[ https://issues.apache.org/jira/browse/SPARK-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167725#comment-16167725 ]
Jen-Ming Chung edited comment on SPARK-22019 at 9/15/17 11:18 AM: ------------------------------------------------------------------ Hi [~client.test], The schema inferred after {{sqc.read().json(stringdataset)}} as below, {code} root |-- id: long (nullable = true) |-- str: string (nullable = true) {code} However, the pojo class {{SampleData.class}} the member {{id}} is declared as {{int}} instead of {{long}}, this will cause the subsequent exception in your test case. So set the {{long}} type to {{id}} in {{SampleData.class}} then executing the test case again, you can expect the following results: {code} +--------+ | str| +--------+ |everyone| |everyone| |everyone| | Hello| | Hello| | Hello| +--------+ root |-- str: string (nullable = true) {code} As you can see, we missing the {{id}} in schema, we need to add the {{id}} and corresponding getter and setter, {code} class SampleDataFlat { ... long id; public long getId() { return id; } public void setId(long id) { this.id = id; } public SampleDataFlat(String str, long id) { this.str = str; this.id = id; } ... } {code} Then you will get the following results: {code} +---+--------+ | id| str| +---+--------+ | 1|everyone| | 2|everyone| | 3|everyone| | 1| Hello| | 2| Hello| | 3| Hello| +---+--------+ root |-- id: long (nullable = true) |-- str: string (nullable = true) {code} was (Author: jmchung): Hi [~client.test], The schema inferred after {{sqc.read().json(stringdataset)}} as below, {code} root |-- id: long (nullable = true) |-- str: string (nullable = true) {code} However, the pojo class {{SampleData.class}} the member {{id}} is declared as {{int}} instead of {{long}}, this will cause the subsequent exception in your test case. So set the {{long}} type to {{id}} in {{SampleData.class}} then executing the test case again, you can expect the following results: {code} +--------+ | str| +--------+ |everyone| |everyone| |everyone| | Hello| | Hello| | Hello| +--------+ root |-- str: string (nullable = true) {code} As you can see, we missing the {{id}} in schema, we need to add the {{id}} and corresponding getter and setter, {code} class SampleDataFlat { long id; public long getId() { return id; } public void setId(long id) { this.id = id; } public SampleDataFlat(String str, long id) { this.str = str; this.id = id; } } {code} Then you will get the following results: {code} +---+--------+ | id| str| +---+--------+ | 1|everyone| | 2|everyone| | 3|everyone| | 1| Hello| | 2| Hello| | 3| Hello| +---+--------+ root |-- id: long (nullable = true) |-- str: string (nullable = true) {code} > JavaBean int type property > --------------------------- > > Key: SPARK-22019 > URL: https://issues.apache.org/jira/browse/SPARK-22019 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: taiho choi > > when the type of SampleData's id is int, following code generates errors. > when long, it's ok. > > {code:java} > @Test > public void testDataSet2() { > ArrayList<String> arr= new ArrayList(); > arr.add("{\"str\": \"everyone\", \"id\": 1}"); > arr.add("{\"str\": \"Hello\", \"id\": 1}"); > //1.read array and change to string dataset. > JavaRDD<String> data = sc.parallelize(arr); > Dataset<String> stringdataset = sqc.createDataset(data.rdd(), > Encoders.STRING()); > stringdataset.show(); //PASS > //2. convert string dataset to sampledata dataset > Dataset<SampleData> df = > sqc.read().json(stringdataset).as(Encoders.bean(SampleData.class)); > df.show();//PASS > df.printSchema();//PASS > Dataset<SampleDataFlat> fad = df.flatMap(SampleDataFlat::flatMap, > Encoders.bean(SampleDataFlat.class)); > fad.show(); //ERROR > fad.printSchema(); > } > public static class SampleData implements Serializable { > public String getStr() { > return str; > } > public void setStr(String str) { > this.str = str; > } > public int getId() { > return id; > } > public void setId(int id) { > this.id = id; > } > String str; > int id; > } > public static class SampleDataFlat { > String str; > public String getStr() { > return str; > } > public void setStr(String str) { > this.str = str; > } > public SampleDataFlat(String str, long id) { > this.str = str; > } > public static Iterator<SampleDataFlat> flatMap(SampleData data) { > ArrayList<SampleDataFlat> arr = new ArrayList<>(); > arr.add(new SampleDataFlat(data.getStr(), data.getId())); > arr.add(new SampleDataFlat(data.getStr(), data.getId()+1)); > arr.add(new SampleDataFlat(data.getStr(), data.getId()+2)); > return arr.iterator(); > } > } > {code} > ==Error message== > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 38, Column 16: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 38, Column 16: No applicable constructor/method found for actual parameters > "long"; candidates are: "public void SparkUnitTest$SampleData.setId(int)" > /* 024 */ public java.lang.Object apply(java.lang.Object _i) { > /* 025 */ InternalRow i = (InternalRow) _i; > /* 026 */ > /* 027 */ final SparkUnitTest$SampleData value1 = false ? null : new > SparkUnitTest$SampleData(); > /* 028 */ this.javaBean = value1; > /* 029 */ if (!false) { > /* 030 */ > /* 031 */ > /* 032 */ boolean isNull3 = i.isNullAt(0); > /* 033 */ long value3 = isNull3 ? -1L : (i.getLong(0)); > /* 034 */ > /* 035 */ if (isNull3) { > /* 036 */ throw new NullPointerException(((java.lang.String) > references[0])); > /* 037 */ } > /* 038 */ javaBean.setId(value3); -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org