[ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308194#comment-16308194 ]
Kazuaki Ishizaki commented on SPARK-22935: ------------------------------------------ [~jlaskowski] When you see the scheme of this Dataset, {{timestamp}} is {{timestamp}}, is not {{date}}. The inferSchema always sets type for time into {{timestamp}}. If you change declaration of {{timestamp}} in {{CDR}} class from {{java.sql.Date}} to {{java.sql.Timestamp}} as below, it works well. {code} Dataset<Row> df = spark .read() .format("csv") .option("header", "true") .option("inferSchema", "true") .option("delimiter", ";") .csv("CDR_SAMPLE.csv"); df.printSchema(); Dataset<CDR> cdr = df .as(Encoders.bean(CDR.class)); cdr.printSchema(); Dataset<CDR> ds = cdr.filter((FilterFunction<CDR2>) x -> (x.timestamp != null)); ... // result root |-- timestamp: timestamp (nullable = true) {code} {code} // CDR.java public class CDR implements java.io.Serializable { public java.sql.Timestamp timestamp; public java.sql.Timestamp getTimestamp() { return this.timestamp; } public void setTimestamp(java.sql.Timestamp timestamp) { this.timestamp = timestamp; } } {code} > Dataset with Java Beans for java.sql.Date throws CompileException > ----------------------------------------------------------------- > > Key: SPARK-22935 > URL: https://issues.apache.org/jira/browse/SPARK-22935 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1, 2.3.0 > Reporter: Kazuaki Ishizaki > > The following code can throw an exception with or without whole-stage codegen. > {code} > public void SPARK22935() { > Dataset<CDR> cdr = spark > .read() > .format("csv") > .option("header", "true") > .option("inferSchema", "true") > .option("delimiter", ";") > .csv("CDR_SAMPLE.csv") > .as(Encoders.bean(CDR.class)); > Dataset<CDR> ds = cdr.filter((FilterFunction<CDR>) x -> (x.timestamp != > null)); > long c = ds.count(); > cdr.show(2); > ds.show(2); > System.out.println("cnt=" + c); > } > // CDR.java > public class CDR implements java.io.Serializable { > public java.sql.Date timestamp; > public java.sql.Date getTimestamp() { return this.timestamp; } > public void setTimestamp(java.sql.Date timestamp) { this.timestamp = > timestamp; } > } > // CDR_SAMPLE.csv > timestamp > 2017-10-29T02:37:07.815Z > 2017-10-29T02:38:07.815Z > {code} > result > {code} > 12:17:10.352 ERROR > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to > compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 61, Column 70: No applicable constructor/method found > for actual parameters "long"; candidates are: "public static java.sql.Date > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)" > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 61, Column 70: No applicable constructor/method found for actual parameters > "long"; candidates are: "public static java.sql.Date > org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)" > at > org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821) > ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org