[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972605#comment-16972605 ] Felix Kizhakkel Jose commented on SPARK-29764: -- So it's still hard to follow the reproducer? Do I need to skim it further? > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > Attachments: SparkParquetSampleCode.docx > > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800) > ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to > stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure:
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972223#comment-16972223 ] Hyukjin Kwon commented on SPARK-29764: -- priority doesn't matter about getting a help. It is just related to release processes. Issues that are difficult to read or don't look clear to reproduce don't get attention a lot often. If I were you, I would try to make a minimsied reproducer without unrelated information, rather than just copy and past the whole codes. > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > Attachments: SparkParquetSampleCode.docx > > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970451#comment-16970451 ] Felix Kizhakkel Jose commented on SPARK-29764: -- How do I get a help once the priority is reduced, it seems like no eyes are getting on this issue. Will there be any SLA on getting a help? [~apachespark] > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > Attachments: SparkParquetSampleCode.docx > > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800) > ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted d
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969301#comment-16969301 ] Felix Kizhakkel Jose commented on SPARK-29764: -- [~hyukjin.kwon] Could you please help me with this? > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > Attachments: SparkParquetSampleCode.docx > > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800) > ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to > stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost > task 0.0 in sta
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968411#comment-16968411 ] Felix Kizhakkel Jose commented on SPARK-29764: -- [~hyukjin.kwon] Sorry, I didn't know critical is for committers. Please find the attached code sample for the issue I am talking [where I have dob and startDateTime field in Employee object - which is causing Spark fail to persist into parquet file. [^SparkParquetSampleCode.docx] > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > Attachments: SparkParquetSampleCode.docx > > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968312#comment-16968312 ] Hyukjin Kwon commented on SPARK-29764: -- Can you show self-contained reproducer? given the codes you provided, it's not clear where is wrong. > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800) > ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to > stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost > task 0.0 in stage 0.0 (TID 0, loca
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968306#comment-16968306 ] Hyukjin Kwon commented on SPARK-29764: -- Please don't set priority as Critical+ which is usually reserved for committers. > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Major > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at > com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at > com.felix.Application.run(Application.java:63) at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800) > ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to > stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost > task 0.0 in stage 0.0 (TID 0, localhost, executor drive
[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file
[ https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967800#comment-16967800 ] Felix Kizhakkel Jose commented on SPARK-29764: -- The Spark Schema generated for the POJO is: message spark_schema { optional group address { optional binary city (UTF8); optional binary streetName (UTF8); optional group zip { required int32 ext; required int32 zip; } } required int32 age; optional group dob { optional group chronology { optional binary calendarType (UTF8); optional binary id (UTF8); } required int32 dayOfMonth; optional binary dayOfWeek (UTF8); required int32 dayOfYear; optional group era { required int32 value; } required boolean leapYear; optional binary month (UTF8); required int32 monthValue; required int32 year; } optional group id { required int64 leastSignificantBits; required int64 mostSignificantBits; } optional binary name (UTF8); optional binary phone (UTF8); optional group startDateTime { required int32 dayOfMonth; optional binary dayOfWeek (UTF8); required int32 dayOfYear; required int32 hour; required int32 minute; optional binary month (UTF8); required int32 monthValue; required int32 nano; required int32 second; required int32 year; } } Also I don't know why its not recognized as int96 [timestamptype] or int i[datetype] instead its represented as group. I dont know whether thats the reason to get negativeArrayException when I have large data set to persist to parquet. Any help is very much appreciated > Error on Serializing POJO with java datetime property to a Parquet file > --- > > Key: SPARK-29764 > URL: https://issues.apache.org/jira/browse/SPARK-29764 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Felix Kizhakkel Jose >Priority: Blocker > > Hello, > I have been doing a proof of concept for data lake structure and analytics > using Apache Spark. > When I add a java java.time.LocalDateTime/java.time.LocalDate properties in > my data model, the serialization to Parquet start failing. > *My Data Model:* > @Data > public class Employee > { private UUID id = UUID.randomUUID(); private String name; private int age; > private LocalDate dob; private LocalDateTime startDateTime; private String > phone; private Address address; } > > *Serialization Snippet* > {color:#0747a6}public void serialize(){color} > {color:#0747a6}{ List inputDataToSerialize = > getInputDataToSerialize(); // this creates 100,000 employee objects > Encoder employeeEncoder = Encoders.bean(Employee.class); > Dataset employeeDataset = sparkSession.createDataset( > inputDataToSerialize, employeeEncoder ); employeeDataset.write() > .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); > }{color} > +*Exception Stack Trace:* > + > *java.lang.IllegalStateException: Failed to execute > CommandLineRunnerjava.lang.IllegalStateException: Failed to execute > CommandLineRunner at > org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803) > at > org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784) > at > org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) > at > org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) > at com.felix.Application.main(Application.java:45)Caused by: > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark