[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-12 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972605#comment-16972605
 ] 

Felix Kizhakkel Jose commented on SPARK-29764:
--

So it's still hard to follow the reproducer? Do I need to skim it further?

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
> Attachments: SparkParquetSampleCode.docx
>
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800)
>  ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972223#comment-16972223
 ] 

Hyukjin Kwon commented on SPARK-29764:
--

priority doesn't matter about getting a help. It is just related to release 
processes. Issues that are difficult to read or don't look clear to reproduce 
don't get attention a lot often.
If I were you, I would try to make a minimsied reproducer without unrelated 
information, rather than just copy and past the whole codes.


> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
> Attachments: SparkParquetSampleCode.docx
>
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-08 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970451#comment-16970451
 ] 

Felix Kizhakkel Jose commented on SPARK-29764:
--

How do I get a help once the priority is reduced, it seems like no eyes are 
getting on this issue. Will there be any SLA on getting a help? [~apachespark]

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
> Attachments: SparkParquetSampleCode.docx
>
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800)
>  ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted d

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-07 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969301#comment-16969301
 ] 

Felix Kizhakkel Jose commented on SPARK-29764:
--

[~hyukjin.kwon] Could you please help me with this?

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
> Attachments: SparkParquetSampleCode.docx
>
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800)
>  ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost 
> task 0.0 in sta

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-06 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968411#comment-16968411
 ] 

Felix Kizhakkel Jose commented on SPARK-29764:
--

[~hyukjin.kwon] Sorry, I didn't know critical is for committers. Please find 
the attached code sample for the issue I am talking [where I have dob and 
startDateTime field in Employee object - which is causing Spark fail to persist 
into parquet file.
[^SparkParquetSampleCode.docx]

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
> Attachments: SparkParquetSampleCode.docx
>
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-06 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968312#comment-16968312
 ] 

Hyukjin Kwon commented on SPARK-29764:
--

Can you show self-contained reproducer? given the codes you provided, it's not 
clear where is wrong.

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800)
>  ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost 
> task 0.0 in stage 0.0 (TID 0, loca

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-06 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968306#comment-16968306
 ] 

Hyukjin Kwon commented on SPARK-29764:
--

Please don't set priority as Critical+ which is usually reserved for committers.

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Major
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566) at 
> com.felix.SparkParquetSerializer.serialize(SparkParquetSerializer.java:24) at 
> com.felix.Application.run(Application.java:63) at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:800)
>  ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost 
> task 0.0 in stage 0.0 (TID 0, localhost, executor drive

[jira] [Commented] (SPARK-29764) Error on Serializing POJO with java datetime property to a Parquet file

2019-11-05 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967800#comment-16967800
 ] 

Felix Kizhakkel Jose commented on SPARK-29764:
--

The Spark Schema generated for the POJO is:
message spark_schema {
 optional group address {
 optional binary city (UTF8);
 optional binary streetName (UTF8);
 optional group zip {
 required int32 ext;
 required int32 zip;
 }
 }
 required int32 age;
 optional group dob {
 optional group chronology {
 optional binary calendarType (UTF8);
 optional binary id (UTF8);
 }
 required int32 dayOfMonth;
 optional binary dayOfWeek (UTF8);
 required int32 dayOfYear;
 optional group era {
 required int32 value;
 }
 required boolean leapYear;
 optional binary month (UTF8);
 required int32 monthValue;
 required int32 year;
 }
 optional group id {
 required int64 leastSignificantBits;
 required int64 mostSignificantBits;
 }
 optional binary name (UTF8);
 optional binary phone (UTF8);
 optional group startDateTime {
 required int32 dayOfMonth;
 optional binary dayOfWeek (UTF8);
 required int32 dayOfYear;
 required int32 hour;
 required int32 minute;
 optional binary month (UTF8);
 required int32 monthValue;
 required int32 nano;
 required int32 second;
 required int32 year;
 }
}

Also I don't know why its not recognized as int96 [timestamptype] or int 
i[datetype] instead its represented as group.  I dont know whether thats the 
reason to get negativeArrayException when I have large data set to persist to 
parquet. Any help is very much appreciated

> Error on Serializing POJO with java datetime property to a Parquet file
> ---
>
> Key: SPARK-29764
> URL: https://issues.apache.org/jira/browse/SPARK-29764
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Felix Kizhakkel Jose
>Priority: Blocker
>
> Hello,
>  I have been doing a proof of concept for data lake structure and analytics 
> using Apache Spark. 
>  When I add a java java.time.LocalDateTime/java.time.LocalDate properties in 
> my data model, the serialization to Parquet start failing.
>  *My Data Model:*
> @Data
>  public class Employee
> { private UUID id = UUID.randomUUID(); private String name; private int age; 
> private LocalDate dob; private LocalDateTime startDateTime; private String 
> phone; private Address address; }
>  
>  *Serialization Snippet*
> {color:#0747a6}public void serialize(){color}
> {color:#0747a6}{ List inputDataToSerialize = 
> getInputDataToSerialize(); // this creates 100,000 employee objects 
> Encoder employeeEncoder = Encoders.bean(Employee.class); 
> Dataset employeeDataset = sparkSession.createDataset( 
> inputDataToSerialize, employeeEncoder ); employeeDataset.write() 
> .mode(SaveMode.Append) .parquet("/Users/felix/Downloads/spark.parquet"); 
> }{color}
> +*Exception Stack Trace:*
>  +
>  *java.lang.IllegalStateException: Failed to execute 
> CommandLineRunnerjava.lang.IllegalStateException: Failed to execute 
> CommandLineRunner at 
> org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:803)
>  at 
> org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:784)
>  at 
> org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
>  at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:316) at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1186) 
> at 
> org.springframework.boot.SpringApplication.run(SpringApplication.java:1175) 
> at com.felix.Application.main(Application.java:45)Caused by: 
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark