[jira] [Updated] (SPARK-31698) NPE on big dataset plans

2020-05-13 Thread Viacheslav Tradunsky (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Tradunsky updated SPARK-31698:
-
Environment: AWS EMR: 30 machines, 7TB RAM total.  (was: AWS EMR: 30 
machine, 7TB RAM total.)

> NPE on big dataset plans
> 
>
> Key: SPARK-31698
> URL: https://issues.apache.org/jira/browse/SPARK-31698
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: AWS EMR: 30 machines, 7TB RAM total.
>Reporter: Viacheslav Tradunsky
>Priority: Major
> Attachments: Spark_NPE_big_dataset.log
>
>
> We have big dataset containing 275 SQL operations more than 275 joins.
> On the terminal operation to write data, it fails with NullPointerException.
>  
> I understand that such big number of operations might not be what spark is 
> designed for, but NullPointerException is not an ideal way to fail in this 
> case.
>  
> For more details, please see the stacktrace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31698) NPE on big dataset plans

2020-05-13 Thread Viacheslav Tradunsky (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Tradunsky updated SPARK-31698:
-
Environment: AWS EMR: 30 machine, 7TB RAM total.  (was: AWS EMR)

> NPE on big dataset plans
> 
>
> Key: SPARK-31698
> URL: https://issues.apache.org/jira/browse/SPARK-31698
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: AWS EMR: 30 machine, 7TB RAM total.
>Reporter: Viacheslav Tradunsky
>Priority: Major
> Attachments: Spark_NPE_big_dataset.log
>
>
> We have big dataset containing 275 SQL operations more than 275 joins.
> On the terminal operation to write data, it fails with NullPointerException.
>  
> I understand that such big number of operations might not be what spark is 
> designed for, but NullPointerException is not an ideal way to fail in this 
> case.
>  
> For more details, please see the stacktrace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31698) NPE on big dataset plans

2020-05-13 Thread Viacheslav Tradunsky (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Tradunsky updated SPARK-31698:
-
Docs Text:   (was: org.apache.spark.SparkException: Job aborted.
./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
com.company.app.executor.spark.SparkDatasetGenerationJob.generateDataset(SparkDatasetGenerationJob.scala:51)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
com.company.app.executor.spark.SparkDatasetGenerationJob.call(SparkDatasetGenerationJob.scala:82)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
com.company.app.executor.spark.SparkDatasetGenerationJob.call(SparkDatasetGenerationJob.scala:11)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:40)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
  ./livy-livy-server.out.gz:20/05/12 22:46:54 INFO LineBufferedStream:  at 
org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:64)
  

[jira] [Updated] (SPARK-31698) NPE on big dataset plans

2020-05-13 Thread Viacheslav Tradunsky (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Tradunsky updated SPARK-31698:
-
Attachment: Spark_NPE_big_dataset.log

> NPE on big dataset plans
> 
>
> Key: SPARK-31698
> URL: https://issues.apache.org/jira/browse/SPARK-31698
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: AWS EMR
>Reporter: Viacheslav Tradunsky
>Priority: Major
> Attachments: Spark_NPE_big_dataset.log
>
>
> We have big dataset containing 275 SQL operations more than 275 joins.
> On the terminal operation to write data, it fails with NullPointerException.
>  
> I understand that such big number of operations might not be what spark is 
> designed for, but NullPointerException is not an ideal way to fail in this 
> case.
>  
> For more details, please see the stacktrace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org