[jira] [Commented] (SPARK-20108) Spark query is getting failed with exception
[ https://issues.apache.org/jira/browse/SPARK-20108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953196#comment-15953196 ] Hyukjin Kwon commented on SPARK-20108: -- It will help other guys like me to track down the problem and solve this. > Spark query is getting failed with exception > > > Key: SPARK-20108 > URL: https://issues.apache.org/jira/browse/SPARK-20108 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 >Reporter: ZS EDGE > > In our project we have implemented a logic where we programatically generate > spark queries. These queries are executed as a sub query and below is the > sample query-- > sqlContext.sql("INSERT INTO TABLE > test_client_r2_r2_2_prod_db1_oz.S3_EMPDTL_Incremental_invalid SELECT > 'S3_EMPDTL_Incremental',S3_EMPDTL_Incremental.row_id,S3_EMPDTL_Incremental.SOURCE_FILE_NAME,S3_EMPDTL_Incremental.SOURCE_ROW_ID,'S3_EMPDTL_Incremental','2017-03-22 > > 20:18:59','1','Emp_id#$Emp_name#$Emp_phone#$Emp_salary_in_K#$Emp_address_id#$Date_of_Birth#$Status#$Dept_id#$Date_of_joining#$Row_Number#$Dec_check#$','test','Y','N/A','','' > FROM S3_EMPDTL_Incremental_r AS S3_EMPDTL_Incremental where row_id IN > (select row_id from s3_empdtl_incremental_r where row_id IN(42949672960))") > While executing the above code in the pyspark it is throwing below exception-- > FAILS>> > .spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463) > at > org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1325) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258) > ... 8 more > [Stage 32:=> (10 + 5) / > 26]17/03/22 15:42:10 ERROR TaskSetManager: Task 4 in stage 32.0 > failed 4 times; aborting job > 17/03/22 15:42:10 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in > stage 32.0 failed 4 times, most recent failure: Lost task 4.3 in > stage 32.0 (TID 857, ip-10-116-1-73.ec2.internal): > org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at
[jira] [Commented] (SPARK-20108) Spark query is getting failed with exception
[ https://issues.apache.org/jira/browse/SPARK-20108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953195#comment-15953195 ] Hyukjin Kwon commented on SPARK-20108: -- It seems almost impossible to reproduce to me. Do you mind if I ask a self-reproducer? > Spark query is getting failed with exception > > > Key: SPARK-20108 > URL: https://issues.apache.org/jira/browse/SPARK-20108 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.0 >Reporter: ZS EDGE > > In our project we have implemented a logic where we programatically generate > spark queries. These queries are executed as a sub query and below is the > sample query-- > sqlContext.sql("INSERT INTO TABLE > test_client_r2_r2_2_prod_db1_oz.S3_EMPDTL_Incremental_invalid SELECT > 'S3_EMPDTL_Incremental',S3_EMPDTL_Incremental.row_id,S3_EMPDTL_Incremental.SOURCE_FILE_NAME,S3_EMPDTL_Incremental.SOURCE_ROW_ID,'S3_EMPDTL_Incremental','2017-03-22 > > 20:18:59','1','Emp_id#$Emp_name#$Emp_phone#$Emp_salary_in_K#$Emp_address_id#$Date_of_Birth#$Status#$Dept_id#$Date_of_joining#$Row_Number#$Dec_check#$','test','Y','N/A','','' > FROM S3_EMPDTL_Incremental_r AS S3_EMPDTL_Incremental where row_id IN > (select row_id from s3_empdtl_incremental_r where row_id IN(42949672960))") > While executing the above code in the pyspark it is throwing below exception-- > FAILS>> > .spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463) > at > org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1325) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258) > ... 8 more > [Stage 32:=> (10 + 5) / > 26]17/03/22 15:42:10 ERROR TaskSetManager: Task 4 in stage 32.0 > failed 4 times; aborting job > 17/03/22 15:42:10 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in > stage 32.0 failed 4 times, most recent failure: Lost task 4.3 in > stage 32.0 (TID 857, ip-10-116-1-73.ec2.internal): > org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply > (InsertIntoHadoopFsRelationCommand.scala:143) > at