[jira] [Comment Edited] (SPARK-26862) assertion failed in ParquetRowConverter

2020-09-29 Thread Rahul Jaiswal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203786#comment-17203786
 ] 

Rahul Jaiswal edited comment on SPARK-26862 at 9/29/20, 8:31 AM:
-

is it resolved?

I am facing the same issue after CDH upgrade.


was (Author: rahul.jaiswal):
is it resolved?

I am facing the same issue.

> assertion failed in ParquetRowConverter
> ---
>
> Key: SPARK-26862
> URL: https://issues.apache.org/jira/browse/SPARK-26862
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Nan Zhu
>Priority: Major
>
> When I run the following  query over a internal table (A and B are typed in 
> string, C is Array[String])
> ```
> spark.read.table("table1")
>  .select(col("A"), col("B"), col("C"))
>  .withColumn("row_num", 
> row_number().over(Window.partitionBy(col("A")).orderBy(col("B").desc)))
> ```
> I received error as 
>  
> ```
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 730 
> in stage 12.0 failed 4 times, most recent failure: Lost task 730.3 in stage 
> 12.0 (TID 2468, hadoopworker650-dca1.prod.uber.internal, executor 88): 
> java.lang.AssertionError: assertion failed
> at scala.Predef$.assert(Predef.scala:156)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:514)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:318)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:324)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:126)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:452)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$Generat

[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter

2020-09-29 Thread Rahul Jaiswal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203786#comment-17203786
 ] 

Rahul Jaiswal commented on SPARK-26862:
---

is it resolved?

I am facing the same issue.

> assertion failed in ParquetRowConverter
> ---
>
> Key: SPARK-26862
> URL: https://issues.apache.org/jira/browse/SPARK-26862
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Nan Zhu
>Priority: Major
>
> When I run the following  query over a internal table (A and B are typed in 
> string, C is Array[String])
> ```
> spark.read.table("table1")
>  .select(col("A"), col("B"), col("C"))
>  .withColumn("row_num", 
> row_number().over(Window.partitionBy(col("A")).orderBy(col("B").desc)))
> ```
> I received error as 
>  
> ```
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 730 
> in stage 12.0 failed 4 times, most recent failure: Lost task 730.3 in stage 
> 12.0 (TID 2468, hadoopworker650-dca1.prod.uber.internal, executor 88): 
> java.lang.AssertionError: assertion failed
> at scala.Predef$.assert(Predef.scala:156)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:514)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:318)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:324)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:126)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:452)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.

[jira] [Commented] (SPARK-18496) java.lang.AssertionError: assertion failed

2020-09-29 Thread Rahul Jaiswal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203785#comment-17203785
 ] 

Rahul Jaiswal commented on SPARK-18496:
---

I am facing the same issue, have we found the solution?

> java.lang.AssertionError: assertion failed
> --
>
> Key: SPARK-18496
> URL: https://issues.apache.org/jira/browse/SPARK-18496
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
> Environment: 2.0.2 snapshot
>Reporter: Harish
>Priority: Major
>  Labels: bulk-closed
>
> I am getting this error when i store the estimates from Julia output to a DF 
> and then i do df.cache()
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 177.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
> 177.0 (TID 9722, 10.63.136.108): java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
>   at 
> org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
>   at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362)
>   at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
>   at 
> org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356)
>   at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916)
>   at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:441)
>   at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.A