[jira] [Comment Edited] (SPARK-26862) assertion failed in ParquetRowConverter
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203786#comment-17203786 ] Rahul Jaiswal edited comment on SPARK-26862 at 9/29/20, 8:31 AM: - is it resolved? I am facing the same issue after CDH upgrade. was (Author: rahul.jaiswal): is it resolved? I am facing the same issue. > assertion failed in ParquetRowConverter > --- > > Key: SPARK-26862 > URL: https://issues.apache.org/jira/browse/SPARK-26862 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Nan Zhu >Priority: Major > > When I run the following query over a internal table (A and B are typed in > string, C is Array[String]) > ``` > spark.read.table("table1") > .select(col("A"), col("B"), col("C")) > .withColumn("row_num", > row_number().over(Window.partitionBy(col("A")).orderBy(col("B").desc))) > ``` > I received error as > > ``` > org.apache.spark.SparkException: Job aborted due to stage failure: Task 730 > in stage 12.0 failed 4 times, most recent failure: Lost task 730.3 in stage > 12.0 (TID 2468, hadoopworker650-dca1.prod.uber.internal, executor 88): > java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:156) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:514) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:318) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:324) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:126) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:452) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:364) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$Generat
[jira] [Commented] (SPARK-26862) assertion failed in ParquetRowConverter
[ https://issues.apache.org/jira/browse/SPARK-26862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203786#comment-17203786 ] Rahul Jaiswal commented on SPARK-26862: --- is it resolved? I am facing the same issue. > assertion failed in ParquetRowConverter > --- > > Key: SPARK-26862 > URL: https://issues.apache.org/jira/browse/SPARK-26862 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Nan Zhu >Priority: Major > > When I run the following query over a internal table (A and B are typed in > string, C is Array[String]) > ``` > spark.read.table("table1") > .select(col("A"), col("B"), col("C")) > .withColumn("row_num", > row_number().over(Window.partitionBy(col("A")).orderBy(col("B").desc))) > ``` > I received error as > > ``` > org.apache.spark.SparkException: Job aborted due to stage failure: Task 730 > in stage 12.0 failed 4 times, most recent failure: Lost task 730.3 in stage > 12.0 (TID 2468, hadoopworker650-dca1.prod.uber.internal, executor 88): > java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:156) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:514) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:318) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:324) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:190) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$7.apply(ParquetRowConverter.scala:185) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:185) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRecordMaterializer.(ParquetRecordMaterializer.scala:43) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport.prepareForRead(ParquetReadSupport.scala:126) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:452) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:364) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.
[jira] [Commented] (SPARK-18496) java.lang.AssertionError: assertion failed
[ https://issues.apache.org/jira/browse/SPARK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203785#comment-17203785 ] Rahul Jaiswal commented on SPARK-18496: --- I am facing the same issue, have we found the solution? > java.lang.AssertionError: assertion failed > -- > > Key: SPARK-18496 > URL: https://issues.apache.org/jira/browse/SPARK-18496 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 > Environment: 2.0.2 snapshot >Reporter: Harish >Priority: Major > Labels: bulk-closed > > I am getting this error when i store the estimates from Julia output to a DF > and then i do df.cache() > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 177.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 177.0 (TID 9722, 10.63.136.108): java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:156) > at > org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) > at > org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356) > at > org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916) > at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:441) > at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) > at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.A