RushabhK commented on PR #9844: URL: https://github.com/apache/incubator-gluten/pull/9844#issuecomment-2943073257
> > @JkSelf I tested this change on my setup. It's still giving the same exception, is not a Parquet file. Expected magic number at tail, but found [2, 0, 0, 0]. This file is ~250 MB size. > > This is the complete stack trace: > > ``` > > Py4JJavaError: An error occurred while calling o135.count. > > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1161 in stage 1.0 failed 4 times, most recent failure: Lost task 1161.3 in stage 1.0 (TID 1208) (241.130.178.8 executor 2): java.lang.RuntimeException: gs://<some_path>/gluten-part-d0a3b6a4-ccc9-41b3-a44e-34177ab18674.zstd.parquet is not a Parquet file. Expected magic number at tail, but found [2, 0, 0, 0] > > at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:565) > > at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:799) > > at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:213) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:219) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:282) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131) > > at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown Source) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) > > at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) > > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > > at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) > > at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > > at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) > > at org.apache.spark.scheduler.Task.run(Task.scala:141) > > at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > > at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > > at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:750) > > > > Driver stacktrace: > > at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856) > > at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792) > > at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791) > > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > > at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791) > > at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247) > > at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247) > > at scala.Option.foreach(Option.scala:407) > > at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247) > > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060) > > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994) > > at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > > Caused by: java.lang.RuntimeException: gs://<some_path>/gluten-part-d0a3b6a4-ccc9-41b3-a44e-34177ab18674.zstd.parquet is not a Parquet file. Expected magic number at tail, but found [2, 0, 0, 0] > > at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:565) > > at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:799) > > at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:666) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:85) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:71) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:66) > > at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:213) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:219) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:282) > > at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131) > > at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown Source) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) > > at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) > > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > > at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) > > at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > > at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) > > at org.apache.spark.scheduler.Task.run(Task.scala:141) > > at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > > at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > > at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:750) > > [Stage 1:======================================> (1168 + 4) / 1627] > > ``` > > @RushabhK Ok. Can you help to provide the reproduced code? Thanks. @JkSelf I can elaborate how I am testing this in the following steps 1. I took the gluten build with these changes, built my new spark image 2. I have a spark job which writes parquet with 300 tasks, 8 core per executor is the config. 3. While it is writing from the 300 tasks, I kill one of the executors (8 failed tasks), it retries and then it finishes. 4. I then try reading the parquet files and just do a df.count() on it for it to materialize. This is when I encounter the following exception.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
