Re: RebaseDateTime with dynamicAllocation

Sean Owen Wed, 09 Mar 2022 08:11:07 -0800

Doesn't quite seem the same. What is the rest of the error -- why did the
class fail to initialize?


On Wed, Mar 9, 2022 at 10:08 AM Andreas Weise <andreas.we...@gmail.com>
wrote:

> Hi,
>
> When playing around with spark.dynamicAllocation.enabled I face the
> following error after the first round of executors have been killed.
>
> Py4JJavaError: An error occurred while calling o337.showString. :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
> in stage 18.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 18.0 (TID 220) (10.128.6.170 executor 13): java.lang.NoClassDefFoundError:
> Could not initialize class
> org.apache.spark.sql.catalyst.util.RebaseDateTime$ at
> org.apache.spark.sql.catalyst.util.RebaseDateTime.lastSwitchJulianTs(RebaseDateTime.scala)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseTimestamp(ParquetVectorUpdaterFactory.java:1067)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseInt96(ParquetVectorUpdaterFactory.java:1088)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.access$1500(ParquetVectorUpdaterFactory.java:43)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$BinaryToSQLTimestampRebaseUpdater.decodeSingleDictionaryId(ParquetVectorUpdaterFactory.java:860)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdater.decodeDictionaryIds(ParquetVectorUpdater.java:75)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:216)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:298)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:196)
> at
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:104)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:191)
> at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:104)
> at
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
> Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_1$(Unknown
> Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown
> Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source) at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
> at
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131) at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
> We tested on Spark 3.2.1 k8s with these dynamicAllocation settings:
>
> spark.dynamicAllocation.enabled=true
> spark.dynamicAllocation.maxExecutors=4
> spark.dynamicAllocation.minExecutors=1
> spark.dynamicAllocation.executorIdleTimeout=30s
> spark.dynamicAllocation.shuffleTracking.enabled=true
> spark.dynamicAllocation.shuffleTracking.timeout=30s
> spark.decommission.enabled=true
>
> Might be related to SPARK-34772 /
> https://www.mail-archive.com/commits@spark.apache.org/msg50240.html but
> as this was fixed for 3.2.0 it might be worth another issue ?
>
> Best regards
> Andreas
>

Re: RebaseDateTime with dynamicAllocation

Reply via email to