Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each and 45GB per executor. Also, specified the property spark.yarn.executor.memoryOverhead=2024.
During the execution, tasks are lost and ShuffleMapTasks are re-submitted. I am seeing that tasks are failing with the following message - *java.lang.IllegalArgumentException: requirement failed: File segment length cannot be negative (got -27045427)* * at scala.Predef$.require(Predef.scala:233)* * at org.apache.spark.storage.FileSegment.<init>(FileSegment.scala:28)* * at org.apache.spark.storage.DiskBlockObjectWriter.fileSegment(DiskBlockObjectWriter.scala:220)* * at org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:184)* * at org.apache.spark.shuffle.sort.ShuffleExternalSorter.closeAndGetSpills(ShuffleExternalSorter.java:398)* * at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:206)* * at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)* * at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)* * at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)* * at org.apache.spark.scheduler.Task.run(Task.scala:89)* * at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)* * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)* I understood that its because the shuffle block is > 2G, the Int value is taking negative and throwing the above exeception. Can someone throw light on this ? What is the fix for this ? Thanks, Padma CH