[jira] [Created] (SPARK-22352) task failures with java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE error

2017-10-25 Thread Gowtham (JIRA)
Gowtham created SPARK-22352:
---

 Summary: task failures with java.lang.IllegalArgumentException: 
Size exceeds Integer.MAX_VALUE error
 Key: SPARK-22352
 URL: https://issues.apache.org/jira/browse/SPARK-22352
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 2.1.1
Reporter: Gowtham


Few tasks failed with below exception when i ran count on a Dataset. This 
dataset is the union of 14 datasets and i am doing cache before count. total 
size of the dataset is 170GB memory deserialized. spark spawned 200 tasks out 
of which 3 failed with java.lang.IllegalArgumentException: Size exceeds 
Integer.MAX_VALUE error. 

Complete exception:

17/10/25 16:16:28 INFO memory.MemoryStore: Will not store rdd_33_68
17/10/25 16:16:28 WARN memory.MemoryStore: Not enough space to cache rdd_33_68 
in memory! (computed 9.9 GB so far)
17/10/25 16:16:28 INFO memory.MemoryStore: Memory use = 4.5 MB (blocks) + 9.9 
GB (scratch space shared across 1 tasks(s)) = 9.9 GB. Storage limit = 14.7 GB.
17/10/25 16:16:28 WARN storage.BlockManager: Persisting block rdd_33_68 to disk 
instead.
17/10/25 16:17:00 ERROR executor.Executor: Exception in task 68.0 in stage 2.0 
(TID 2155)
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:103)
at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
at 
org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:468)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:704)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/10/25 16:17:00 ERROR server.TransportRequestHandler: Error while invoking 
RpcHandler#receive() on RPC id 6932313655306109864
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:103)
at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doGetLocalBytes(BlockManager.scala:526)
at 
org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:504)
at 
org.apache.spark.storage.BlockManager$$anonfun$getLocalBytes$2.apply(BlockManager.scala:504)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:504)
at 
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:307)
at 
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:60)
at 
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:60)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at 

[jira] [Commented] (SPARK-13210) NPE in Sort

2017-04-06 Thread Gowtham (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960147#comment-15960147
 ] 

Gowtham commented on SPARK-13210:
-

we are facing same exception in spark 2.1.0

java.lang.NullPointerException
at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:347)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:193)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:254)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:171)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92)
at 
org.apache.spark.shuffle.sort.ShuffleInMemorySorter.reset(ShuffleInMemorySorter.java:100)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:256)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:171)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:245)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:92)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.growPointerArrayIfNecessary(ShuffleExternalSorter.java:328)
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:379)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:167)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

> NPE in Sort
> ---
>
> Key: SPARK-13210
> URL: https://issues.apache.org/jira/browse/SPARK-13210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.6.1, 2.0.0
>
>
> When run TPCDS query Q78 with scale 10:
> {code}
> 16/02/04 22:39:09 ERROR Executor: Managed memory leak detected; size = 
> 268435456 bytes, TID = 143
> 16/02/04 22:39:09 ERROR Executor: Exception in task 0.0 in stage 47.0 (TID 
> 143)
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:333)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:60)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:39)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:239)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getSortedIterator(UnsafeExternalSorter.java:415)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:116)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:87)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:60)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>   at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>   at 
>