Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
For uniform partitioning, you can try custom Partitioner. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p26477.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
only option is to split you problem further by increasing parallelism My understanding is by increasing the number of partitions, is that right? That didn't seem to help because it is seem the partitions are not uniformly sized. My observation is when I increase the number of partitions, it creates many empty block partitions and may larger partition is not broken down into smaller size. Any hints, on how I can get uniform partitions. I noticed many threads, but was not able to do any thing effective from Java api. I will appreciate any help/insight you can provide. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p19097.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
That's true Guillaume. I'm currently aggregating documents considering a week as time range. I will have to make it daily and aggregate the results later. thanks for your hints anyway Arian Pasquali http://about.me/arianpasquali 2014-10-20 13:53 GMT+01:00 Guillaume Pitel guillaume.pi...@exensa.com: Hi, The array size you (or the serializer) tries to allocate is just too big for the JVM. No configuration can help : https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit The only option is to split you problem further by increasing parallelism. Guillaume Hi, I’m using Spark 1.1.0 and I’m having some issues to setup memory options. I get “Requested array size exceeds VM limit” and I’m probably missing something regarding memory configuration https://spark.apache.org/docs/1.1.0/configuration.html. My server has 30G of memory and this are my current settings. ##this one seams that was deprecated export SPARK_MEM=‘25g’ ## worker memory options seams to be the memory for each worker (by default we have a worker for each core) export SPARK_WORKER_MEMORY=‘5g’ I probably need to specify some options using SPARK_DAEMON_JAVA_OPTS, but I’m not quite sure how. I have tried some different options like the following, but I still couldn’t make it right: export SPARK_DAEMON_JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' export JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' Does anyone has any idea how can I approach this? 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1566 non-empty blocks out of 1566 blocks 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms 14/10/11 13:02:06 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (1 time so far) 14/10/11 13:05:17 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (2 times so far) 14/10/11 13:09:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 1566) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/11 13:09:15 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140 Arian -- [image: eXenSa] *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Hi Arian, You will get this exception because you are trying to create an array that is larger than the maximum contiguous block of memory in your Java VMs heap. Here since you are setting Worker memory as *5Gb* and you are exporting the *_OPTS as *8Gb*, your application actually thinks it has 8Gb of memory where as it only has 5Gb and hence it exceeds the VM Limit. Thanks Best Regards On Mon, Oct 20, 2014 at 4:42 PM, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I’m using Spark 1.1.0 and I’m having some issues to setup memory options. I get “Requested array size exceeds VM limit” and I’m probably missing something regarding memory configuration https://spark.apache.org/docs/1.1.0/configuration.html. My server has 30G of memory and this are my current settings. ##this one seams that was deprecated export SPARK_MEM=‘25g’ ## worker memory options seams to be the memory for each worker (by default we have a worker for each core) export SPARK_WORKER_MEMORY=‘5g’ I probably need to specify some options using SPARK_DAEMON_JAVA_OPTS, but I’m not quite sure how. I have tried some different options like the following, but I still couldn’t make it right: export SPARK_DAEMON_JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' export JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' Does anyone has any idea how can I approach this? 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1566 non-empty blocks out of 1566 blocks 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms 14/10/11 13:02:06 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (1 time so far) 14/10/11 13:05:17 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (2 times so far) 14/10/11 13:09:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 1566) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/11 13:09:15 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140 Arian
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Hi Akhil, thanks for your help but I was originally running without xmx option. With that I was just trying to push the limit of my heap size, but obviously doing it wrong. Arian Pasquali http://about.me/arianpasquali 2014-10-20 12:24 GMT+01:00 Akhil Das ak...@sigmoidanalytics.com: Hi Arian, You will get this exception because you are trying to create an array that is larger than the maximum contiguous block of memory in your Java VMs heap. Here since you are setting Worker memory as *5Gb* and you are exporting the *_OPTS as *8Gb*, your application actually thinks it has 8Gb of memory where as it only has 5Gb and hence it exceeds the VM Limit. Thanks Best Regards On Mon, Oct 20, 2014 at 4:42 PM, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I’m using Spark 1.1.0 and I’m having some issues to setup memory options. I get “Requested array size exceeds VM limit” and I’m probably missing something regarding memory configuration https://spark.apache.org/docs/1.1.0/configuration.html. My server has 30G of memory and this are my current settings. ##this one seams that was deprecated export SPARK_MEM=‘25g’ ## worker memory options seams to be the memory for each worker (by default we have a worker for each core) export SPARK_WORKER_MEMORY=‘5g’ I probably need to specify some options using SPARK_DAEMON_JAVA_OPTS, but I’m not quite sure how. I have tried some different options like the following, but I still couldn’t make it right: export SPARK_DAEMON_JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' export JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' Does anyone has any idea how can I approach this? 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1566 non-empty blocks out of 1566 blocks 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms 14/10/11 13:02:06 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (1 time so far) 14/10/11 13:05:17 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (2 times so far) 14/10/11 13:09:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 1566) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/11 13:09:15 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140 Arian
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Try setting SPARK_EXECUTOR_MEMORY=5g (not sure how many workers you are having), You can also set the executor memory while creating the sparkContext (like *sparkContext.set(spark.executor.memory,5g)* ) Thanks Best Regards On Mon, Oct 20, 2014 at 5:01 PM, Arian Pasquali ar...@arianpasquali.com wrote: Hi Akhil, thanks for your help but I was originally running without xmx option. With that I was just trying to push the limit of my heap size, but obviously doing it wrong. Arian Pasquali http://about.me/arianpasquali 2014-10-20 12:24 GMT+01:00 Akhil Das ak...@sigmoidanalytics.com: Hi Arian, You will get this exception because you are trying to create an array that is larger than the maximum contiguous block of memory in your Java VMs heap. Here since you are setting Worker memory as *5Gb* and you are exporting the *_OPTS as *8Gb*, your application actually thinks it has 8Gb of memory where as it only has 5Gb and hence it exceeds the VM Limit. Thanks Best Regards On Mon, Oct 20, 2014 at 4:42 PM, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I’m using Spark 1.1.0 and I’m having some issues to setup memory options. I get “Requested array size exceeds VM limit” and I’m probably missing something regarding memory configuration https://spark.apache.org/docs/1.1.0/configuration.html. My server has 30G of memory and this are my current settings. ##this one seams that was deprecated export SPARK_MEM=‘25g’ ## worker memory options seams to be the memory for each worker (by default we have a worker for each core) export SPARK_WORKER_MEMORY=‘5g’ I probably need to specify some options using SPARK_DAEMON_JAVA_OPTS, but I’m not quite sure how. I have tried some different options like the following, but I still couldn’t make it right: export SPARK_DAEMON_JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' export JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' Does anyone has any idea how can I approach this? 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1566 non-empty blocks out of 1566 blocks 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms 14/10/11 13:02:06 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (1 time so far) 14/10/11 13:05:17 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (2 times so far) 14/10/11 13:09:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 1566) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/11 13:09:15 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140 Arian
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Hi, The array size you (or the serializer) tries to allocate is just too big for the JVM. No configuration can help : https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit The only option is to split you problem further by increasing parallelism. Guillaume Hi, I’m using Spark 1.1.0 and I’m having some issues to setup memory options. I get “Requested array size exceeds VM limit” and I’m probably missing something regarding memory configuration https://spark.apache.org/docs/1.1.0/configuration.html. My server has 30G of memory and this are my current settings. ##this one seams that was deprecated export SPARK_MEM=‘25g’ ## worker memory options seams to be the memory for each worker (by default we have a worker for each core) export SPARK_WORKER_MEMORY=‘5g’ I probably need to specify some options using SPARK_DAEMON_JAVA_OPTS, but I’m not quite sure how. I have tried some different options like the following, but I still couldn’t make it right: export SPARK_DAEMON_JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' export JAVA_OPTS='-Xmx8G -XX:+UseCompressedOops' Does anyone has any idea how can I approach this? 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1566 non-empty blocks out of 1566 blocks 14/10/11 13:00:16 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms 14/10/11 13:02:06 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (1 time so far) 14/10/11 13:05:17 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory map of 3925 MB to disk (2 times so far) 14/10/11 13:09:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 1566) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/11 13:09:15 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140 Arian -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705