Re: 1.4.0 regression: out-of-memory errors on small data
I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94. Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues. I was able to recreate the issue with {Java 7, default}. I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore. This is on OSX Macbook Pro. One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79. Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call? On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote: The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS=-XX:MaxPermSize=256m spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov s...@swoop.com Cc: Denny Lee denny.g@gmail.com, Andy Huang andy.hu...@servian.com.au, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.com, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from
Re: 1.4.0 regression: out-of-memory errors on small data
You meant SPARK_REPL_OPTS? I did a quick search. Looks like it has been removed since 1.0. I think it did not affect the behavior of the shell. On Mon, Jul 6, 2015 at 9:04 AM, Simeon Simeonov s...@swoop.com wrote: Yin, that did the trick. I'm curious what was the effect of the environment variable, however, as the behavior of the shell changed from hanging to quitting when the env var value got to 1g. /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Monday, July 6, 2015 at 11:41 AM To: Denny Lee denny.g@gmail.com Cc: Simeon Simeonov s...@swoop.com, Andy Huang andy.hu...@servian.com.au, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m Can you try it? Without this conf, your driver's PermGen size is still 128m. Thanks, Yin On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee denny.g@gmail.com wrote: I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94. Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues. I was able to recreate the issue with {Java 7, default}. I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore. This is on OSX Macbook Pro. One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79. Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call? On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote: The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS=-XX:MaxPermSize=256m spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov s...@swoop.com Cc: Denny Lee denny.g@gmail.com, Andy Huang andy.hu...@servian.com.au, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.com
Re: 1.4.0 regression: out-of-memory errors on small data
Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m Can you try it? Without this conf, your driver's PermGen size is still 128m. Thanks, Yin On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee denny.g@gmail.com wrote: I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94. Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues. I was able to recreate the issue with {Java 7, default}. I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore. This is on OSX Macbook Pro. One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79. Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call? On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote: The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS=-XX:MaxPermSize=256m spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov s...@swoop.com Cc: Denny Lee denny.g@gmail.com, Andy Huang andy.hu...@servian.com.au, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.com, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet
Re: 1.4.0 regression: out-of-memory errors on small data
Yin, that did the trick. I'm curious what was the effect of the environment variable, however, as the behavior of the shell changed from hanging to quitting when the env var value got to 1g. /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Monday, July 6, 2015 at 11:41 AM To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com Cc: Simeon Simeonov s...@swoop.commailto:s...@swoop.com, Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m Can you try it? Without this conf, your driver's PermGen size is still 128m. Thanks, Yin On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee denny.g@gmail.commailto:denny.g@gmail.com wrote: I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94. Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues. I was able to recreate the issue with {Java 7, default}. I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore. This is on OSX Macbook Pro. One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79. Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call? On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.commailto:s...@swoop.com wrote: The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS=-XX:MaxPermSize=256m spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com Cc: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com, Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.commailto:s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Re: 1.4.0 regression: out-of-memory errors on small data
We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS=-XX:MaxPermSize=256m spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com Cc: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com, Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.commailto:s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.commailto:s...@swoop.com, user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.commailto:denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.commailto:s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.commailto:s...@swoop.com, user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.commailto:denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.commailto:s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell associated processes stopped hanging and started exiting with scala println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at console:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread main Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder CTO, Swoop http://swoop.com/ @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746 From: Yin Huai yh...@databricks.com Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee denny.g@gmail.com Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.com, user user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979
Re: 1.4.0 regression: out-of-memory errors on small data
Hi Sim, Seems you already set the PermGen size to 256m, right? I notice that in your the shell, you created a HiveContext (it further increased the memory consumption on PermGen). But, spark shell has already created a HiveContext for you (sqlContext. You can use asInstanceOf to access HiveContext's methods). Can you just use the sqlContext created by the shell and try again? Thanks, Yin On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai yh...@databricks.com wrote: Hi Sim, Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the command you used to launch Spark shell? This will increase the PermGen size from 128m (our default) to 256m. Thanks, Yin On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.com wrote: A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0. This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4 includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the command line showing how spark-shell is started. Should the 1.4.0 spark-shell be started with different options to avoid this problem? Thanks, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: 1.4.0 regression: out-of-memory errors on small data
Same error with the new code: import org.apache.spark.sql.hive.HiveContext val ctx = sqlContext.asInstanceOf[HiveContext] import ctx.implicits._ val df = ctx.jsonFile(file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-0.gz) df.registerTempTable(training) val dfCount = ctx.sql(select count(*) as cnt from training) println(dfCount.first.getLong(0)) /Sim Simeon Simeonov, Founder CTO, Swoophttp://swoop.com/ @simeonshttp://twitter.com/simeons | blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746 From: Yin Huai yh...@databricks.commailto:yh...@databricks.com Date: Thursday, July 2, 2015 at 4:34 PM To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com Cc: user user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, Seems you already set the PermGen size to 256m, right? I notice that in your the shell, you created a HiveContext (it further increased the memory consumption on PermGen). But, spark shell has already created a HiveContext for you (sqlContext. You can use asInstanceOf to access HiveContext's methods). Can you just use the sqlContext created by the shell and try again? Thanks, Yin On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai yh...@databricks.commailto:yh...@databricks.com wrote: Hi Sim, Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the command you used to launch Spark shell? This will increase the PermGen size from 128m (our default) to 256m. Thanks, Yin On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.commailto:s...@swoop.com wrote: A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0. This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4 includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the command line showing how spark-shell is started. Should the 1.4.0 spark-shell be started with different options to avoid this problem? Thanks, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: 1.4.0 regression: out-of-memory errors on small data
Hi Sim, Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the command you used to launch Spark shell? This will increase the PermGen size from 128m (our default) to 256m. Thanks, Yin On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.com wrote: A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0. This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4 includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the command line showing how spark-shell is started. Should the 1.4.0 spark-shell be started with different options to avoid this problem? Thanks, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org