Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Denny Lee
I went ahead and tested your file and the results from the tests can be
seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default}
the query ran without any issues.  I was able to recreate the issue with
{Java 7, default}.  I included the commands I used to start the spark-shell
but basically I just used all defaults (no alteration to driver or executor
memory) with the only additional call was with driver-class-path to connect
to MySQL Hive metastore.  This is on OSX Macbook Pro.

One thing I did notice is that your version of Java 7 is version 51 while
my version of Java 7 version 79.  Could you see if updating to Java 7
version 79 perhaps allows you to use the MaxPermSize call?




On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote:

  The file is at
 https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1

  The command was included in the gist

  SPARK_REPL_OPTS=-XX:MaxPermSize=256m
 spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
 com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Monday, July 6, 2015 at 12:59 AM
 To: Simeon Simeonov s...@swoop.com
 Cc: Denny Lee denny.g@gmail.com, Andy Huang 
 andy.hu...@servian.com.au, user user@spark.apache.org

 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   I have never seen issue like this. Setting PermGen size to 256m should
 solve the problem. Can you send me your test file and the command used to
 launch the spark shell or your application?

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote:

   Yin,

  With 512Mb PermGen, the process still hung and had to be kill -9ed.

  At 1Gb the spark shell  associated processes stopped hanging and
 started exiting with

  scala println(dfCount.first.getLong(0))
 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
 called with curMem=0, maxMem=2223023063
 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
 values in memory (estimated size 229.5 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called
 with curMem=235040, maxMem=2223023063
 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first
 at console:30
 java.lang.OutOfMemoryError: PermGen space
 Stopping spark context.
 Exception in thread main
 Exception: java.lang.OutOfMemoryError thrown from the
 UncaughtExceptionHandler in thread main
 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
 broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
 GB)

  That did not change up until 4Gb of PermGen space and 8Gb for driver 
 executor each.

  I stopped at this point because the exercise started looking silly. It
 is clear that 1.4.0 is using memory in a substantially different manner.

  I'd be happy to share the test file so you can reproduce this in your
 own environment.

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Sunday, July 5, 2015 at 11:04 PM
 To: Denny Lee denny.g@gmail.com
 Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov 
 s...@swoop.com, user user@spark.apache.org
 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   Sim,

  Can you increase the PermGen size? Please let me know what is your
 setting when the problem disappears.

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote:

  I had run into the same problem where everything was working
 swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by
 upgrading to Java8 (from Java7) or by knocking up the PermGenSize had
 solved my issue.  HTH!



  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au
 wrote:

 We have hit the same issue in spark shell when registering a temp
 table. We observed it happening with those who had JDK 6. The problem went
 away after installing jdk 8. This was only for the tutorial materials which
 was about loading a parquet file.

  Regards
 Andy

 On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote:

 @bipin, in my case the error happens immediately in a fresh shell in
 1.4.0.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
  Sent from

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
You meant SPARK_REPL_OPTS? I did a quick search. Looks like it has been
removed since 1.0. I think it did not affect the behavior of the shell.

On Mon, Jul 6, 2015 at 9:04 AM, Simeon Simeonov s...@swoop.com wrote:

   Yin, that did the trick.

  I'm curious what was the effect of the environment variable, however, as
 the behavior of the shell changed from hanging to quitting when the env var
 value got to 1g.

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Monday, July 6, 2015 at 11:41 AM
 To: Denny Lee denny.g@gmail.com
 Cc: Simeon Simeonov s...@swoop.com, Andy Huang andy.hu...@servian.com.au,
 user user@spark.apache.org

 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   Hi Sim,

  I think the right way to set the PermGen Size is through driver extra
 JVM options, i.e.

  --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m

  Can you try it? Without this conf, your driver's PermGen size is still
 128m.

  Thanks,

  Yin

 On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee denny.g@gmail.com wrote:

  I went ahead and tested your file and the results from the tests can be
 seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

  Basically, when running {Java 7, MaxPermSize = 256} or {Java 8,
 default} the query ran without any issues.  I was able to recreate the
 issue with {Java 7, default}.  I included the commands I used to start the
 spark-shell but basically I just used all defaults (no alteration to driver
 or executor memory) with the only additional call was with
 driver-class-path to connect to MySQL Hive metastore.  This is on OSX
 Macbook Pro.

  One thing I did notice is that your version of Java 7 is version 51
 while my version of Java 7 version 79.  Could you see if updating to Java 7
 version 79 perhaps allows you to use the MaxPermSize call?




  On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote:

  The file is at
 https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1

  The command was included in the gist

  SPARK_REPL_OPTS=-XX:MaxPermSize=256m
 spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
 com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Monday, July 6, 2015 at 12:59 AM
 To: Simeon Simeonov s...@swoop.com
 Cc: Denny Lee denny.g@gmail.com, Andy Huang 
 andy.hu...@servian.com.au, user user@spark.apache.org

 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   I have never seen issue like this. Setting PermGen size to 256m
 should solve the problem. Can you send me your test file and the command
 used to launch the spark shell or your application?

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote:

   Yin,

  With 512Mb PermGen, the process still hung and had to be kill -9ed.

  At 1Gb the spark shell  associated processes stopped hanging and
 started exiting with

  scala println(dfCount.first.getLong(0))
 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
 called with curMem=0, maxMem=2223023063
 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
 values in memory (estimated size 229.5 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184)
 called with curMem=235040, maxMem=2223023063
 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added
 broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1
 GB)
 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from
 first at console:30
 java.lang.OutOfMemoryError: PermGen space
 Stopping spark context.
 Exception in thread main
 Exception: java.lang.OutOfMemoryError thrown from the
 UncaughtExceptionHandler in thread main
 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
 broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
 GB)

  That did not change up until 4Gb of PermGen space and 8Gb for driver
  executor each.

  I stopped at this point because the exercise started looking silly.
 It is clear that 1.4.0 is using memory in a substantially different manner.

  I'd be happy to share the test file so you can reproduce this in your
 own environment.

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com |
 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Sunday, July 5, 2015 at 11:04 PM
 To: Denny Lee denny.g@gmail.com
 Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov 
 s...@swoop.com

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
Hi Sim,

I think the right way to set the PermGen Size is through driver extra JVM
options, i.e.

--conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m

Can you try it? Without this conf, your driver's PermGen size is still 128m.

Thanks,

Yin

On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee denny.g@gmail.com wrote:

 I went ahead and tested your file and the results from the tests can be
 seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

 Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default}
 the query ran without any issues.  I was able to recreate the issue with
 {Java 7, default}.  I included the commands I used to start the spark-shell
 but basically I just used all defaults (no alteration to driver or executor
 memory) with the only additional call was with driver-class-path to connect
 to MySQL Hive metastore.  This is on OSX Macbook Pro.

 One thing I did notice is that your version of Java 7 is version 51 while
 my version of Java 7 version 79.  Could you see if updating to Java 7
 version 79 perhaps allows you to use the MaxPermSize call?




 On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov s...@swoop.com wrote:

  The file is at
 https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1

  The command was included in the gist

  SPARK_REPL_OPTS=-XX:MaxPermSize=256m
 spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
 com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Monday, July 6, 2015 at 12:59 AM
 To: Simeon Simeonov s...@swoop.com
 Cc: Denny Lee denny.g@gmail.com, Andy Huang 
 andy.hu...@servian.com.au, user user@spark.apache.org

 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   I have never seen issue like this. Setting PermGen size to 256m should
 solve the problem. Can you send me your test file and the command used to
 launch the spark shell or your application?

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote:

   Yin,

  With 512Mb PermGen, the process still hung and had to be kill -9ed.

  At 1Gb the spark shell  associated processes stopped hanging and
 started exiting with

  scala println(dfCount.first.getLong(0))
 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
 called with curMem=0, maxMem=2223023063
 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
 values in memory (estimated size 229.5 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184)
 called with curMem=235040, maxMem=2223023063
 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added
 broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1
 GB)
 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from
 first at console:30
 java.lang.OutOfMemoryError: PermGen space
 Stopping spark context.
 Exception in thread main
 Exception: java.lang.OutOfMemoryError thrown from the
 UncaughtExceptionHandler in thread main
 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
 broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
 GB)

  That did not change up until 4Gb of PermGen space and 8Gb for driver 
 executor each.

  I stopped at this point because the exercise started looking silly. It
 is clear that 1.4.0 is using memory in a substantially different manner.

  I'd be happy to share the test file so you can reproduce this in your
 own environment.

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Sunday, July 5, 2015 at 11:04 PM
 To: Denny Lee denny.g@gmail.com
 Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov 
 s...@swoop.com, user user@spark.apache.org
 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   Sim,

  Can you increase the PermGen size? Please let me know what is your
 setting when the problem disappears.

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote:

  I had run into the same problem where everything was working
 swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by
 upgrading to Java8 (from Java7) or by knocking up the PermGenSize had
 solved my issue.  HTH!



  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au
 wrote:

 We have hit the same issue in spark shell when registering a temp
 table. We observed it happening with those who had JDK 6. The problem went
 away after installing jdk 8. This was only for the tutorial materials 
 which
 was about loading a parquet

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Simeon Simeonov
Yin, that did the trick.

I'm curious what was the effect of the environment variable, however, as the 
behavior of the shell changed from hanging to quitting when the env var value 
got to 1g.

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Monday, July 6, 2015 at 11:41 AM
To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com
Cc: Simeon Simeonov s...@swoop.commailto:s...@swoop.com, Andy Huang 
andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Hi Sim,

I think the right way to set the PermGen Size is through driver extra JVM 
options, i.e.

--conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m

Can you try it? Without this conf, your driver's PermGen size is still 128m.

Thanks,

Yin

On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee 
denny.g@gmail.commailto:denny.g@gmail.com wrote:
I went ahead and tested your file and the results from the tests can be seen in 
the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the 
query ran without any issues.  I was able to recreate the issue with {Java 7, 
default}.  I included the commands I used to start the spark-shell but 
basically I just used all defaults (no alteration to driver or executor memory) 
with the only additional call was with driver-class-path to connect to MySQL 
Hive metastore.  This is on OSX Macbook Pro.

One thing I did notice is that your version of Java 7 is version 51 while my 
version of Java 7 version 79.  Could you see if updating to Java 7 version 79 
perhaps allows you to use the MaxPermSize call?




On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov 
s...@swoop.commailto:s...@swoop.com wrote:
The file is at 
https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1

The command was included in the gist

SPARK_REPL_OPTS=-XX:MaxPermSize=256m 
spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages 
com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Monday, July 6, 2015 at 12:59 AM
To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com
Cc: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com, Andy Huang 
andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user 
user@spark.apache.orgmailto:user@spark.apache.org

Subject: Re: 1.4.0 regression: out-of-memory errors on small data

I have never seen issue like this. Setting PermGen size to 256m should solve 
the problem. Can you send me your test file and the command used to launch the 
spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov 
s...@swoop.commailto:s...@swoop.com wrote:
Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell  associated processes stopped hanging and started 
exiting with

scala println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with 
curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values 
in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with 
curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as 
bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at 
console:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread main
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread main
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 
localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver  executor 
each.

I stopped at this point because the exercise started looking silly. It is clear 
that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own 
environment.

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Andy Huang
We have hit the same issue in spark shell when registering a temp table. We
observed it happening with those who had JDK 6. The problem went away after
installing jdk 8. This was only for the tutorial materials which was about
loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote:

 @bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
f: 02 9376 0730| m: 0433221979


Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Denny Lee
I had run into the same problem where everything was working swimmingly
with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au wrote:

 We have hit the same issue in spark shell when registering a temp table.
 We observed it happening with those who had JDK 6. The problem went away
 after installing jdk 8. This was only for the tutorial materials which was
 about loading a parquet file.

 Regards
 Andy

 On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote:

 @bipin, in my case the error happens immediately in a fresh shell in
 1.4.0.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
 f: 02 9376 0730| m: 0433221979



Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
The file is at 
https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-0.gz?dl=1

The command was included in the gist

SPARK_REPL_OPTS=-XX:MaxPermSize=256m 
spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages 
com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Monday, July 6, 2015 at 12:59 AM
To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com
Cc: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com, Andy Huang 
andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, user 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

I have never seen issue like this. Setting PermGen size to 256m should solve 
the problem. Can you send me your test file and the command used to launch the 
spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov 
s...@swoop.commailto:s...@swoop.com wrote:
Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell  associated processes stopped hanging and started 
exiting with

scala println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with 
curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values 
in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with 
curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as 
bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at 
console:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread main
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread main
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 
localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver  executor 
each.

I stopped at this point because the exercise started looking silly. It is clear 
that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own 
environment.

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746tel:617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com
Cc: Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, 
Simeon Simeonov s...@swoop.commailto:s...@swoop.com, user 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Sim,

Can you increase the PermGen size? Please let me know what is your setting when 
the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee 
denny.g@gmail.commailto:denny.g@gmail.com wrote:
I had run into the same problem where everything was working swimmingly with 
Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to Java8 (from 
Java7) or by knocking up the PermGenSize had solved my issue.  HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang 
andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au wrote:
We have hit the same issue in spark shell when registering a temp table. We 
observed it happening with those who had JDK 6. The problem went away after 
installing jdk 8. This was only for the tutorial materials which was about 
loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.commailto:s...@swoop.com 
wrote:
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 
9376 0730| m: 0433221979




Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell  associated processes stopped hanging and started 
exiting with

scala println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with 
curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values 
in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with 
curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as 
bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in 
memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at 
console:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread main
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread main
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 
localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver  executor 
each.

I stopped at this point because the exercise started looking silly. It is clear 
that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own 
environment.

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com
Cc: Andy Huang andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au, 
Simeon Simeonov s...@swoop.commailto:s...@swoop.com, user 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Sim,

Can you increase the PermGen size? Please let me know what is your setting when 
the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee 
denny.g@gmail.commailto:denny.g@gmail.com wrote:
I had run into the same problem where everything was working swimmingly with 
Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to Java8 (from 
Java7) or by knocking up the PermGenSize had solved my issue.  HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang 
andy.hu...@servian.com.aumailto:andy.hu...@servian.com.au wrote:
We have hit the same issue in spark shell when registering a temp table. We 
observed it happening with those who had JDK 6. The problem went away after 
installing jdk 8. This was only for the tutorial materials which was about 
loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.commailto:s...@swoop.com 
wrote:
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 
9376 0730| m: 0433221979



Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
I have never seen issue like this. Setting PermGen size to 256m should
solve the problem. Can you send me your test file and the command used to
launch the spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov s...@swoop.com wrote:

   Yin,

  With 512Mb PermGen, the process still hung and had to be kill -9ed.

  At 1Gb the spark shell  associated processes stopped hanging and
 started exiting with

  scala println(dfCount.first.getLong(0))
 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called
 with curMem=0, maxMem=2223023063
 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
 values in memory (estimated size 229.5 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called
 with curMem=235040, maxMem=2223023063
 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first
 at console:30
 java.lang.OutOfMemoryError: PermGen space
 Stopping spark context.
 Exception in thread main
 Exception: java.lang.OutOfMemoryError thrown from the
 UncaughtExceptionHandler in thread main
 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
 broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
 GB)

  That did not change up until 4Gb of PermGen space and 8Gb for driver 
 executor each.

  I stopped at this point because the exercise started looking silly. It
 is clear that 1.4.0 is using memory in a substantially different manner.

  I'd be happy to share the test file so you can reproduce this in your
 own environment.

  /Sim

  Simeon Simeonov, Founder  CTO, Swoop http://swoop.com/
 @simeons http://twitter.com/simeons | blog.simeonov.com | 617.299.6746


   From: Yin Huai yh...@databricks.com
 Date: Sunday, July 5, 2015 at 11:04 PM
 To: Denny Lee denny.g@gmail.com
 Cc: Andy Huang andy.hu...@servian.com.au, Simeon Simeonov s...@swoop.com,
 user user@spark.apache.org
 Subject: Re: 1.4.0 regression: out-of-memory errors on small data

   Sim,

  Can you increase the PermGen size? Please let me know what is your
 setting when the problem disappears.

  Thanks,

  Yin

 On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote:

  I had run into the same problem where everything was working swimmingly
 with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
 Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
 HTH!



  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au
 wrote:

 We have hit the same issue in spark shell when registering a temp table.
 We observed it happening with those who had JDK 6. The problem went away
 after installing jdk 8. This was only for the tutorial materials which was
 about loading a parquet file.

  Regards
 Andy

 On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote:

 @bipin, in my case the error happens immediately in a fresh shell in
 1.4.0.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
  Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




  --
  Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
 f: 02 9376 0730| m: 0433221979





Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
Sim,

Can you increase the PermGen size? Please let me know what is your setting
when the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee denny.g@gmail.com wrote:

 I had run into the same problem where everything was working swimmingly
 with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
 Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
 HTH!



 On Mon, Jul 6, 2015 at 8:31 AM Andy Huang andy.hu...@servian.com.au
 wrote:

 We have hit the same issue in spark shell when registering a temp table.
 We observed it happening with those who had JDK 6. The problem went away
 after installing jdk 8. This was only for the tutorial materials which was
 about loading a parquet file.

 Regards
 Andy

 On Sat, Jul 4, 2015 at 2:54 AM, sim s...@swoop.com wrote:

 @bipin, in my case the error happens immediately in a fresh shell in
 1.4.0.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
 f: 02 9376 0730| m: 0433221979




Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
Hi Sim,

Seems you already set the PermGen size to 256m, right? I notice that in
your the shell, you created a HiveContext (it further increased the memory
consumption on PermGen). But, spark shell has already created a HiveContext
for you (sqlContext. You can use asInstanceOf to access HiveContext's
methods). Can you just use the sqlContext created by the shell and try
again?

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai yh...@databricks.com wrote:

 Hi Sim,

 Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
 add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the
 command you used to launch Spark shell? This will increase the PermGen size
 from 128m (our default) to 256m.

 Thanks,

 Yin

 On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.com wrote:

 A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1
 and
 fails with a series of out-of-memory errors in 1.4.0.

 This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4
 includes the code and the full output from the 1.3.1 and 1.4.0 runs,
 including the command line showing how spark-shell is started.

 Should the 1.4.0 spark-shell be started with different options to avoid
 this
 problem?

 Thanks,
 Sim




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Simeon Simeonov
Same error with the new code:

import org.apache.spark.sql.hive.HiveContext

val ctx = sqlContext.asInstanceOf[HiveContext]
import ctx.implicits._

val df = 
ctx.jsonFile(file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-0.gz)
df.registerTempTable(training)

val dfCount = ctx.sql(select count(*) as cnt from training)
println(dfCount.first.getLong(0))

/Sim

Simeon Simeonov, Founder  CTO, Swoophttp://swoop.com/
@simeonshttp://twitter.com/simeons | 
blog.simeonov.comhttp://blog.simeonov.com/ | 617.299.6746


From: Yin Huai yh...@databricks.commailto:yh...@databricks.com
Date: Thursday, July 2, 2015 at 4:34 PM
To: Simeon Simeonov s...@swoop.commailto:s...@swoop.com
Cc: user user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Hi Sim,

Seems you already set the PermGen size to 256m, right? I notice that in your 
the shell, you created a HiveContext (it further increased the memory 
consumption on PermGen). But, spark shell has already created a HiveContext for 
you (sqlContext. You can use asInstanceOf to access HiveContext's methods). Can 
you just use the sqlContext created by the shell and try again?

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai 
yh...@databricks.commailto:yh...@databricks.com wrote:
Hi Sim,

Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained 
in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf 
spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the command you used to 
launch Spark shell? This will increase the PermGen size from 128m (our default) 
to 256m.

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.commailto:s...@swoop.com 
wrote:
A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and
fails with a series of out-of-memory errors in 1.4.0.

This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4
includes the code and the full output from the 1.3.1 and 1.4.0 runs,
including the command line showing how spark-shell is started.

Should the 1.4.0 spark-shell be started with different options to avoid this
problem?

Thanks,
Sim




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org





Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
Hi Sim,

Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
(explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
add --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=256m in the
command you used to launch Spark shell? This will increase the PermGen size
from 128m (our default) to 256m.

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:40 PM, sim s...@swoop.com wrote:

 A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1
 and
 fails with a series of out-of-memory errors in 1.4.0.

 This gist https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4
 includes the code and the full output from the 1.3.1 and 1.4.0 runs,
 including the command line showing how spark-shell is started.

 Should the 1.4.0 spark-shell be started with different options to avoid
 this
 problem?

 Thanks,
 Sim




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org