Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
; >> > > >> > There is a big table (5.6 Billion rows, 450Gb in memory) loaded into > 300 > >> > executors's memory in SparkSQL, on which we would do some calculation > >> > using > >> > UDFs in pyspark. > >> > If I run my SQL on on

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Davies Liu
n >> > using >> > UDFs in pyspark. >> > If I run my SQL on only a portion of the data (filtering by one of the >> > attributes), let's say 800 million records, then all works well. But >> > when I >> > run the same SQL on all the data, then I receive &g

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
the data (filtering by one of the > > attributes), let's say 800 million records, then all works well. But > when I > > run the same SQL on all the data, then I receive > > "java.lang.OutOfMemoryError: GC overhead limit exceeded" from basically > all > > o

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Davies Liu
D_ACCOUNT_STATE_UNIV > FROM ma""") > > results_df.registerTempTable("m") > sqlContext.cacheTable("m") > > results_df = sqlContext.sql("""SELECT COUNT(*) FROM m""") > print(results_df.take(1)) > > > - the error

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
a portion of the data (filtering by one of the attributes), let's say 800 million records, then all works well. But when I run the same SQL on all the data, then I receive "*java.lang.OutOfMemoryError: GC overhead limit exceeded"* from basically all of the executors. It seems to me that py

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
he partitioned dataset successfully. I can see the output in HDFS once all Spark tasks are done. After the spark tasks are done, the job appears to be running for over an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC o

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Adrien Mogenet
I can see the output in HDFS once all Spark tasks >> are done. >> >> After the spark tasks are done, the job appears to be running for over an >> hour, until I get the following (full stack trace below): >> >> java.lang.OutOfMemoryError: GC overhead limit exc

df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-28 Thread Don Drake
an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) I had set the driver memory to be 20GB. I attempted

newbie simple app, small data set: Py4JJavaError java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-18 Thread Andy Davidson
'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o65.partitions. : java.lang.OutOfMemoryError: GC overhead limit exceed

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-07 Thread Sun, Rui
val Patel [mailto:dhaval1...@gmail.com] Sent: Saturday, November 7, 2015 12:26 AM To: Spark User Group Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded I have been struggling through this error since past 3 days and have tried all possible ways/suggest

[sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-06 Thread Dhaval Patel
cast_2_piece0 on localhost:39562 in memory (size: 2.4 KB, free: 530.0 MB) 15/11/06 10:45:20 INFO ContextCleaner: Cleaned accumulator 2 15/11/06 10:45:53 WARN ServletHandler: Error for /static/timeline-view.css java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.zip.Zip

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread t_ras
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying coutn action on a file. The file is a CSV file 217GB zise Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0 configutation: spark.app.id:local-1443956477103 spark.app.name:Spark shell spark.cores.max

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread Ted Yu
1.2.0 is quite old. You may want to try 1.5.1 which was released in the past week. Cheers > On Oct 4, 2015, at 4:26 AM, t_ras <marti...@netvision.net.il> wrote: > > I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying > coutn action on a file. > >

AW: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-08-11 Thread rene.pfitzner
: Samstag, 11. Juli 2015 03:58 An: Ted Yu; Robin East; user Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option

Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit exceeded It seems that the map function keeps the hashDocs RDD in the memory and when the memory is filled in an executor, the application

Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Ted Yu
the html that has the shortest URL. However, after running for 2-3 hours the application crashes due to memory issue. Here is the exception: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit

Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
): java.lang.OutOfMemoryError: GC overhead limit exceeded It seems that the map function keeps the hashDocs RDD in the memory and when the memory is filled in an executor, the application crashes. Persisting the map output to disk solves the problem. Adding the following line between map and reduce solve

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option), but with workers I have errors. So I run spark shell: ./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory 6900m --driver-memory 15g and workers

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Ok, but what does it means? I did not change the core files of spark, so is it a bug there? PS: on small datasets (500 Mb) I have no problem. Am 25.06.2015 18:02 schrieb Ted Yu yuzhih...@gmail.com: The assertion failure from TriangleCount.scala corresponds with the following lines:

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Robin East
You’ll get this issue if you just take the first 2000 lines of that file. The problem is triangleCount() expects srdId dstId which is not the case in the file (e.g. vertex 28). You can get round this by calling graph.convertToCanonical Edges() which removes bi-directional edges and ensures

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Yep, I already found it. So I added 1 line: val graph = GraphLoader.edgeListFile(sc, , ...) val newgraph = graph.convertToCanonicalEdges() and could successfully count triangles on newgraph. Next will test it on bigger (several Gb) networks. I am using Spark 1.3 and 1.4 but haven't seen

Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Roman Sokolov
Hello! I am trying to compute number of triangles with GraphX. But get memory error or heap size, even though the dataset is very small (1Gb). I run the code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on separate machines 8Gb RAM each). So I have 15x more memory than the

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Ted Yu
The assertion failure from TriangleCount.scala corresponds with the following lines: g.outerJoinVertices(counters) { (vid, _, optCounter: Option[Int]) = val dblCount = optCounter.getOrElse(0) // double count should be even (divisible by two) assert((dblCount 1)

Re: Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread Deng Ching-Mallete
java.lang.OutOfMemoryError: GC overhead limit exceeded. The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried increasing more cores and executors which sometime works

Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread diplomatic Guru
Hello All, I have a Spark job that throws java.lang.OutOfMemoryError: GC overhead limit exceeded. The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried increasing more

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
kras...@gmail.com Cc: Sandy Ryza sandy.r...@cloudera.com, user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded I have yarn configured with yarn.nodemanager.vmem-check-enabled=false and yarn.nodemanager.pmem-check-enabled=false to avoid

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sandy Ryza
Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a different reason? -Sandy On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors crashed with this error. does that mean I have genuinely not enough RAM or is this matter of config tuning? other config options used:spark.storage.memoryFraction=0.3 SPARK_EXECUTOR_MEMORY=14G running spark 1.2.0 as

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
sandy.r...@cloudera.com Date: Tuesday, January 27, 2015 at 3:33 PM To: Antony Mayi antonym...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sven Krasser
, 2015 at 3:33 PM To: Antony Mayi antonym...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Since it's an executor running OOM it doesn't look like a container being killed by YARN to me. As a starting point, can you repartition your job into smaller tasks? -Sven On Tue, Jan 27, 2015 at 2:34 PM, Guru Medasani gdm...@outlook.com

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
17:02:53 ERROR executor.Executor: Exception in task 21.0 in stage 12.0 (TID 1312)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.Integer.valueOf(Integer.java:642)        at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70

[Spark Streaming] java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-09-08 Thread Yan Fang
Hi guys, My Spark Streaming application have this java.lang.OutOfMemoryError: GC overhead limit exceeded error in SparkStreaming driver program. I have done the following to debug with it: 1. improved the driver memory from 1GB to 2GB, this error came after 22 hrs. When the memory was 1GB

Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-08-04 Thread buntu
I got a 40 node cdh 5.1 cluster and attempting to run a simple spark app that processes about 10-15GB raw data but I keep running into this error: java.lang.OutOfMemoryError: GC overhead limit exceeded Each node has 8 cores and 2GB memory. I notice the heap size on the executors is set

Re: Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-08-04 Thread Sean Owen
to run a simple spark app that processes about 10-15GB raw data but I keep running into this error: java.lang.OutOfMemoryError: GC overhead limit exceeded Each node has 8 cores and 2GB memory. I notice the heap size on the executors is set to 512MB with total heap size on each executor

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Abel Coronado Iruegas
Hi Yifan This works for me: export SPARK_JAVA_OPTS=-Xms10g -Xmx40g -XX:MaxPermSize=10g export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar export SPARK_MEM=40g ./spark-shell Regards On Mon, Jul 21, 2014 at 7:48 AM, Yifan LI iamyifa...@gmail.com wrote: Hi, I am trying to load

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Yifan LI
Thanks, Abel. Best, Yifan LI On Jul 21, 2014, at 4:16 PM, Abel Coronado Iruegas acoronadoirue...@gmail.com wrote: Hi Yifan This works for me: export SPARK_JAVA_OPTS=-Xms10g -Xmx40g -XX:MaxPermSize=10g export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar export

java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Konstantin Kudryavtsev
Hi all, I faced with the next exception during map step: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) java.lang.reflect.Array.newInstance(Array.java:70) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson
. On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi all, I faced with the next exception during map step: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) java.lang.reflect.Array.newInstance(Array.java:70

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Jerry Lam
kudryavtsev.konstan...@gmail.com wrote: Hi all, I faced with the next exception during map step: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) java.lang.reflect.Array.newInstance(Array.java:70

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson
to caching or serializing. On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi all, I faced with the next exception during map step: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded