Re: Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-29 Thread mmdenny
Hi Supun, Did you look at https://spark.apache.org/docs/latest/tuning.html? In addition to the info there, if you're partitioning by some key where you've got a lot of data skew, one of the task's memory requirements may be larger than the RAM of a given executor, where the rest of the tasks

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-27 Thread Supun Nakandala
pipeline is iterative. I get OOM errors and GC overhead limit exceeded errors and I fix them by increasing the heap size or number of partitions even though after doing that there is still high GC pressure. I know that my partitions should be small enough such that it can fit in memory. But when

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
gt;> Python) ? > >> The cached table could take 1.5G, it means almost nothing left for other > >> things. > > True. I have also tried with memoryOverhead being set to 800 (10% of the > 8Gb > > memory), but no difference. The "GC overhead limit exceeded&quo

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Davies Liu
or >> Python) ? >> The cached table could take 1.5G, it means almost nothing left for other >> things. > True. I have also tried with memoryOverhead being set to 800 (10% of the 8Gb > memory), but no difference. The "GC overhead limit exceeded" is still the > same. > >&

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
the data (filtering by one of the > > attributes), let's say 800 million records, then all works well. But > when I > > run the same SQL on all the data, then I receive > > "java.lang.OutOfMemoryError: GC overhead limit exceeded" from basically > all > > o

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Davies Liu
her things. Python UDF do requires some buffering in JVM, the size of buffering depends on how much rows are under processing by Python process. > - a table of 5.6 Billions rows loaded into the memory of the executors > (taking up 450Gb of memory), partitioned evenly across the executors &g

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
a portion of the data (filtering by one of the attributes), let's say 800 million records, then all works well. But when I run the same SQL on all the data, then I receive "*java.lang.OutOfMemoryError: GC overhead limit exceeded"* from basically all of the executors. It seems to me that py

Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
error in the apache spark... >>> >>> "spark.driver.memory 60g >>> spark.python.worker.memory 60g >>> spark.master local[*]" >>> >>> The amount of data is about 5Gb, but spark says that "GC overhead limit >>> exc

Re: GC overhead limit exceeded

2016-05-16 Thread Aleksandr Modestov
; the number of partitions. > > // maropu > > On Mon, May 16, 2016 at 10:00 PM, AlexModestov < > aleksandrmodes...@gmail.com> wrote: > >> I get the error in the apache spark... >> >> "spark.driver.memory 60g >> spark.python.worker.memory 60g >&g

Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
wrote: > I get the error in the apache spark... > > "spark.driver.memory 60g > spark.python.worker.memory 60g > spark.master local[*]" > > The amount of data is about 5Gb, but spark says that "GC overhead limit > exceeded". I guess that my conf-file gi

GC overhead limit exceeded

2016-05-16 Thread AlexModestov
I get the error in the apache spark... "spark.driver.memory 60g spark.python.worker.memory 60g spark.master local[*]" The amount of data is about 5Gb, but spark says that "GC overhead limit exceeded". I guess that my conf-file gives enought resources. "16/05/16 15:13:02

_metada file throwing an "GC overhead limit exceeded" after a write

2016-02-12 Thread Maurin Lenglart
ailed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "qtp1919278883-98" java.lang.OutOfM

Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Barak Yaish
ionRequired","true"); sparkConf.set("spark.kryoserializer.buffer.max.mb","512"); sparkConf.set("spark.default.parallelism","300"); sparkConf.set("spark.rpc.askTimeout","500"); I'm trying to load data from hdfs and running some sqls on it (m

Re: Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Muthu Jayakumar
k.rpc.askTimeout","500"); > > I'm trying to load data from hdfs and running some sqls on it (mostly > groupby) using DataFrames. The logs keep saying that tasks are lost due to > OutOfMemoryError (GC overhead limit exceeded). > > Can you advice what is the recommended settings (memory, cores, > partitions, etc.) for the given hardware? > > Thanks! >

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
he partitioned dataset successfully. I can see the output in HDFS once all Spark tasks are done. After the spark tasks are done, the job appears to be running for over an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC o

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Adrien Mogenet
I can see the output in HDFS once all Spark tasks >> are done. >> >> After the spark tasks are done, the job appears to be running for over an >> hour, until I get the following (full stack trace below): >> >> java.lang.OutOfMemoryError: GC overhead limit exc

df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-28 Thread Don Drake
an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) I had set the driver memory to be 20GB. I attempted

newbie simple app, small data set: Py4JJavaError java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-18 Thread Andy Davidson
'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o65.partitions. : java.lang.OutOfMemoryError: GC overhead limit exceed

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-07 Thread Sun, Rui
val Patel [mailto:dhaval1...@gmail.com] Sent: Saturday, November 7, 2015 12:26 AM To: Spark User Group Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded I have been struggling through this error since past 3 days and have tried all possible ways/suggest

[sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-06 Thread Dhaval Patel
cast_2_piece0 on localhost:39562 in memory (size: 2.4 KB, free: 530.0 MB) 15/11/06 10:45:20 INFO ContextCleaner: Cleaned accumulator 2 15/11/06 10:45:53 WARN ServletHandler: Error for /static/timeline-view.css java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.zip.Zip

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread t_ras
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying coutn action on a file. The file is a CSV file 217GB zise Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0 configutation: spark.app.id:local-1443956477103 spark.app.name:Spark shell spark.cores.max

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread Ted Yu
1.2.0 is quite old. You may want to try 1.5.1 which was released in the past week. Cheers > On Oct 4, 2015, at 4:26 AM, t_ras <marti...@netvision.net.il> wrote: > > I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying > coutn action on a file. > >

Re: Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread Ted Yu
to Spark. Thanks in advance. WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373, myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.sql.types.UTF8String.toString(UTF8String.scala:150

Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread unk1102
, Rpc client disassociated, shuffle not found etc Please help me solve this I am getting mad as I am new to Spark. Thanks in advance. WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373, myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded

AW: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-08-11 Thread rene.pfitzner
: Samstag, 11. Juli 2015 03:58 An: Ted Yu; Robin East; user Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
. Tried increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Ted Yu
getting the below error. Tried increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded

How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
We're getting the below error. Tried increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Ted Yu
java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Dmitry Goldenberg
: We're getting the below error. Tried increasing spark.executor.memory e.g. from 1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Cody Koeninger
1g to 2g but the below error still happens. Any recommendations? Something to do with specifying -Xmx in the submit job scripts? Thanks. Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332

Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit exceeded It seems that the map function keeps the hashDocs RDD in the memory and when the memory is filled in an executor, the application

Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Ted Yu
the html that has the shortest URL. However, after running for 2-3 hours the application crashes due to memory issue. Here is the exception: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit

Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
): java.lang.OutOfMemoryError: GC overhead limit exceeded It seems that the map function keeps the hashDocs RDD in the memory and when the memory is filled in an executor, the application crashes. Persisting the map output to disk solves the problem. Adding the following line between map and reduce solve

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option), but with workers I have errors. So I run spark shell: ./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory 6900m --driver-memory 15g and workers

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Ok, but what does it means? I did not change the core files of spark, so is it a bug there? PS: on small datasets (500 Mb) I have no problem. Am 25.06.2015 18:02 schrieb Ted Yu yuzhih...@gmail.com: The assertion failure from TriangleCount.scala corresponds with the following lines:

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Robin East
You’ll get this issue if you just take the first 2000 lines of that file. The problem is triangleCount() expects srdId dstId which is not the case in the file (e.g. vertex 28). You can get round this by calling graph.convertToCanonical Edges() which removes bi-directional edges and ensures

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Yep, I already found it. So I added 1 line: val graph = GraphLoader.edgeListFile(sc, , ...) val newgraph = graph.convertToCanonicalEdges() and could successfully count triangles on newgraph. Next will test it on bigger (several Gb) networks. I am using Spark 1.3 and 1.4 but haven't seen

Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Roman Sokolov
Hello! I am trying to compute number of triangles with GraphX. But get memory error or heap size, even though the dataset is very small (1Gb). I run the code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on separate machines 8Gb RAM each). So I have 15x more memory than the

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Ted Yu
The assertion failure from TriangleCount.scala corresponds with the following lines: g.outerJoinVertices(counters) { (vid, _, optCounter: Option[Int]) = val dblCount = optCounter.getOrElse(0) // double count should be even (divisible by two) assert((dblCount 1)

Re: Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread Deng Ching-Mallete
java.lang.OutOfMemoryError: GC overhead limit exceeded. The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried increasing more cores and executors which sometime works

Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread diplomatic Guru
Hello All, I have a Spark job that throws java.lang.OutOfMemoryError: GC overhead limit exceeded. The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried increasing more

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Yiannis Gkoufas
is the following: val people = sqlContext.parquetFile(/data.parquet); val res = people.groupBy(name,date). agg(sum(power),sum(supply)).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Martin Goodson
= sqlContext.parquetFile(/data.parquet); val res = people.groupBy(name,date). agg(sum(power),sum(supply)).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Patrick Wendell
consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX:+UseCompressedOops spark.shuffle.manager

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yin Huai
overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX:+UseCompressedOops spark.shuffle.managersort Any idea how can I workaround this? Thanks a lot

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yiannis Gkoufas
); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX:+UseCompressedOops

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-20 Thread Yiannis Gkoufas
. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX:+UseCompressedOops spark.shuffle.managersort Any idea how can I workaround

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-19 Thread Yiannis Gkoufas
overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX:+UseCompressedOops spark.shuffle.managersort Any idea how can I workaround this? Thanks a lot

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-19 Thread Yin Huai
(power),sum(supply) ).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g

DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
,date).agg(sum(power),sum(supply)).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializerorg.apache.spark.serializer.KryoSerializer spark.driver.memory6g

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Cheng Lian
); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g spark.executor.extraJavaOptions -XX

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
is the following: val people = sqlContext.parquetFile(/data.parquet); val res = people.groupBy(name,date).agg(sum(power),sum(supply) ).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-18 Thread Yiannis Gkoufas
(supply) ).take(10); System.out.println(res); The dataset consists of 16 billion entries. The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded My configuration is: spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory6g

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Pat Ferrel
Sab, not sure what you require for the similarity metric or your use case but you can also look at spark-rowsimilarity or spark-itemsimilarity (column-wise) here http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Sabarish Sasidharan
Thanks Debasish, Reza and Pat. In my case, I am doing an SVD and then doing the similarities computation. So a rowSimiliarities() would be a good fit, looking forward to it. In the meanwhile I will try to see if I can further limit the number of similarities computed through some other fashion or

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-02 Thread Reza Zadeh
Hi Sab, The current method is optimized for having many rows and few columns. In your case it is exactly the opposite. We are working on your case, tracked by this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 Your case is very common, so I will put some time into building it. In the

Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
the BlockManager doesn't respond within the heart beat interval. In the second attempt I am seeing a GC overhead limit exceeded error. And it is almost always in the RowMatrix.columSimilaritiesDIMSUM - mapPartitionsWithIndex (line 570) java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
Sorry, I actually meant 30 x 1 matrix (missed a 0) Regards Sab

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Reza Zadeh
Hi Sab, In this dense case, the output will contain 1 x 1 entries, i.e. 100 million doubles, which doesn't fit in 1GB with overheads. For a dense matrix, similarColumns() scales quadratically in the number of columns, so you need more memory across the cluster. Reza On Sun, Mar 1, 2015

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Sabarish Sasidharan
​Hi Reza ​​ I see that ((int, int), double) pairs are generated for any combination that meets the criteria controlled by the threshold. But assuming a simple 1x10K matrix that means I would need atleast 12GB memory per executor for the flat map just for these pairs excluding any other overhead.

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Debasish Das
Column based similarities work well if the columns are mild (10K, 100K, we actually scaled it to 1.5M columns but it really stress tests the shuffle and it needs to tune the shuffle parameters)...You can either use dimsum sampling or come up with your own threshold based on your application that

Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded

2015-03-01 Thread Reza Zadeh
running on a single node of 15G and giving the driver 1G and the executor 9G. This is on a single node hadoop. In the first attempt the BlockManager doesn't respond within the heart beat interval. In the second attempt I am seeing a GC overhead limit exceeded error. And it is almost always

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Xiangrui Meng
spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout === 15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Ilya Ganelin
again GC overhead limit exceeded: === spark stdout === 15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Antony Mayi
19, 2015 at 5:10 AM Antony Mayi antonym...@yahoo.com.invalid wrote: now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Sean Owen
) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398) 15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage 51.0 (TID 7259) java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.reflect.Array.newInstance(Array.java:75

loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.reflect.Array.newInstance(Array.java:75)        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
) java.lang.OutOfMemoryError: GC overhead limit exceeded         at java.lang.reflect.Array.newInstance(Array.java:75)         at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Sean Owen
) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398) 15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage 51.0 (TID 7259) java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.reflect.Array.newInstance(Array.java:75

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
it is from within the ALS.trainImplicit() call. btw. the exception varies between this GC overhead limit exceeded and Java heap space (which I guess is just different outcome of same problem). just tried another run and here are the logs (filtered) - note I tried this run

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Ilya Ganelin
spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout === 15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields

Re: failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-14 Thread Matthew Cornell
): java.lang.OutOfMemoryError: GC overhead limit exceeded 15/02/12 08:05:06 WARN TaskSetManager: Lost task 0.0 in stage 31.1 (TID 48, compute-0-2.wright): FetchFailed(BlockManagerId(0, wright.cs.umass.edu, 60837), shuffleId=0, mapId=1, reduceId=1, message= org.apache.spark.shuffle.FetchFailedException: Failed

failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-12 Thread Matthew Cornell
Hi Folks, I'm running a five-step path following-algorithm on a movie graph with 120K verticies and 400K edges. The graph has vertices for actors, directors, movies, users, and user ratings, and my Scala code is walking the path rating movie rating user rating. There are 75K rating nodes

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
kras...@gmail.com Cc: Sandy Ryza sandy.r...@cloudera.com, user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded I have yarn configured with yarn.nodemanager.vmem-check-enabled=false and yarn.nodemanager.pmem-check-enabled=false to avoid

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sandy Ryza
Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a different reason? -Sandy On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors crashed with this error. does that mean I have genuinely not enough RAM or is this matter of config tuning? other config options used:spark.storage.memoryFraction=0.3 SPARK_EXECUTOR_MEMORY=14G running spark 1.2.0 as

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
sandy.r...@cloudera.com Date: Tuesday, January 27, 2015 at 3:33 PM To: Antony Mayi antonym...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sven Krasser
, 2015 at 3:33 PM To: Antony Mayi antonym...@yahoo.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Since it's an executor running OOM it doesn't look like a container being killed by YARN to me. As a starting point, can you repartition your job into smaller tasks? -Sven On Tue, Jan 27, 2015 at 2:34 PM, Guru Medasani gdm...@outlook.com

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
17:02:53 ERROR executor.Executor: Exception in task 21.0 in stage 12.0 (TID 1312)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.Integer.valueOf(Integer.java:642)        at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70

Re: Spark-Shell: OOM: GC overhead limit exceeded

2014-10-08 Thread sranga
Increasing the driver memory resolved this issue. Thanks to Nick for the hint. Here is how I am starting the shell: spark-shell --driver-memory 4g --driver-cores 4 --master local -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Shell-OOM-GC-overhead

Spark-Shell: OOM: GC overhead limit exceeded

2014-10-07 Thread sranga
spark.default.parallelism 24 Any help is appreciated. The stack trace of the error is given below. - Ranga == Stack trace == java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332

Re: still GC overhead limit exceeded after increasing heap space

2014-10-05 Thread Andrew Ash
this issue, I have increased the java heap space to -Xms64g -Xmx64g, but still met the java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I use MEMORY_AND_DISK_SER and kryo

Re: still GC overhead limit exceeded after increasing heap space

2014-10-02 Thread Sean Owen
-Xmx64g, but still met the java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I use MEMORY_AND_DISK_SER and kryo serialization. Thanks a lot! -- View this message in context

still GC overhead limit exceeded after increasing heap space

2014-10-01 Thread anny9699
Hi, After reading some previous posts about this issue, I have increased the java heap space to -Xms64g -Xmx64g, but still met the java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I

Re: still GC overhead limit exceeded after increasing heap space

2014-10-01 Thread Liquan Pei
to -Xms64g -Xmx64g, but still met the java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I use MEMORY_AND_DISK_SER and kryo serialization. Thanks a lot! -- View this message

Re: still GC overhead limit exceeded after increasing heap space

2014-10-01 Thread anny9699
Hi Liquan, I have 8 workers, each with 15.7GB memory. What you said makes sense, but if I don't increase heap space, it keeps telling me GC overhead limit exceeded. Thanks! Anny On Wed, Oct 1, 2014 at 1:41 PM, Liquan Pei [via Apache Spark User List] ml-node+s1001560n1554...@n3.nabble.com

Re: still GC overhead limit exceeded after increasing heap space

2014-10-01 Thread 陈韵竹
the java heap space to -Xms64g -Xmx64g, but still met the java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I use MEMORY_AND_DISK_SER and kryo serialization. Thanks a lot! -- View

Re: still GC overhead limit exceeded after increasing heap space

2014-10-01 Thread Liquan Pei
: GC overhead limit exceeded error. Does anyone have other suggestions? I am reading a data of 200 GB and my total memory is 120 GB, so I use MEMORY_AND_DISK_SER and kryo serialization. Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/still

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-09 Thread Ankur Dave
At 2014-09-05 12:13:18 +0200, Yifan LI iamyifa...@gmail.com wrote: But how to assign the storage level to a new vertices RDD that mapped from an existing vertices RDD, e.g. *val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId, a:Array[VertexId]) = (id,

[Spark Streaming] java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-09-08 Thread Yan Fang
Hi guys, My Spark Streaming application have this java.lang.OutOfMemoryError: GC overhead limit exceeded error in SparkStreaming driver program. I have done the following to debug with it: 1. improved the driver memory from 1GB to 2GB, this error came after 22 hrs. When the memory was 1GB

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-05 Thread Yifan LI
Thank you, Ankur! :) But how to assign the storage level to a new vertices RDD that mapped from an existing vertices RDD, e.g. *val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId, a:Array[VertexId]) = (id, initialHashMap(a))}* the new one will be combined with

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-03 Thread Yifan LI
wrote: I am testing our application(similar to personalised page rank using Pregel, and note that each vertex property will need pretty much more space to store after new iteration) [...] But when we ran it on larger graph(e.g. LiveJouranl), it always end at the error GC overhead limit

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-03 Thread Ankur Dave
At 2014-09-03 17:58:09 +0200, Yifan LI iamyifa...@gmail.com wrote: val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK) Error: java.lang.UnsupportedOperationException: Cannot

  1   2   >