Hi Supun,
Did you look at https://spark.apache.org/docs/latest/tuning.html?
In addition to the info there, if you're partitioning by some key where
you've got a lot of data skew, one of the task's memory requirements may be
larger than the RAM of a given executor, where the rest of the tasks
pipeline is iterative.
I get OOM errors and GC overhead limit exceeded errors and I fix them by
increasing the heap size or number of partitions even though after doing
that there is still high GC pressure.
I know that my partitions should be small enough such that it can fit in
memory. But when
gt;> Python) ?
> >> The cached table could take 1.5G, it means almost nothing left for other
> >> things.
> > True. I have also tried with memoryOverhead being set to 800 (10% of the
> 8Gb
> > memory), but no difference. The "GC overhead limit exceeded&quo
or
>> Python) ?
>> The cached table could take 1.5G, it means almost nothing left for other
>> things.
> True. I have also tried with memoryOverhead being set to 800 (10% of the 8Gb
> memory), but no difference. The "GC overhead limit exceeded" is still the
> same.
>
>&
the data (filtering by one of the
> > attributes), let's say 800 million records, then all works well. But
> when I
> > run the same SQL on all the data, then I receive
> > "java.lang.OutOfMemoryError: GC overhead limit exceeded" from basically
> all
> > o
her things.
Python UDF do requires some buffering in JVM, the size of buffering depends on
how much rows are under processing by Python process.
> - a table of 5.6 Billions rows loaded into the memory of the executors
> (taking up 450Gb of memory), partitioned evenly across the executors
&g
a portion of the data (filtering by one of the
attributes), let's say 800 million records, then all works well. But when I
run the same SQL on all the data, then I receive "*java.lang.OutOfMemoryError:
GC overhead limit exceeded"* from basically all of the executors.
It seems to me that py
error in the apache spark...
>>>
>>> "spark.driver.memory 60g
>>> spark.python.worker.memory 60g
>>> spark.master local[*]"
>>>
>>> The amount of data is about 5Gb, but spark says that "GC overhead limit
>>> exc
; the number of partitions.
>
> // maropu
>
> On Mon, May 16, 2016 at 10:00 PM, AlexModestov <
> aleksandrmodes...@gmail.com> wrote:
>
>> I get the error in the apache spark...
>>
>> "spark.driver.memory 60g
>> spark.python.worker.memory 60g
>&g
wrote:
> I get the error in the apache spark...
>
> "spark.driver.memory 60g
> spark.python.worker.memory 60g
> spark.master local[*]"
>
> The amount of data is about 5Gb, but spark says that "GC overhead limit
> exceeded". I guess that my conf-file gi
I get the error in the apache spark...
"spark.driver.memory 60g
spark.python.worker.memory 60g
spark.master local[*]"
The amount of data is about 5Gb, but spark says that "GC overhead limit
exceeded". I guess that my conf-file gives enought resources.
"16/05/16 15:13:02
ailed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
Exception in thread "qtp1919278883-98" java.lang.OutOfM
ionRequired","true");
sparkConf.set("spark.kryoserializer.buffer.max.mb","512");
sparkConf.set("spark.default.parallelism","300");
sparkConf.set("spark.rpc.askTimeout","500");
I'm trying to load data from hdfs and running some sqls on it (m
k.rpc.askTimeout","500");
>
> I'm trying to load data from hdfs and running some sqls on it (mostly
> groupby) using DataFrames. The logs keep saying that tasks are lost due to
> OutOfMemoryError (GC overhead limit exceeded).
>
> Can you advice what is the recommended settings (memory, cores,
> partitions, etc.) for the given hardware?
>
> Thanks!
>
he partitioned dataset successfully. I can see the output in
HDFS once all Spark tasks are done.
After the spark tasks are done, the job appears to be running for
over an hour, until I get the following (full stack trace below):
java.lang.OutOfMemoryError: GC o
I can see the output in HDFS once all Spark tasks
>> are done.
>>
>> After the spark tasks are done, the job appears to be running for over an
>> hour, until I get the following (full stack trace below):
>>
>> java.lang.OutOfMemoryError: GC overhead limit exc
an
hour, until I get the following (full stack trace below):
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238)
I had set the driver memory to be 20GB.
I attempted
'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(
Py4JJavaError: An error occurred while calling o65.partitions.
: java.lang.OutOfMemoryError: GC overhead limit exceed
val Patel [mailto:dhaval1...@gmail.com]
Sent: Saturday, November 7, 2015 12:26 AM
To: Spark User Group
Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit
exceeded
I have been struggling through this error since past 3 days and have tried all
possible ways/suggest
cast_2_piece0 on
localhost:39562 in memory (size: 2.4 KB, free: 530.0 MB)
15/11/06 10:45:20 INFO ContextCleaner: Cleaned accumulator 2
15/11/06 10:45:53 WARN ServletHandler: Error for /static/timeline-view.css
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.zip.Zip
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying
coutn action on a file.
The file is a CSV file 217GB zise
Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0
configutation:
spark.app.id:local-1443956477103
spark.app.name:Spark shell
spark.cores.max
1.2.0 is quite old.
You may want to try 1.5.1 which was released in the past week.
Cheers
> On Oct 4, 2015, at 4:26 AM, t_ras <marti...@netvision.net.il> wrote:
>
> I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying
> coutn action on a file.
>
>
to Spark. Thanks in advance.
WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373,
myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.spark.sql.types.UTF8String.toString(UTF8String.scala:150
, Rpc
client disassociated, shuffle not found etc Please help me solve this I am
getting mad as I am new to Spark. Thanks in advance.
WARN scheduler.TaskSetManager: Lost task 7.0 in stage 363.0 (TID 3373,
myhost.com): java.lang.OutOfMemoryError: GC overhead limit exceeded
: Samstag, 11. Juli 2015 03:58
An: Ted Yu; Robin East; user
Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC
overhead limit exceeded
Hello again.
So I could compute triangle numbers when run the code from spark shell without
workers (with --driver-memory 15g option
. Tried increasing spark.executor.memory
e.g. from 1g to 2g but the below error still happens.
Any recommendations? Something to do with specifying -Xmx in the submit
job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.util.Arrays.copyOf
getting the below error. Tried increasing spark.executor.memory
e.g. from 1g to 2g but the below error still happens.
Any recommendations? Something to do with specifying -Xmx in the submit
job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead limit
exceeded
We're getting the below error. Tried increasing spark.executor.memory e.g.
from 1g to 2g but the below error still happens.
Any recommendations? Something to do with specifying -Xmx in the submit job
scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead limit
spark.executor.memory
e.g. from 1g to 2g but the below error still happens.
Any recommendations? Something to do with specifying -Xmx in the submit
job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf(Arrays.java
still happens.
Any recommendations? Something to do with specifying -Xmx in the
submit job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.expandCapacity
java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121
increasing
spark.executor.memory e.g. from 1g to 2g but the below error still
happens.
Any recommendations? Something to do with specifying -Xmx in the
submit job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf
:
We're getting the below error. Tried increasing
spark.executor.memory e.g. from 1g to 2g but the below error still
happens.
Any recommendations? Something to do with specifying -Xmx in the
submit job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit
with specifying -Xmx in the
submit job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137
1g to 2g but the below error still
happens.
Any recommendations? Something to do with specifying -Xmx in the
submit job scripts?
Thanks.
Exception in thread main java.lang.OutOfMemoryError: GC overhead
limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332
:
15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage
0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit
exceeded
It seems that the map function keeps the hashDocs RDD in the memory and when
the memory is filled in an executor, the application
the html that has the shortest URL. However, after
running for 2-3 hours the application crashes due to memory issue. Here is
the exception:
15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0
(TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit
): java.lang.OutOfMemoryError: GC
overhead limit exceeded
It seems that the map function keeps the hashDocs RDD in the memory
and when the memory is filled in an executor, the application crashes.
Persisting the map output to disk solves the problem. Adding the
following line between map and reduce solve
Hello again.
So I could compute triangle numbers when run the code from spark shell
without workers (with --driver-memory 15g option), but with workers I have
errors. So I run spark shell:
./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory
6900m --driver-memory 15g
and workers
Ok, but what does it means? I did not change the core files of spark, so is
it a bug there?
PS: on small datasets (500 Mb) I have no problem.
Am 25.06.2015 18:02 schrieb Ted Yu yuzhih...@gmail.com:
The assertion failure from TriangleCount.scala corresponds with the
following lines:
You’ll get this issue if you just take the first 2000 lines of that file. The
problem is triangleCount() expects srdId dstId which is not the case in the
file (e.g. vertex 28). You can get round this by calling
graph.convertToCanonical Edges() which removes bi-directional edges and ensures
Yep, I already found it. So I added 1 line:
val graph = GraphLoader.edgeListFile(sc, , ...)
val newgraph = graph.convertToCanonicalEdges()
and could successfully count triangles on newgraph. Next will test it on
bigger (several Gb) networks.
I am using Spark 1.3 and 1.4 but haven't seen
Hello!
I am trying to compute number of triangles with GraphX. But get memory
error or heap size, even though the dataset is very small (1Gb). I run the
code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on
separate machines 8Gb RAM each). So I have 15x more memory than the
The assertion failure from TriangleCount.scala corresponds with the
following lines:
g.outerJoinVertices(counters) {
(vid, _, optCounter: Option[Int]) =
val dblCount = optCounter.getOrElse(0)
// double count should be even (divisible by two)
assert((dblCount 1)
java.lang.OutOfMemoryError: GC overhead
limit exceeded.
The job is trying to process a filesize 4.5G.
I've tried following spark configuration:
--num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G
I tried increasing more cores and executors which sometime works
Hello All,
I have a Spark job that throws java.lang.OutOfMemoryError: GC overhead
limit exceeded.
The job is trying to process a filesize 4.5G.
I've tried following spark configuration:
--num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G
I tried increasing more
is the following:
val people = sqlContext.parquetFile(/data.parquet);
val res = people.groupBy(name,date).
agg(sum(power),sum(supply)).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My
= sqlContext.parquetFile(/data.parquet);
val res = people.groupBy(name,date).
agg(sum(power),sum(supply)).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My configuration
consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead
limit exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.manager
overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.managersort
Any idea how can I workaround this?
Thanks a lot
);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.managersort
Any idea how can I workaround
overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.shuffle.managersort
Any idea how can I workaround this?
Thanks a lot
(power),sum(supply)
).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
,date).agg(sum(power),sum(supply)).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
My configuration is:
spark.serializerorg.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
spark.executor.extraJavaOptions -XX
is the following:
val people = sqlContext.parquetFile(/data.parquet);
val res = people.groupBy(name,date).agg(sum(power),sum(supply)
).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit exceeded
My configuration
(supply)
).take(10);
System.out.println(res);
The dataset consists of 16 billion entries.
The error I get is java.lang.OutOfMemoryError: GC overhead limit
exceeded
My configuration is:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory6g
Sab, not sure what you require for the similarity metric or your use case but
you can also look at spark-rowsimilarity or spark-itemsimilarity (column-wise)
here http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
Thanks Debasish, Reza and Pat. In my case, I am doing an SVD and then doing
the similarities computation. So a rowSimiliarities() would be a good fit,
looking forward to it.
In the meanwhile I will try to see if I can further limit the number of
similarities computed through some other fashion or
Hi Sab,
The current method is optimized for having many rows and few columns. In
your case it is exactly the opposite. We are working on your case, tracked
by this JIRA: https://issues.apache.org/jira/browse/SPARK-4823
Your case is very common, so I will put some time into building it.
In the
the BlockManager doesn't respond within the heart beat interval. In the
second attempt I am seeing a GC overhead limit exceeded error. And it is
almost always in the RowMatrix.columSimilaritiesDIMSUM -
mapPartitionsWithIndex (line 570)
java.lang.OutOfMemoryError: GC overhead limit exceeded
Sorry, I actually meant 30 x 1 matrix (missed a 0)
Regards
Sab
Hi Sab,
In this dense case, the output will contain 1 x 1 entries, i.e. 100
million doubles, which doesn't fit in 1GB with overheads.
For a dense matrix, similarColumns() scales quadratically in the number of
columns, so you need more memory across the cluster.
Reza
On Sun, Mar 1, 2015
Hi Reza
I see that ((int, int), double) pairs are generated for any combination
that meets the criteria controlled by the threshold. But assuming a simple
1x10K matrix that means I would need atleast 12GB memory per executor for
the flat map just for these pairs excluding any other overhead.
Column based similarities work well if the columns are mild (10K, 100K, we
actually scaled it to 1.5M columns but it really stress tests the shuffle
and it needs to tune the shuffle parameters)...You can either use dimsum
sampling or come up with your own threshold based on your application that
running on a single node of 15G and giving the driver 1G
and the executor 9G. This is on a single node hadoop. In the first attempt
the BlockManager doesn't respond within the heart beat interval. In the
second attempt I am seeing a GC overhead limit exceeded error. And it is
almost always
spark.shuffle.io.preferDirectBufs (to true) getting again
GC overhead limit exceeded:
=== spark stdout ===
15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0
(TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit
exceeded
again
GC overhead limit exceeded:
=== spark stdout ===
15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage
18.0
(TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead limit
exceeded
at
java.io.ObjectInputStream.defaultReadFields
19, 2015 at 5:10 AM Antony Mayi antonym...@yahoo.com.invalid
wrote:
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC
overhead limit exceeded:
=== spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task
7.0 in stage 18.0 (TID 5329, 192.168.1.93
)
at
org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398)
15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage
51.0 (TID 7259)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.reflect.Array.newInstance(Array.java:75
)java.lang.OutOfMemoryError: GC overhead limit exceeded at
java.lang.reflect.Array.newInstance(Array.java:75) at
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671) at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345
)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.reflect.Array.newInstance(Array.java:75)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345
)
at
org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398)
15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in
stage
51.0 (TID 7259)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.reflect.Array.newInstance(Array.java:75
it is from within the ALS.trainImplicit() call. btw. the exception varies
between this GC overhead limit exceeded and Java heap space (which I guess
is just different outcome of same problem).
just tried another run and here are the logs (filtered) - note I tried this run
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC
overhead limit exceeded:
=== spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task
7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC
overhead limit exceeded
spark.shuffle.io.preferDirectBufs (to true) getting
again GC overhead limit exceeded:
=== spark stdout ===
15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage
18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC overhead
limit exceeded
at
java.io.ObjectInputStream.defaultReadFields
): java.lang.OutOfMemoryError: GC overhead limit exceeded
15/02/12 08:05:06 WARN TaskSetManager: Lost task 0.0 in stage 31.1 (TID 48,
compute-0-2.wright): FetchFailed(BlockManagerId(0, wright.cs.umass.edu, 60837),
shuffleId=0, mapId=1, reduceId=1, message=
org.apache.spark.shuffle.FetchFailedException: Failed
Hi Folks,
I'm running a five-step path following-algorithm on a movie graph with 120K
verticies and 400K edges. The graph has vertices for actors, directors, movies,
users, and user ratings, and my Scala code is walking the path rating movie
rating user rating. There are 75K rating nodes
kras...@gmail.com
Cc: Sandy Ryza sandy.r...@cloudera.com, user@spark.apache.org
user@spark.apache.org
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
I have yarn configured with yarn.nodemanager.vmem-check-enabled=false and
yarn.nodemanager.pmem-check-enabled=false to avoid
Hi Antony,
If you look in the YARN NodeManager logs, do you see that it's killing the
executors? Or are they crashing for a different reason?
-Sandy
On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
I am using spark.yarn.executor.memoryOverhead=8192 yet
Hi,
I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors
crashed with this error.
does that mean I have genuinely not enough RAM or is this matter of config
tuning?
other config options used:spark.storage.memoryFraction=0.3
SPARK_EXECUTOR_MEMORY=14G
running spark 1.2.0 as
sandy.r...@cloudera.com
Date: Tuesday, January 27, 2015 at 3:33 PM
To: Antony Mayi antonym...@yahoo.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Hi Antony,
If you look in the YARN NodeManager logs, do you see that it's
, 2015 at 3:33 PM
To: Antony Mayi antonym...@yahoo.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Hi Antony,
If you look in the YARN NodeManager logs, do you see that it's killing the
executors? Or are they crashing
: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded
Since it's an executor running OOM it doesn't look like a container being
killed by YARN to me. As a starting point, can you repartition your job
into smaller tasks?
-Sven
On Tue, Jan 27, 2015 at 2:34 PM, Guru Medasani gdm...@outlook.com
17:02:53 ERROR executor.Executor: Exception in task 21.0 in
stage 12.0 (TID 1312)java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Integer.java:642) at
scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70
Increasing the driver memory resolved this issue. Thanks to Nick for the
hint. Here is how I am starting the shell: spark-shell --driver-memory 4g
--driver-cores 4 --master local
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Shell-OOM-GC-overhead
spark.default.parallelism 24
Any help is appreciated. The stack trace of the error is given below.
- Ranga
== Stack trace ==
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332
this issue, I have increased
the
java heap space to -Xms64g -Xmx64g, but still met the
java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does
anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I use
MEMORY_AND_DISK_SER and kryo
-Xmx64g, but still met the
java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does
anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I use
MEMORY_AND_DISK_SER and kryo serialization.
Thanks a lot!
--
View this message in context
Hi,
After reading some previous posts about this issue, I have increased the
java heap space to -Xms64g -Xmx64g, but still met the
java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I
to -Xms64g -Xmx64g, but still met the
java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I use
MEMORY_AND_DISK_SER and kryo serialization.
Thanks a lot!
--
View this message
Hi Liquan,
I have 8 workers, each with 15.7GB memory.
What you said makes sense, but if I don't increase heap space, it keeps
telling me GC overhead limit exceeded.
Thanks!
Anny
On Wed, Oct 1, 2014 at 1:41 PM, Liquan Pei [via Apache Spark User List]
ml-node+s1001560n1554...@n3.nabble.com
the
java heap space to -Xms64g -Xmx64g, but still met the
java.lang.OutOfMemoryError: GC overhead limit exceeded error. Does
anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I use
MEMORY_AND_DISK_SER and kryo serialization.
Thanks a lot!
--
View
: GC overhead limit exceeded error. Does
anyone
have other suggestions?
I am reading a data of 200 GB and my total memory is 120 GB, so I use
MEMORY_AND_DISK_SER and kryo serialization.
Thanks a lot!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/still
At 2014-09-05 12:13:18 +0200, Yifan LI iamyifa...@gmail.com wrote:
But how to assign the storage level to a new vertices RDD that mapped from
an existing vertices RDD,
e.g.
*val newVertexRDD =
graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId,
a:Array[VertexId]) = (id,
Hi guys,
My Spark Streaming application have this java.lang.OutOfMemoryError: GC
overhead limit exceeded error in SparkStreaming driver program. I have
done the following to debug with it:
1. improved the driver memory from 1GB to 2GB, this error came after 22
hrs. When the memory was 1GB
Thank you, Ankur! :)
But how to assign the storage level to a new vertices RDD that mapped from
an existing vertices RDD,
e.g.
*val newVertexRDD =
graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId,
a:Array[VertexId]) = (id, initialHashMap(a))}*
the new one will be combined with
wrote:
I am testing our application(similar to personalised page rank using
Pregel, and note that each vertex property will need pretty much more space
to store after new iteration)
[...]
But when we ran it on larger graph(e.g. LiveJouranl), it always end at the
error GC overhead limit
At 2014-09-03 17:58:09 +0200, Yifan LI iamyifa...@gmail.com wrote:
val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions =
numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK)
Error: java.lang.UnsupportedOperationException: Cannot
1 - 100 of 119 matches
Mail list logo