Re: Why Spark having OutOfMemory Exception?

2016-04-21 Thread Zhan Zhang
The data may be not large, but the driver need to do a lot of bookkeeping. In your case, it is possible the driver control plane takes too much memory. I think you can find a java developer to look at the coredump. Otherwise, it is hard to tell exactly which part are using all the memory.

Re:Re: Re: Re: Why Spark having OutOfMemory Exception?

2016-04-20 Thread 李明伟
Hi the input data size is less than 10M. The task result size should be less I think. Because I am doing aggregation on the data At 2016-04-20 16:18:31, "Jeff Zhang" wrote: Do you mean the input data size as 10M or the task result size ? >>> But my way is to setup

Re: Re: Re: Why Spark having OutOfMemory Exception?

2016-04-20 Thread Jeff Zhang
Do you mean the input data size as 10M or the task result size ? >>> But my way is to setup a forever loop to handle continued income data. Not sure if it is the right way to use spark Not sure what this mean, do you use spark-streaming, for doing batch job in the forever loop ? On Wed, Apr

Re:Re: Re: Why Spark having OutOfMemory Exception?

2016-04-20 Thread 李明伟
Hi Jeff The total size of my data is less than 10M. I already set the driver memory to 4GB. 在 2016-04-20 13:42:25,"Jeff Zhang" 写道: Seems it is OOM in driver side when fetching task result. You can try to increase spark.driver.memory and

Re: Re: Why Spark having OutOfMemory Exception?

2016-04-19 Thread Jeff Zhang
Seems it is OOM in driver side when fetching task result. You can try to increase spark.driver.memory and spark.driver.maxResultSize On Tue, Apr 19, 2016 at 4:06 PM, 李明伟 wrote: > Hi Zhan Zhang > > > Please see the exception trace below. It is saying some GC overhead limit >

Re: Why Spark having OutOfMemory Exception?

2016-04-18 Thread Zhan Zhang
What kind of OOM? Driver or executor side? You can use coredump to find what cause the OOM. Thanks. Zhan Zhang On Apr 18, 2016, at 9:44 PM, 李明伟 > wrote: Hi Samaga Thanks very much for your reply and sorry for the delay reply. Cassandra or Hive

RE: Why Spark having OutOfMemory Exception?

2016-04-11 Thread Lohith Samaga M
Hi Kramer, Some options: 1. Store in Cassandra with TTL = 24 hours. When you read the full table, you get the latest 24 hours data. 2. Store in Hive as ORC file and use timestamp field to filter out the old data. 3. Try windowing in spark or flink (have not used