The data may be not large, but the driver need to do a lot of bookkeeping. In
your case, it is possible the driver control plane takes too much memory.
I think you can find a java developer to look at the coredump. Otherwise, it is
hard to tell exactly which part are using all the memory.
Hi
the input data size is less than 10M. The task result size should be less I
think. Because I am doing aggregation on the data
At 2016-04-20 16:18:31, "Jeff Zhang" wrote:
Do you mean the input data size as 10M or the task result size ?
>>> But my way is to setup
Do you mean the input data size as 10M or the task result size ?
>>> But my way is to setup a forever loop to handle continued income data. Not
sure if it is the right way to use spark
Not sure what this mean, do you use spark-streaming, for doing batch job in
the forever loop ?
On Wed, Apr
Hi Jeff
The total size of my data is less than 10M. I already set the driver memory to
4GB.
在 2016-04-20 13:42:25,"Jeff Zhang" 写道:
Seems it is OOM in driver side when fetching task result.
You can try to increase spark.driver.memory and
Seems it is OOM in driver side when fetching task result.
You can try to increase spark.driver.memory and spark.driver.maxResultSize
On Tue, Apr 19, 2016 at 4:06 PM, 李明伟 wrote:
> Hi Zhan Zhang
>
>
> Please see the exception trace below. It is saying some GC overhead limit
>
What kind of OOM? Driver or executor side? You can use coredump to find what
cause the OOM.
Thanks.
Zhan Zhang
On Apr 18, 2016, at 9:44 PM, 李明伟
> wrote:
Hi Samaga
Thanks very much for your reply and sorry for the delay reply.
Cassandra or Hive
Hi Kramer,
Some options:
1. Store in Cassandra with TTL = 24 hours. When you read the full
table, you get the latest 24 hours data.
2. Store in Hive as ORC file and use timestamp field to filter out the
old data.
3. Try windowing in spark or flink (have not used