Re: Out of memory Error when using Collection Accumulator Spark 2.2

2018-02-26 Thread naresh Goud
what is your driver memory?

Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


On Mon, Feb 26, 2018 at 3:45 AM, Patrick  wrote:

> Hi,
>
> We were getting OOM error when we are accumulating the results of each
> worker. We were trying to avoid collecting data to driver node instead used
> accumulator as per below code snippet,
>
> Is there any spark config to set the accumulator settings Or am i doing
> the wrong way to collect the huge data set?
>
>   CollectionAccumulator accumulate;
>   Dataset bin;
>
> bin.foreach((ForeachFunction) row -> {
>   accumulate.add(row.get(0) + "|" + row.get(1) + "|" + row.get(2));
> });
>
> accumulate.value().forEach(element -> {
>   String[] arr = element.split("\\|");
>   String count = arr[2];
>   double percentage =
>   (total == 0.0) ? 0.0 : (Double.valueOf(count) / total);
>   PayloadBin payload = new PayloadBin(arr[0],
>   arr[1], 0, Long.valueOf(count), percentage, sortBy, sortOrder);
>   binArray.add(payload);
>
> });
>
>
> 18/02/21 17:35:23 INFO storage.BlockManagerInfo: Added taskresult_5050 in
> memory on rhlhddfrd225.fairisaac.com:41640 (size: 3.7 MB, free: 8.3 GB)
>
> 18/02/21 17:35:24 INFO storage.BlockManagerInfo: Removed taskresult_5034
> on rhlhddfrd218.fairisaac.com:46584 in memory (size: 3.7 MB, free: 8.4 GB)
>
> 18/02/21 17:35:25 INFO scheduler.TaskSetManager: Finished task 59.0 in
> stage 20.0 (TID 5034) in 9908 ms on rhlhddfrd218.fairisaac.com (executor
> 92) (14/200)
>
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError:
> Java heap space
>
> at java.util.Arrays.copyOf(Arrays.java:3181)
>
> at java.util.ArrayList.toArray(ArrayList.java:376)
>
> at java.util.Collections$SynchronizedCollection.
> toArray(Collections.java:2024)
>
> at java.util.ArrayList.(ArrayList.java:177)
>
> at org.apache.spark.util.CollectionAccumulator.value(
> AccumulatorV2.scala:470)
>
>


Out of memory Error when using Collection Accumulator Spark 2.2

2018-02-26 Thread Patrick
Hi,

We were getting OOM error when we are accumulating the results of each
worker. We were trying to avoid collecting data to driver node instead used
accumulator as per below code snippet,

Is there any spark config to set the accumulator settings Or am i doing the
wrong way to collect the huge data set?

  CollectionAccumulator accumulate;
  Dataset bin;

bin.foreach((ForeachFunction) row -> {
  accumulate.add(row.get(0) + "|" + row.get(1) + "|" + row.get(2));
});

accumulate.value().forEach(element -> {
  String[] arr = element.split("\\|");
  String count = arr[2];
  double percentage =
  (total == 0.0) ? 0.0 : (Double.valueOf(count) / total);
  PayloadBin payload = new PayloadBin(arr[0],
  arr[1], 0, Long.valueOf(count), percentage, sortBy, sortOrder);
  binArray.add(payload);

});


18/02/21 17:35:23 INFO storage.BlockManagerInfo: Added taskresult_5050 in
memory on rhlhddfrd225.fairisaac.com:41640 (size: 3.7 MB, free: 8.3 GB)

18/02/21 17:35:24 INFO storage.BlockManagerInfo: Removed taskresult_5034 on
rhlhddfrd218.fairisaac.com:46584 in memory (size: 3.7 MB, free: 8.4 GB)

18/02/21 17:35:25 INFO scheduler.TaskSetManager: Finished task 59.0 in
stage 20.0 (TID 5034) in 9908 ms on rhlhddfrd218.fairisaac.com (executor
92) (14/200)

Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError:
Java heap space

at java.util.Arrays.copyOf(Arrays.java:3181)

at java.util.ArrayList.toArray(ArrayList.java:376)

at
java.util.Collections$SynchronizedCollection.toArray(Collections.java:2024)

at java.util.ArrayList.(ArrayList.java:177)

at
org.apache.spark.util.CollectionAccumulator.value(AccumulatorV2.scala:470)