Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-12 Thread Jacob Lynn
Thanks for the pointer, Vadim. However, I just tried it with Spark 2.4 and get the same failure. (I was previously testing with 2.2 and/or 2.3.) And I don't see this particular issue referred to there. The ticket that Harel commented on indeed appears to be the most similar one to this issue:

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-11 Thread Vadim Semenov
There's an umbrella ticket for various 2GB limitations https://issues.apache.org/jira/browse/SPARK-6235 On Fri, Nov 8, 2019 at 4:11 PM Jacob Lynn wrote: > > Sorry for the noise, folks! I understand that reducing the number of > partitions works around the issue (at the scale I'm working at,

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Jacob Lynn
Sorry for the noise, folks! I understand that reducing the number of partitions works around the issue (at the scale I'm working at, anyway) -- as I mentioned in my initial email -- and I understand the root cause. I'm not looking for advice on how to resolve my issue. I'm just pointing out that

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Vadim Semenov
Basically, the driver tracks partitions and sends it over to executors, so what it's trying to do is to serialize and compress the map but because it's so big, it goes over 2GiB and that's Java's limit on the max size of byte arrays, so the whole thing drops. The size of data doesn't matter here

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Jacob Lynn
File system is HDFS. Executors are 2 cores, 14GB RAM. But I don't think either of these relate to the problem -- this is a memory allocation issue on the driver side, and happens in an intermediate stage that has no HDFS read/write. On Fri, Nov 8, 2019 at 10:01 AM Spico Florin wrote: > Hi! >

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Spico Florin
Hi! What file system are you using: EMRFS or HDFS? Also what memory are you using for the reducer ? On Thu, Nov 7, 2019 at 8:37 PM abeboparebop wrote: > I ran into the same issue processing 20TB of data, with 200k tasks on both > the map and reduce sides. Reducing to 100k tasks each resolved

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-07 Thread abeboparebop
I ran into the same issue processing 20TB of data, with 200k tasks on both the map and reduce sides. Reducing to 100k tasks each resolved the issue. But this could/would be a major problem in cases where the data is bigger or the computation is heavier, since reducing the number of partitions may

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2018-09-07 Thread Harel Gliksman
I understand the error is because the number of partitions is very high, yet when processing 40 TB (and this number is expected to grow) this number seems reasonable: 40TB / 300,000 will result in partitions size of ~ 130MB (data should be evenly distributed). On Fri, Sep 7, 2018 at 6:28 PM Vadim

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2018-09-07 Thread Vadim Semenov
You have too many partitions, so when the driver is trying to gather the status of all map outputs and send back to executors it chokes on the size of the structure that needs to be GZipped, and since it's bigger than 2GiB, it produces OOM. On Fri, Sep 7, 2018 at 10:35 AM Harel Gliksman wrote: >

Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2018-09-07 Thread Harel Gliksman
Hi, We are running a Spark (2.3.1) job on an EMR cluster with 500 r3.2xlarge (60 GB, 8 vcores, 160 GB SSD ). Driver memory is set to 25GB. It processes ~40 TB of data using aggregateByKey in which we specify numPartitions = 300,000. Map side tasks succeed, but reduce side tasks all fail. We