Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Jacob Lynn
Sorry for the noise, folks! I understand that reducing the number of partitions works around the issue (at the scale I'm working at, anyway) -- as I mentioned in my initial email -- and I understand the root cause. I'm not looking for advice on how to resolve my issue. I'm just pointing out that

Re: How to use spark-on-k8s pod template?

2019-11-08 Thread David Mitchell
Are you using Spark 2.3 or above? See the documentation: https://spark.apache.org/docs/latest/running-on-kubernetes.html I looks like you do not need: --conf spark.kubernetes.driver.podTemplateFile='/spark-pod-template.yaml' \ --conf

Re: Build customized resource manager

2019-11-08 Thread Tom Graves
I don't know if it all works but some work was done to make cluster manager pluggable, see SPARK-13904. Tom On Wednesday, November 6, 2019, 07:22:59 PM CST, Klaus Ma wrote: Any suggestions? - Klaus On Mon, Nov 4, 2019 at 5:04 PM Klaus Ma wrote: Hi team, AFAIK, we built

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Vadim Semenov
Basically, the driver tracks partitions and sends it over to executors, so what it's trying to do is to serialize and compress the map but because it's so big, it goes over 2GiB and that's Java's limit on the max size of byte arrays, so the whole thing drops. The size of data doesn't matter here

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Jacob Lynn
File system is HDFS. Executors are 2 cores, 14GB RAM. But I don't think either of these relate to the problem -- this is a memory allocation issue on the driver side, and happens in an intermediate stage that has no HDFS read/write. On Fri, Nov 8, 2019 at 10:01 AM Spico Florin wrote: > Hi! >

Re: Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2019-11-08 Thread Spico Florin
Hi! What file system are you using: EMRFS or HDFS? Also what memory are you using for the reducer ? On Thu, Nov 7, 2019 at 8:37 PM abeboparebop wrote: > I ran into the same issue processing 20TB of data, with 200k tasks on both > the map and reduce sides. Reducing to 100k tasks each resolved