Re: very high maxresults setting (no collect())

Adrian Bridgett Thu, 22 Sep 2016 00:54:07 -0700

Hi Michael,

No spark upgrade, we've been changing some of our data pipelines so thedata volumes have probably been getting a bit larger. Just in the lastfew weeks we've seen quite a few jobs needing a larger maxResultSize.Some jobs have gone from "fine with 1GB default" to 3GB. Wonderingwhat besides a collect could cause this (as there's certainly not anexplicit collect()).

Mesos, parquet source data, a broadcast of a small table earlier whichis joined then just a few aggregations, select, coalesce and spark-csvwrite. The executors go along nicely (as does the driver) and then westart to hit memory pressure on the driver in the output loop and thejob grinds to a crawl (we eventually have to kill it and restart withmore memory).


Adrian

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: very high maxresults setting (no collect())

Reply via email to