Hi Michael,
No spark upgrade, we've been changing some of our data pipelines so the
data volumes have probably been getting a bit larger. Just in the last
few weeks we've seen quite a few jobs needing a larger maxResultSize.
Some jobs have gone from "fine with 1GB default" to 3GB. Wondering
what besides a collect could cause this (as there's certainly not an
explicit collect()).
Mesos, parquet source data, a broadcast of a small table earlier which
is joined then just a few aggregations, select, coalesce and spark-csv
write. The executors go along nicely (as does the driver) and then we
start to hit memory pressure on the driver in the output loop and the
job grinds to a crawl (we eventually have to kill it and restart with
more memory).
Adrian
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org