Have you tried shuffle compression?
spark.shuffle.compress (true|false)
if you have a filesystem capable also I’ve noticed file consolidation helps
disk usage a bit.
spark.shuffle.consolidateFiles (true|false)
Steve
On Jun 24, 2015, at 3:27 PM, Ulanov, Alexander
mailto:alexander.ula...@hp.co
.com>> wrote:
Did you happened to have a look at this https://github.com/abashev/vfs-s3
Thanks
Best Regards
On Tue, May 12, 2015 at 11:33 PM, Stephen Carman
mailto:scar...@coldlight.com>>
wrote:
> We have a small mesos cluster and these slaves need to have a vfs setup on
> them so
We have a small mesos cluster and these slaves need to have a vfs setup on them
so that the slaves can pull down the data they need from S3 when spark runs.
There doesn’t seem to be any obvious way online on how to do this or how easily
accomplish this. Does anyone have some best practices or so
I think as long as the two frameworks follow the same paradigm for how their
interfaces work it’s fine to have 2 competing frameworks. This way the
frameworks have some motivation
to be the best at what they do rather than being the only choice whether you
like it or not. They also seem to have