We have a series of spark jobs which run in succession over various cached
datasets, do small groups and transforms, and then call
saveAsSequenceFile() on them.
Each call to save as a sequence file appears to have done its work, the
task says it completed in xxx.x seconds but then it pauses
Not quiet sure, but it could be the GC Pause, if you are holding too much
objects in memory. You can check this tuning
http://spark.apache.org/docs/1.2.0/tuning.html part if you haven't
already been through it.
Thanks
Best Regards
On Sat, Jan 31, 2015 at 7:22 AM, Corey Nolet cjno...@gmail.com