Re: Inconsistent behavior of randomSplit in YARN mode

2015-12-28 Thread Gaurav Kumar
produces consistent results. Best Regards, Gaurav Kumar Big Data • Data Science • Photography • Music +91 9953294125 On Mon, Dec 28, 2015 at 3:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. the train and test have overlap in the numbers being outputted > > Can the call to repa

Inconsistent behavior of randomSplit in YARN mode

2015-12-27 Thread Gaurav Kumar
expect consistent behavior since repartition is not evaluated again and again. Best Regards, Gaurav Kumar Big Data • Data Science • Photography • Music +91 9953294125

Re: spark 1.4 GC issue

2015-11-13 Thread Gaurav Kumar
Please have a look at http://spark.apache.org/docs/1.4.0/tuning.html You may also want to use the latest build of JDK 7/8 and use G1GC instead. I saw considerable reductions in GC time just by doing that. Rest of the tuning parameters are better explained in the link above. Best Regards, Gaurav

Save GraphX to disk

2015-11-13 Thread Gaurav Kumar
Hi, I was wondering how to save a graph to disk and load it back again. I know how to save vertices and edges to disk and construct the graph from them, not sure if there's any method to save the graph itself to disk. Best Regards, Gaurav Kumar Big Data • Data Science • Photography • Music +91