Re: getting different results from same line of code repeated

2015-11-20 Thread Walrus theCat
I'm running into all kinds of problems with Spark 1.5.1 -- does anyone have a version that's working smoothly for them? On Fri, Nov 20, 2015 at 10:50 AM, Dean Wampler wrote: > I didn't expect that to fail. I would call it a bug for sure, since it's > practically useless

Re: getting different results from same line of code repeated

2015-11-20 Thread Ted Yu
Mind trying 1.5.2 release ? Thanks On Fri, Nov 20, 2015 at 10:56 AM, Walrus theCat wrote: > I'm running into all kinds of problems with Spark 1.5.1 -- does anyone > have a version that's working smoothly for them? > > On Fri, Nov 20, 2015 at 10:50 AM, Dean Wampler

getting different results from same line of code repeated

2015-11-18 Thread Walrus theCat
Hi, I'm launching a Spark cluster with the spark-ec2 script and playing around in spark-shell. I'm running the same line of code over and over again, and getting different results, and sometimes exceptions. Towards the end, after I cache the first RDD, it gives me the correct result multiple

Re: getting different results from same line of code repeated

2015-11-18 Thread Dean Wampler
Methods like first() and take(n) can't guarantee to return the same result in a distributed context, because Spark uses an algorithm to grab data from one or more partitions that involves running a distributed job over the cluster, with tasks on the nodes where the chosen partitions are located.