Re: Java 8 vs Scala

2015-07-14 Thread Tristan Blakers
We have had excellent results operating on RDDs using Java 8 with Lambdas. It’s slightly more verbose than Scala, but I haven’t found this an issue, and haven’t missed any functionality. The new DataFrame API makes the Spark platform even more language agnostic. Tristan On 15 July 2015 at

Re: JavaPairRDD

2015-05-13 Thread Tristan Blakers
You could use a map() operation, but the easiest way is probably to just call values() method on the JavaPairRDDA,B to get a JavaRDDB. See this link: https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html Tristan On 13 May 2015 at 23:12, Yasemin Kaya

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-06 Thread Tristan Blakers
...@gmail.com wrote: Thanks Tristan for sharing this. Actually this happens when I am reading a csv file of 3.5 GB. best, /Shahab On Tue, May 5, 2015 at 9:15 AM, Tristan Blakers tris...@blackfrog.org wrote: Hi Shahab, I’ve seen exceptions very similar to this (it also manifests

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-05 Thread Tristan Blakers
Hi Shahab, I’ve seen exceptions very similar to this (it also manifests as negative array size exception), and I believe it’s a really bug in Kryo. See this thread:

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-26 Thread Tristan Blakers
. Perhaps we can get kryo to turn off that buffer, or we can at least get it to flush more often.) thanks, Imran On Thu, Feb 26, 2015 at 1:06 AM, Tristan Blakers tris...@blackfrog.org wrote: I get the same exception simply by doing a large broadcast of about 6GB. Note that I’m broadcasting

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-25 Thread Tristan Blakers
I get the same exception simply by doing a large broadcast of about 6GB. Note that I’m broadcasting a small number (~3m) of fat objects. There’s plenty of free RAM. This and related kryo exceptions seem to crop-up whenever an object graph of more than a couple of GB gets passed around. at

Kryo buffer overflows

2015-01-28 Thread Tristan Blakers
A search shows several historical threads for similar Kryo issues, but none seem to have a definitive solution. Currently using Spark 1.2.0. While collecting/broadcasting/grouping moderately sized data sets (~500MB - 1GB), I regularly see exceptions such as the one below. I’ve tried increasing

Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
Hi, I’m getting some seemingly invalid results when I collect an RDD. This is happening in both Spark 1.1.0 and 1.2.0, using Java8 on Mac. See the following code snippet: JavaRDDThing rdd= pairRDD.values(); rdd.foreach( e - System.out.println ( RDD Foreach: + e ) ); rdd.collect().forEach( e -

Re: Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
at 21:25, Sean Owen so...@cloudera.com wrote: It sounds a lot like your values are mutable classes and you are mutating or reusing them somewhere? It might work until you actually try to materialize them all and find many point to the same object. On Thu, Dec 18, 2014 at 10:06 AM, Tristan

Re: Incorrect results when calling collect() ?

2014-12-18 Thread Tristan Blakers
, Tristan Blakers tris...@blackfrog.org wrote: Suspected the same thing, but because the underlying data classes are deserialised by Avro I think they have to be mutable as you need to provide the no-args constructor with settable fields. Nothing is being cached in my code anywhere