How to use scala.math.Ordering in java

2016-01-20 Thread ddav
Hi, I am writing my Spark application in java and I need to use a RangePartitioner. JavaPairRDD progRef1 = sc.textFile(programReferenceDataFile, 12).filter( (String s) -> !s.startsWith("#")).mapToPair(

RDD immutablility

2016-01-19 Thread ddav
Hi, Certain API's (map, mapValues) give the developer access to the data stored in RDD's. Am I correct in saying that these API's must never modify the data but always return a new object with a copy of the data if the data needs to be updated for the returned RDD. Thanks, Dave. -- View this

RangePartitioning

2016-01-19 Thread ddav
Hi, I have the following pair RDD created in java. JavaPairRDD progRef = sc.textFile(programReferenceDataFile, 12).filter( (String s) -> !s.startsWith("#")).mapToPair( (String s) -> {

Kafka Streaming and partitioning

2016-01-13 Thread ddav
Hi, I have the following use case: 1. Reference data stored in an RDD that is persisted and partitioned using a simple custom partitioner. 2. Input stream from kafka that uses the same partitioner algorithm as the ref data RDD - this partitioning is done in kafka. I am using kafka direct

Co-Partitioned Joins

2016-01-13 Thread ddav
Hi, I am quite new to Spark and have some questions on joins and co-partitioning. Are the following assumptions correct. When a join takes place and one of the RDD's has been partitioned, does Spark make a best effort to execute the join for a specific partition where the partitioned data