Hi,
I am writing my Spark application in java and I need to use a
RangePartitioner.
JavaPairRDD progRef1 =
sc.textFile(programReferenceDataFile, 12).filter(
(String s) -> !s.startsWith("#")).mapToPair(
Hi,
Certain API's (map, mapValues) give the developer access to the data stored
in RDD's.
Am I correct in saying that these API's must never modify the data but
always return a new object with a copy of the data if the data needs to be
updated for the returned RDD.
Thanks,
Dave.
--
View this
Hi,
I have the following pair RDD created in java.
JavaPairRDD progRef =
sc.textFile(programReferenceDataFile, 12).filter(
(String s) -> !s.startsWith("#")).mapToPair(
(String s) -> {
Hi,
I have the following use case:
1. Reference data stored in an RDD that is persisted and partitioned using a
simple custom partitioner.
2. Input stream from kafka that uses the same partitioner algorithm as the
ref data RDD - this partitioning is done in kafka.
I am using kafka direct
Hi,
I am quite new to Spark and have some questions on joins and
co-partitioning.
Are the following assumptions correct.
When a join takes place and one of the RDD's has been partitioned, does
Spark make a best effort to execute the join for a specific partition where
the partitioned data