Re: Kafka Streaming and partitioning

2017-02-26 Thread tonyye
Hi Dave, I had the same question and was wondering if you had found a way to do the join without causing a shuffle? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955p28425.html Sent from the Apache Spark User List

Kafka Streaming and partitioning

2016-01-13 Thread ddav
on the RDD partitions i.e. check the first entry in each partition to determine the partition number of the data. Thank you in advance for any help on this issue. Dave. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
t;>> Spark will do a shuffle under the hood in this case and the join will >>> take >>> place. The join will do its best to run on a node that has local access >>> to >>> the reference data RDD. >>> >>> Is there any difference between

Re: Kafka Streaming and partitioning

2016-01-13 Thread David D
; >>>> I have two ways to do this. >>>> 1. Explicitly call PartitionBy(CutomParitioner) on the input stream RDD >>>> followed by a join. This results in a shuffle of the input stream RDD >>>> and >>>> then the co-partitioned join to t

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
on the RDD > partitions i.e. check the first entry in each partition to determine the > partition number of the data. > > Thank you in advance for any help on this issue. > Dave. > > > > -- > View this message in context: > http:/

Re: Kafka Streaming and partitioning

2016-01-13 Thread Dave
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

Re: Kafka Streaming and partitioning

2016-01-13 Thread Cody Koeninger
o an already created RDD and not to do a shuffle. >> Spark in this case trusts that the data is setup correctly (as in the use >> case above) and simply fills in the necessary meta data on the RDD >> partitions i.e. check the first entry in each partition to determine the >&g

Re: Kafka Streaming and partitioning

2016-01-13 Thread Dave
nce for any help on this issue. Dave. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Streaming-and-partitioning-tp25955.html Sent from the Apache Spark User List mailing list