Re: RDD partition after calling mapToPair

2015-11-24 Thread trung kien
Thanks Cody for very useful information. It's much more clear to me now. I had a lots of wrong assumptions. On Nov 23, 2015 10:19 PM, "Cody Koeninger" wrote: > Partitioner is an optional field when defining an rdd. KafkaRDD doesn't > define one, so you can't really assume

Re: RDD partition after calling mapToPair

2015-11-23 Thread Thúy Hằng Lê
Thanks Cody, I still have concerns about this. What's do you mean by saying Spark direct stream doesn't have a default partitioner? Could you please help me to explain more? When i assign 20 cores to 20 Kafka partitions, I am expecting each core will work on a partition. Is it correct? I'm

Re: RDD partition after calling mapToPair

2015-11-23 Thread Cody Koeninger
Partitioner is an optional field when defining an rdd. KafkaRDD doesn't define one, so you can't really assume anything about the way it's partitioned, because spark doesn't know anything about the way it's partitioned. If you want to rely on some property of how things were partitioned as they