subject:"spark streaming 1.3 coalesce on kafkadirectstream"

Re: spark streaming 1.3 coalesce on kafkadirectstream

2015-07-22 Thread Tathagata Das

With DirectKafkaStream there are two approaches. 1. you increase the number of KAfka partitions Spark will automatically read in parallel 2. if that's not possible, then explicitly repartition only if there are more cores in the cluster than the number of Kafka partitions, AND the first map-like

spark streaming 1.3 coalesce on kafkadirectstream

2015-07-20 Thread Shushant Arora

does spark streaming 1.3 launches task for each partition offset range whether that is 0 or not ? If yes, how can I enforce it to not to launch tasks for empty rdds.Not able t o use coalesce on directKafkaStream. Shall we enforce repartitioning always before processing direct stream ? use case