In spark streaming 1.3 -
Say I have 10 executors each with 4 cores so in total 40 tasks in parllel
at once. If I repartition kafka directstream to 40 partitions vs say I have
in kafka topic 300 partitions - which one will be more efficient , Should I
repartition the kafka stream equal to num of
For Java, do
OffsetRange[] offsetRanges = ((HasOffsetRanges)rdd*.rdd()*).offsetRanges();
If you fix that error, you should be seeing data.
You can call arbitrary RDD operations on a DStream, using
DStream.transform. Take a look at the docs.
For the direct kafka approach you are doing,
- tasks
I'd suggest you upgrading to 1.4 as it has better metrices and UI.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 7:01 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Is coalesce not applicable to kafkaStream ? How to do coalesce on
kafkadirectstream its not there in api ?
Shall calling
Hi
1.I am using spark streaming 1.3 for reading from a kafka queue and pushing
events to external source.
I passed in my job 20 executors but it is showing only 6 in executor tab ?
When I used highlevel streaming 1.2 - its showing 20 executors. My cluster
is 10 node yarn cluster with each node
Is coalesce not applicable to kafkaStream ? How to do coalesce on
kafkadirectstream its not there in api ?
Shall calling repartition on directstream with number of executors as
numpartitions will imrove perfromance ?
Does in 1.3 tasks get launched for partitions which are empty? Does driver
makes