1. If you are consuming data from Kafka or any other receiver based
sources, then you can start 1-2 receivers per worker (assuming you'll have
min 4 core per worker)

2. If you are having single receiver or is a fileStream then what you can
do to distribute the data across machines is to do a repartition.

Thanks
Best Regards

On Thu, Mar 19, 2015 at 11:32 PM, Mohit Anchlia <mohitanch...@gmail.com>
wrote:

> I am trying to understand how to load balance the incoming data to
> multiple spark streaming workers. Could somebody help me understand how I
> can distribute my incoming data from various sources such that incoming
> data is going to multiple spark streaming nodes? Is it done by spark client
> with help of spark master similar to hadoop client asking namenodes for the
> list of datanodes?
>

Reply via email to