Hi Mohit,
please make sure you use the Reply to all button and include the mailing
list, otherwise only I will get your message ;)
Regarding your question:
Yes, that's also my understanding. You can partition streaming RDDs only by
time intervals, not by size. So depending on your incoming rate,
posting my question again :)
Thanks for the pointer, looking at the below description from the site it
looks like in spark block size is not fixed, it's determined by block
interval and in fact for the same batch you could have different block
sizes. Did I get it right?
-
Another
1. If you are consuming data from Kafka or any other receiver based
sources, then you can start 1-2 receivers per worker (assuming you'll have
min 4 core per worker)
2. If you are having single receiver or is a fileStream then what you can
do to distribute the data across machines is to do a
Hi Mohit,
it also depends on what the source for your streaming application is.
If you use Kafka, you can easily partition topics and have multiple
receivers on different machines.
If you have sth like a HTTP, socket, etc stream, you probably can't do
that. The Spark RDDs generated by your
I am trying to understand how to load balance the incoming data to multiple
spark streaming workers. Could somebody help me understand how I can
distribute my incoming data from various sources such that incoming data is
going to multiple spark streaming nodes? Is it done by spark client with
help
Done :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/History-Server-renered-page-not-suitable-for-load-balancing-tp7447p8550.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.