I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since different executors in spark are
allocated at each run so instantiating a new kafka producer at each run
seems a costly operation .Is there a way to reuse objects in processing
Use foreachPartition, and allocate whatever the costly resource is once per
partition.
On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora shushantaror...@gmail.com
wrote:
I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since
Yeah, creating a new producer at the granularity of partitions may not be
that costly.
On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger c...@koeninger.org wrote:
Use foreachPartition, and allocate whatever the costly resource is once
per partition.
On Mon, Jul 6, 2015 at 6:11 AM, Shushant
whats the difference between foreachPartition vs mapPartitions for a
Dtstream both works at partition granularity?
One is an operation and another is action but if I call an opeartion
afterwords mapPartitions also, which one is more efficient and recommeded?
On Tue, Jul 7, 2015 at 12:21 AM,
Both have same efficiency. The primary difference is that one is a
transformation (hence is lazy, and requires another action to actually
execute), and the other is an action.
But it may be a slightly better design in general to have transformations
be purely functional (that is, no external side
On using foreachPartition jobs get created are not displayed on driver
console but are visible on web ui.
On driver it creates some stage statistics of form [Stage 2:
(0 + 2) / 5] and disappeared .
I am using foreachPartition as :