writing to kafka using spark streaming

2015-07-06 Thread Shushant Arora
I have a requirement to write in kafka queue from a spark streaming application. I am using spark 1.2 streaming. Since different executors in spark are allocated at each run so instantiating a new kafka producer at each run seems a costly operation .Is there a way to reuse objects in processing

Re: writing to kafka using spark streaming

2015-07-06 Thread Cody Koeninger
Use foreachPartition, and allocate whatever the costly resource is once per partition. On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora shushantaror...@gmail.com wrote: I have a requirement to write in kafka queue from a spark streaming application. I am using spark 1.2 streaming. Since

Re: writing to kafka using spark streaming

2015-07-06 Thread Tathagata Das
Yeah, creating a new producer at the granularity of partitions may not be that costly. On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger c...@koeninger.org wrote: Use foreachPartition, and allocate whatever the costly resource is once per partition. On Mon, Jul 6, 2015 at 6:11 AM, Shushant

Re: writing to kafka using spark streaming

2015-07-06 Thread Shushant Arora
whats the difference between foreachPartition vs mapPartitions for a Dtstream both works at partition granularity? One is an operation and another is action but if I call an opeartion afterwords mapPartitions also, which one is more efficient and recommeded? On Tue, Jul 7, 2015 at 12:21 AM,

Re: writing to kafka using spark streaming

2015-07-06 Thread Tathagata Das
Both have same efficiency. The primary difference is that one is a transformation (hence is lazy, and requires another action to actually execute), and the other is an action. But it may be a slightly better design in general to have transformations be purely functional (that is, no external side

Re: writing to kafka using spark streaming

2015-07-06 Thread Shushant Arora
On using foreachPartition jobs get created are not displayed on driver console but are visible on web ui. On driver it creates some stage statistics of form [Stage 2: (0 + 2) / 5] and disappeared . I am using foreachPartition as :