So you need some state between messages in a partition. You can use
mapPartitions or foreachPartition, which allow you to write code to process
an entire partition.

On Thu, Aug 13, 2015 at 11:48 AM, Priya Ch <learnings.chitt...@gmail.com>
wrote:

> Hi Philip,
>
>  I have the following requirement -
> I read the streams of data from various partitions of kafka topic. And
> then I union the dstreams and apply hash partitioner so messages of same
> key would go into single partition of an rdd, which is ofcourse handled by
> a single thread. This way we trying to resolve concurrency issue.
>
> Now one of the partitions of the rdd holds messages with same key. Let's
> say 1st message in the partition may correspond to ticket issuance and 2nd
> message might corresponds to update on the ticket. Now while handling 1st
> message there is different logic and 2nd message's logic depends on 1st
> message.
> Hence using rdd.foreach i am handling different logic for individual
> messages. Now bulk rdd.saveToCassandra will now work.
>
> Hope you got what i am trying to say..
>
> On Fri, Aug 14, 2015 at 12:07 AM, Philip Weaver <philip.wea...@gmail.com>
> wrote:
>
>> All you'd need to do is *transform* the rdd before writing it, e.g.
>> using the .map function.
>>
>>
>> On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch <learnings.chitt...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>>  I have a question in writing rdd to cassandra. Instead of writing
>>> entire rdd to cassandra, i want to write individual statement into
>>> cassandra beacuse there is a need to perform to ETL on each message ( which
>>> requires checking with the DB).
>>> How could i insert statements individually? Using
>>> CassandraConnector.session ??
>>>
>>> If so, what is the performance impact of this ? How about using
>>> sc.parallelize() for eah message in the rdd and then insert into cassandra ?
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>

Reply via email to