And yet another way is to demultiplex at one point which will yield separate 
DStreams for each message type which you can then process in independent DAG 
pipelines in the following way:

 

MessageType1DStream = MainDStream.filter(message type1)

MessageType2DStream = MainDStream.filter(message type2)

MessageType3DStream = MainDStream.filter(message type3)

 

Then proceed your processing independently with MessageType1DStream, 
MessageType2DStream and MessageType3DStream ie each of them is a starting point 
of a new DAG pipeline running in parallel

 

From: Tathagata Das [mailto:t...@databricks.com] 
Sent: Thursday, April 16, 2015 12:52 AM
To: Jianshi Huang
Cc: user; Shao, Saisai; Huang Jie
Subject: Re: How to do dispatching in Streaming?

 

It may be worthwhile to do architect the computation in a different way. 

 

dstream.foreachRDD { rdd => 

   rdd.foreach { record => 

      // do different things for each record based on filters

   }

}

 

TD

 

On Sun, Apr 12, 2015 at 7:52 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote:

Hi,

 

I have a Kafka topic that contains dozens of different types of messages. And 
for each one I'll need to create a DStream for it.

 

Currently I have to filter the Kafka stream over and over, which is very 
inefficient.

 

So what's the best way to do dispatching in Spark Streaming? (one DStream -> 
multiple DStreams)

 




Thanks,

-- 

Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

 

Reply via email to