Hi,

I agree with Evo, Spark works at a different abstraction level than Storm,
and there is not a direct translation from Storm topologies to Spark
Streaming jobs. I think something remotely close is the notion of lineage
of  DStreams or RDDs, which is similar to a logical plan of an engine like
Apache Pig. Here
https://github.com/JerryLead/SparkInternals/blob/master/pdf/2-JobLogicalPlan.pdf
is a diagram of a spark logical plan by a third party. I would suggest you
reading the book "Learning Spark"
https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/foreword01.html
for more on this. But in general I think that Storm has an abstraction
level closer to MapReduce, and Spark has an abstraction level closer to
Pig, so the correspondence between Storm and Spark notions cannot be
perfect.

Greetings,

Juan




2015-05-06 11:37 GMT+02:00 Evo Eftimov <evo.efti...@isecc.com>:

> What is called Bolt in Storm is essentially a combination of
> [Transformation/Action and DStream RDD] in Spark – so to achieve a higher
> parallelism for specific Transformation/Action on specific Dstream RDD
> simply repartition it to the required number of partitions which directly
> relates to the corresponding number of Threads
>
>
>
> *From:* anshu shukla [mailto:anshushuk...@gmail.com]
> *Sent:* Wednesday, May 6, 2015 9:33 AM
> *To:* ayan guha
> *Cc:* user@spark.apache.org; d...@spark.apache.org
> *Subject:* Re: Creating topology in spark streaming
>
>
>
> But main problem is how to increase the level of parallelism  for any
> particular bolt logic .
>
>
>
> suppose i  want  this type of topology .
>
>
>
> https://storm.apache.org/documentation/images/topology.png
>
>
>
> How we can manage it .
>
>
>
> On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.a...@gmail.com> wrote:
>
> Every transformation on a dstream will create another dstream. You may
> want to take a look at foreachrdd? Also, kindly share your code so people
> can help better
>
> On 6 May 2015 17:54, "anshu shukla" <anshushuk...@gmail.com> wrote:
>
> Please help  guys, Even  After going through all the examples given i have
> not understood how to pass the  D-streams  from one bolt/logic to other
> (without writing it on HDFS etc.) just like emit function in storm .
>
> Suppose i have topology with 3  bolts(say)
>
>
>
> *BOLT1(parse the tweets nd emit tweet using given
> hashtags)=====>Bolt2(Complex logic for sentiment analysis over
> tweets)=======>BOLT3(submit tweets to the sql database using spark SQL)*
>
>
>
>
>
> Now  since Sentiment analysis will take most of the time ,we have to
> increase its level of parallelism for tuning latency. Howe to increase the
> levele of parallelism since the logic of topology is not clear .
>
>
>
> --
>
> Thanks & Regards,
> Anshu Shukla
>
> Indian Institute of Sciences
>
>
>
>
>
> --
>
> Thanks & Regards,
> Anshu Shukla
>

Reply via email to