Re: Spark or Storm

Ashish Soni Wed, 17 Jun 2015 05:52:16 -0700

As per my Best Understanding Spark Streaming offer Exactly once processing
, is this achieve only through updateStateByKey or there is another way to
do the same.


Ashish

On Wed, Jun 17, 2015 at 8:48 AM, Enno Shioji <eshi...@gmail.com> wrote:

> In that case I assume you need exactly once semantics. There's no
> out-of-the-box way to do that in Spark. There is updateStateByKey, but it's
> not practical with your use case as the state is too large (it'll try to
> dump the entire intermediate state on every checkpoint, which would be
> prohibitively expensive).
>
> So either you have to implement something yourself, or you can use Storm
> Trident (or transactional low-level API).
>
> On Wed, Jun 17, 2015 at 1:26 PM, Ashish Soni <asoni.le...@gmail.com>
> wrote:
>
>> My Use case is below
>>
>> We are going to receive lot of event as stream ( basically Kafka Stream )
>> and then we need to process and compute
>>
>> Consider you have a phone contract with ATT and every call / sms / data
>> useage you do is an event and then it needs  to calculate your bill on real
>> time basis so when you login to your account you can see all those variable
>> as how much you used and how much is left and what is your bill till date
>> ,Also there are different rules which need to be considered when you
>> calculate the total bill one simple rule will be 0-500 min it is free but
>> above it is $1 a min.
>>
>> How do i maintain a shared state  ( total amount , total min , total data
>> etc ) so that i know how much i accumulated at any given point as events
>> for same phone can go to any node / executor.
>>
>> Can some one please tell me how can i achieve this is spark as in storm i
>> can have a bolt which can do this ?
>>
>> Thanks,
>>
>>
>>
>> On Wed, Jun 17, 2015 at 4:52 AM, Enno Shioji <eshi...@gmail.com> wrote:
>>
>>> I guess both. In terms of syntax, I was comparing it with Trident.
>>>
>>> If you are joining, Spark Streaming actually does offer windowed join
>>> out of the box. We couldn't use this though as our event stream can grow
>>> "out-of-sync", so we had to implement something on top of Storm. If your
>>> event streams don't become out of sync, you may find the built-in join in
>>> Spark Streaming useful. Storm also has a join keyword but its semantics are
>>> different.
>>>
>>>
>>> > Also, what do you mean by "No Back Pressure" ?
>>>
>>> So when a topology is overloaded, Storm is designed so that it will stop
>>> reading from the source. Spark on the other hand, will keep reading from
>>> the source and spilling it internally. This maybe fine, in fairness, but it
>>> does mean you have to worry about the persistent store usage in the
>>> processing cluster, whereas with Storm you don't have to worry because the
>>> messages just remain in the data store.
>>>
>>> Spark came up with the idea of rate limiting, but I don't feel this is
>>> as nice as back pressure because it's very difficult to tune it such that
>>> you don't cap the cluster's processing power but yet so that it will
>>> prevent the persistent storage to get used up.
>>>
>>>
>>> On Wed, Jun 17, 2015 at 9:33 AM, Spark Enthusiast <
>>> sparkenthusi...@yahoo.in> wrote:
>>>
>>>> When you say Storm, did you mean Storm with Trident or Storm?
>>>>
>>>> My use case does not have simple transformation. There are complex
>>>> events that need to be generated by joining the incoming event stream.
>>>>
>>>> Also, what do you mean by "No Back PRessure" ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshi...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> We've evaluated Spark Streaming vs. Storm and ended up sticking with
>>>> Storm.
>>>>
>>>> Some of the important draw backs are:
>>>> Spark has no back pressure (receiver rate limit can alleviate this to a
>>>> certain point, but it's far from ideal)
>>>> There is also no exactly-once semantics. (updateStateByKey can achieve
>>>> this semantics, but is not practical if you have any significant amount of
>>>> state because it does so by dumping the entire state on every 
>>>> checkpointing)
>>>>
>>>> There are also some minor drawbacks that I'm sure will be fixed
>>>> quickly, like no task timeout, not being able to read from Kafka using
>>>> multiple nodes, data loss hazard with Kafka.
>>>>
>>>> It's also not possible to attain very low latency in Spark, if that's
>>>> what you need.
>>>>
>>>> The pos for Spark is the concise and IMO more intuitive syntax,
>>>> especially if you compare it with Storm's Java API.
>>>>
>>>> I admit I might be a bit biased towards Storm tho as I'm more familiar
>>>> with it.
>>>>
>>>> Also, you can do some processing with Kinesis. If all you need to do is
>>>> straight forward transformation and you are reading from Kinesis to begin
>>>> with, it might be an easier option to just do the transformation in 
>>>> Kinesis.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan <
>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>
>>>> Whatever you write in bolts would be the logic you want to apply on
>>>> your events. In Spark, that logic would be coded in map() or similar such
>>>> transformations and/or actions. Spark doesn't enforce a structure for
>>>> capturing your processing logic like Storm does.
>>>> Regards
>>>> Sab
>>>> Probably overloading the question a bit.
>>>>
>>>> In Storm, Bolts have the functionality of getting triggered on events.
>>>> Is that kind of functionality possible with Spark streaming? During each
>>>> phase of the data processing, the transformed data is stored to the
>>>> database and this transformed data should then be sent to a new pipeline
>>>> for further processing
>>>>
>>>> How can this be achieved using Spark?
>>>>
>>>>
>>>>
>>>> On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast <
>>>> sparkenthusi...@yahoo.in> wrote:
>>>>
>>>> I have a use-case where a stream of Incoming events have to be
>>>> aggregated and joined to create Complex events. The aggregation will have
>>>> to happen at an interval of 1 minute (or less).
>>>>
>>>> The pipeline is :
>>>>                                   send events
>>>>                enrich event
>>>> Upstream services -------------------> KAFKA ---------> event Stream
>>>> Processor ------------> Complex Event Processor ------------> Elastic
>>>> Search.
>>>>
>>>> From what I understand, Storm will make a very good ESP and Spark
>>>> Streaming will make a good CEP.
>>>>
>>>> But, we are also evaluating Storm with Trident.
>>>>
>>>> How does Spark Streaming compare with Storm with Trident?
>>>>
>>>> Sridhar Chellappa
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.a...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> I have a similar scenario where we need to bring data from kinesis to
>>>> hbase. Data volecity is 20k per 10 mins. Little manipulation of data will
>>>> be required but that's regardless of the tool so we will be writing that
>>>> piece in Java pojo.
>>>> All env is on aws. Hbase is on a long running EMR and kinesis on a
>>>> separate cluster.
>>>> TIA.
>>>> Best
>>>> Ayan
>>>> On 17 Jun 2015 12:13, "Will Briggs" <wrbri...@gmail.com> wrote:
>>>>
>>>> The programming models for the two frameworks are conceptually rather
>>>> different; I haven't worked with Storm for quite some time, but based on my
>>>> old experience with it, I would equate Spark Streaming more with Storm's
>>>> Trident API, rather than with the raw Bolt API. Even then, there are
>>>> significant differences, but it's a bit closer.
>>>>
>>>> If you can share your use case, we might be able to provide better
>>>> guidance.
>>>>
>>>> Regards,
>>>> Will
>>>>
>>>> On June 16, 2015, at 9:46 PM, asoni.le...@gmail.com wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I am evaluating spark VS storm ( spark streaming  ) and i am not able
>>>> to see what is equivalent of Bolt in storm inside spark.
>>>>
>>>> Any help will be appreciated on this ?
>>>>
>>>> Thanks ,
>>>> Ashish
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark or Storm

Reply via email to