In general, if you have multiple steps in a workflow :
For every batch 
1.stream data from s3 
2.write it to hbase
3.execute a hive step using the data in s3 

In this case all these 3 steps are part of the workflow. That's the reason I 
mentioned about workflow orchestration.

The other question (2) is about how to manage the clusters without any downtime 
/ data loss .(especially when you want k being down the cluster and create a 
new one for running spark streaming )


Sent from my iPhone

> On Jun 22, 2016, at 10:17 AM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Hi Pandees,
> 
> can you kindly explain what you are trying to achieve by incorporating Spark 
> streaming with workflow orchestration. Is this some form of back-to-back 
> seamless integration.
> 
> I have not used it myself but would be interested in knowing more about your 
> use case.
> 
> Cheers,
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 22 June 2016 at 15:54, pandees waran <pande...@gmail.com> wrote:
>> Hi Mich, please let me know if you have any thoughts on the below. 
>> 
>> ---------- Forwarded message ----------
>> From: pandees waran <pande...@gmail.com>
>> Date: Wed, Jun 22, 2016 at 7:53 AM
>> Subject: spark streaming questions
>> To: user@spark.apache.org
>> 
>> 
>> Hello all,
>> 
>> I have few questions regarding spark streaming :
>> 
>> * I am wondering anyone uses spark streaming with workflow orchestrators 
>> such as data pipeline/SWF/any other framework. Is there any advantages 
>> /drawbacks on using a workflow orchestrator for spark streaming?
>> 
>> *How do you guys manage the cluster(bringing down /creating a new cluster ) 
>> without any data loss in streaming? 
>> 
>> I would like to hear your thoughts on this.
>> 
>> 
>> 
>> 
>> -- 
>> Thanks,
>> Pandeeswaran
> 

Reply via email to