Re: Deploying spark-streaming application on production

2015-10-01 Thread Jeetendra Gangele
Ya Also I think I need to enable the checkpointing and rather then building the lineage DAG need to store the RDD data into HDFS. On 23 September 2015 at 01:04, Adrian Tanase wrote: > btw I re-read the docs and I want to clarify that reliable receiver + WAL > gives you at

Re: Deploying spark-streaming application on production

2015-09-22 Thread Adrian Tanase
btw I re-read the docs and I want to clarify that reliable receiver + WAL gives you at least once, not exactly once semantics. Sent from my iPhone On 21 Sep 2015, at 21:50, Adrian Tanase > wrote: I'm wondering, isn't this the canonical use case for

Re: Deploying spark-streaming application on production

2015-09-22 Thread Petr Novak
Or if there is an option on MQTT server to block events ingestion towards Spark but still keep receiving and buffering them in MQTT and wait for ACK, then it would be possible just to gracefully shutdown Spark job to finish what is in its buffers and restart. Petr On Tue, Sep 22, 2015 at 10:53

Re: Deploying spark-streaming application on production

2015-09-22 Thread Petr Novak
If MQTT can be configured with long enough timeout for ACK and can buffer enough events while waiting for Spark Job restart then I think one could do even without WAL assuming that Spark job shutdowns gracefully. Possibly saving its own custom metadata somewhere, f.e. Zookeeper, if required to

Re: Deploying spark-streaming application on production

2015-09-22 Thread Petr Novak
Ahh the problem probably is async ingestion to Spark receiver buffers, hence WAL is required I would say. Petr On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak wrote: > If MQTT can be configured with long enough timeout for ACK and can buffer > enough events while waiting for

Deploying spark-streaming application on production

2015-09-21 Thread Jeetendra Gangele
Hi All, I have an spark streaming application with batch (10 ms) which is reading the MQTT channel and dumping the data from MQTT to HDFS. So suppose if I have to deploy new application jar(with changes in spark streaming application) what is the best way to deploy, currently I am doing as below

Re: Deploying spark-streaming application on production

2015-09-21 Thread Petr Novak
In short there is no direct support for it in Spark AFAIK. You will either manage it in MQTT or have to add another layer of indirection - either in-memory based (observable streams, in-mem db) or disk based (Kafka, hdfs files, db) which will keep you unprocessed events. Now realizing, there is

Re: Deploying spark-streaming application on production

2015-09-21 Thread Adrian Tanase
I'm wondering, isn't this the canonical use case for WAL + reliable receiver? As far as I know you can tune Mqtt server to wait for ack on messages (qos level 2?). With some support from the client libray you could achieve exactly once semantics on the read side, if you ack message only after

Re: Deploying spark-streaming application on production

2015-09-21 Thread Petr Novak
I think you would have to persist events somehow if you don't want to miss them. I don't see any other option there. Either in MQTT if it is supported there or routing them through Kafka. There is WriteAheadLog in Spark but you would have decouple stream MQTT reading and processing into 2

Re: Deploying spark-streaming application on production

2015-09-21 Thread Petr Novak
I should read my posts at least once to avoid so many typos. Hopefully you are brave enough to read through. Petr On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak wrote: > I think you would have to persist events somehow if you don't want to miss > them. I don't see any other