[ 
https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984927#comment-13984927
 ] 

Tathagata Das commented on SPARK-1645:
--------------------------------------

Ah, I think get it now. So instead of the default push-based as it is now 
(where a sink is running with the receiver), you simply want to make 
pull-based. 

So if the current situation is this 

!http://i.imgur.com/m8oiOwl.png?1!  

you propose this  

!http://i.imgur.com/N6Ee1cb.png?1!

Right?
Assuming it is right, that does make it very convenient for Spark Streaming's 
receivers. However what does it mean for reliable receiving? When the receiver 
pulls the data from the source, it will acknowledge the source only when the 
Spark acknowledges that it has reliably saved the data?


> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
>                 Key: SPARK-1645
>                 URL: https://issues.apache.org/jira/browse/SPARK-1645
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>            Reporter: Hari Shreedharan
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node, 
> else Flume cannot send data to it. We can fix this by adding a Flume receiver 
> that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The 
> new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest, 
> not just Flume. I will file a separate jira for this and we should work on it 
> there. This is a longer term project and requires considerable development 
> work.
> I intend to start working on these soon. Any input is appreciated. (It'd be 
> great if someone can add me as a contributor on jira, so I can assign the 
> jira to myself).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to