[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

Thomas Graves (JIRA) Wed, 20 Aug 2014 09:06:53 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104077#comment-14104077
 ]


Thomas Graves commented on SPARK-3129:
--------------------------------------

Yes that probably means using reflection. 

I think having a file based one makes sense so we don't have other dependencies 
if you don't need them.  You can always make it more complex and use zookeeper 
for those who want to install it.   For yarn you could save it in the 
.sparkStaging directories along with the application jars that way it knows 
where to find it.    

You still have the question of how authentication works.  This would require 
either the secret key being stored somewhere in hdfs also (and protected) or 
some other way for executors to allow connections and figure out this is a 
restart.   

> Prevent data loss in Spark Streaming
> ------------------------------------
>
>                 Key: SPARK-3129
>                 URL: https://issues.apache.org/jira/browse/SPARK-3129
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>         Attachments: StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

Reply via email to