[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102504#comment-14102504
 ] 

Thomas Graves commented on SPARK-3129:
--------------------------------------

A couple of random thoughts on this for yarn.  yarn added this ability in 2.4.0 
and you have to tell it you want it in the application submission context.  So 
you will have to handle other versions of yarn properly where its not supported.
 I believe yarn will tell you what nodes you have containers already running on 
but you'll have to figure out details about ports, etc. I haven't looked at all 
the specifics.

You'll have to figure out how to do authentication properly.  This gets 
forgotten about many times. 

I think we should flush out more of the high level design concerns between 
yarn/standalone/mesos and on yarn the client/cluster modes. 

> Prevent data loss in Spark Streaming
> ------------------------------------
>
>                 Key: SPARK-3129
>                 URL: https://issues.apache.org/jira/browse/SPARK-3129
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>         Attachments: StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to