[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

ASF GitHub Bot (JIRA) Wed, 31 Oct 2018 03:00:58 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669861#comment-16669861
 ]


ASF GitHub Bot commented on FLINK-10205:
----------------------------------------

tillrohrmann commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit 
Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-434628107
 
 
   Thanks for the explanation @isunjin. Now I understand that you don't think 
that it is strictly required to let a failed task process exactly the same 
`InputSplits` and that it is just a side effect of the current implementation.
   
   So in the end you've implemented it this way, because the 
`InputSplitAssigner` does not support returning `InputSplits`. Maybe that is 
something we should change. We could, for example, add a new interface which 
needs to be implemented by an `InputSplitAssigner` to support fine grained 
recovery. Otherwise, such a failure will result into a global failover.
   
   My concern is that by storing `InputSplits` in the `Executions` that we are 
mixing a bit of concerns. For example, assume that we have three tasks failing 
and we also lost a slot. Then we could only restart two of these tasks and need 
to distribute the slots of the third `Execution` among the newly started 
`Executions`. It would be much easier to simply return all slots to the 
`InputSplitAssigner` and let the newly started `Executions` pull from there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.6.2, 1.7.0
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

Reply via email to