[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

ASF GitHub Bot (JIRA) Mon, 22 Oct 2018 13:43:55 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659655#comment-16659655
 ]


ASF GitHub Bot commented on FLINK-10205:
----------------------------------------

isunjin commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431950619
 
 
   @tillrohrmann 
   
   >The thing I'm questioning is whether the InputSplits of the failed task 
need to be processed by the same (restarted) task or can be given to any 
running task.
   
   Agree. 
   
   I think failed task **doesn't** very necessary need to be processed by the 
same task (executionvertex).
   
   > So far I'm not convinced that something would break if we simply return 
the InputSplits to the InputSplitAssigner
   
   Agree. 
   
   i  think ```simply return the InputSplits to the InputSplitAssigner``` would 
work, the point is how to make it work.
   
   Restart the entier graph will call ExecutionJobVertex.resetForNewExecution 
which will create a new ```InputSplitAssigner``` and "return" all 
```InputSplits``` to ``` InputSplitAssigner```.
   
   My point is that for fine-grian failover, we might not want to return  all 
```InputSplits``` but just the failed ```InputSplits```.  However, currently 
not all subclass of InputSplitAssigner has the logic to ```simply return the 
InputSplits to the InputSplitAssigner```, such as 
```LocatableInputSplitAssigner``` or any other ```customized 
InputSplitAssigner```.
   
   ```simply return the InputSplits to the InputSplitAssigner``` also implies 
transaction between task and jobManager (maybe multiple one), we need to make 
sure the ```inputSplits``` get return to the ```InputSplitAssigner``` exactly 
once. what happened if we have speculative execution, which means two task 
consume the same set of InputSplits and but not fail at same time, does every 
InputSplitAssigner need to keep a list to deduplicate? what happened if the TM 
died or has network issue and InputSplit cannot be return?
   
   Save the ```InputSplits``` in executionVertex is a way to "return" it to ``` 
InputSplitAssigner```, the "side effect" of this implementation is that this 
also implies the ``` InputSplits``` will be handled by the same task 
(executionVertex). But this seams a simple and safe way to implement ```simply 
return the InputSplits to the InputSplitAssigner``` with transaction. 
   
   @tillrohrmann, the above is my understanding, let you know if we are on the 
same page. I would happy to redo this if you have any other suggestion. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.6.2, 1.7.0
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

Reply via email to