[ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659655#comment-16659655 ]
ASF GitHub Bot commented on FLINK-10205: ---------------------------------------- isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource… URL: https://github.com/apache/flink/pull/6684#issuecomment-431950619 @tillrohrmann >The thing I'm questioning is whether the InputSplits of the failed task need to be processed by the same (restarted) task or can be given to any running task. Agree. I think failed task **doesn't** very necessary need to be processed by the same task (executionvertex). > So far I'm not convinced that something would break if we simply return the InputSplits to the InputSplitAssigner Agree. i think ```simply return the InputSplits to the InputSplitAssigner``` would work, the point is how to make it work. Restart the entier graph will call ExecutionJobVertex.resetForNewExecution which will create a new ```InputSplitAssigner``` and "return" all ```InputSplits``` to ``` InputSplitAssigner```. My point is that for fine-grian failover, we might not want to return all ```InputSplits``` but just the failed ```InputSplits```. However, currently not all subclass of InputSplitAssigner has the logic to ```simply return the InputSplits to the InputSplitAssigner```, such as ```LocatableInputSplitAssigner``` or any other ```customized InputSplitAssigner```. ```simply return the InputSplits to the InputSplitAssigner``` also implies transaction between task and jobManager (maybe multiple one), we need to make sure the ```inputSplits``` get return to the ```InputSplitAssigner``` exactly once. what happened if we have speculative execution, which means two task consume the same set of InputSplits and but not fail at same time, does every InputSplitAssigner need to keep a list to deduplicate? what happened if the TM died or has network issue and InputSplit cannot be return? Save the ```InputSplits``` in executionVertex is a way to "return" it to ``` InputSplitAssigner```, the "side effect" of this implementation is that this also implies the ``` InputSplits``` will be handled by the same task (executionVertex). But this seams a simple and safe way to implement ```simply return the InputSplits to the InputSplitAssigner``` with transaction. @tillrohrmann, the above is my understanding, let you know if we are on the same page. I would happy to redo this if you have any other suggestion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Batch Job: InputSplit Fault tolerant for DataSourceTask > ------------------------------------------------------- > > Key: FLINK-10205 > URL: https://issues.apache.org/jira/browse/FLINK-10205 > Project: Flink > Issue Type: Sub-task > Components: JobManager > Affects Versions: 1.6.1, 1.6.2, 1.7.0 > Reporter: JIN SUN > Assignee: JIN SUN > Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > Today DataSource Task pull InputSplits from JobManager to achieve better > performance, however, when a DataSourceTask failed and rerun, it will not get > the same splits as its previous version. this will introduce inconsistent > result or even data corruption. > Furthermore, if there are two executions run at the same time (in batch > scenario), this two executions should process same splits. > we need to fix the issue to make the inputs of a DataSourceTask > deterministic. The propose is save all splits into ExecutionVertex and > DataSourceTask will pull split from there. > document: > [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005)