@StefanRRichter thanks for comments
- for the inconsistent issue, 
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is the repro, the logic is simple, we throw a exception in the wordcount 
example and use restartRegion as the failover strategy, the job was expected to 
fail, but succeed with incorrect result. the reason is that while restart, it 
will call requestNextSplit, it will return empty as the the split was drained 
to empty, since its empty, flatMap method will not get executed and exception 
will not throw.

- the goal for the general approach is to make sure we have the assumption 
"deterministic behavior" as much as possible, as deterministic is crucial for 
failover. the code is not target for introduce "deterministic" for 
DataSourceTask, right now DataSourceTask is only used for batch scenario . For 
streaming scenario, it will work once we treat the splitIndex as state.

- for the load balance, i think the first priority is make data consistent, we 
can certainly add more logic to make it more efficient.   

- Thanks for let me know this, however, this is a bug right now, actually block 
me moving forward, we can refactor this code if we have a fundamental different 
design. 


 

[ Full content available at: https://github.com/apache/flink/pull/6684 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to