Hi, Does the split API in Bounded/UnboundedSource guarantee to return the same result if invoked in different parallel instances in a distributed environment?
For example, assume the original source can split into 3 sub-sources. Say the runner creates 3 parallel source operator instances (perhaps running in different servers) and uses each instance to handle 1 of the 3 sub-sources. In this case, if each operator instance invokes the split method in a distributed manner, will they get the same split result? My understanding is that the current API does not guarantee the 3 operator instances will receive the same split result. It is possible that 1 of the 3 instances receive 4 sub-sources and the other two receives 3. Or, even if they all get 3 sub-sources, there could be gaps and overlaps in the data streams. If so, shall we add an API to indicate that whether a source can split at runtime? One solution is to avoid this problem is to split the source at translation time and directly pass sub-sources to operator instances. But this is not ideal. The server runs the translation might not have access to the source (DB, KV, MQ, etc). Or the application may want to dynamically change the source parallel width at runtime. Hence, the runner/engine sometimes have to split the source during runtime in a distributed environment. Thanks, Shen