lincoln-lil opened a new pull request, #20393:
URL: https://github.com/apache/flink/pull/20393

   ## What is the purpose of the change
   this is two main parts of the parent issue FLINK-27849
   1. FLINK-28570: Introduces a StreamNonDeterministicPlanResolver to validate 
and try to solve (lookup join only) the non-deterministic updates problem which 
may cause wrong result or error
    which introduce a new optimizer option: 
"table.optimizer.non-deterministic-update.handling" with default value  
`IGNORE` to keep the compatibility. When user cares about the effects of 
non-deterministic updates, the option can be set to `TRY_RESOLVE`, then the 
inner `StreamNonDeterministicPlanResolver` will validate if there's any 
non-deterministic updates which may cause wrong result or error, and also try 
to eliminate the non determinism generated by lookup join node by adding 
materialization to a new physical lookup joins operator, will raise an error if 
there still exists other non-deterministic beside lookup join
   
   2. FLINK-28568:  Implements a new lookup join operator (sync mode only) with 
state to eliminate the non determinism
    which is a followup issue of FLINK-28570 to finish the materialization 
support in runtime, note that only sync lookup function is supported for now 
due to some runtime limitation (async lookup join based on the 
AsyncWaitOperator which has operatror state, and the materialization 
requirement here needs a keyed state, so unless we invent a new 
KeyedAsyncWaitOperator here, the work can not be done for now)
   
   ## Brief change log
   * add 'table.optimizer.non-deterministic-update.handling' to 
OptimizerConfigOptions
   * implement a StreamNonDeterministicPlanResolver to visit all stream 
physical node and try to rewrite lookup join node with materialization enabled
   * add new tests and update existing tests to best effortly covering the new 
logic 
   
   ## Verifying this change
   NonDeterministicDagTest KeyedLookupJoinHarnessTest and existing tests
   
   
   ## Does this pull request potentially affect one of the following parts:
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
@Public(Evolving): (no)
     - The serializers: (no )
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
     - Does this pull request introduce a new feature? (yes)
     - If yes, how is the feature documented? (docs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to