lincoln-lil opened a new pull request, #20393:
URL: https://github.com/apache/flink/pull/20393
## What is the purpose of the change
this is two main parts of the parent issue FLINK-27849
1. FLINK-28570: Introduces a StreamNonDeterministicPlanResolver to validate
and try to solve (lookup join only) the non-deterministic updates problem which
may cause wrong result or error
which introduce a new optimizer option:
"table.optimizer.non-deterministic-update.handling" with default value
`IGNORE` to keep the compatibility. When user cares about the effects of
non-deterministic updates, the option can be set to `TRY_RESOLVE`, then the
inner `StreamNonDeterministicPlanResolver` will validate if there's any
non-deterministic updates which may cause wrong result or error, and also try
to eliminate the non determinism generated by lookup join node by adding
materialization to a new physical lookup joins operator, will raise an error if
there still exists other non-deterministic beside lookup join
2. FLINK-28568: Implements a new lookup join operator (sync mode only) with
state to eliminate the non determinism
which is a followup issue of FLINK-28570 to finish the materialization
support in runtime, note that only sync lookup function is supported for now
due to some runtime limitation (async lookup join based on the
AsyncWaitOperator which has operatror state, and the materialization
requirement here needs a keyed state, so unless we invent a new
KeyedAsyncWaitOperator here, the work can not be done for now)
## Brief change log
* add 'table.optimizer.non-deterministic-update.handling' to
OptimizerConfigOptions
* implement a StreamNonDeterministicPlanResolver to visit all stream
physical node and try to rewrite lookup join node with materialization enabled
* add new tests and update existing tests to best effortly covering the new
logic
## Verifying this change
NonDeterministicDagTest KeyedLookupJoinHarnessTest and existing tests
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (no)
- The serializers: (no )
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (yes)
- If yes, how is the feature documented? (docs)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]