## What is the purpose of the change IN clauses are translated to JOIN with VALUES if the number of elements in the IN clause exceeds a certain threshold. This should not be done, because a streaming join is very heavy and materializes both inputs (which is fine for the VALUES) input but not for the other.
This pull request force usage of a cascade of predicates in all cases, both for streaming and batch. Also a rule has been added to convert these predicates to IN and NOT_IN. Currently Flink code has already use HashSet when generate code for IN/NOT_IN. ## Brief change log - Add a configuration for SqlToRelConverter to force usage of a cascade of predicates in all cases, both for streaming and batch. - Add some rules to convert predicates to IN and NOT_IN. - Add test cases. ## Verifying this change This change added tests and can be verified as follows: - Add integration tests for IN/NOT_IN in CalcITCase. - Add unit tests to test plan in CalcTest. - Added test that validates replace SqlToRelConverter Config ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) [ Full content available at: https://github.com/apache/flink/pull/6792 ] This message was relayed via gitbox.apache.org for [email protected]
