## What is the purpose of the change

IN clauses are translated to JOIN with VALUES if the number of elements in the 
IN clause exceeds a certain threshold. This should not be done, because a 
streaming join is very heavy and materializes both inputs (which is fine for 
the VALUES) input but not for the other.

This pull request force usage of a cascade of predicates in all cases, both for 
streaming and batch. Also a rule has been added to convert these predicates to 
IN and NOT_IN. 

Currently Flink code has already use HashSet when generate code for IN/NOT_IN.



## Brief change log

  - Add a configuration for SqlToRelConverter to force usage of a cascade of 
predicates in all cases, both for streaming and batch.
  - Add some rules to convert predicates to IN and NOT_IN.
  - Add test cases.


## Verifying this change

This change added tests and can be verified as follows:

  - Add integration tests for IN/NOT_IN in CalcITCase.
  - Add unit tests to test plan in CalcTest.
  - Added test that validates replace SqlToRelConverter Config

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)


[ Full content available at: https://github.com/apache/flink/pull/6792 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to