[ https://issues.apache.org/jira/browse/FLINK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636645#comment-16636645 ]
Hequn Cheng commented on FLINK-10474: ------------------------------------- Hi, thanks for the discussion and ideas. I also prefer the first option. I think we can override {{getInSubQueryThreshold}} in {{SqlToRelConverter.Config}} to avoid converting to a join. And we can use a more efficient HashSet during code gen. > Don't translate IN to JOIN with VALUES for streaming queries > ------------------------------------------------------------ > > Key: FLINK-10474 > URL: https://issues.apache.org/jira/browse/FLINK-10474 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Affects Versions: 1.6.1, 1.7.0 > Reporter: Fabian Hueske > Assignee: Hequn Cheng > Priority: Major > > IN clauses are translated to JOIN with VALUES if the number of elements in > the IN clause exceeds a certain threshold. This should not be done, because a > streaming join is very heavy and materializes both inputs (which is fine for > the VALUES) input but not for the other. > There are two ways to solve this: > # don't translate IN to a JOIN at all > # translate it to a JOIN but have a special join strategy if one input is > bound and final (non-updating) > Option 1. should be easy to do, option 2. requires much more effort. -- This message was sent by Atlassian JIRA (v7.6.3#76005)