[ 
https://issues.apache.org/jira/browse/FLINK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642878#comment-16642878
 ] 

Piotr Nowojski commented on FLINK-10474:
----------------------------------------

Sorry for jumping a bit late to the discussion, but I would like to point out 
couple of drawbacks of the 1. approach:
 # it's less general. 2nd option would/could cover more cases like: IN queries 
with bounded table (not values), JOINS with bounded tables (or values). JOINS 
with bounded are something that is being asked by the users and is something 
that we would like to have. If we go now with 1. approach, it will be a wasted 
effort after implementing bounded JOINS. 
 # It adds complexity. Despite it being maybe easier to implement, it doesn't 
add new features to the Flink, while increasing code complexity by adding some 
code to handle only a special case.
 # It will complicate planning logic and will more diverge streaming plans from 
the batch. This again will rise the complexity of the project (more moving 
parts, more things one have to consider when analysing planning results).

> Don't translate IN to JOIN with VALUES for streaming queries
> ------------------------------------------------------------
>
>                 Key: FLINK-10474
>                 URL: https://issues.apache.org/jira/browse/FLINK-10474
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.6.1, 1.7.0
>            Reporter: Fabian Hueske
>            Assignee: Hequn Cheng
>            Priority: Major
>              Labels: pull-request-available
>
> IN clauses are translated to JOIN with VALUES if the number of elements in 
> the IN clause exceeds a certain threshold. This should not be done, because a 
> streaming join is very heavy and materializes both inputs (which is fine for 
> the VALUES) input but not for the other.
> There are two ways to solve this:
>  # don't translate IN to a JOIN at all
>  # translate it to a JOIN but have a special join strategy if one input is 
> bound and final (non-updating)
> Option 1. should be easy to do, option 2. requires much more effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to