[
https://issues.apache.org/jira/browse/FLINK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642878#comment-16642878
]
Piotr Nowojski commented on FLINK-10474:
----------------------------------------
Sorry for jumping a bit late to the discussion, but I would like to point out
couple of drawbacks of the 1. approach:
# it's less general. 2nd option would/could cover more cases like: IN queries
with bounded table (not values), JOINS with bounded tables (or values). JOINS
with bounded are something that is being asked by the users and is something
that we would like to have. If we go now with 1. approach, it will be a wasted
effort after implementing bounded JOINS.
# It adds complexity. Despite it being maybe easier to implement, it doesn't
add new features to the Flink, while increasing code complexity by adding some
code to handle only a special case.
# It will complicate planning logic and will more diverge streaming plans from
the batch. This again will rise the complexity of the project (more moving
parts, more things one have to consider when analysing planning results).
> Don't translate IN to JOIN with VALUES for streaming queries
> ------------------------------------------------------------
>
> Key: FLINK-10474
> URL: https://issues.apache.org/jira/browse/FLINK-10474
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Affects Versions: 1.6.1, 1.7.0
> Reporter: Fabian Hueske
> Assignee: Hequn Cheng
> Priority: Major
> Labels: pull-request-available
>
> IN clauses are translated to JOIN with VALUES if the number of elements in
> the IN clause exceeds a certain threshold. This should not be done, because a
> streaming join is very heavy and materializes both inputs (which is fine for
> the VALUES) input but not for the other.
> There are two ways to solve this:
> # don't translate IN to a JOIN at all
> # translate it to a JOIN but have a special join strategy if one input is
> bound and final (non-updating)
> Option 1. should be easy to do, option 2. requires much more effort.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)