[
https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469873#comment-15469873
]
Timo Walther commented on FLINK-4565:
-------------------------------------
Calcite translates the IN operator in
{{org.apache.calcite.sql2rel.SqlToRelConverter#convertExpression}}. Calcite
translates this into an Aggregate and Join. After fixing some issue in
"DataSetAggregate" we can execute: {{"SELECT WordCount.word FROM WordCount
WHERE WordCount.word IN (SELECT WordCount1.word AS w FROM WordCount1)"}}. The
plan looks like:
{code}
== Physical Execution Plan ==
Stage 4 : Data Source
content : collect elements with CollectionInputFormat
Partitioning : RANDOM_PARTITIONED
Stage 3 : Map
content : from: (word, frequency)
ship_strategy : Forward
exchange_mode : BATCH
driver_strategy : Map
Partitioning : RANDOM_PARTITIONED
Stage 8 : Map
content : from: (word, frequency)
ship_strategy : Forward
exchange_mode : BATCH
driver_strategy : Map
Partitioning : RANDOM_PARTITIONED
Stage 7 : Map
content : prepare select: (word)
ship_strategy : Forward
exchange_mode : PIPELINED
driver_strategy : Map
Partitioning : RANDOM_PARTITIONED
Stage 6 : GroupCombine
content : groupBy: (word), select:(word)
ship_strategy : Forward
exchange_mode : PIPELINED
driver_strategy : Sorted Combine
Partitioning : RANDOM_PARTITIONED
Stage 5 : GroupReduce
content : groupBy: (word),
select:(word)
ship_strategy : Hash Partition
on [0]
exchange_mode : PIPELINED
driver_strategy : Sorted Group
Reduce
Partitioning :
RANDOM_PARTITIONED
Stage 2 : Join
content : where:
(=(word, w)), join: (word, frequency, w)
ship_strategy : Hash
Partition on [0]
exchange_mode :
PIPELINED
driver_strategy :
Hybrid Hash (build: from: (word, frequency) (id: 3))
Partitioning :
RANDOM_PARTITIONED
Stage 1 : FlatMap
content :
select: (word)
ship_strategy :
Forward
exchange_mode :
PIPELINED
driver_strategy
: FlatMap
Partitioning :
RANDOM_PARTITIONED
Stage 0 : Data
Sink
content
: org.apache.flink.api.java.io.DiscardingOutputFormat
ship_strategy : Forward
exchange_mode : PIPELINED
Partitioning : RANDOM_PARTITIONED
{code}
> Support for SQL IN operator
> ---------------------------
>
> Key: FLINK-4565
> URL: https://issues.apache.org/jira/browse/FLINK-4565
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Reporter: Timo Walther
> Assignee: Simone Robutti
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But
> it should also be available in the Table API and tested.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)