[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Luo updated ASTERIXDB-2350:
--------------------------------
    Summary: Improve IN subquery  (was: Improvement of IN subquery)

> Improve IN subquery
> -------------------
>
>                 Key: ASTERIXDB-2350
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: COMP - Compiler
>            Reporter: Chen Luo
>            Priority: Minor
>
> By default, IN is translated into a hybrid hash join followed by a sort group 
> by, which is expensive for simply in queries with constant values.
> For example
> {code}
> select *
> from lineitem
> where l_partkey in [1, 2 ,3];
> {code}
> is translated into
> {code}
> distribute result [$$16]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$16])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$16] <- [{"lineitem": $$lineitem}]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$lineitem])
>         -- STREAM_PROJECT  |PARTITIONED|
>           select ($$14)
>           -- STREAM_SELECT  |PARTITIONED|
>             project ([$$14, $$lineitem])
>             -- STREAM_PROJECT  |PARTITIONED|
>               exchange
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 group by ([$$20 := $$17]) decor ([$$lineitem]) {
>                           aggregate [$$14] <- [non-empty-stream()]
>                           -- AGGREGATE  |LOCAL|
>                             select (not(is-missing($$19)))
>                             -- STREAM_SELECT  |LOCAL|
>                               nested tuple source
>                               -- NESTED_TUPLE_SOURCE  |LOCAL|
>                        }
>                 -- PRE_CLUSTERED_GROUP_BY[$$17]  |PARTITIONED|
>                   exchange
>                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                     order (ASC, $$17) 
>                     -- STABLE_SORT [$$17(ASC)]  |PARTITIONED|
>                       exchange
>                       -- HASH_PARTITION_EXCHANGE [$$17]  |PARTITIONED|
>                         project ([$$lineitem, $$19, $$17])
>                         -- STREAM_PROJECT  |PARTITIONED|
>                           exchange
>                           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                             left outer join (eq($$18, $#1))
>                             -- HYBRID_HASH_JOIN [$$18][$#1]  |PARTITIONED|
>                               exchange
>                               -- HASH_PARTITION_EXCHANGE [$$18]  |PARTITIONED|
>                                 assign [$$18] <- [$$lineitem.getField(1)]
>                                 -- ASSIGN  |PARTITIONED|
>                                   exchange
>                                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                     data-scan []<-[$$17, $$lineitem] <- 
> Default.lineitem
>                                     -- DATASOURCE_SCAN  |PARTITIONED|
>                                       exchange
>                                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                         empty-tuple-source
>                                         -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>                               exchange
>                               -- HASH_PARTITION_EXCHANGE [$#1]  |PARTITIONED|
>                                 assign [$$19] <- [TRUE]
>                                 -- ASSIGN  |UNPARTITIONED|
>                                   unnest $#1 <- scan-collection(array: [ 1, 
> 2, 3 ])
>                                   -- UNNEST  |UNPARTITIONED|
>                                     empty-tuple-source
>                                     -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
> {code}
> While the following query
> {code}
> select *
> from lineitem
> where l_partkey = 1 OR l_partkey=2 OR l_partkey =3;
> {code}
> is translated into
> {code}
> distribute result [$$18]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$18])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$18] <- [{"lineitem": $$lineitem}]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$lineitem])
>         -- STREAM_PROJECT  |PARTITIONED|
>           exchange
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             join (eq($$21, $$19))
>             -- HYBRID_HASH_JOIN [$$19][$$21]  |PARTITIONED|
>               exchange
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 assign [$$19] <- [$$lineitem.getField(1)]
>                 -- ASSIGN  |PARTITIONED|
>                   project ([$$lineitem])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     exchange
>                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                       data-scan []<-[$$20, $$lineitem] <- Default.lineitem
>                       -- DATASOURCE_SCAN  |PARTITIONED|
>                         exchange
>                         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                           empty-tuple-source
>                           -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>               exchange
>               -- BROADCAST_EXCHANGE  |PARTITIONED|
>                 unnest $$21 <- scan-collection(array: [ 1, 2, 3 ])
>                 -- UNNEST  |UNPARTITIONED|
>                   empty-tuple-source
>                   -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to