[ https://issues.apache.org/jira/browse/ASTERIXDB-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Luo updated ASTERIXDB-2350: -------------------------------- Summary: Improve IN subquery (was: Improvement of IN subquery) > Improve IN subquery > ------------------- > > Key: ASTERIXDB-2350 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler > Reporter: Chen Luo > Priority: Minor > > By default, IN is translated into a hybrid hash join followed by a sort group > by, which is expensive for simply in queries with constant values. > For example > {code} > select * > from lineitem > where l_partkey in [1, 2 ,3]; > {code} > is translated into > {code} > distribute result [$$16] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$16]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$16] <- [{"lineitem": $$lineitem}] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > select ($$14) > -- STREAM_SELECT |PARTITIONED| > project ([$$14, $$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > group by ([$$20 := $$17]) decor ([$$lineitem]) { > aggregate [$$14] <- [non-empty-stream()] > -- AGGREGATE |LOCAL| > select (not(is-missing($$19))) > -- STREAM_SELECT |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > order (ASC, $$17) > -- STABLE_SORT [$$17(ASC)] |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED| > project ([$$lineitem, $$19, $$17]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > left outer join (eq($$18, $#1)) > -- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED| > assign [$$18] <- [$$lineitem.getField(1)] > -- ASSIGN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$17, $$lineitem] <- > Default.lineitem > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED| > assign [$$19] <- [TRUE] > -- ASSIGN |UNPARTITIONED| > unnest $#1 <- scan-collection(array: [ 1, > 2, 3 ]) > -- UNNEST |UNPARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| > {code} > While the following query > {code} > select * > from lineitem > where l_partkey = 1 OR l_partkey=2 OR l_partkey =3; > {code} > is translated into > {code} > distribute result [$$18] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$18]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$18] <- [{"lineitem": $$lineitem}] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > join (eq($$21, $$19)) > -- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$19] <- [$$lineitem.getField(1)] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$20, $$lineitem] <- Default.lineitem > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > exchange > -- BROADCAST_EXCHANGE |PARTITIONED| > unnest $$21 <- scan-collection(array: [ 1, 2, 3 ]) > -- UNNEST |UNPARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)