Chen Luo created ASTERIXDB-2350: ----------------------------------- Summary: Improvement of IN subquery Key: ASTERIXDB-2350 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350 Project: Apache AsterixDB Issue Type: Improvement Components: COMP - Compiler Reporter: Chen Luo
By default, IN is translated into a hybrid hash join followed by a sort group by, which is expensive for simply in queries with constant values. For example {code} select * from lineitem where l_partkey in [1, 2 ,3]; {code} is translated into {code} distribute result [$$16] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$16]) -- STREAM_PROJECT |PARTITIONED| assign [$$16] <- [{"lineitem": $$lineitem}] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| select ($$14) -- STREAM_SELECT |PARTITIONED| project ([$$14, $$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$20 := $$17]) decor ([$$lineitem]) { aggregate [$$14] <- [non-empty-stream()] -- AGGREGATE |LOCAL| select (not(is-missing($$19))) -- STREAM_SELECT |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (ASC, $$17) -- STABLE_SORT [$$17(ASC)] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED| project ([$$lineitem, $$19, $$17]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| left outer join (eq($$18, $#1)) -- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED| assign [$$18] <- [$$lineitem.getField(1)] -- ASSIGN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$17, $$lineitem] <- Default.lineitem -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED| assign [$$19] <- [TRUE] -- ASSIGN |UNPARTITIONED| unnest $#1 <- scan-collection(array: [ 1, 2, 3 ]) -- UNNEST |UNPARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| {code} While the following query {code} select * from lineitem where l_partkey = 1 OR l_partkey=2 OR l_partkey =3; {code} is translated into {code} distribute result [$$18] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$18]) -- STREAM_PROJECT |PARTITIONED| assign [$$18] <- [{"lineitem": $$lineitem}] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$21, $$19)) -- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| assign [$$19] <- [$$lineitem.getField(1)] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$20, $$lineitem] <- Default.lineitem -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| unnest $$21 <- scan-collection(array: [ 1, 2, 3 ]) -- UNNEST |UNPARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)