No problem. Let me re-open it. > On Nov 10, 2015, at 3:20 PM, Yingyi Bu <[email protected]> wrote: > > Ah, yes! > So this should be a bug then... > > Best, > Yingyi > > > On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <[email protected]> > wrote: > >> Actually, I’m still confused with the “cardinality” here. Isn’t the >> cardinality of $ps is 5? >>>> let $ps := ["b","a", "b","c","c”] >> >> >>> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <[email protected]> wrote: >>> >>> Jianfeng, >>> >>> The results of the query is correct. >>> The cardinality of returned results should be the same as the number of >>> input binding tuples for $p. >>> >>> Best, >>> Yingyi >>> >>> >>> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <[email protected]> >>> wrote: >>> >>>> Jianfeng Jia created ASTERIXDB-1168: >>>> --------------------------------------- >>>> >>>> Summary: Should not sort&group after an OrderedList >> left-join >>>> with a dataset >>>> Key: ASTERIXDB-1168 >>>> URL: >> https://issues.apache.org/jira/browse/ASTERIXDB-1168 >>>> Project: Apache AsterixDB >>>> Issue Type: Bug >>>> Components: Optimizer >>>> Reporter: Jianfeng Jia >>>> >>>> >>>> Hi, >>>> Here is the context for this issue, I wanted to lookup some records in >>>> the DB through REST API, and I wanted to lookup in a batch way. Then I >>>> packaged the "keys" into an OrderdList and expected a left-out join >> would >>>> give me all matching records that consistent with query order. However, >> the >>>> result was re-sorted and grouped, which confused the client side >> response >>>> handler. >>>> >>>> Here is the synthetic query that emulates the similar use case: >>>> >> --------------------------------------------------------------------------- >>>> drop dataverse test if exists; >>>> create dataverse test; >>>> >>>> use dataverse test; >>>> >>>> create type TType as closed { >>>> id: int64, >>>> content: string >>>> } >>>> >>>> create dataset TData (TType) primary key id; >>>> >>>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, >> "content": >>>> "b"}, {"id":3, "content":"c"}]) >>>> >>>> // now let's query on >>>> let $ps := ["b","a", "b","c","c"] >>>> >>>> for $p in $ps >>>> return { "p":$p, >>>> "match": for $x in dataset TData where $x.content = $p return $x.id >>>> } >>>> >> --------------------------------------------------------------------------- >>>> >>>> What I expected is following: >>>> >> --------------------------------------------------------------------------- >>>> [ { "p": "b", "match": [ 2 ] } >>>> , { "p": "a", "match": [ 1 ] } >>>> , { "p": "b", "match": [ 2 ] } >>>> , { "p": "c", "match": [ 3 ] } >>>> , { "p": "c", "match": [ 3 ] } >>>> ] >>>> >> --------------------------------------------------------------------------- >>>> >>>> The returned result is following, which is aggregated and re-sorted. >>>> >> --------------------------------------------------------------------------- >>>> [ { "p": "a", "match": [ 1 ] } >>>> , { "p": "b", "match": [ 2, 2 ] } >>>> , { "p": "c", "match": [ 3, 3 ] } >>>> ] >>>> >> --------------------------------------------------------------------------- >>>> >>>> The optimized logical plan is following: >>>> >> --------------------------------------------------------------------------- >>>> distribute result [%0->$$4] >>>> -- DISTRIBUTE_RESULT |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> project ([$$4]) >>>> -- STREAM_PROJECT |PARTITIONED| >>>> assign [$$4] <- [function-call: asterix:closed-record-constructor, >>>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]] >>>> -- ASSIGN |PARTITIONED| >>>> project ([$$1, $$9]) >>>> -- STREAM_PROJECT |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) { >>>> aggregate [$$9] <- [function-call: asterix:listify, >>>> Args:[%0->$$10]] >>>> -- AGGREGATE |LOCAL| >>>> select (function-call: algebricks:not, >>>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]]) >>>> -- STREAM_SELECT |LOCAL| >>>> nested tuple source >>>> -- NESTED_TUPLE_SOURCE |LOCAL| >>>> } >>>> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> order (ASC, %0->$$12) (ASC, %0->$$13) >>>> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> project ([$$10, $$11, $$12, $$13]) >>>> -- STREAM_PROJECT |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> left outer join (function-call: algebricks:eq, >>>> Args:[%0->$$14, %0->$$13]) >>>> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED| >>>> exchange >>>> -- HASH_PARTITION_EXCHANGE [$$13] >> |PARTITIONED| >>>> unnest $$13 <- function-call: >>>> asterix:scan-collection, Args:[%0->$$12] >>>> -- UNNEST |UNPARTITIONED| >>>> assign [$$12] <- [AOrderedList: [ AString: >>>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]] >>>> -- ASSIGN |UNPARTITIONED| >>>> empty-tuple-source >>>> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| >>>> exchange >>>> -- HASH_PARTITION_EXCHANGE [$$14] >> |PARTITIONED| >>>> project ([$$10, $$11, $$14]) >>>> -- STREAM_PROJECT |PARTITIONED| >>>> assign [$$11, $$14] <- [TRUE, >> function-call: >>>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]] >>>> -- ASSIGN |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> data-scan []<-[$$10, $$2] <- test:TData >>>> -- DATASOURCE_SCAN |PARTITIONED| >>>> exchange >>>> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| >>>> empty-tuple-source >>>> -- EMPTY_TUPLE_SOURCE >>>> >>>> >> --------------------------------------------------------------------------------- >>>> >>>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out >>>> join? >>>> We can close this issue if this is an intended design. >>>> >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v6.3.4#6332) >>>> >> >> >> >> Best, >> >> Jianfeng Jia >> PhD Candidate of Computer Science >> University of California, Irvine >> >>
Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
