Ah, yes! So this should be a bug then... Best, Yingyi
On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <[email protected]> wrote: > Actually, I’m still confused with the “cardinality” here. Isn’t the > cardinality of $ps is 5? > >> let $ps := ["b","a", "b","c","c”] > > > > On Nov 10, 2015, at 2:50 PM, Yingyi Bu <[email protected]> wrote: > > > > Jianfeng, > > > > The results of the query is correct. > > The cardinality of returned results should be the same as the number of > > input binding tuples for $p. > > > > Best, > > Yingyi > > > > > > On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <[email protected]> > > wrote: > > > >> Jianfeng Jia created ASTERIXDB-1168: > >> --------------------------------------- > >> > >> Summary: Should not sort&group after an OrderedList > left-join > >> with a dataset > >> Key: ASTERIXDB-1168 > >> URL: > https://issues.apache.org/jira/browse/ASTERIXDB-1168 > >> Project: Apache AsterixDB > >> Issue Type: Bug > >> Components: Optimizer > >> Reporter: Jianfeng Jia > >> > >> > >> Hi, > >> Here is the context for this issue, I wanted to lookup some records in > >> the DB through REST API, and I wanted to lookup in a batch way. Then I > >> packaged the "keys" into an OrderdList and expected a left-out join > would > >> give me all matching records that consistent with query order. However, > the > >> result was re-sorted and grouped, which confused the client side > response > >> handler. > >> > >> Here is the synthetic query that emulates the similar use case: > >> > --------------------------------------------------------------------------- > >> drop dataverse test if exists; > >> create dataverse test; > >> > >> use dataverse test; > >> > >> create type TType as closed { > >> id: int64, > >> content: string > >> } > >> > >> create dataset TData (TType) primary key id; > >> > >> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, > "content": > >> "b"}, {"id":3, "content":"c"}]) > >> > >> // now let's query on > >> let $ps := ["b","a", "b","c","c"] > >> > >> for $p in $ps > >> return { "p":$p, > >> "match": for $x in dataset TData where $x.content = $p return $x.id > >> } > >> > --------------------------------------------------------------------------- > >> > >> What I expected is following: > >> > --------------------------------------------------------------------------- > >> [ { "p": "b", "match": [ 2 ] } > >> , { "p": "a", "match": [ 1 ] } > >> , { "p": "b", "match": [ 2 ] } > >> , { "p": "c", "match": [ 3 ] } > >> , { "p": "c", "match": [ 3 ] } > >> ] > >> > --------------------------------------------------------------------------- > >> > >> The returned result is following, which is aggregated and re-sorted. > >> > --------------------------------------------------------------------------- > >> [ { "p": "a", "match": [ 1 ] } > >> , { "p": "b", "match": [ 2, 2 ] } > >> , { "p": "c", "match": [ 3, 3 ] } > >> ] > >> > --------------------------------------------------------------------------- > >> > >> The optimized logical plan is following: > >> > --------------------------------------------------------------------------- > >> distribute result [%0->$$4] > >> -- DISTRIBUTE_RESULT |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> project ([$$4]) > >> -- STREAM_PROJECT |PARTITIONED| > >> assign [$$4] <- [function-call: asterix:closed-record-constructor, > >> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]] > >> -- ASSIGN |PARTITIONED| > >> project ([$$1, $$9]) > >> -- STREAM_PROJECT |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) { > >> aggregate [$$9] <- [function-call: asterix:listify, > >> Args:[%0->$$10]] > >> -- AGGREGATE |LOCAL| > >> select (function-call: algebricks:not, > >> Args:[function-call: algebricks:is-null, Args:[%0->$$11]]) > >> -- STREAM_SELECT |LOCAL| > >> nested tuple source > >> -- NESTED_TUPLE_SOURCE |LOCAL| > >> } > >> -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> order (ASC, %0->$$12) (ASC, %0->$$13) > >> -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> project ([$$10, $$11, $$12, $$13]) > >> -- STREAM_PROJECT |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> left outer join (function-call: algebricks:eq, > >> Args:[%0->$$14, %0->$$13]) > >> -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED| > >> exchange > >> -- HASH_PARTITION_EXCHANGE [$$13] > |PARTITIONED| > >> unnest $$13 <- function-call: > >> asterix:scan-collection, Args:[%0->$$12] > >> -- UNNEST |UNPARTITIONED| > >> assign [$$12] <- [AOrderedList: [ AString: > >> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]] > >> -- ASSIGN |UNPARTITIONED| > >> empty-tuple-source > >> -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| > >> exchange > >> -- HASH_PARTITION_EXCHANGE [$$14] > |PARTITIONED| > >> project ([$$10, $$11, $$14]) > >> -- STREAM_PROJECT |PARTITIONED| > >> assign [$$11, $$14] <- [TRUE, > function-call: > >> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]] > >> -- ASSIGN |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> data-scan []<-[$$10, $$2] <- test:TData > >> -- DATASOURCE_SCAN |PARTITIONED| > >> exchange > >> -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > >> empty-tuple-source > >> -- EMPTY_TUPLE_SOURCE > >> > >> > --------------------------------------------------------------------------------- > >> > >> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out > >> join? > >> We can close this issue if this is an intended design. > >> > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.3.4#6332) > >> > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > >
