[
https://issues.apache.org/jira/browse/ASTERIXDB-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261268#comment-15261268
]
Yingyi Bu commented on ASTERIXDB-1418:
--------------------------------------
This seems to be a duplicate of ASTERIXDB-967, if using group-by instead of
distinct by can work.
> Doesn't support a Nested Aggregation Query
> ------------------------------------------
>
> Key: ASTERIXDB-1418
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1418
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB, Optimizer
> Reporter: Jianfeng Jia
> Assignee: Yingyi Bu
>
> When I ran the following query
> {code}
> use dataverse twitter
> for $t in dataset ds_tweet_trump
> group by
> $county := $t.geo_tag.countyID,
> $timebin := interval-bin($t.create_at, date("2012-01-01"),
> day-time-duration("P1D")) with $t
> return {
> "county": $county,
> "time": $timebin,
> "count": count($t),
> "users": count( for $tt in $t distinct by $tt.user.id return $tt.user.id)
> }
> {code}
> One exception appears:
> {code}
> Attempting to construct a nested plan with 3 operator descriptors. Currently,
> nested plans can only consist in linear pipelines of Asterix micro operators.
> [AlgebricksException]
> {code}
> The ddl :
> {code}
> create dataverse twitter if not exists;
> use dataverse twitter
> create type typeUser if not exists as open {
> id: int64,
> name: string,
> screen_name : string,
> lang : string,
> location: string,
> create_at: date,
> description: string,
> followers_count: int32,
> friends_count: int32,
> statues_count: int64
> }
> create type typePlace if not exists as open{
> country : string,
> country_code : string,
> full_name : string,
> id : string,
> name : string,
> place_type : string,
> bounding_box : rectangle
> }
> create type typeGeoTag if not exists as open {
> stateID: int32,
> stateName: string,
> countyID: int32,
> countyName: string,
> cityID: int32?,
> cityName: string?
> }
> create type typeTweet if not exists as open{
> create_at : datetime,
> id: int64,
> "text": string,
> in_reply_to_status : int64,
> in_reply_to_user : int64,
> favorite_count : int64,
> coordinate: point?,
> retweet_count : int64,
> lang : string,
> is_retweet: boolean,
> hashtags : {{ string }} ?,
> user_mentions : {{ int64 }} ? ,
> user : typeUser,
> place : typePlace?,
> geo_tag: typeGeoTag
> }
> create dataset ds_tweet(typeTweet) if not exists primary key id;
> //with filter on create_at;
> {code}
> The logical plan is generated successfully:
> {code}
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> project ([$$13])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$13] <- [function-call: asterix:closed-record-constructor,
> Args:[AString: {county}, %0->$$1, AString: {time}, %0->$$2, AString: {count},
> %0->$$25, AString: {users}, %0->$$26]]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> group by ([$$1 := %0->$$32; $$2 := %0->$$33]) decor ([]) {
> aggregate [$$25] <- [function-call: asterix:agg-sum,
> Args:[%0->$$30]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> {
> aggregate [$$26] <- [function-call: asterix:agg-sum,
> Args:[%0->$$31]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$32, $$33] |PARTITIONED|
> exchange
> -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$32(ASC), $$33(ASC)]
> HASH:[$$32, $$33] |PARTITIONED|
> group by ([$$32 := %0->$$21; $$33 := %0->$$22]) decor ([]) {
> aggregate [$$30] <- [function-call:
> asterix:agg-count, Args:[%0->$$3]]
> -- AGGREGATE |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> {
> aggregate [$$31] <- [function-call:
> asterix:agg-count, Args:[%0->$$23]]
> -- AGGREGATE |LOCAL|
> exchange
> -- ONE_TO_ONE_EXCHANGE |LOCAL|
> distinct ([%0->$$23])
> -- PRE_SORTED_DISTINCT_BY |LOCAL|
> exchange
> -- ONE_TO_ONE_EXCHANGE |LOCAL|
> order (ASC, %0->$$23)
> -- IN_MEMORY_STABLE_SORT [$$23(ASC)] |LOCAL|
> nested tuple source
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> -- PRE_CLUSTERED_GROUP_BY[$$21, $$22] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> order (ASC, %0->$$21) (ASC, %0->$$22)
> -- STABLE_SORT [$$21(ASC), $$22(ASC)] |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$22, $$21, $$23] <- [function-call:
> asterix:interval-bin, Args:[function-call: asterix:field-access-by-index,
> Args:[%0->$$3, AInt32: {0}], ADate: { 2012-01-01 },
> org.apache.asterix.om.base.ADayTimeDuration@5265c00], function-call:
> asterix:field-access-by-index, Args:[function-call:
> asterix:field-access-by-index, Args:[%0->$$3, AInt32: {14}], AInt32: {2}],
> function-call: asterix:field-access-by-index, Args:[function-call:
> asterix:field-access-by-index, Args:[%0->$$3, AInt32: {12}], AInt32: {0}]]
> -- ASSIGN |PARTITIONED|
> project ([$$3])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$24, $$3] <- twitter:ds_tweet
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)