[
https://issues.apache.org/jira/browse/DRILL-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206642#comment-14206642
]
Jacques Nadeau edited comment on DRILL-1649 at 11/11/14 5:21 PM:
-----------------------------------------------------------------
Physical plan (with plan simplifications)
{code}
00-00 Screen
00-01 Project(uid=[$0], trans_id=[$1], EXPR$2=[$2])
00-02 Project(uid=[$2], trans_id=[$0], EXPR$2=[ITEM($3, 'evnt_id')])
00-03 HashJoin(condition=[=($1, $4)], joinType=[inner])
00-05 HashAgg(group=[{0}], max_event_time=[MAX($1)])
00-07 Project(trans_id=[ITEM($2, 'trans_id')], $f1=[ITEM($1,
'event_time')])
00-09 SelectionVectorRemover
00-11 Filter(condition=[>=(ITEM($2, 'trans_time'), ITEM($1,
'event_time'))])
00-13 Project(uid=[$1], event=[$3], transaction=[$4])
00-14 Flatten(flattenField=[$4])
00-15 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2],
EXPR$3=[$3], EXPR$4=[$2])
00-16 Flatten(flattenField=[$3])
00-17 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2],
EXPR$3=[$0])
00-18 Scan(groupscan=[EasyGroupScan
[selectionRoot=/flatten/single-user-transactions.json, numFiles=1, columns =
[`uid`, `events`, `transactions`]]])
00-04 Project(uid=[$0], event=[$1], $f2=[ITEM($1, 'event_time')])
00-06 Project(uid=[$1], event=[$2])
00-08 Flatten(flattenField=[$2])
00-10 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0])
00-12 Scan(groupscan=[EasyGroupScan
[selectionRoot=/flatten/single-user-transactions.json, numFiles=1, columns =
[`uid`, `events`]]])
{code}
was (Author: jnadeau):
Physical plan (with plan simplifications)
00-00 Screen
00-01 Project(uid=[$0], trans_id=[$1], EXPR$2=[$2])
00-02 Project(uid=[$2], trans_id=[$0], EXPR$2=[ITEM($3, 'evnt_id')])
00-03 HashJoin(condition=[=($1, $4)], joinType=[inner])
00-05 HashAgg(group=[{0}], max_event_time=[MAX($1)])
00-07 Project(trans_id=[ITEM($2, 'trans_id')], $f1=[ITEM($1,
'event_time')])
00-09 SelectionVectorRemover
00-11 Filter(condition=[>=(ITEM($2, 'trans_time'), ITEM($1,
'event_time'))])
00-13 Project(uid=[$1], event=[$3], transaction=[$4])
00-14 Flatten(flattenField=[$4])
00-15 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2],
EXPR$3=[$3], EXPR$4=[$2])
00-16 Flatten(flattenField=[$3])
00-17 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2],
EXPR$3=[$0])
00-18 Scan(groupscan=[EasyGroupScan
[selectionRoot=/flatten/single-user-transactions.json, numFiles=1, columns =
[`uid`, `events`, `transactions`]]])
00-04 Project(uid=[$0], event=[$1], $f2=[ITEM($1, 'event_time')])
00-06 Project(uid=[$1], event=[$2])
00-08 Flatten(flattenField=[$2])
00-10 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0])
00-12 Scan(groupscan=[EasyGroupScan
[selectionRoot=/flatten/single-user-transactions.json, numFiles=1, columns =
[`uid`, `events`]]])
> JSON : Joining 2 sub-queries (one of them uses flatten) fails with "Hash Join
> doe not support schema changes"
> --------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-1649
> URL: https://issues.apache.org/jira/browse/DRILL-1649
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill, Storage - JSON
> Reporter: Rahul Challapalli
> Attachments: error.log, single-user-transactions.json
>
>
> git.commit.id.abbrev=60aa446
> I am running this test against Jason's branch which has some fixes to a few
> flatten issues.
> The below query fails
> {code}
> select event_info.uid, transaction_info.trans_id, event_info.event.evnt_id
> from (
> select userinfo.transaction.trans_id trans_id,
> max(userinfo.event.event_time) max_event_time
> from (
> select uid, flatten(events) event, flatten(transactions) transaction
> from `json_kvgenflatten/single-user-transactions.json`
> ) userinfo
> where userinfo.transaction.trans_time >= userinfo.event.event_time
> group by userinfo.transaction.trans_id
> ) transaction_info
> inner join
> (
> select uid, flatten(events) event
> from `json_kvgenflatten/single-user-transactions.json`
> ) event_info
> on transaction_info.max_event_time = event_info.event.event_time;
> {code}
> The problem still persists even if I create views on top of each sub-query
> and the join them
> {code}
> create view v1 as
> select userinfo.transaction.trans_id trans_id, max(userinfo.event.event_time)
> max_event_time
> from (
> select uid, flatten(events) event, flatten(transactions) transaction
> from `json_kvgenflatten/single-user-transactions.json`
> ) userinfo
> where userinfo.transaction.trans_time >= userinfo.event.event_time
> group by userinfo.transaction.trans_id;
>
> create view v2 as select uid, flatten(events) event
> from `json_kvgenflatten/single-user-transactions.json`;
>
> select v2.uid, v1.trans_id, v2.event.evnt_id
> from v1 inner join v2
> on v1.max_event_time = v2.event.event_time;
> {code}
> However if I create 2 files with the exact data from the outputs of the 2
> sub-queries and try to join them the everything works fine.
> I attached the data, and the error log files. Let me know if you need anything
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)