[
https://issues.apache.org/jira/browse/ASTERIXDB-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060665#comment-15060665
]
Yingyi Bu edited comment on ASTERIXDB-1030 at 12/16/15 8:04 PM:
----------------------------------------------------------------
This the optimized query plan with my patch:
{noformat}
distribute result [%0->$$14]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$14])
-- STREAM_PROJECT |PARTITIONED|
assign [$$14] <- [function-call: asterix:closed-record-constructor,
Args:[AString: {id}, %0->$$16]]
-- ASSIGN |PARTITIONED|
project ([$$16])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
join (function-call: algebricks:eq, Args:[%0->$$19, %0->$$16])
-- HYBRID_HASH_JOIN [$$16][$$19] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$16] |PARTITIONED|
project ([$$16])
-- STREAM_PROJECT |PARTITIONED|
assign [$$16] <- [function-call:
asterix:field-access-by-index, Args:[%0->$$0, AInt32: {1}]]
-- ASSIGN |PARTITIONED|
project ([$$0])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$17, $$0] <-
emergencyTest:NearbySheltersDuringTornadoDangerChannelSubscriptions
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$19] |PARTITIONED|
project ([$$19])
-- STREAM_PROJECT |PARTITIONED|
select (function-call: algebricks:ge, Args:[function-call:
asterix:field-access-by-index, Args:[%0->$$2, AInt32: {4}], function-call:
asterix:numeric-subtract, Args:[function-call: asterix:current-datetime,
Args:[], org.apache.asterix.om.base.ADayTimeDuration@927c0]])
-- STREAM_SELECT |PARTITIONED|
assign [$$19] <- [function-call:
asterix:field-access-by-index, Args:[%0->$$2, AInt32: {5}]]
-- ASSIGN |PARTITIONED|
project ([$$2])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$18, $$2] <- function-call:
asterix:index-search, Args:[AString: {CHPReports}, AInt32: {0}, AString:
{emergencyTest}, AString: {CHPReports}, ABoolean: {false}, ABoolean: {false},
ABoolean: {false}, AInt32: {1}, %0->$$23, AInt32: {1}, %0->$$23, TRUE, TRUE,
TRUE]
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, %0->$$23)
-- STABLE_SORT [$$23(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$23])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$22, $$23] <- function-call:
asterix:index-search, Args:[AString: {times}, AInt32: {0}, AString:
{emergencyTest}, AString: {CHPReports}, ABoolean: {false}, ABoolean: {false},
ABoolean: {false}, AInt32: {1}, %0->$$21, AInt32: {0}, TRUE, TRUE, FALSE]
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$21] <- [function-call:
asterix:numeric-subtract, Args:[function-call: asterix:current-datetime,
Args:[], org.apache.asterix.om.base.ADayTimeDuration@927c0]]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
{noformat}
The plan seems fine to me because it leverages the secondary index on the
timestamp field of dataset CHPReports.
What're the downside of pushing constant through join? Is it a correctness
issue or performance issue?
was (Author: buyingyi):
This the optimized query plan with my patch:
distribute result [%0->$$14]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$14])
-- STREAM_PROJECT |PARTITIONED|
assign [$$14] <- [function-call: asterix:closed-record-constructor,
Args:[AString: {id}, %0->$$16]]
-- ASSIGN |PARTITIONED|
project ([$$16])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
join (function-call: algebricks:eq, Args:[%0->$$19, %0->$$16])
-- HYBRID_HASH_JOIN [$$16][$$19] |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$16] |PARTITIONED|
project ([$$16])
-- STREAM_PROJECT |PARTITIONED|
assign [$$16] <- [function-call:
asterix:field-access-by-index, Args:[%0->$$0, AInt32: {1}]]
-- ASSIGN |PARTITIONED|
project ([$$0])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$17, $$0] <-
emergencyTest:NearbySheltersDuringTornadoDangerChannelSubscriptions
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$19] |PARTITIONED|
project ([$$19])
-- STREAM_PROJECT |PARTITIONED|
select (function-call: algebricks:ge, Args:[function-call:
asterix:field-access-by-index, Args:[%0->$$2, AInt32: {4}], function-call:
asterix:numeric-subtract, Args:[function-call: asterix:current-datetime,
Args:[], org.apache.asterix.om.base.ADayTimeDuration@927c0]])
-- STREAM_SELECT |PARTITIONED|
assign [$$19] <- [function-call:
asterix:field-access-by-index, Args:[%0->$$2, AInt32: {5}]]
-- ASSIGN |PARTITIONED|
project ([$$2])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$18, $$2] <- function-call:
asterix:index-search, Args:[AString: {CHPReports}, AInt32: {0}, AString:
{emergencyTest}, AString: {CHPReports}, ABoolean: {false}, ABoolean: {false},
ABoolean: {false}, AInt32: {1}, %0->$$23, AInt32: {1}, %0->$$23, TRUE, TRUE,
TRUE]
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
order (ASC, %0->$$23)
-- STABLE_SORT [$$23(ASC)] |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$23])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$22, $$23] <- function-call:
asterix:index-search, Args:[AString: {times}, AInt32: {0}, AString:
{emergencyTest}, AString: {CHPReports}, ABoolean: {false}, ABoolean: {false},
ABoolean: {false}, AInt32: {1}, %0->$$21, AInt32: {0}, TRUE, TRUE, FALSE]
-- BTREE_SEARCH |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$21] <- [function-call:
asterix:numeric-subtract, Args:[function-call: asterix:current-datetime,
Args:[], org.apache.asterix.om.base.ADayTimeDuration@927c0]]
-- ASSIGN |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
The plan seems fine to me because it leverages the secondary index on the
timestamp field of dataset CHPReports.
What're the downside of pushing constant through join? Is it a correctness
issue or performance issue?
> Constant created between two data scans gets inserted into their join
> ---------------------------------------------------------------------
>
> Key: ASTERIXDB-1030
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1030
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB, Optimizer
> Reporter: Steven Jacobs
> Assignee: Yingyi Bu
> Priority: Minor
> Labels: FixedInBranch
>
> Constant created between two data scans gets inserted into their join
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)