[
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859114#comment-13859114
]
Yin Huai commented on HIVE-5945:
--------------------------------
Thanks Navis :) I played with your patch and found a issue which I commented at
the review board. I am also attaching more info at here. For the query in the
description, we can have 4 map-joins. There will be 3 different intermediate
tables called $INTNAME. The current patch does not update the size of $INTNAME.
Here are logs.
{code}
13/12/30 16:48:25 INFO ql.Driver: MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 12.76 sec HDFS Read: 388445624 HDFS Write:
20815654 SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 0: Map: 1 Cumulative CPU: 12.76 sec
HDFS Read: 388445624 HDFS Write: 20815654 SUCCESS
Job 1: Map: 1 Cumulative CPU: 9.18 sec HDFS Read: 20816111 HDFS Write:
28593993 SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 1: Map: 1 Cumulative CPU: 9.18 sec
HDFS Read: 20816111 HDFS Write: 28593993 SUCCESS
Job 2: Map: 1 Cumulative CPU: 17.38 sec HDFS Read: 80660331 HDFS Write:
378063 SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 2: Map: 1 Cumulative CPU: 17.38 sec
HDFS Read: 80660331 HDFS Write: 378063 SUCCESS
Job 3: Map: 1 Cumulative CPU: 2.06 sec HDFS Read: 378520 HDFS Write: 96
SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 3: Map: 1 Cumulative CPU: 2.06 sec
HDFS Read: 378520 HDFS Write: 96 SUCCESS
Job 4: Map: 1 Reduce: 1 Cumulative CPU: 2.45 sec HDFS Read: 553 HDFS
Write: 96 SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 4: Map: 1 Reduce: 1 Cumulative CPU:
2.45 sec HDFS Read: 553 HDFS Write: 96 SUCCESS
Job 5: Map: 1 Reduce: 1 Cumulative CPU: 2.33 sec HDFS Read: 553 HDFS
Write: 0 SUCCESS
13/12/30 16:48:25 INFO ql.Driver: Job 5: Map: 1 Reduce: 1 Cumulative CPU:
2.33 sec HDFS Read: 553 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 46 seconds 160 msec
{code}
{code}
Map-join1:
plan.ConditionalResolverCommonJoin: Driver alias is store_sales with size
388445409 (total size of others : 0, threshold : 25000000)
Stage-28 is selected by condition resolver.
Map-join2:
plan.ConditionalResolverCommonJoin: Driver alias is $INTNAME with size 20815654
(total size of others : 5051899, threshold : 25000000)
Stage-26 is selected by condition resolver.
Map-join3:
plan.ConditionalResolverCommonJoin: Driver alias is customer_demographics with
size 80660096 (total size of others : 20815654, threshold : 25000000)
Stage-24 is filtered out by condition resolver.
Map-join4:
plan.ConditionalResolverCommonJoin: Driver alias is $INTNAME with size 20815654
(total size of others : 3155, threshold : 25000000)
Stage-22 is selected by condition resolver.
{code}
btw, a minor question. Why the log of map-join 1 shows the size of others 0?
> ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those
> tables which are not used in the child of this conditional task.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-5945
> URL: https://issues.apache.org/jira/browse/HIVE-5945
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
> Reporter: Yin Huai
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt,
> HIVE-5945.3.patch.txt, HIVE-5945.4.patch.txt, HIVE-5945.5.patch.txt
>
>
> Here is an example
> {code}
> select
> i_item_id,
> s_state,
> avg(ss_quantity) agg1,
> avg(ss_list_price) agg2,
> avg(ss_coupon_amt) agg3,
> avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk =
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
> cd_gender = 'F' and
> cd_marital_status = 'U' and
> cd_education_status = 'Primary' and
> d_year = 2002 and
> s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
> i_item_id,
> s_state
> order by
> i_item_id,
> s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and
> date_dim) and 3 MR job (for reduce joins.)
> So, I checked the conditional task determining the plan of the join involving
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask,
> aliasToFileSizeMap contains all input tables used in this query and the
> intermediate table generated by joining store_sales and date_dim. So, when we
> sum the size of all small tables, the size of store_sales (which is around
> 45GB in my test) will be also counted.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)