> On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java,
> > line 242
> > <https://reviews.apache.org/r/16172/diff/1/?file=396419#file396419line242>
> >
> > aliasToKnownSize can also contain tables which will not be used in the
> > next job. For example, we have a query like SELECT ... FROM a JOIN b ON
> > (a.key1=b.key1) JOIN c ON (a.key2=b.key). Let's also assume that "a" is the
> > big table. We can first use a Map only job to do a JOIN b. Then, we should
> > evaluate the size of table c and the result of a JOIN b. But, at here,
> > aliasToKnownSize also has the size of table a which will be counted in
> > sumOfOthers.
No. it's not. Below is the log messages.
[ConditionalResolverCommonJoin/resolveMapJoinTask] aliasToKnownSize : {b=11624,
c=11624, a=11624}
[ConditionalResolverCommonJoin/resolveMapJoinTask] aliases : [b, a]
[ConditionalResolverCommonJoin/resolveMapJoinTask] aliasToKnownSize : {b=11624,
c=11624, a=11624, $INTNAME=167608}
[ConditionalResolverCommonJoin/resolveMapJoinTask] aliases : [c, $INTNAME]
> On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java,
> > line 467
> > <https://reviews.apache.org/r/16172/diff/1/?file=396418#file396418line467>
> >
> > A question which is not very related to this issue. Have we documented
> > that we prefer the right most alias as the big table? I also see we have
> > such assumption in JoinOperator.
Preferring the right most alias is introduced in this patch first (it was
decided by iteration order of aliasToWork), changing result of auto_join25.q.
(This part of change is not related to this very issue but I thought it's too
confusing to understand)
> On Dec. 18, 2013, 1:47 a.m., Yin Huai wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java,
> > line 255
> > <https://reviews.apache.org/r/16172/diff/1/?file=396419#file396419line255>
> >
> > Let's change it to log the exception instead of printing the stack
> > trace.
ok.
- Navis
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16172/#review30594
-----------------------------------------------------------
On Dec. 11, 2013, 2:12 a.m., Navis Ryu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16172/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2013, 2:12 a.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-5945
> https://issues.apache.org/jira/browse/HIVE-5945
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Here is an example
> {code}
> select
> i_item_id,
> s_state,
> avg(ss_quantity) agg1,
> avg(ss_list_price) agg2,
> avg(ss_coupon_amt) agg3,
> avg(ss_sales_price) agg4
> FROM store_sales
> JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
> JOIN customer_demographics on (store_sales.ss_cdemo_sk =
> customer_demographics.cd_demo_sk)
> JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
> where
> cd_gender = 'F' and
> cd_marital_status = 'U' and
> cd_education_status = 'Primary' and
> d_year = 2002 and
> s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
> group by
> i_item_id,
> s_state
> order by
> i_item_id,
> s_state
> limit 100;
> {\code}
> I turned off noconditionaltask. So, I expected that there will be 4 Map-only
> jobs for this query. However, I got 1 Map-only job (joining strore_sales and
> date_dim) and 3 MR job (for reduce joins.)
>
> So, I checked the conditional task determining the plan of the join involving
> item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask,
> aliasToFileSizeMap contains all input tables used in this query and the
> intermediate table generated by joining store_sales and date_dim. So, when we
> sum the size of all small tables, the size of store_sales (which is around
> 45GB in my test) will be also counted.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 197a20f
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
> 2efa7c2
>
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java
> faf2f9b
>
> ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java
> 67203c9
> ql/src/test/results/clientpositive/auto_join25.q.out 7427239
>
> Diff: https://reviews.apache.org/r/16172/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Navis Ryu
>
>