Re: Is YSmart integrated into Hive on tez ?

2015-09-01 Thread Jeff Zhang
+ dev mail list

The original correlation optimization might be designed for mr engine. But
similar optimization could be applied for tez too.  Is there any existing
jira to track that ?



On Tue, Sep 1, 2015 at 1:58 PM, Jeff Zhang  wrote:

> Hi Pengcheng,
>
> Is there reason why the correlation optimization disabled in tez ?
>
> And even when I change the code to enable the correlation optimization in
> tez. I still get the same query plan.
>
> >>> Vertex dependency in root stage
> >>> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> >>> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>
> On Tue, Sep 1, 2015 at 1:14 AM, Pengcheng Xiong  wrote:
>
>> Hi Jeff,
>>
>>  From code base point of view,  YSmart is integrated into Hive on Tez
>> because it is one of the optimization of the current Hive. However, from
>> the execution point of view, it is now disabled when Hive is running on
>> Tez. You may take look at the source code of Hive
>>
>> Optimizer.java, L175-180:
>> {code}
>>
>> if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&
>>
>> !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW)
>> &&
>>
>> !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
>> HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&
>>
>> !isTezExecEngine) {
>>
>>   transformations.add(new CorrelationOptimizer());
>>
>> }
>> {code}
>>
>> Hope it helps.
>>
>> Best
>> Pengcheng Xiong
>>
>>
>> On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang  wrote:
>>
>>> The reason why I ask this question is that when I execute the following
>>> sql, it will generated a query plan with 4 vertices. But as my
>>> understanding if YSmart is integrated into hive, it should only take 3
>>> vertices since the join key and group by key are the same. Anybody know
>>> this ? Thanks
>>>
>>>
>>> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
>>> orderkey,count(1)  from lineitem l >> join orders o on
>>> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>>>
>>> *YSmart Hive Jira*
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2206
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang


Re: Is YSmart integrated into Hive on tez ?

2015-08-31 Thread Pengcheng Xiong
Hi Jeff,

 From code base point of view,  YSmart is integrated into Hive on Tez
because it is one of the optimization of the current Hive. However, from
the execution point of view, it is now disabled when Hive is running on
Tez. You may take look at the source code of Hive

Optimizer.java, L175-180:
{code}

if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&

!HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW) &&

!HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&

!isTezExecEngine) {

  transformations.add(new CorrelationOptimizer());

}
{code}

Hope it helps.

Best
Pengcheng Xiong


On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang  wrote:

> The reason why I ask this question is that when I execute the following
> sql, it will generated a query plan with 4 vertices. But as my
> understanding if YSmart is integrated into hive, it should only take 3
> vertices since the join key and group by key are the same. Anybody know
> this ? Thanks
>
>
> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
> orderkey,count(1)  from lineitem l >> join orders o on
> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>
> *YSmart Hive Jira*
>
> https://issues.apache.org/jira/browse/HIVE-2206
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>