Re: [DISCUSS] Re: deprecating MR in the first release of Hive 2.0

2015-10-30 Thread Thejas Nair
the jira - https://issues.apache.org/jira/browse/HIVE-12300


On Mon, Oct 26, 2015 at 2:58 PM, Sergey Shelukhin
 wrote:
> There appear to be no objections, so I will start by filing a JIRA :)
>
> On 15/10/22, 14:38, "Thejas Nair"  wrote:
>
>>(Adding [DISCUSS] to subject to bring it to attention of wider audience.)
>>
>>+1 Given how much investment is going into Tez and Spark execution
>>modes, it makes sense to convey that better to the user community and
>>recommend the use of the new modes over MR. Users who choose those
>>modes are going to get better experience, and it will help to improve
>>the overall perception of Hive.
>>
>>Once most users have moved to the new modes, we can start looking into
>>removing MR support. (Though that is likely to take a while).
>>
>>
>>On Wed, Oct 21, 2015 at 9:44 PM, Sergey Shelukhin
>> wrote:
>>> We have discussed the removal of hadoop-1 and MR support in Hive 2 line
>>>in the past..
>>> Hadoop-1 removal seems to be non-controversial and on track; before we
>>>cut the first release of Hive 2, I propose we deprecate MR.
>>>
>>> Tez and Spark engines provide vast perf improvements over MR;
>>> Execution optimization work by most contributors for a long time has
>>>been done for these engines and is not portable to MR, so it is
>>>languishing further;
>>> At the same time, supporting additional code has other development
>>>costs for new features or bugs, plus we have to run tests for it both in
>>>Apache and for local changes and to deploy code.
>>>
>>> However, MR is hard to remove. Plus, it may provide a baseline for some
>>>bugs in other engines (which is not bulletproof since MR logic can be
>>>incorrect), or to mock during perf benchmarks.
>>>
>>> Therefore, I propose that for now we add deprecation warnings
>>>suggesting the other alternatives:
>>>
>>>   *   to Hive configuration documentation.
>>>   *   to Hive wiki.
>>>   *   to release notes on Hive 2.
>>>   *   in Beeline and CLI when using MR.
>>>
>>> Additionally, I propose we remove Minimr test driver from HiveQA runs
>>>for master.
>>>
>>> What do you think?
>>
>


Re: [DISCUSS] Re: deprecating MR in the first release of Hive 2.0

2015-10-26 Thread Sergey Shelukhin
There appear to be no objections, so I will start by filing a JIRA :)

On 15/10/22, 14:38, "Thejas Nair"  wrote:

>(Adding [DISCUSS] to subject to bring it to attention of wider audience.)
>
>+1 Given how much investment is going into Tez and Spark execution
>modes, it makes sense to convey that better to the user community and
>recommend the use of the new modes over MR. Users who choose those
>modes are going to get better experience, and it will help to improve
>the overall perception of Hive.
>
>Once most users have moved to the new modes, we can start looking into
>removing MR support. (Though that is likely to take a while).
>
>
>On Wed, Oct 21, 2015 at 9:44 PM, Sergey Shelukhin
> wrote:
>> We have discussed the removal of hadoop-1 and MR support in Hive 2 line
>>in the past..
>> Hadoop-1 removal seems to be non-controversial and on track; before we
>>cut the first release of Hive 2, I propose we deprecate MR.
>>
>> Tez and Spark engines provide vast perf improvements over MR;
>> Execution optimization work by most contributors for a long time has
>>been done for these engines and is not portable to MR, so it is
>>languishing further;
>> At the same time, supporting additional code has other development
>>costs for new features or bugs, plus we have to run tests for it both in
>>Apache and for local changes and to deploy code.
>>
>> However, MR is hard to remove. Plus, it may provide a baseline for some
>>bugs in other engines (which is not bulletproof since MR logic can be
>>incorrect), or to mock during perf benchmarks.
>>
>> Therefore, I propose that for now we add deprecation warnings
>>suggesting the other alternatives:
>>
>>   *   to Hive configuration documentation.
>>   *   to Hive wiki.
>>   *   to release notes on Hive 2.
>>   *   in Beeline and CLI when using MR.
>>
>> Additionally, I propose we remove Minimr test driver from HiveQA runs
>>for master.
>>
>> What do you think?
>



[DISCUSS] Re: deprecating MR in the first release of Hive 2.0

2015-10-22 Thread Thejas Nair
(Adding [DISCUSS] to subject to bring it to attention of wider audience.)

+1 Given how much investment is going into Tez and Spark execution
modes, it makes sense to convey that better to the user community and
recommend the use of the new modes over MR. Users who choose those
modes are going to get better experience, and it will help to improve
the overall perception of Hive.

Once most users have moved to the new modes, we can start looking into
removing MR support. (Though that is likely to take a while).


On Wed, Oct 21, 2015 at 9:44 PM, Sergey Shelukhin
 wrote:
> We have discussed the removal of hadoop-1 and MR support in Hive 2 line in 
> the past..
> Hadoop-1 removal seems to be non-controversial and on track; before we cut 
> the first release of Hive 2, I propose we deprecate MR.
>
> Tez and Spark engines provide vast perf improvements over MR;
> Execution optimization work by most contributors for a long time has been 
> done for these engines and is not portable to MR, so it is languishing 
> further;
> At the same time, supporting additional code has other development costs for 
> new features or bugs, plus we have to run tests for it both in Apache and for 
> local changes and to deploy code.
>
> However, MR is hard to remove. Plus, it may provide a baseline for some bugs 
> in other engines (which is not bulletproof since MR logic can be incorrect), 
> or to mock during perf benchmarks.
>
> Therefore, I propose that for now we add deprecation warnings suggesting the 
> other alternatives:
>
>   *   to Hive configuration documentation.
>   *   to Hive wiki.
>   *   to release notes on Hive 2.
>   *   in Beeline and CLI when using MR.
>
> Additionally, I propose we remove Minimr test driver from HiveQA runs for 
> master.
>
> What do you think?


deprecating MR in the first release of Hive 2.0

2015-10-21 Thread Sergey Shelukhin
We have discussed the removal of hadoop-1 and MR support in Hive 2 line in the 
past..
Hadoop-1 removal seems to be non-controversial and on track; before we cut the 
first release of Hive 2, I propose we deprecate MR.

Tez and Spark engines provide vast perf improvements over MR;
Execution optimization work by most contributors for a long time has been done 
for these engines and is not portable to MR, so it is languishing further;
At the same time, supporting additional code has other development costs for 
new features or bugs, plus we have to run tests for it both in Apache and for 
local changes and to deploy code.

However, MR is hard to remove. Plus, it may provide a baseline for some bugs in 
other engines (which is not bulletproof since MR logic can be incorrect), or to 
mock during perf benchmarks.

Therefore, I propose that for now we add deprecation warnings suggesting the 
other alternatives:

  *   to Hive configuration documentation.
  *   to Hive wiki.
  *   to release notes on Hive 2.
  *   in Beeline and CLI when using MR.

Additionally, I propose we remove Minimr test driver from HiveQA runs for 
master.

What do you think?