[jira] [Commented] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967219#comment-15967219
 ] 

zhihai xu commented on HIVE-16433:
--

Test 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 succeeded in my local build.
All these test failures are not related to my patch.

> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the rj is still accessed after shutdown is called. 
> For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  rj is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
> once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16430) Add log to show the cancelled query id when cancelOperation is called.

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967225#comment-15967225
 ] 

Hive QA commented on HIVE-16430:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863205/HIVE-16430.001.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_join1] 
(batchId=5)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4672/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4672/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4672/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863205 - PreCommit-HIVE-Build

> Add log to show the cancelled query id when cancelOperation is called.
> --
>
> Key: HIVE-16430
> URL: https://issues.apache.org/jira/browse/HIVE-16430
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: HIVE-16430.000.patch, HIVE-16430.001.patch
>
>
> Add log to show the cancelled query id when cancelOperation is called.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16418:
--
Attachment: HIVE-16418.1.patch

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16418:
--
Status: Patch Available  (was: Open)

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967305#comment-15967305
 ] 

Hive QA commented on HIVE-16418:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863216/HIVE-16418.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10573 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=109)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4673/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4673/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4673/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863216 - PreCommit-HIVE-Build

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967307#comment-15967307
 ] 

Gopal V commented on HIVE-16418:


Please explain the behaviour of TimestampTZ when it comes to equality across 
all operators - is the same UTC timestamp from different timezones equal or not?

If I have two rows with 

TimestampTZ('2005-04-03 10:01:00','Asia/Shanghai')
TimestampTZ('2005-04-03 10:01:00 GMT+08:00')

What is the output of count(distinct tzcol)?

What is the output of select distinct tzcol from table?

What will be the output of select max(tzcol), min(tzcol) from table?

Ordering fixes need to come after working out equality conditions when it comes 
to group-by, distinct, joins and over() partitions (some of those are driven by 
hash equality as well).

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967307#comment-15967307
 ] 

Gopal V edited comment on HIVE-16418 at 4/13/17 8:57 AM:
-

Please explain the behaviour of TimestampTZ when it comes to equality across 
all operators - is the same UTC timestamp from different timezones equal or not?

If I have two rows with 

TimestampTZ('2005-04-03 10:01:00','Asia/Shanghai')
TimestampTZ('2005-04-03 10:01:00 GMT+08:00')

What is the output of count(distinct tzcol)?

What is the output of select distinct tzcol from table?

What will be the output of select max(tzcol), min(tzcol) from table?

Ordering fixes need to come after working out equality conditions when it comes 
to group-by, distinct, joins and over() partitions (some of those are driven by 
hash equality as well).

This feature has to be -1, if it does not actually handle all these logical 
operations consistently in all execution engines.


was (Author: gopalv):
Please explain the behaviour of TimestampTZ when it comes to equality across 
all operators - is the same UTC timestamp from different timezones equal or not?

If I have two rows with 

TimestampTZ('2005-04-03 10:01:00','Asia/Shanghai')
TimestampTZ('2005-04-03 10:01:00 GMT+08:00')

What is the output of count(distinct tzcol)?

What is the output of select distinct tzcol from table?

What will be the output of select max(tzcol), min(tzcol) from table?

Ordering fixes need to come after working out equality conditions when it comes 
to group-by, distinct, joins and over() partitions (some of those are driven by 
hash equality as well).

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16436:

Attachment: HIVE-16436.1.patch

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-16436:
---

Assignee: Rajesh Balamohan

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967338#comment-15967338
 ] 

Rajesh Balamohan commented on HIVE-16436:
-

{{perfLogger.PerfLogEnd}} was getting invoked for every report causing this 
issue.

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16436:

Status: Patch Available  (was: Open)

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967338#comment-15967338
 ] 

Rajesh Balamohan edited comment on HIVE-16436 at 4/13/17 9:28 AM:
--

{{perfLogger.PerfLogEnd}} was getting invoked for every report causing this 
issue.

\cc [~anishek], [~gopalv]


was (Author: rajesh.balamohan):
{{perfLogger.PerfLogEnd}} was getting invoked for every report causing this 
issue.

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967399#comment-15967399
 ] 

Gopal V commented on HIVE-16436:


Alright,  completed == total till the end of the query, causing the perf logger 
to get reset the endTime.

+1 

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967416#comment-15967416
 ] 

Hive QA commented on HIVE-16436:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863230/HIVE-16436.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteDecimalX 
(batchId=178)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate (batchId=178)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4674/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4674/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4674/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863230 - PreCommit-HIVE-Build

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967464#comment-15967464
 ] 

Rui Li commented on HIVE-16418:
---

[~gopalv] - thanks for the review.
My plan is to only allow GMT timezone format, which means '2005-04-03 10:01:00 
Asia/Shanghai' will be converted to '2005-04-03 10:01:00 GMT+08:00' internally. 
Per Jason's 
[comment|https://issues.apache.org/jira/browse/HIVE-14412?focusedCommentId=15527345&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15527345],
 the timezone part shouldn't be used for comparison. Therefore, '2005-04-03 
10:01:00 GMT+08:00' == '2005-04-03 02:01:00 GMT'. And if you run a 
count(distinct) on these two timestamps, the result should be 1.

I agree this may cause some confusion in queries with distinct/goupBy like you 
mentioned. [~jdere], [~xuefuz] could you please share how this should be 
handled according to the SQL standard?

This patch could have been included in HIVE-14412. But I'd like to get some 
early feedbacks and suggestions. The basic idea is to store all the 
non-comparable bytes at the beginning of HiveKey. A boolean is added to HiveKey 
to indicate whether such bytes exist. And these bytes will be skipped 
accordingly in comparison. In serialized format, the boolean will be encoded 
using the MSB of the length part. Does this make sense?

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967569#comment-15967569
 ] 

Gopal V commented on HIVE-16418:


bq. The basic idea is to store all the non-comparable bytes at the beginning of 
HiveKey.

That's the part that confuses me, what are in these bytes?

>From the standard perspective, I usually use Postgres as a "good 
>interpretation" - Postgres *does* not store the TZ information provided by the 
>user after parsing, so it never has to handle 2 different serialized 
>timestamps which are equivalent (but not equal as serialized bytes).

{code}
8.5.1.3. Time Stamps

For timestamp with time zone, the internally stored value is always in UTC 
(Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). 
An input value that has an explicit time zone specified is converted to UTC 
using the appropriate offset for that time zone. If no time zone is stated in 
the input string, then it is assumed to be in the time zone indicated by the 
system's timezone parameter, and is converted to UTC using the offset for the 
timezone zone.
...
When a timestamp with time zone value is output, it is always converted from 
UTC to the current timezone zone, and displayed as local time in that zone.
{code}

if Hive follows a similar interpretation of the standard, the BinarySortable 
and LazyBinary representations of the Timestamp_with_timezone are identical to 
the Timestamp implementation that already exists.

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16316) Prepare master branch for 3.0.0 development.

2017-04-13 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967743#comment-15967743
 ] 

Naveen Gangam commented on HIVE-16316:
--

[~wzheng] Thanks for the review. I have committed the fix to master. 

> Prepare master branch for 3.0.0 development.
> 
>
> Key: HIVE-16316
> URL: https://issues.apache.org/jira/browse/HIVE-16316
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 3.0.0
>
> Attachments: HIVE-16316-addendum.patch, HIVE-16316.patch
>
>
> master branch is now being used for 3.0.0 development. The build files will 
> need to reflect this change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16316) Prepare master branch for 3.0.0 development.

2017-04-13 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966277#comment-15966277
 ] 

Naveen Gangam edited comment on HIVE-16316 at 4/13/17 3:31 PM:
---

Add-on patch to change the upgrade path from 2.2 --> 3.0 to 2.2 --> 2.3 --> 3.0.
[~pxiong] [~aihuaxu] Could you please review? Thanks


was (Author: ngangam):
Add-on patch to change the upgrade patch from 2.2 --> 3.0 to 2.2 --> 2.3 --> 
3.0.
[~pxiong] [~aihuaxu] Could you please review? Thanks

> Prepare master branch for 3.0.0 development.
> 
>
> Key: HIVE-16316
> URL: https://issues.apache.org/jira/browse/HIVE-16316
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 3.0.0
>
> Attachments: HIVE-16316-addendum.patch, HIVE-16316.patch
>
>
> master branch is now being used for 3.0.0 development. The build files will 
> need to reflect this change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16430) Add log to show the cancelled query id when cancelOperation is called.

2017-04-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967757#comment-15967757
 ] 

zhihai xu commented on HIVE-16430:
--

Test org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_join1] 
succeeded in my local build.
All these test failures are not related to my patch.

> Add log to show the cancelled query id when cancelOperation is called.
> --
>
> Key: HIVE-16430
> URL: https://issues.apache.org/jira/browse/HIVE-16430
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: HIVE-16430.000.patch, HIVE-16430.001.patch
>
>
> Add log to show the cancelled query id when cancelOperation is called.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15571) Support Insert into for druid storage handler

2017-04-13 Thread Nishant Bangarwa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa reassigned HIVE-15571:
---

Assignee: Nishant Bangarwa  (was: slim bouguerra)

> Support Insert into for druid storage handler
> -
>
> Key: HIVE-15571
> URL: https://issues.apache.org/jira/browse/HIVE-15571
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>
> Add support of inset into operator for druid storage handler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-16433:
-
Description: 
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the rj is still accessed after shutdown is called. 
For example: if the following code is executed right after ExecDriver.shutdown 
is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  {{rj}} is mainly to make sure 
{{rj.killJob()}} is only called once.
I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
called once.

  was:
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the rj is still accessed after shutdown is called. 
For example: if the following code is executed right after ExecDriver.shutdown 
is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  {{rj}} is mainly to make sure 
{{rj.killJob()}} is only called once.
I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
once.


> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the rj is still accessed after shutdown is called. 
> For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-16433:
-
Description: 
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the rj is still accessed after shutdown is called. 
For example: if the following code is executed right after ExecDriver.shutdown 
is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  {{rj}} is mainly to make sure 
{{rj.killJob()}} is only called once.
I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
once.

  was:
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the rj is still accessed after shutdown is called. 
For example: if the following code is executed right after ExecDriver.shutdown 
is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  rj is mainly to make sure {{rj.killJob()}} 
is only called once.
I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
once.


> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the rj is still accessed after shutdown is called. 
> For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag jobKilled to make sure {{rj.killJob()}} will be only called 
> once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-16433:
-
Description: 
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the {{rj}} is still accessed after shutdown is 
called. For example: if the following code is executed right after 
ExecDriver.shutdown is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  {{rj}} is mainly to make sure 
{{rj.killJob()}} is only called once.
I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
called once.

  was:
Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
other thread for query cancellation. It can happen at any time. There is a 
potential race condition,  the rj is still accessed after shutdown is called. 
For example: if the following code is executed right after ExecDriver.shutdown 
is called.
{code}
  this.jobID = rj.getJobID();
  updateStatusInQueryDisplay();
  returnVal = jobExecHelper.progress(rj, jc, ctx);
{code}
Currently the purpose of nullifying  {{rj}} is mainly to make sure 
{{rj.killJob()}} is only called once.
I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
called once.


> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the {{rj}} is still accessed after shutdown is 
> called. For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967932#comment-15967932
 ] 

Jimmy Xiang commented on HIVE-16429:


Please add a new function which takes two more parameters so that the change is 
less. 

> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.
> 
>
> Key: HIVE-16429
> URL: https://issues.apache.org/jira/browse/HIVE-16429
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16429.000.patch
>
>
> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16422) Should kill running Spark Jobs when a query is cancelled.

2017-04-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16422:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~zxu] for the patch.

> Should kill running Spark Jobs when a query is cancelled.
> -
>
> Key: HIVE-16422
> URL: https://issues.apache.org/jira/browse/HIVE-16422
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 3.0.0
>
> Attachments: HIVE-16422.000.txt
>
>
> Should kill running Spark Jobs when a query is cancelled. When a query is 
> cancelled, Driver.releaseDriverContext will be called by Driver.close. 
> releaseDriverContext will call DriverContext.shutdown which will call all the 
> running tasks' shutdown.
> {code}
>   public synchronized void shutdown() {
> LOG.debug("Shutting down query " + ctx.getCmd());
> shutdown = true;
> for (TaskRunner runner : running) {
>   if (runner.isRunning()) {
> Task task = runner.getTask();
> LOG.warn("Shutting down task : " + task);
> try {
>   task.shutdown();
> } catch (Exception e) {
>   console.printError("Exception on shutting down task " + 
> task.getId() + ": " + e);
> }
> Thread thread = runner.getRunner();
> if (thread != null) {
>   thread.interrupt();
> }
>   }
> }
> running.clear();
>   }
> {code}
> since SparkTask didn't implement shutdown method to kill the running spark 
> job, the spark job may be still running after the query is cancelled. So it 
> will be good to kill the spark job in SparkTask.shutdown to save cluster 
> resource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967940#comment-15967940
 ] 

Jimmy Xiang commented on HIVE-16433:


Can you attach a stacktrace to show the NPE problem?

> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the {{rj}} is still accessed after shutdown is 
> called. For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967948#comment-15967948
 ] 

Jimmy Xiang commented on HIVE-16433:


By the way, is the new method isTaskShutdown used any place?

> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the {{rj}} is still accessed after shutdown is 
> called. For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16193) Hive show compactions not reflecting the status of the application

2017-04-13 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967966#comment-15967966
 ] 

Wei Zheng commented on HIVE-16193:
--

Committed addendum patch to same branches.

> Hive show compactions not reflecting the status of the application
> --
>
> Key: HIVE-16193
> URL: https://issues.apache.org/jira/browse/HIVE-16193
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Kavan Suresh
>Assignee: Wei Zheng
> Fix For: 2.2.0, 2.3.0, 3.0.0
>
> Attachments: HIVE-16193.1.patch, HIVE-16193.2.patch, 
> HIVE-16193.addendum.patch
>
>
> In a test for [HIVE-13354|https://issues.apache.org/jira/browse/HIVE-13354], 
> we set properties to make the compaction fail. Recently show compactions 
> indicates that compactions have been succeeding on the tables though the 
> corresponding application gets killed as expected. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16436:


Assignee: Prasanth Jayachandran  (was: Rajesh Balamohan)

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16436.1.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16436:


Assignee: Rajesh Balamohan  (was: Prasanth Jayachandran)

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch, HIVE-16436.2.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16436:
-
Attachment: HIVE-16436.2.patch

minor change added to .2. Using endTimeHasMethod to be consistent with 
startTimeHasMethod.

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16436.1.patch, HIVE-16436.2.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967975#comment-15967975
 ] 

Prasanth Jayachandran commented on HIVE-16436:
--

To make sure if .2 works, [~rajesh.balamohan] did you see the end time to be 
negative as well? Made a minor change in .2 to make it consistent. If that does 
not work, I will commit .1

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch, HIVE-16436.2.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15708) Upgrade calcite version to 1.12

2017-04-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-15708:

Attachment: HIVE-15708.23.patch

.23.patch addresses latest code review feedback

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch, HIVE-15708.16.patch, 
> HIVE-15708.17.patch, HIVE-15708.18.patch, HIVE-15708.19.patch, 
> HIVe-15708.20.patch, HIVE-15708.21.patch, HIVE-15708.22.patch, 
> HIVE-15708.23.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-04-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968009#comment-15968009
 ] 

Carl Steinbach commented on HIVE-15229:
---

+1

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch, HIVE-15229.4.patch, HIVE-15229.5.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-13 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-16429:
-
Attachment: HIVE-16429.001.patch

> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.
> 
>
> Key: HIVE-16429
> URL: https://issues.apache.org/jira/browse/HIVE-16429
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16429.000.patch, HIVE-16429.001.patch
>
>
> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-04-13 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968019#comment-15968019
 ] 

Carl Steinbach commented on HIVE-15229:
---

[~simanchal], the change to HiveParser no longer applies cleanly. Can you 
please refresh the patch? Thanks.

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch, HIVE-15229.4.patch, HIVE-15229.5.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16438) Support inclusive sort order on ReduceDink deduplication

2017-04-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16438:
---


> Support inclusive sort order on ReduceDink deduplication
> 
>
> Key: HIVE-16438
> URL: https://issues.apache.org/jira/browse/HIVE-16438
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> HIVE-11132, HIVE-12664 and HIVE-15708 hit issues around the 
> ReduceSinkDeDuplicate logic. The patch for HIVE-15708 forces the sort order 
> of all reduced operators to be the same, disallowing for allowance of less 
> restrictive sort in parent RS operators that are covered by the more 
> restrictive child RS operator. Logically this condition should be allowed, 
> ie. child operator sorts by {{(A, B, C)}} and parent sorts by {{(A,B)}} it 
> should be merged. This is the JIRA to track that work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968024#comment-15968024
 ] 

zhihai xu commented on HIVE-16429:
--

Thanks for the review [~jxiang]! Good suggestion! It will make the code more 
readable. I attached a new patch HIVE-16429.001.patch, which create a new 
function handleInterruptionWithHook, please review it.

> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.
> 
>
> Key: HIVE-16429
> URL: https://issues.apache.org/jira/browse/HIVE-16429
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16429.000.patch, HIVE-16429.001.patch
>
>
> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968046#comment-15968046
 ] 

zhihai xu commented on HIVE-16433:
--

Thanks for the review [~jxiang]! Currently I didn't have the stack trace, but 
it is a potential race condition since user can cancel the query at any time, 
ExecDriver.shutdown may be called at any time to set variable {{rj}} to null. 
Yes, I plan to use isTaskShutdown function from hook to find out whether 
ExecDriver is shutdown by user, so we can better monitor the query from outside.

> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the {{rj}} is still accessed after shutdown is 
> called. For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.12

2017-04-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968053#comment-15968053
 ] 

Ashutosh Chauhan commented on HIVE-15708:
-

+1

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch, HIVE-15708.16.patch, 
> HIVE-15708.17.patch, HIVE-15708.18.patch, HIVE-15708.19.patch, 
> HIVe-15708.20.patch, HIVE-15708.21.patch, HIVE-15708.22.patch, 
> HIVE-15708.23.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968052#comment-15968052
 ] 

Jimmy Xiang commented on HIVE-16429:


Thanks for making the change. +1

> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.
> 
>
> Key: HIVE-16429
> URL: https://issues.apache.org/jira/browse/HIVE-16429
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16429.000.patch, HIVE-16429.001.patch
>
>
> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16433) Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.

2017-04-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968060#comment-15968060
 ] 

Jimmy Xiang commented on HIVE-16433:


Ok, thanks for the explanation. +1

> Not nullify variable "rj" to avoid NPE due to race condition in ExecDriver.
> ---
>
> Key: HIVE-16433
> URL: https://issues.apache.org/jira/browse/HIVE-16433
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16433.000.patch
>
>
> Not nullify variable {{rj}} to avoid NPE due to race condition in ExecDriver. 
> currently  {{rj}} is set to null in ExecDriver.shutdown which is called from 
> other thread for query cancellation. It can happen at any time. There is a 
> potential race condition,  the {{rj}} is still accessed after shutdown is 
> called. For example: if the following code is executed right after 
> ExecDriver.shutdown is called.
> {code}
>   this.jobID = rj.getJobID();
>   updateStatusInQueryDisplay();
>   returnVal = jobExecHelper.progress(rj, jc, ctx);
> {code}
> Currently the purpose of nullifying  {{rj}} is mainly to make sure 
> {{rj.killJob()}} is only called once.
> I will add a flag {{jobKilled}} to make sure {{rj.killJob()}} will be only 
> called once.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968071#comment-15968071
 ] 

Xuefu Zhang commented on HIVE-16418:


For datatype "TIMESTAMP WITH TIME ZONE", I don't think there is a standard on 
how it's represented internally or whether the original timezone needs to be  
maintained. Almost unexceptionally, TIMESTAMP WITH TIME ZONE just means that 
the column can understand input time in various timezones rather than carrying 
the timezone over. This is the case for Postgres, as Gopal mentioned above.

Carrying over the original timezone, great functionality, brings implementation 
complexity. The complexity comes as we are combining two values in one field: a 
timestamp value (either local or UTC) and a timezone code.

Personally I'd be happy to see if Hive can carry out timezone info. (It may be 
worth checking if we can extend HiveKey for timestamptz.) Given the complexity, 
I'd be as happy if we just follow what Postgres does if the original problems 
such as those in HIVE-14305 are solved.

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16439:

Summary: Exclude older v2 version of jackson lib from dependent jars in 
pom.xml   (was: Exclude older v2 version of jackson lib from pom.xml )

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16439) Exclude older v2 version of jackson lib from pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-16439:
---


> Exclude older v2 version of jackson lib from pom.xml 
> -
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Attachment: HIVE-16321.01.patch

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Status: Patch Available  (was: Open)

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16439:

Attachment: HIVE-16439.1.patch

patch-1: exclude the dependencies from common and spark-client projects.

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16439.1.patch
>
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16439:

Status: Patch Available  (was: Open)

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16439.1.patch
>
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16439:

Attachment: HIVE-16439.1.patch

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16439.1.patch
>
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16439:

Attachment: (was: HIVE-16439.1.patch)

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16439.1.patch
>
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16287) Alter table partition rename with location - moves partition back to hive warehouse

2017-04-13 Thread Ying Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968090#comment-15968090
 ] 

Ying Chen commented on HIVE-16287:
--

In the change for HiveAlterHandler -  " destPath = 
wh.getPartitionPath(msdb.getDatabase(dbname), tbl, part_vals);"
Shouldn't new_part.getValues()  be used instead of part_vals ?

> Alter table partition rename with location - moves partition back to hive 
> warehouse
> ---
>
> Key: HIVE-16287
> URL: https://issues.apache.org/jira/browse/HIVE-16287
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: RHEL 6.8 
>Reporter: Ying Chen
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16287.01.patch, HIVE-16287.02.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I was renaming my partition in a table that I've created using the location 
> clause, and noticed that when after rename is completed, my partition is 
> moved to the hive warehouse (hive.metastore.warehouse.dir).
> {quote}
> create table test_local_part (col1 int) partitioned by (col2 int) location 
> '/tmp/testtable/test_local_part';
> insert into test_local_part  partition (col2=1) values (1),(3);
> insert into test_local_part  partition (col2=2) values (3);
> alter table test_local_part partition (col2='1') rename to partition 
> (col2='4');
> {quote}
> Running: 
>describe formatted test_local_part partition (col2='2')
> # Detailed Partition Information   
> Partition Value:  [2]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:25:28 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/tmp/testtable/test_local_part/col2=2*
> Running: 
>describe formatted test_local_part partition (col2='4')
> # Detailed Partition Information   
> Partition Value:  [4]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:24:53 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/apps/hive/warehouse/test_local_part/col2=4*
> ---
> Per Sergio's comment - "The rename should create the new partition name in 
> the same location of the table. "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16436) Response times in "Task Execution Summary" at the end of the job is not correct

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968101#comment-15968101
 ] 

Hive QA commented on HIVE-16436:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863322/HIVE-16436.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4675/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4675/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4675/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863322 - PreCommit-HIVE-Build

> Response times in "Task Execution Summary" at the end of the job is not 
> correct
> ---
>
> Key: HIVE-16436
> URL: https://issues.apache.org/jira/browse/HIVE-16436
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-16436.1.patch, HIVE-16436.2.patch
>
>
> "Task execution summary" is printed at the of running a hive query. E.g
> {noformat}
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 1 277869.00  0  0   1,500,000,000
> 1,500,000,000
>  Map 2 277868.00  0  0   5,999,989,709
>31,162,299
>  Reducer 3  59875.00  0  0   1,531,162,299
> 2,018
>  Reducer 4   2436.00  0  0   2,018
> 2
>  Reducer 5375.00  0  0   2
> 0
> --
> {noformat}
> Response times reported here for Map-1/Map-2 is not correct.  Not sure if 
> this is broken due to any other patch. Creating this jira for tracking 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16427) Fix multi-insert query and write qtests

2017-04-13 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16427:
---
Description: 
On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was 
not actually fixed.

This task is to find the problem, fix it, and add qtests to verify no future 
regression.

Specifically, the following query does not produce correct answers: 
{code}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
limit 0
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{code}

This gives incorrect result in master. All dirs end up with 0 rows instead of 
just dir2.

  was:
On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 was 
not actually fixed.

This task is to find the problem, fix it, and add qtests to verify no future 
regression.

Specifically, the following query does not produce correct answers: 
{code}
>From (select * from src) a
insert overwrite directory '/tmp/emp/dir1/'
select key, value
insert overwrite directory '/tmp/emp/dir2/'
select 'header'
limit 0
insert overwrite directory '/tmp/emp/dir3/'
select key, value 
where key = 100;
{code}


> Fix multi-insert query and write qtests
> ---
>
> Key: HIVE-16427
> URL: https://issues.apache.org/jira/browse/HIVE-16427
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Thomas Poepping
>
> On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 
> was not actually fixed.
> This task is to find the problem, fix it, and add qtests to verify no future 
> regression.
> Specifically, the following query does not produce correct answers: 
> {code}
> From (select * from src) a
> insert overwrite directory '/tmp/emp/dir1/'
> select key, value
> insert overwrite directory '/tmp/emp/dir2/'
> select 'header'
> limit 0
> insert overwrite directory '/tmp/emp/dir3/'
> select key, value 
> where key = 100;
> {code}
> This gives incorrect result in master. All dirs end up with 0 rows instead of 
> just dir2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16287) Alter table partition rename with location - moves partition back to hive warehouse

2017-04-13 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968107#comment-15968107
 ] 

Vihang Karajgaonkar commented on HIVE-16287:


Thanks [~ying1] for reviewing. I ran the tests locally and found the same. I 
will do some tests and update the patch a in few minutes.

> Alter table partition rename with location - moves partition back to hive 
> warehouse
> ---
>
> Key: HIVE-16287
> URL: https://issues.apache.org/jira/browse/HIVE-16287
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: RHEL 6.8 
>Reporter: Ying Chen
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16287.01.patch, HIVE-16287.02.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I was renaming my partition in a table that I've created using the location 
> clause, and noticed that when after rename is completed, my partition is 
> moved to the hive warehouse (hive.metastore.warehouse.dir).
> {quote}
> create table test_local_part (col1 int) partitioned by (col2 int) location 
> '/tmp/testtable/test_local_part';
> insert into test_local_part  partition (col2=1) values (1),(3);
> insert into test_local_part  partition (col2=2) values (3);
> alter table test_local_part partition (col2='1') rename to partition 
> (col2='4');
> {quote}
> Running: 
>describe formatted test_local_part partition (col2='2')
> # Detailed Partition Information   
> Partition Value:  [2]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:25:28 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/tmp/testtable/test_local_part/col2=2*
> Running: 
>describe formatted test_local_part partition (col2='4')
> # Detailed Partition Information   
> Partition Value:  [4]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:24:53 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/apps/hive/warehouse/test_local_part/col2=4*
> ---
> Per Sergio's comment - "The rename should create the new partition name in 
> the same location of the table. "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests

2017-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968111#comment-15968111
 ] 

Sergio Peña commented on HIVE-16427:


[~ychena] Seems the patch on HIVE-14519 fixed the issue partially. Here's 
another test case that is causing multi-insert query to fail.

> Fix multi-insert query and write qtests
> ---
>
> Key: HIVE-16427
> URL: https://issues.apache.org/jira/browse/HIVE-16427
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Thomas Poepping
>
> On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 
> was not actually fixed.
> This task is to find the problem, fix it, and add qtests to verify no future 
> regression.
> Specifically, the following query does not produce correct answers: 
> {code}
> From (select * from src) a
> insert overwrite directory '/tmp/emp/dir1/'
> select key, value
> insert overwrite directory '/tmp/emp/dir2/'
> select 'header'
> limit 0
> insert overwrite directory '/tmp/emp/dir3/'
> select key, value 
> where key = 100;
> {code}
> This gives incorrect result in master. All dirs end up with 0 rows instead of 
> just dir2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968115#comment-15968115
 ] 

Sergio Peña commented on HIVE-16415:


[~poeppt] can we add this zero rows tests on TestCliDriver as well? I think 
they're should be good to have.

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16287) Alter table partition rename with location - moves partition back to hive warehouse

2017-04-13 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16287:
---
Attachment: HIVE-16287.03.patch

Updating the patch for the test failures.

> Alter table partition rename with location - moves partition back to hive 
> warehouse
> ---
>
> Key: HIVE-16287
> URL: https://issues.apache.org/jira/browse/HIVE-16287
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: RHEL 6.8 
>Reporter: Ying Chen
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16287.01.patch, HIVE-16287.02.patch, 
> HIVE-16287.03.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I was renaming my partition in a table that I've created using the location 
> clause, and noticed that when after rename is completed, my partition is 
> moved to the hive warehouse (hive.metastore.warehouse.dir).
> {quote}
> create table test_local_part (col1 int) partitioned by (col2 int) location 
> '/tmp/testtable/test_local_part';
> insert into test_local_part  partition (col2=1) values (1),(3);
> insert into test_local_part  partition (col2=2) values (3);
> alter table test_local_part partition (col2='1') rename to partition 
> (col2='4');
> {quote}
> Running: 
>describe formatted test_local_part partition (col2='2')
> # Detailed Partition Information   
> Partition Value:  [2]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:25:28 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/tmp/testtable/test_local_part/col2=2*
> Running: 
>describe formatted test_local_part partition (col2='4')
> # Detailed Partition Information   
> Partition Value:  [4]  
> Database: default  
> Table:test_local_part  
> CreateTime:   Mon Mar 20 13:24:53 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Location: 
> *hdfs://my.server.com:8020/apps/hive/warehouse/test_local_part/col2=4*
> ---
> Per Sergio's comment - "The rename should create the new partition name in 
> the same location of the table. "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16416) Service: move constants out from HiveAuthFactory

2017-04-13 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968131#comment-15968131
 ] 

Zoltan Haindrich commented on HIVE-16416:
-

test failures are not related

> Service: move constants out from HiveAuthFactory
> 
>
> Key: HIVE-16416
> URL: https://issues.apache.org/jira/browse/HIVE-16416
> Project: Hive
>  Issue Type: Sub-task
>  Components: Server Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16416.1.patch
>
>
> It took me a while to notice that there are only some constants which are 
> keep pulling in this class :)
> it contains a tricky dependency to the whole ql module; but in client mode 
> that part is totally unused - moving the constants out from it, enables the 
> client to operate without the factory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16355) Service: embedded mode should only be available if service is loaded onto the classpath

2017-04-13 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968133#comment-15968133
 ] 

Zoltan Haindrich commented on HIVE-16355:
-

failures are not related

> Service: embedded mode should only be available if service is loaded onto the 
> classpath
> ---
>
> Key: HIVE-16355
> URL: https://issues.apache.org/jira/browse/HIVE-16355
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Server Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16355.1.patch, HIVE-16355.2.patch, 
> HIVE-16355.2.patch
>
>
> I would like to relax the hard reference to 
> {{EmbeddedThriftBinaryCLIService}} to be only used in case {{service}} module 
> is loaded onto the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16440) Fix failing test columnstats_partlvl_invalid_values when autogather column stats is on

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16440:
--


> Fix failing test columnstats_partlvl_invalid_values when autogather column 
> stats is on
> --
>
> Key: HIVE-16440
> URL: https://issues.apache.org/jira/browse/HIVE-16440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16440) Fix failing test columnstats_partlvl_invalid_values when autogather column stats is on

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16440:
---
Status: Patch Available  (was: Open)

> Fix failing test columnstats_partlvl_invalid_values when autogather column 
> stats is on
> --
>
> Key: HIVE-16440
> URL: https://issues.apache.org/jira/browse/HIVE-16440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16440.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16440) Fix failing test columnstats_partlvl_invalid_values when autogather column stats is on

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16440:
---
Attachment: HIVE-16440.01.patch

> Fix failing test columnstats_partlvl_invalid_values when autogather column 
> stats is on
> --
>
> Key: HIVE-16440
> URL: https://issues.apache.org/jira/browse/HIVE-16440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16440.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-04-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.05.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.12

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968153#comment-15968153
 ] 

Hive QA commented on HIVE-15708:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863330/HIVE-15708.23.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[unknown_function4]
 (batchId=233)
org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[wrong_distinct2]
 (batchId=233)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4676/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4676/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4676/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863330 - PreCommit-HIVE-Build

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch, HIVE-15708.16.patch, 
> HIVE-15708.17.patch, HIVE-15708.18.patch, HIVE-15708.19.patch, 
> HIVe-15708.20.patch, HIVE-15708.21.patch, HIVE-15708.22.patch, 
> HIVE-15708.23.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2017-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968168#comment-15968168
 ] 

Sergio Peña commented on HIVE-11418:


The PURGE does not exist on the DROP DATABASE yet. We were thinking if adding 
it on branch-2 in order to delete an encrypted table without sending it to 
trash as this patch with hadoop 2.8 will work only on 3.0. 

The current workaround is to delete one table at a time when they're encrypted 
and trash is enabled. So one option (if we don't want to include PURGE then 
remove it on 3.0) is to mark it as a known bug on Hive versions older than 3.0. 
Does it make sense?

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sahil Takiar
> Attachments: HIVE-11418.1.patch, HIVE-11418.2.patch
>
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2017-04-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968180#comment-15968180
 ] 

Ashutosh Chauhan commented on HIVE-11418:
-

yeah adding a syntax for a workaround of a bug will be confusing. Better to not 
introduce it and leave it with saying known issue, fixed in later version.

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sahil Takiar
> Attachments: HIVE-11418.1.patch, HIVE-11418.2.patch
>
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16426) Query cancel: improve the way to handle files

2017-04-13 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-16426:

Attachment: HIVE-16426.1.patch

1. Use threadlocal variable to store cancel state to make it is accessible 
without being passed around by parameters. 
2. Add checkpoints for file operations.
3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the 
interruption. By what I observed that the method seems not very effective for 
scheduled operation, for example, the on going HMS API calls. 

> Query cancel: improve the way to handle files
> -
>
> Key: HIVE-16426
> URL: https://issues.apache.org/jira/browse/HIVE-16426
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-16426.1.patch
>
>
> 1. Add data structure support to make it is easy to check query cancel status.
> 2. Handle query cancel more gracefully. Remove possible file leaks caused by 
> query cancel as shown in following stack:
> {noformat}
> 2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
> [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
> java.io.InterruptedIOException: Call interrupted
> at org.apache.hadoop.ipc.Client.call(Client.java:1496)
> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy20.delete(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy21.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 3. Add checkpoints to related file operations to improve response time for 
> query cancelling. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16426) Query cancel: improve the way to handle files

2017-04-13 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-16426:

Status: Patch Available  (was: Open)

> Query cancel: improve the way to handle files
> -
>
> Key: HIVE-16426
> URL: https://issues.apache.org/jira/browse/HIVE-16426
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-16426.1.patch
>
>
> 1. Add data structure support to make it is easy to check query cancel status.
> 2. Handle query cancel more gracefully. Remove possible file leaks caused by 
> query cancel as shown in following stack:
> {noformat}
> 2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
> [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
> java.io.InterruptedIOException: Call interrupted
> at org.apache.hadoop.ipc.Client.call(Client.java:1496)
> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy20.delete(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy21.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 3. Add checkpoints to related file operations to improve response time for 
> query cancelling. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15708) Upgrade calcite version to 1.12

2017-04-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15708:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Failures not reproducible.
Pushed to master.  Thanks, Remus!

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch, HIVE-15708.16.patch, 
> HIVE-15708.17.patch, HIVE-15708.18.patch, HIVE-15708.19.patch, 
> HIVe-15708.20.patch, HIVE-15708.21.patch, HIVE-15708.22.patch, 
> HIVE-15708.23.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16429) Should call invokeFailureHooks in handleInterruption to track failed query execution due to interrupted command.

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968203#comment-15968203
 ] 

Hive QA commented on HIVE-16429:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863332/HIVE-16429.001.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10571 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4677/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4677/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4677/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863332 - PreCommit-HIVE-Build

> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.
> 
>
> Key: HIVE-16429
> URL: https://issues.apache.org/jira/browse/HIVE-16429
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16429.000.patch, HIVE-16429.001.patch
>
>
> Should call invokeFailureHooks in handleInterruption to track failed query 
> execution due to interrupted command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16423) Add hint to enforce semi join optimization

2017-04-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16423:
--
Summary: Add hint to enforce semi join optimization  (was: De-duplicate 
semijoin branches and add hint to enforce semi join optimization)

> Add hint to enforce semi join optimization
> --
>
> Key: HIVE-16423
> URL: https://issues.apache.org/jira/browse/HIVE-16423
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16423.1.patch
>
>
> Currently in an n-way join, a semi join branch is created n times. Instead, 
> it should reuse the  same branch.
> Add hints in semijoin to enforce particular semi join optimization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16427) Fix multi-insert query and write qtests

2017-04-13 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968214#comment-15968214
 ] 

Yongzhi Chen commented on HIVE-16427:
-

Yes, it is a different case, HIVE-14519 fixed the case that null return caused 
by filter. This test case is the null value caused by limit statement.

> Fix multi-insert query and write qtests
> ---
>
> Key: HIVE-16427
> URL: https://issues.apache.org/jira/browse/HIVE-16427
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Thomas Poepping
>
> On HIVE-16415, it was found that the bug reported to be fixed in HIVE-14519 
> was not actually fixed.
> This task is to find the problem, fix it, and add qtests to verify no future 
> regression.
> Specifically, the following query does not produce correct answers: 
> {code}
> From (select * from src) a
> insert overwrite directory '/tmp/emp/dir1/'
> select key, value
> insert overwrite directory '/tmp/emp/dir2/'
> select 'header'
> limit 0
> insert overwrite directory '/tmp/emp/dir3/'
> select key, value 
> where key = 100;
> {code}
> This gives incorrect result in master. All dirs end up with 0 rows instead of 
> just dir2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16423) Add hint to enforce semi join optimization

2017-04-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16423:
--
Description: Add hints in semijoin to enforce particular semi join 
optimization.  (was: Currently in an n-way join, a semi join branch is created 
n times. Instead, it should reuse the  same branch.
Add hints in semijoin to enforce particular semi join optimization.)

> Add hint to enforce semi join optimization
> --
>
> Key: HIVE-16423
> URL: https://issues.apache.org/jira/browse/HIVE-16423
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16423.1.patch
>
>
> Add hints in semijoin to enforce particular semi join optimization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-16441:
-


> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16441 started by Deepak Jaiswal.
-
> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16423) Add hint to enforce semi join optimization

2017-04-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16423:
--
Attachment: HIVE-16423.2.patch

Segregated the patch to only focus on hints.

> Add hint to enforce semi join optimization
> --
>
> Key: HIVE-16423
> URL: https://issues.apache.org/jira/browse/HIVE-16423
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16423.1.patch, HIVE-16423.2.patch
>
>
> Add hints in semijoin to enforce particular semi join optimization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16431) Support Parquet StatsNoJobTask for Spark & Tez engine

2017-04-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968233#comment-15968233
 ] 

Xuefu Zhang commented on HIVE-16431:


Patch looks good. I checked the same thing for MR and found over there equal 
check (instead of is assignable from) is in place. Should we be consistent in 
all three places?

> Support Parquet StatsNoJobTask for Spark & Tez engine
> -
>
> Key: HIVE-16431
> URL: https://issues.apache.org/jira/browse/HIVE-16431
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-16431.1.patch
>
>
> It seems only MR uses StatsNoJobTask for Parquet input format when computing 
> stats. We should add it to Tez & Spark as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16431) Support Parquet StatsNoJobTask for Spark & Tez engine

2017-04-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968240#comment-15968240
 ] 

Chao Sun commented on HIVE-16431:
-

[~xuefuz] Hmm... MR should already been done. Which class you were looking at?

> Support Parquet StatsNoJobTask for Spark & Tez engine
> -
>
> Key: HIVE-16431
> URL: https://issues.apache.org/jira/browse/HIVE-16431
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-16431.1.patch
>
>
> It seems only MR uses StatsNoJobTask for Parquet input format when computing 
> stats. We should add it to Tez & Spark as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-10299) Enable new cost model for Tez execution engine

2017-04-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-10299:
---
Status: Patch Available  (was: Open)

> Enable new cost model for Tez execution engine
> --
>
> Key: HIVE-10299
> URL: https://issues.apache.org/jira/browse/HIVE-10299
> Project: Hive
>  Issue Type: Task
>Reporter: Jesus Camacho Rodriguez
>Assignee: Vineet Garg
> Attachments: HIVE-10299.2.patch, HIVE-10299.3.patch, HIVE-10299.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-10299) Enable new cost model for Tez execution engine

2017-04-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-10299:
---
Status: Open  (was: Patch Available)

> Enable new cost model for Tez execution engine
> --
>
> Key: HIVE-10299
> URL: https://issues.apache.org/jira/browse/HIVE-10299
> Project: Hive
>  Issue Type: Task
>Reporter: Jesus Camacho Rodriguez
>Assignee: Vineet Garg
> Attachments: HIVE-10299.2.patch, HIVE-10299.3.patch, HIVE-10299.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968256#comment-15968256
 ] 

Eugene Koifman commented on HIVE-16418:
---

It seems that storing TZ will cause more confusion than it's worth.  I'm not 
even sure it's a good idea to automatically convert UTC to local TZ.  This 
should be done by the application displaying the value or through an explicit 
conversion/formatting SQL function.  This way it's always obvious what the 
default comparison/sorting is and what modifications to those operation any 
particular query is requesting.  

> Allow HiveKey to skip some bytes for comparison
> ---
>
> Key: HIVE-16418
> URL: https://issues.apache.org/jira/browse/HIVE-16418
> Project: Hive
>  Issue Type: New Feature
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15986) Support "is [not] distinct from"

2017-04-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15986:
---
Attachment: HIVE-15986.7.patch

> Support "is [not] distinct from"
> 
>
> Key: HIVE-15986
> URL: https://issues.apache.org/jira/browse/HIVE-15986
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Vineet Garg
> Attachments: HIVE-15986.1.patch, HIVE-15986.2.patch, 
> HIVE-15986.3.patch, HIVE-15986.4.patch, HIVE-15986.5.patch, 
> HIVE-15986.5.patch, HIVE-15986.7.patch
>
>
> Support standard "is [not] distinct from" syntax. For example this gives a 
> standard way to do a comparison to null safe join: select * from t1 join t2 
> on t1.x is not distinct from t2.y. SQL standard reference Section 8.15



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15986) Support "is [not] distinct from"

2017-04-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15986:
---
Status: Open  (was: Patch Available)

> Support "is [not] distinct from"
> 
>
> Key: HIVE-15986
> URL: https://issues.apache.org/jira/browse/HIVE-15986
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Vineet Garg
> Attachments: HIVE-15986.1.patch, HIVE-15986.2.patch, 
> HIVE-15986.3.patch, HIVE-15986.4.patch, HIVE-15986.5.patch, 
> HIVE-15986.5.patch, HIVE-15986.7.patch
>
>
> Support standard "is [not] distinct from" syntax. For example this gives a 
> standard way to do a comparison to null safe join: select * from t1 join t2 
> on t1.x is not distinct from t2.y. SQL standard reference Section 8.15



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15986) Support "is [not] distinct from"

2017-04-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15986:
---
Status: Patch Available  (was: Open)

> Support "is [not] distinct from"
> 
>
> Key: HIVE-15986
> URL: https://issues.apache.org/jira/browse/HIVE-15986
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Vineet Garg
> Attachments: HIVE-15986.1.patch, HIVE-15986.2.patch, 
> HIVE-15986.3.patch, HIVE-15986.4.patch, HIVE-15986.5.patch, 
> HIVE-15986.5.patch, HIVE-15986.7.patch
>
>
> Support standard "is [not] distinct from" syntax. For example this gives a 
> standard way to do a comparison to null safe join: select * from t1 join t2 
> on t1.x is not distinct from t2.y. SQL standard reference Section 8.15



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16431) Support Parquet StatsNoJobTask for Spark & Tez engine

2017-04-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968261#comment-15968261
 ] 

Xuefu Zhang commented on HIVE-16431:


I was looking at GenMRTableScan1.java. My code could be old though.

> Support Parquet StatsNoJobTask for Spark & Tez engine
> -
>
> Key: HIVE-16431
> URL: https://issues.apache.org/jira/browse/HIVE-16431
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-16431.1.patch
>
>
> It seems only MR uses StatsNoJobTask for Parquet input format when computing 
> stats. We should add it to Tez & Spark as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16431) Support Parquet StatsNoJobTask for Spark & Tez engine

2017-04-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968264#comment-15968264
 ] 

Chao Sun commented on HIVE-16431:
-

[~xuefuz] That is done in HIVE-14858: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L94


> Support Parquet StatsNoJobTask for Spark & Tez engine
> -
>
> Key: HIVE-16431
> URL: https://issues.apache.org/jira/browse/HIVE-16431
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-16431.1.patch
>
>
> It seems only MR uses StatsNoJobTask for Parquet input format when computing 
> stats. We should add it to Tez & Spark as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16431) Support Parquet StatsNoJobTask for Spark & Tez engine

2017-04-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968286#comment-15968286
 ] 

Xuefu Zhang commented on HIVE-16431:


Cool then! +1

> Support Parquet StatsNoJobTask for Spark & Tez engine
> -
>
> Key: HIVE-16431
> URL: https://issues.apache.org/jira/browse/HIVE-16431
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-16431.1.patch
>
>
> It seems only MR uses StatsNoJobTask for Parquet input format when computing 
> stats. We should add it to Tez & Spark as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968294#comment-15968294
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863338/HIVE-16321.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10571 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863338 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15986) Support "is [not] distinct from"

2017-04-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968297#comment-15968297
 ] 

Ashutosh Chauhan commented on HIVE-15986:
-

+1

> Support "is [not] distinct from"
> 
>
> Key: HIVE-15986
> URL: https://issues.apache.org/jira/browse/HIVE-15986
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Vineet Garg
> Attachments: HIVE-15986.1.patch, HIVE-15986.2.patch, 
> HIVE-15986.3.patch, HIVE-15986.4.patch, HIVE-15986.5.patch, 
> HIVE-15986.5.patch, HIVE-15986.7.patch
>
>
> Support standard "is [not] distinct from" syntax. For example this gives a 
> standard way to do a comparison to null safe join: select * from t1 join t2 
> on t1.x is not distinct from t2.y. SQL standard reference Section 8.15



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968315#comment-15968315
 ] 

Eugene Koifman commented on HIVE-16321:
---

wow, a clean run!

[~wzheng] could you review please

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16361) Automatically kill runaway client processes

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16361:
--
Summary: Automatically kill runaway client processes   (was: Automatically 
kill runaway processes )

> Automatically kill runaway client processes 
> 
>
> Key: HIVE-16361
> URL: https://issues.apache.org/jira/browse/HIVE-16361
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Critical
>
> HIVE-13249 added an enforceable limit on how many transactions can be opened 
> concurrently where the system starts to reject new work to prevent the system 
> getting to a point where it cannot manage the load.
> Another condition to guard against is a runaway process (which would usually 
> be some app (e.g. Storm) using Streaming Ingest API) that create a very large 
> number of transactions very quickly all of which immediately get aborted due 
> to some misconfiguration.  This can cause large amount of metatdata to 
> accumulate in the ACID system slowing everything down and causing instability.
> Now that we have TXNS.TXN_AGENT_INFO information we could probably use that 
> refuse work from a client even before we open any txns if it passes some 
> "runaway client" heuristic.
> This is like an unintentional DOS attack



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16442) ExplainSemanticAnalyzer shares QueryState between statements

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16442:
--
Component/s: Query Planning

> ExplainSemanticAnalyzer shares QueryState between statements
> 
>
> Key: HIVE-16442
> URL: https://issues.apache.org/jira/browse/HIVE-16442
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Eugene Koifman
>
> explain analyze .
> will call the Driver recursively to actually execute the query
> when it does
> {noformat}
> BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, input);
> {noformat}
> it ends up sharing QueryState between different query executions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16439) Exclude older v2 version of jackson lib from dependent jars in pom.xml

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968355#comment-15968355
 ] 

Hive QA commented on HIVE-16439:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863340/HIVE-16439.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.spark.client.rpc.TestRpc.testEncryption (batchId=280)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4679/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4679/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4679/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863340 - PreCommit-HIVE-Build

> Exclude older v2 version of jackson lib from dependent jars in pom.xml 
> ---
>
> Key: HIVE-16439
> URL: https://issues.apache.org/jira/browse/HIVE-16439
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16439.1.patch
>
>
> There are multiple versions of jackson libs included in the dependent jars 
> like spark-client and metrics-json. That causes older versions of jackson 
> libs to be used.   
> We need to exclude them from the dependencies and use the explicit one 
> (currently 2.6.5).
>   com.fasterxml.jackson.core
>   jackson-databind



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16443) HiveOperation doesn't have operations for Update, Delete, Merge

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16443:
-


> HiveOperation doesn't have operations for Update, Delete, Merge
> ---
>
> Key: HIVE-16443
> URL: https://issues.apache.org/jira/browse/HIVE-16443
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> ideally it should have with proper privileges specified
>   SQLUPDATE("UPDATE", null, null, true, false),
>   SQLDELETE("DELETE", null, null, true, false),
>   SQLMERGE("MERGE", null, null, true, false);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16443) HiveOperation doesn't have operations for Update, Delete, Merge

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16443:
--
Description: 
ideally it should have with proper privileges specified

  SQLUPDATE("UPDATE", null, null, true, false),
  SQLDELETE("DELETE", null, null, true, false),
  SQLMERGE("MERGE", null, null, true, false);

It would also be useful to have INSERT and SELECT

all of these are currently QUERY is not informative

  was:
ideally it should have with proper privileges specified

  SQLUPDATE("UPDATE", null, null, true, false),
  SQLDELETE("DELETE", null, null, true, false),
  SQLMERGE("MERGE", null, null, true, false);



> HiveOperation doesn't have operations for Update, Delete, Merge
> ---
>
> Key: HIVE-16443
> URL: https://issues.apache.org/jira/browse/HIVE-16443
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> ideally it should have with proper privileges specified
>   SQLUPDATE("UPDATE", null, null, true, false),
>   SQLDELETE("DELETE", null, null, true, false),
>   SQLMERGE("MERGE", null, null, true, false);
> It would also be useful to have INSERT and SELECT
> all of these are currently QUERY is not informative



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16443) HiveOperation doesn't have operations for Update, Delete, Merge

2017-04-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16443:
--
Description: 
ideally it should have with proper privileges specified

  SQLUPDATE("UPDATE", null, null, true, false),
  SQLDELETE("DELETE", null, null, true, false),
  SQLMERGE("MERGE", null, null, true, false);

It would also be useful to have INSERT and SELECT

all of these are currently QUERY is not informative

see how VIEW related stuff in SemanticAnalyzerFactory to set more specific 
operation type

  was:
ideally it should have with proper privileges specified

  SQLUPDATE("UPDATE", null, null, true, false),
  SQLDELETE("DELETE", null, null, true, false),
  SQLMERGE("MERGE", null, null, true, false);

It would also be useful to have INSERT and SELECT

all of these are currently QUERY is not informative


> HiveOperation doesn't have operations for Update, Delete, Merge
> ---
>
> Key: HIVE-16443
> URL: https://issues.apache.org/jira/browse/HIVE-16443
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> ideally it should have with proper privileges specified
>   SQLUPDATE("UPDATE", null, null, true, false),
>   SQLDELETE("DELETE", null, null, true, false),
>   SQLMERGE("MERGE", null, null, true, false);
> It would also be useful to have INSERT and SELECT
> all of these are currently QUERY is not informative
> see how VIEW related stuff in SemanticAnalyzerFactory to set more specific 
> operation type



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16415) Add blobstore tests for insertion of zero rows

2017-04-13 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968391#comment-15968391
 ] 

Thomas Poepping commented on HIVE-16415:


[~spena] I like that idea. Do you think the commit messages deserves to be 
edited with that addition? "Add tests covering insertion of zero rows", maybe 
even "Add tests covering single inserts of zero rows" to differentiate from the 
separate issue [~ashutoshc] found.

I have a new diff, I'll upload now

> Add blobstore tests for insertion of zero rows
> --
>
> Key: HIVE-16415
> URL: https://issues.apache.org/jira/browse/HIVE-16415
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Attachments: HIVE-16415.patch
>
>
> This patch introduces two regression tests into the hive-blobstore qtest 
> module: zero_rows_hdfs.q and zero_rows_blobstore.q. These test doing INSERT 
> commands with a WHERE clause where the condition of the WHERE clause causes 
> zero rows to be considered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >