[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25142?focusedWorklogId=599619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599619
 ]

ASF GitHub Bot logged work on HIVE-25142:
-

Author: ASF GitHub Bot
Created on: 20/May/21 05:36
Start Date: 20/May/21 05:36
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #2300:
URL: https://github.com/apache/hive/pull/2300


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599619)
Remaining Estimate: 0h
Time Spent: 10m

> Rehashing in map join fast hash table  causing corruption for large keys
> 
>
> Key: HIVE-25142
> URL: https://issues.apache.org/jira/browse/HIVE-25142
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25142:
--
Labels: pull-request-available  (was: )

> Rehashing in map join fast hash table  causing corruption for large keys
> 
>
> Key: HIVE-25142
> URL: https://issues.apache.org/jira/browse/HIVE-25142
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys

2021-05-19 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-25142:
--


> Rehashing in map join fast hash table  causing corruption for large keys
> 
>
> Key: HIVE-25142
> URL: https://issues.apache.org/jira/browse/HIVE-25142
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> In map join the hash table is created using the keys. To support rehashing, 
> the keys are stored in write buffer. The hash table contains the offset of 
> the keys along with the hash code. When rehashing is done, the offset is 
> extracted from the hash table and then hash code is generated again. For 
> large keys of size greater than 255, the key length is also stored along with 
> the key. In case of fast hash table implementation the way key is extracted 
> is not proper. There is a code bug and thats causing the wrong key to be 
> extracted and causing wrong hash code generation. This is causing the 
> corruption in the hash table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25136) Remove MetaExceptions From RawStore First Cut

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25136?focusedWorklogId=599576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599576
 ]

ASF GitHub Bot logged work on HIVE-25136:
-

Author: ASF GitHub Bot
Created on: 20/May/21 02:26
Start Date: 20/May/21 02:26
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2290:
URL: https://github.com/apache/hive/pull/2290#issuecomment-844634120


   @miklosgergely This work regarding `MetaExceptions` is coming from my emails 
to dev@hive .  This one is a bit more significant and removes many instances of 
this Thrift-generated `MetaExceptions` from the bowels of the Hive Metastore.  
Please let me know if you can take a look at this. Thanks!
   
   @nrg4878 too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599576)
Time Spent: 20m  (was: 10m)

> Remove MetaExceptions From RawStore First Cut
> -
>
> Key: HIVE-25136
> URL: https://issues.apache.org/jira/browse/HIVE-25136
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599575
 ]

ASF GitHub Bot logged work on HIVE-25127:
-

Author: ASF GitHub Bot
Created on: 20/May/21 02:24
Start Date: 20/May/21 02:24
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599575)
Time Spent: 40m  (was: 0.5h)

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599573
 ]

ASF GitHub Bot logged work on HIVE-25127:
-

Author: ASF GitHub Bot
Created on: 20/May/21 02:22
Start Date: 20/May/21 02:22
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599573)
Time Spent: 0.5h  (was: 20m)

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25141:
--
Labels: pull-request-available  (was: )

> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?focusedWorklogId=599431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599431
 ]

ASF GitHub Bot logged work on HIVE-25141:
-

Author: ASF GitHub Bot
Created on: 19/May/21 20:08
Start Date: 19/May/21 20:08
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2299:
URL: https://github.com/apache/hive/pull/2299


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599431)
Remaining Estimate: 0h
Time Spent: 10m

> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-19 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25141:
-


> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=599384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599384
 ]

ASF GitHub Bot logged work on HIVE-25112:
-

Author: ASF GitHub Bot
Created on: 19/May/21 18:10
Start Date: 19/May/21 18:10
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2270:
URL: https://github.com/apache/hive/pull/2270#issuecomment-844347235


   @klcopp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599384)
Time Spent: 40m  (was: 0.5h)

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=599383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599383
 ]

ASF GitHub Bot logged work on HIVE-25112:
-

Author: ASF GitHub Bot
Created on: 19/May/21 18:08
Start Date: 19/May/21 18:08
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2270:
URL: https://github.com/apache/hive/pull/2270#issuecomment-844345978


   @miklosgergely :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599383)
Time Spent: 0.5h  (was: 20m)

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-05-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347763#comment-17347763
 ] 

Zoltan Haindrich commented on HIVE-24920:
-

[~ngangam], [~thejas]: I've updated the PR - and implemented that 
TRANSLATED_TO_EXTERNAL tables may follow renames

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-05-19 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347747#comment-17347747
 ] 

Pravin Sinha commented on HIVE-24909:
-

Committed to master.

Thanks for the patch, [~haymant]

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-05-19 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347745#comment-17347745
 ] 

Pravin Sinha edited comment on HIVE-24909 at 5/19/21, 3:48 PM:
---

+1

Committed to master.

Thanks for the patch, [~haymant]


was (Author: pkumarsinha):
Thanks for the patch, [~haymant]

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-05-19 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347745#comment-17347745
 ] 

Pravin Sinha edited comment on HIVE-24909 at 5/19/21, 3:48 PM:
---

+1

 


was (Author: pkumarsinha):
+1

Committed to master.

Thanks for the patch, [~haymant]

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-05-19 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-24909.
-
Resolution: Fixed

Thanks for the patch, [~haymant]

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24909) Skip the repl events from getting logged in notification log

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24909?focusedWorklogId=599315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599315
 ]

ASF GitHub Bot logged work on HIVE-24909:
-

Author: ASF GitHub Bot
Created on: 19/May/21 15:44
Start Date: 19/May/21 15:44
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2101:
URL: https://github.com/apache/hive/pull/2101


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599315)
Time Spent: 7h 10m  (was: 7h)

> Skip the repl events from getting logged in notification log
> 
>
> Key: HIVE-24909
> URL: https://issues.apache.org/jira/browse/HIVE-24909
> Project: Hive
>  Issue Type: Bug
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently REPL dump events are logged and replicated as a part of replication 
> policy. Whenever one replication cycle completed, we always have one 
> transaction left open on the target corresponding to repl dump operation. 
> This will never be caught up without manually dealing with the transaction on 
> target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-19 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25140:

Description: 
Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift 
and protobuf version conflicts. A logging only exporter is used.

There are Spans for BeeLine and Hive. Server 2. The code was developed on 
branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
time due to major metastore code refactoring.

  was:
Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift 
and protobuf version conflicts.

Has Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 
and porting Spans to the Hive MetaStore on master is taking more time due to 
major code refactoring.


> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-19 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25140:
---


> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts.
> Has Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599304
 ]

ASF GitHub Bot logged work on HIVE-25127:
-

Author: ASF GitHub Bot
Created on: 19/May/21 15:24
Start Date: 19/May/21 15:24
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599304)
Time Spent: 20m  (was: 10m)

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25127:
--
Labels: pull-request-available  (was: )

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599302
 ]

ASF GitHub Bot logged work on HIVE-25127:
-

Author: ASF GitHub Bot
Created on: 19/May/21 15:22
Start Date: 19/May/21 15:22
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2283:
URL: https://github.com/apache/hive/pull/2283


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599302)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Thrift Exceptions From RawStore getCatalogs
> --
>
> Key: HIVE-25127
> URL: https://issues.apache.org/jira/browse/HIVE-25127
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25069) Hive Distributed Tracing

2021-05-19 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Attachment: (was: HIVE-25069.01.patch)

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-25069) Hive Distributed Tracing

2021-05-19 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Comment: was deleted

(was: A first Work-in-Progress patch. Work was done on branch-3.1 and manually 
merging changes to master is tedious. The Tracing infrastructure modules are in 
but only a few Hive classes have been merged. Enough though to give Hive QA a 
run. Tracing will exported to a logging-only exporter..)

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-25069) Hive Distributed Tracing

2021-05-19 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25069:

Comment: was deleted

(was: I'll try a pull request instead.)

> Hive Distributed Tracing
> 
>
> Key: HIVE-25069
> URL: https://issues.apache.org/jira/browse/HIVE-25069
> Project: Hive
>  Issue Type: New Feature
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: image-2021-05-10-09-20-54-688.png, 
> image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png
>
>
> Instrument Hive code to gather distributed traces and export trace data to a 
> configurable collector.
> Distributed tracing is a revolutionary tool for debugging issues.
> We will use the new OpenTelemetry open-source standard that our industry has 
> aligned on. OpenTelemetry is the merger of two earlier distributed tracing 
> projects OpenTracing and OpenCensus.
> Next step: Add design document that goes into more detail on the benefits of 
> distributed tracing and describes how Hive will enhanced.
> Also see:
> HBASE-22120 Replace HTrace with OpenTelemetry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?focusedWorklogId=599301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599301
 ]

ASF GitHub Bot logged work on HIVE-25128:
-

Author: ASF GitHub Bot
Created on: 19/May/21 15:17
Start Date: 19/May/21 15:17
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2291:
URL: https://github.com/apache/hive/pull/2291


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599301)
Time Spent: 40m  (was: 0.5h)

> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25128?focusedWorklogId=599300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599300
 ]

ASF GitHub Bot logged work on HIVE-25128:
-

Author: ASF GitHub Bot
Created on: 19/May/21 15:16
Start Date: 19/May/21 15:16
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2291:
URL: https://github.com/apache/hive/pull/2291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599300)
Time Spent: 0.5h  (was: 20m)

> Remove Thrift Exceptions From RawStore alterCatalog
> ---
>
> Key: HIVE-25128
> URL: https://issues.apache.org/jira/browse/HIVE-25128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver

2021-05-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347719#comment-17347719
 ] 

Stamatis Zampetakis commented on HIVE-25010:


I tried running the tests for this driver locally using the following command, 
which usually works fine with other CliDriver tests, but I got some weird 
failures as the one shown below.
{code:sh}
cd itests/qtest
mvn test -Dtest=TestIcebergCliDriver -Dtest.output.overwrite
{code}
The exception that I got was the following: 
{noformat}
org.apache.hive.iceberg.org.apache.iceberg.exceptions.NoSuchIcebergTableException
{noformat}

It turns out that in order to run the Iceberg tests the project needs to be 
compiled with the iceberg profile enabled.

{code:sh}
mvn clean install -DskipTests -Pitests,iceberg
{code}

Failing to include the iceberg profile in the compilation can lead to more 
problems since old versions of the iceberg module may be mixed with current 
compiled SNAPSHOTS and make the problem harder to debug.

> Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
> 
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We should create iceberg specific drivers to run iceberg qtests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2021-05-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347702#comment-17347702
 ] 

Zoltan Haindrich commented on HIVE-24920:
-

if we are about to do that then an existing external table dir might cause some 
trouble:
{code}
create external table t (i integer);
-- this will create dir WH/t
insert into t values (1);
drop table t;
-- this will leave WH/t as is beacuse its a full external table without the 
purge option
create table t(i integer);
-- this will create a table at the same external location; which is now 
occupied...your current proposal doesn't handle this case
select * from t
1
-- shows the inserted record from the previous table instance...
{code}

I don't think we should just accept the above behaviour the user have used a 
statement which should have created a normal managed table (create table t) - 
so it should be empty in any circumstancesif we ant to do the same kind of 
renames for translated table we should still retain the "existing location dir" 
avoidance mechanisms of the existing patch - and set the one which throws an 
exception if it exists the default.

This could probably enable our users to choose the behaviour they would like to 
see.

[~thejas]: what do you think?

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599244
 ]

ASF GitHub Bot logged work on HIVE-25079:
-

Author: ASF GitHub Bot
Created on: 19/May/21 13:45
Start Date: 19/May/21 13:45
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2281:
URL: https://github.com/apache/hive/pull/2281#discussion_r635252137



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, 
Map rhs) {
 return value.isEmpty()? Collections.emptyMap() : 
Splitter.on(',').withKeyValueSeparator("->").split(value);
   }
 
+  @Test
+  public void textWritesToDisabledCompactionTable() throws Exception {

Review comment:
   nit: typo "text" -> "test"

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, 
Map rhs) {
 return value.isEmpty()? Collections.emptyMap() : 
Splitter.on(',').withKeyValueSeparator("->").split(value);
   }
 
+  @Test
+  public void textWritesToDisabledCompactionTable() throws Exception {
+MetastoreConf.setVar(conf, 
MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS, 
"org.apache.hadoop.hive.metastore.HMSMetricsListener");

Review comment:
   nit: HMSMetricsListener.class.getName() would be nicer

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSMetricsListener.java
##
@@ -86,4 +92,24 @@ public void onAddPartition(AddPartitionEvent partitionEvent) 
throws MetaExceptio
 
Metrics.getOrCreateGauge(MetricsConstants.TOTAL_PARTITIONS).incrementAndGet();
 createdParts.inc();
   }
+
+  @Override
+  public void onAllocWriteId(AllocWriteIdEvent allocWriteIdEvent, Connection 
dbConn, SQLGenerator sqlGenerator) throws MetaException {
+Table table = getTable(allocWriteIdEvent);

Review comment:
   Before doing anything we should first check if metrics are enabled 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599244)
Time Spent: 0.5h  (was: 20m)

> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25139:
--
Labels: pull-request-available  (was: )

> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?focusedWorklogId=599237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599237
 ]

ASF GitHub Bot logged work on HIVE-25139:
-

Author: ASF GitHub Bot
Created on: 19/May/21 13:36
Start Date: 19/May/21 13:36
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2298:
URL: https://github.com/apache/hive/pull/2298


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599237)
Remaining Estimate: 0h
Time Spent: 10m

> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface

2021-05-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-24665.
--
Resolution: Duplicate

> Add commitAlterTable method to the HiveMetaHook interface
> -
>
> Key: HIVE-24665
> URL: https://issues.apache.org/jira/browse/HIVE-24665
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: László Pintér
>Priority: Major
>
> Currently we have pre and post hooks for create table and drop table 
> commands, but only a pre hook for alter table commands. We should add a post 
> hook as well (with a default implementation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook

2021-05-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25139:



> Filter out null table properties in HiveIcebergMetaHook
> ---
>
> Key: HIVE-25139
> URL: https://issues.apache.org/jira/browse/HIVE-25139
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25080?focusedWorklogId=599211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599211
 ]

ASF GitHub Bot logged work on HIVE-25080:
-

Author: ASF GitHub Bot
Created on: 19/May/21 13:08
Start Date: 19/May/21 13:08
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2297:
URL: https://github.com/apache/hive/pull/2297#discussion_r635220587



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -292,7 +292,10 @@
 TxnStatus.OPEN + "' AND \"TXN_TYPE\" != "+ 
TxnType.REPL_CREATED.getValue() +") \"T\" CROSS JOIN (" +
   "SELECT COUNT(*), MIN(\"TXN_ID\"), ({0} - MIN(\"TXN_STARTED\"))/1000 
FROM \"TXNS\" WHERE \"TXN_STATE\"='" +
 TxnStatus.ABORTED + "') \"A\" CROSS JOIN (" +
-  "SELECT COUNT(*), ({0} - MIN(\"HL_ACQUIRED_AT\"))/1000 FROM 
\"HIVE_LOCKS\") \"HL\"";
+  "SELECT COUNT(*), ({0} - MIN(\"HL_ACQUIRED_AT\"))/1000 FROM 
\"HIVE_LOCKS\") \"HL\" CROSS JOIN (" +
+  "SELECT ({0} - MIN(\"CQ_ENQUEUE_TIME\"))/1000 from \"COMPACTION_QUEUE\" 
WHERE " +

Review comment:
   I think CQ_ENQUEUE_TIME is the time that the compaction was put in 
"initiated" state. Either CQ_ENQUEUE_TIME value should be updated when the 
compaction is put in "ready for cleaning" or we need a new column in the 
compaction queue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599211)
Time Spent: 20m  (was: 10m)

> Create metric about oldest entry in "ready for cleaning" state
> --
>
> Key: HIVE-25080
> URL: https://issues.apache.org/jira/browse/HIVE-25080
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated 
> with the current time. Then the compaction state is set to "ready for 
> cleaning". (... and then the Cleaner runs and the state is set to "succeeded" 
> hopefully)
> Based on this we know (roughly) how long a compaction has been in state 
> "ready for cleaning".
> We should create a metric similar to compaction_oldest_enqueue_age_in_sec 
> that would show that the cleaner is blocked by something i.e. find the 
> compaction in "ready for cleaning" that has the oldest commit time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599190
 ]

ASF GitHub Bot logged work on HIVE-25109:
-

Author: ASF GitHub Bot
Created on: 19/May/21 12:41
Start Date: 19/May/21 12:41
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2268:
URL: https://github.com/apache/hive/pull/2268#discussion_r635199267



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -5031,7 +5038,7 @@ private RelNode genLogicalPlan(QB qb, boolean outerMostQB,
 
   // Build Rel for Constraint checks
   Pair constraintPair =
-  genConstraintFilterLogicalPlan(qb, srcRel, outerNameToPosMap, 
outerRR);
+  genConstraintFilterLogicalPlan(qb, selPair, outerNameToPosMap, 
outerRR);

Review comment:
   Went through the code where `selectRel` gets its value and I found that 
it can not be null:
   If is coming from `internalGenSelectLogicalPlan` which can create it with 
the following way
   ```
   outputRel = genUDTFPlan(genericUDTF, genericUDTFName, udtfTableAlias, 
udtfColAliases, qb,
   ...
   RelNode udtf = HiveTableFunctionScan.create(cluster, traitSet, list, 
rexNode, null, retType,
 null);
   outputRel = genSelectRelNode(columnList, outputRR, srcRel);
   ...
 HiveRelNode selRel = HiveProject.create(
   
   outputRel = new HiveAggregate(cluster, 
cluster.traitSetOf(HiveRelNode.CONVENTION),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599190)
Time Spent: 40m  (was: 0.5h)

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599188
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 12:35
Start Date: 19/May/21 12:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r635195134



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   .getMsg(destinationPath.toUri().toString()));
 }
   }
+  // handle direct insert CTAS case
+  // for direct insert CTAS, the table creation DDL is not added to the 
task plan in TaskCompiler,
+  // therefore we need to add the InsertHook here manually so that 
HiveMetaHook#commitInsertTable is called

Review comment:
   Rephrased the comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599188)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-05-19 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25093:
-
Description: 
*HIVE - 1.2*

sshuser@hn0-dateti:~$ *timedatectl*

  Local time: Thu 2021-05-06 11:56:08 IST
  Universal time: Thu 2021-05-06 06:26:08 UTC
RTC time: Thu 2021-05-06 06:26:08
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-dateti:~$ beeline
0: jdbc:hive2://localhost:10001/default> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+--+
| _c0  |
+--+--+
| 2021-05-06 11:58:53.760 IST  |
+--+--+
1 row selected (1.271 seconds)


*HIVE - 3.1.0*

sshuser@hn0-testja:~$ *timedatectl*
  Local time: Thu 2021-05-06 12:03:32 IST
  Universal time: Thu 2021-05-06 06:33:32 UTC
RTC time: Thu 2021-05-06 06:33:32
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-testja:~$ beeline
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *2021-05-06 06:33:59.078 UTC*  |
+--+
1 row selected (13.396 seconds)

0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
hive.local.time.zone=Asia/Kolkata;*
No rows affected (0.025 seconds)
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
+--+
1 row selected (1.074 seconds)

expected result was *2021-05-06 12:08:15.118 IST*


As part of HIVE-12192 it was decided to have a common time zone for all 
computation i.e. "UTC". Due to which data_format() function was hard coded to 
"UTC".

But later in HIVE-21039 it was decided that user session time zone value should 
be the default not UTC. 

date_format() was not fixed as part of HIVE-21039.

what should be the ideal time zone value of date_format().

  was:
*HIVE - 1.2*

sshuser@hn0-dateti:~$ *timedatectl*

  Local time: Thu 2021-05-06 11:56:08 IST
  Universal time: Thu 2021-05-06 06:26:08 UTC
RTC time: Thu 2021-05-06 06:26:08
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-dateti:~$ beeline
0: jdbc:hive2://localhost:10001/default> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+--+
| _c0  |
+--+--+
| 2021-05-06 11:58:53.760 IST  |
+--+--+
1 row selected (1.271 seconds)


*HIVE - 3.1.0*

sshuser@hn0-testja:~$ *timedatectl*
  Local time: Thu 2021-05-06 12:03:32 IST
  Universal time: Thu 2021-05-06 06:33:32 UTC
RTC time: Thu 2021-05-06 06:33:32
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-testja:~$ beeline
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(2021-05-06 
12:03:32,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *2021-05-06 06:33:59.078 UTC*  |
+--+
1 row selected (13.396 seconds)

0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
hive.local.time.zone=Asia/Kolkata;*
No rows affected (0.025 seconds)
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
+--+
1 row selected (1.074 seconds)

expected result was *2021-05-06 12:08:15.118 IST*


As part of HIVE-12192 it was decided to have a common time zone for all 
computation i.e. "UTC". Due to which data_format() function was hard coded to 
"UTC".

But later in HIVE-21039 it was decided that user session time zone value should 
be the default not UTC. 

date_format() was not fixed as part of HIVE-21039.

what should be the ideal time zone value of date_format().


> date_format() UDF is returning values in UTC time zone only 
> 
>
> Key: HIVE-25093
> URL: https://issues.apache.org/jira/browse/HIVE-25093
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>  

[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599186
 ]

ASF GitHub Bot logged work on HIVE-25109:
-

Author: ASF GitHub Bot
Created on: 19/May/21 12:29
Start Date: 19/May/21 12:29
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2268:
URL: https://github.com/apache/hive/pull/2268#discussion_r635190684



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -3475,15 +3475,22 @@ private RelNode genFilterLogicalPlan(QB qb, RelNode 
srcRel, ImmutableMap(constraintRel, inputRR);
+  RelNode constraintRel = genFilterRelNode(constraintUDF, selPair.left, 
outerNameToPosMap, outerRR);
+
+  List originalInputRefs = toRexNodeList(selPair.left);
+  List selectedRefs = Lists.newArrayList();
+  for (int index = 0; index < selPair.right.getColumnInfos().size(); 
index++) {
+selectedRefs.add(originalInputRefs.get(index));
+  }

Review comment:
   The Project may contains columns which are not in the top Project and 
not present in the row schema. However these columns may referenced in 
constraint filter expressions or sort and order by keys.
   I found that at the end of Project generation all columns coming from the 
input RowResolver of the Project are added to the output RowResolver:
   
https://github.com/apache/hive/blob/d0d3f0aa50fa7b50ec74cae0dda0b93271799313/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4738
   
   Since these are added to the end of the list the selected ones should be a 
prefix of the full list.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599186)
Time Spent: 0.5h  (was: 20m)

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-05-19 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25093:
-
Description: 
*HIVE - 1.2*

sshuser@hn0-dateti:~$ *timedatectl*

  Local time: Thu 2021-05-06 11:56:08 IST
  Universal time: Thu 2021-05-06 06:26:08 UTC
RTC time: Thu 2021-05-06 06:26:08
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-dateti:~$ beeline
0: jdbc:hive2://localhost:10001/default> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+--+
| _c0  |
+--+--+
| 2021-05-06 11:58:53.760 IST  |
+--+--+
1 row selected (1.271 seconds)


*HIVE - 3.1.0*

sshuser@hn0-testja:~$ *timedatectl*
  Local time: Thu 2021-05-06 12:03:32 IST
  Universal time: Thu 2021-05-06 06:33:32 UTC
RTC time: Thu 2021-05-06 06:33:32
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-testja:~$ beeline
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(2021-05-06 
12:03:32,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *2021-05-06 06:33:59.078 UTC*  |
+--+
1 row selected (13.396 seconds)

0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
hive.local.time.zone=Asia/Kolkata;*
No rows affected (0.025 seconds)
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
+--+
1 row selected (1.074 seconds)

expected result was *2021-05-06 12:08:15.118 IST*


As part of HIVE-12192 it was decided to have a common time zone for all 
computation i.e. "UTC". Due to which data_format() function was hard coded to 
"UTC".

But later in HIVE-21039 it was decided that user session time zone value should 
be the default not UTC. 

date_format() was not fixed as part of HIVE-21039.

what should be the ideal time zone value of date_format().

  was:
*HIVE - 1.2*

sshuser@hn0-dateti:~$ *timedatectl*

  Local time: Thu 2021-05-06 11:56:08 IST
  Universal time: Thu 2021-05-06 06:26:08 UTC
RTC time: Thu 2021-05-06 06:26:08
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-dateti:~$ beeline
0: jdbc:hive2://localhost:10001/default> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+--+
| _c0  |
+--+--+
| 2021-05-06 11:58:53.760 IST  |
+--+--+
1 row selected (1.271 seconds)


*HIVE - 3.1.0*

sshuser@hn0-testja:~$ *timedatectl*
  Local time: Thu 2021-05-06 12:03:32 IST
  Universal time: Thu 2021-05-06 06:33:32 UTC
RTC time: Thu 2021-05-06 06:33:32
   Time zone: Asia/Kolkata (IST, +0530)
 Network time on: yes
NTP synchronized: yes
 RTC in local TZ: no

sshuser@hn0-testja:~$ beeline
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *2021-05-06 06:33:59.078 UTC*  |
+--+
1 row selected (13.396 seconds)

0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
hive.local.time.zone=Asia/Kolkata;*
No rows affected (0.025 seconds)
0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
+--+
| _c0  |
+--+
| *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
+--+
1 row selected (1.074 seconds)

expected result was *2021-05-06 12:08:15.118 IST*


As part of HIVE-12192 it was decided to have a common time zone for all 
computation i.e. "UTC". Due to which data_format() function was hard coded to 
"UTC".

But later in HIVE-21039 it was decided that user session time zone value should 
be the default not UTC. 

date_format() was not fixed as part of HIVE-21039.

what should be the ideal time zone value of date_format().


> date_format() UDF is returning values in UTC time zone only 
> 
>
> Key: HIVE-25093
> URL: https://issues.apache.org/jira/browse/HIVE-25093
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>  

[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599170
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 12:02
Start Date: 19/May/21 12:02
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r635171207



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);

Review comment:
   As discussed, since the `Catalogs.createTable()` does not register the 
table in HMS for non-HiveCatalogs, any subsequent SELECTS for the target table 
wouldn't work. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599170)
Time Spent: 4h 10m  (was: 4h)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599164
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 11:55
Start Date: 19/May/21 11:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r635167120



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergCTASHook.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.iceberg.mr.Catalogs;
+import org.apache.iceberg.mr.InputFormatConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergCTASHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergCTASHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackIcebergCTAS(ctx);
+}
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackIcebergCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackIcebergCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+String queryId = conf.getVar(HiveConf.ConfVars.HIVEQUERYID);
+if 
(conf.getBoolean(String.format(InputFormatConfig.IS_CTAS_QUERY_TEMPLATE, 
queryId), false)) {
+  try {
+String tableName = 
conf.get(String.format(InputFormatConfig.CTAS_TABLE_NAME_TEMPLATE, queryId));
+LOG.info("Dropping the following CTAS target table as part of 
rollback: {}", tableName);
+Properties props = new Properties();
+props.put(Catalogs.NAME, tableName);
+Catalogs.dropTable(conf, props);

Review comment:
   Good point. As discussed, the table properties of the target table 
should contain the catalog_name (and the corresponding fields such as type), so 
we should drop the table from the correct catalog.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599164)
Time Spent: 4h  (was: 3h 50m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599163
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 11:54
Start Date: 19/May/21 11:54
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r635166430



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFromDifferentIcebergCatalog() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+// get source data from a different catalog
+shell.executeStatement(String.format(
+"CREATE TABLE source STORED BY '%s' LOCATION '%s' TBLPROPERTIES 
('%s'='%s', '%s'='%s')",
+HiveIcebergStorageHandler.class.getName(),
+temp.getRoot().getPath() + "/default/source/",
+InputFormatConfig.CATALOG_NAME, Catalogs.ICEBERG_HADOOP_TABLE_NAME,
+InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)));
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'Roger'), 
(2, 'Linda', 'Albright')");
+
+// CTAS into a new HiveCatalog table
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY customer_id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "Roger"}, 
objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Albright"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFailureRollback() throws IOException {

Review comment:
   Sure!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599163)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599138
 ]

ASF GitHub Bot logged work on HIVE-25109:
-

Author: ASF GitHub Bot
Created on: 19/May/21 10:56
Start Date: 19/May/21 10:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2268:
URL: https://github.com/apache/hive/pull/2268#discussion_r635129376



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -5031,7 +5038,7 @@ private RelNode genLogicalPlan(QB qb, boolean outerMostQB,
 
   // Build Rel for Constraint checks
   Pair constraintPair =
-  genConstraintFilterLogicalPlan(qb, srcRel, outerNameToPosMap, 
outerRR);
+  genConstraintFilterLogicalPlan(qb, selPair, outerNameToPosMap, 
outerRR);

Review comment:
   will this work okay when `selectRel == null`?
   previous code was passing `srcRel` which is optionally the previous `srcRel`

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -3475,15 +3475,22 @@ private RelNode genFilterLogicalPlan(QB qb, RelNode 
srcRel, ImmutableMap(constraintRel, inputRR);
+  RelNode constraintRel = genFilterRelNode(constraintUDF, selPair.left, 
outerNameToPosMap, outerRR);
+
+  List originalInputRefs = toRexNodeList(selPair.left);
+  List selectedRefs = Lists.newArrayList();
+  for (int index = 0; index < selPair.right.getColumnInfos().size(); 
index++) {
+selectedRefs.add(originalInputRefs.get(index));
+  }

Review comment:
   I'm not sure about this; this block could be replaced with something 
like 
   ```
   
selectedRefs.addAll(originalInputRefs.sublist(selPair.right.getColumnInfos().size()))
   ```
   which looks odd to me because it would mean that the `selected` ones may 
only be a prefix of the original ones - is that true in every case?
   shouldn't this code be checking the `ref` of the `RexInputRefs` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599138)
Time Spent: 20m  (was: 10m)

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-25117) Vector PTF ClassCastException with Decimal64

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25117?focusedWorklogId=599123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599123
 ]

ASF GitHub Bot logged work on HIVE-25117:
-

Author: ASF GitHub Bot
Created on: 19/May/21 10:28
Start Date: 19/May/21 10:28
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2286:
URL: https://github.com/apache/hive/pull/2286#discussion_r635106215



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -4962,9 +4969,8 @@ private static void createVectorPTFDesc(Operator ptfOp,
 evaluatorWindowFrameDefs,
 evaluatorInputExprNodeDescLists);
 
-TypeInfo[] reducerBatchTypeInfos = vContext.getAllTypeInfos();
-
 vectorPTFDesc.setReducerBatchTypeInfos(reducerBatchTypeInfos);
+
vectorPTFDesc.setReducerBatchDataTypePhysicalVariations(reducerBatchDataTypePhysicalVariations);

Review comment:
   Shall we have setReducerBatchTypeInfos(Types, TypeVariations) instead of 
creating a separate method?
   Seems like a good practice making sure we pas TypeVariations along with 
Types.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPTFDesc.java
##
@@ -487,10 +495,18 @@ public void setOutputColumnNames(String[] 
outputColumnNames) {
 return outputTypeInfos;
   }
 
+  public DataTypePhysicalVariation[] getOutputDataTypePhysicalVariations() {
+return outputDataTypePhysicalVariations;
+  }
+
   public void setOutputTypeInfos(TypeInfo[] outputTypeInfos) {
 this.outputTypeInfos = outputTypeInfos;
   }
 
+  public void setOutputDataTypePhysicalVariations(DataTypePhysicalVariation[] 
outputDataTypePhysicalVariations) {

Review comment:
   As per comment above this can be simplified

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -4978,6 +4984,7 @@ private static void createVectorPTFDesc(Operator ptfOp,
 
 vectorPTFDesc.setOutputColumnNames(outputColumnNames);
 vectorPTFDesc.setOutputTypeInfos(outputTypeInfos);
+
vectorPTFDesc.setOutputDataTypePhysicalVariations(outputDataTypePhysicalVariations);

Review comment:
   Same comment as above for outputTypes

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFOperator.java
##
@@ -250,13 +253,16 @@ protected VectorizedRowBatch setupOverflowBatch() throws 
HiveException {
 for (int i = 0; i < outputProjectionColumnMap.length; i++) {
   int outputColumn = outputProjectionColumnMap[i];
   String typeName = outputTypeInfos[i].getTypeName();
-  allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName);
+  allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName, 
outputDataTypePhysicalVariations[i]);
 }
 
 // Now, add any scratch columns needed for children operators.
 int outputColumn = initialColumnCount;
+DataTypePhysicalVariation[] dataTypePhysicalVariations = 
vOutContext.getScratchDataTypePhysicalVariations();
 for (String typeName : vOutContext.getScratchColumnTypeNames()) {
-  allocateOverflowBatchColumnVector(overflowBatch, outputColumn++, 
typeName);
+  allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName,
+  dataTypePhysicalVariations[outputColumn-initialColumnCount]);

Review comment:
   I would expect the ScratchCol Types not to include the initial 
outputCols Types. Why do we need the outputColumn-initialColumnCount here?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPTFDesc.java
##
@@ -419,6 +422,11 @@ public void setReducerBatchTypeInfos(TypeInfo[] 
reducerBatchTypeInfos) {
 this.reducerBatchTypeInfos = reducerBatchTypeInfos;
   }
 
+  public void setReducerBatchDataTypePhysicalVariations(

Review comment:
   As per comment above this can be simplified




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599123)
Time Spent: 20m  (was: 10m)

> Vector PTF ClassCastException with Decimal64
> 
>
> Key: HIVE-25117
> URL: https://issues.apache.org/jira/browse/HIVE-25117
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: vector_ptf_classcast_exception.q
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Only reproduces when there is at least 1 

[jira] [Assigned] (HIVE-25138) Auto disable scheduled queries after repeated failures

2021-05-19 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25138:
---


> Auto disable scheduled queries after repeated failures
> --
>
> Key: HIVE-25138
> URL: https://issues.apache.org/jira/browse/HIVE-25138
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24936) Fix file name parsing and copy file move.

2021-05-19 Thread Harish JP (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish JP updated HIVE-24936:
-
Description: 
The taskId and taskAttemptId is not extracted correctly for copy files 
(1_02_copy_3) and when doing a move file of an incompatible copy file the 
rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 
1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 
1_02_copy_N.

 

Incompatible files should be always renamed using the current task or it can 
get deleted if the file name conflicts with another task output file. Ex: if 
the input file name for a task is 5_01 and is incompatible then if we move 
this file, it will be treated as an output file for task id 5, attempt 1 which 
if exists will try to generate the same file and fail and another attempt will 
be made. There will be 2 files 5_01, 5_02, the deduping code will 
remove 5_01 resulting in data loss. There are other scenarios where the 
same can happen.

  was:The taskId and taskAttemptId is not extracted correctly for copy files 
(1_02_copy_3) and when doing a move file of an incompatible copy file the 
rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 
1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 
1_02_copy_N.


> Fix file name parsing and copy file move.
> -
>
> Key: HIVE-24936
> URL: https://issues.apache.org/jira/browse/HIVE-24936
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The taskId and taskAttemptId is not extracted correctly for copy files 
> (1_02_copy_3) and when doing a move file of an incompatible copy file the 
> rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 
> 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 
> 1_02_copy_N.
>  
> Incompatible files should be always renamed using the current task or it can 
> get deleted if the file name conflicts with another task output file. Ex: if 
> the input file name for a task is 5_01 and is incompatible then if we 
> move this file, it will be treated as an output file for task id 5, attempt 1 
> which if exists will try to generate the same file and fail and another 
> attempt will be made. There will be 2 files 5_01, 5_02, the deduping 
> code will remove 5_01 resulting in data loss. There are other scenarios 
> where the same can happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599064
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 08:20
Start Date: 19/May/21 08:20
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r635016099



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws 
IOException {
 HiveIcebergTestUtils.validateData(table, expected, 0);
   }
 
+  @Test
+  public void testCTASFromHiveTable() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");
+
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+testTables.locationForCreateTableSQL(TableIdentifier.of("default", 
"target")),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFromDifferentIcebergCatalog() {
+Assume.assumeTrue("CTAS target table is supported only for HiveCatalog 
tables",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+// get source data from a different catalog
+shell.executeStatement(String.format(
+"CREATE TABLE source STORED BY '%s' LOCATION '%s' TBLPROPERTIES 
('%s'='%s', '%s'='%s')",
+HiveIcebergStorageHandler.class.getName(),
+temp.getRoot().getPath() + "/default/source/",
+InputFormatConfig.CATALOG_NAME, Catalogs.ICEBERG_HADOOP_TABLE_NAME,
+InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)));
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'Roger'), 
(2, 'Linda', 'Albright')");
+
+// CTAS into a new HiveCatalog table
+shell.executeStatement(String.format(
+"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS 
SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY customer_id");
+Assert.assertEquals(2, objects.size());
+Assert.assertArrayEquals(new Object[]{1L, "Mike", "Roger"}, 
objects.get(0));
+Assert.assertArrayEquals(new Object[]{2L, "Linda", "Albright"}, 
objects.get(1));
+  }
+
+  @Test
+  public void testCTASFailureRollback() throws IOException {

Review comment:
   Could you please add a test method that checks the rollback in case the 
source and dest tables are in different catalogs? 

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergCTASHook.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.util.Properties;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.iceberg.mr.Catalogs;
+import org.apache.iceberg.mr.InputFormatConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergCTASHook implements QueryLifeTimeHook {
+

[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599055
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 07:54
Start Date: 19/May/21 07:54
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r634998109



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   .getMsg(destinationPath.toUri().toString()));
 }
   }
+  // handle direct insert CTAS case
+  // for direct insert CTAS, the table creation DDL is not added to the 
task plan in TaskCompiler,
+  // therefore we need to add the InsertHook here manually so that 
HiveMetaHook#commitInsertTable is called
+  if (qb.isCTAS() && tableDesc != null && tableDesc.getStorageHandler() != 
null) {
+try {
+  if (HiveUtils.getStorageHandler(conf, 
tableDesc.getStorageHandler()).directInsertCTAS()) {
+createPreInsertDesc(destinationTable, false);
+  }
+} catch (HiveException e) {

Review comment:
   Right. Now that I think about it, the main reason I swallowed the 
exception is that this is a general hive codepath, so didn't want to screw up 
any normal hive ctas queries. But I think we can assume that even for normal 
hive ctas, if the table has a storage handler defined, it should be loadable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599055)
Time Spent: 3.5h  (was: 3h 20m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver

2021-05-19 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347384#comment-17347384
 ] 

Peter Vary commented on HIVE-25122:
---

I would keep for a while, so if there is a reoccurrence of the issue others 
might find it.
If this was only a one off issue then we can close it.

Or we can just run a flaky test check on this test, to see if it is failing. If 
not, then we can close.
http://ci.hive.apache.org/job/hive-flaky-check


> Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver
> --
>
> Key: HIVE-25122
> URL: https://issues.apache.org/jira/browse/HIVE-25122
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish JP
>Priority: Minor
> Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt
>
>
> Hive test is failing with error. The build link where it failed: 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/]
> Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver

2021-05-19 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347384#comment-17347384
 ] 

Peter Vary edited comment on HIVE-25122 at 5/19/21, 7:53 AM:
-

We can just run a flaky test check on this test, to see if it is failing. If 
not, then I think we can close.
http://ci.hive.apache.org/job/hive-flaky-check



was (Author: pvary):
I would keep for a while, so if there is a reoccurrence of the issue others 
might find it.
If this was only a one off issue then we can close it.

Or we can just run a flaky test check on this test, to see if it is failing. If 
not, then we can close.
http://ci.hive.apache.org/job/hive-flaky-check


> Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver
> --
>
> Key: HIVE-25122
> URL: https://issues.apache.org/jira/browse/HIVE-25122
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish JP
>Priority: Minor
> Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt
>
>
> Hive test is failing with error. The build link where it failed: 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/]
> Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599052
 ]

ASF GitHub Bot logged work on HIVE-25034:
-

Author: ASF GitHub Bot
Created on: 19/May/21 07:46
Start Date: 19/May/21 07:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2243:
URL: https://github.com/apache/hive/pull/2243#discussion_r634992707



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   .getMsg(destinationPath.toUri().toString()));
 }
   }
+  // handle direct insert CTAS case
+  // for direct insert CTAS, the table creation DDL is not added to the 
task plan in TaskCompiler,
+  // therefore we need to add the InsertHook here manually so that 
HiveMetaHook#commitInsertTable is called
+  if (qb.isCTAS() && tableDesc != null && tableDesc.getStorageHandler() != 
null) {
+try {
+  if (HiveUtils.getStorageHandler(conf, 
tableDesc.getStorageHandler()).directInsertCTAS()) {
+createPreInsertDesc(destinationTable, false);
+  }
+} catch (HiveException e) {

Review comment:
   I think that would make sense, as we are not able to perform the CTAS 
without the classes on classpath anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599052)
Time Spent: 3h 20m  (was: 3h 10m)

> Implement CTAS for Iceberg
> --
>
> Key: HIVE-25034
> URL: https://issues.apache.org/jira/browse/HIVE-25034
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=599051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599051
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 19/May/21 07:43
Start Date: 19/May/21 07:43
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2285:
URL: https://github.com/apache/hive/pull/2285#discussion_r634990814



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String 
filename) {
* @param filename
*  filename to extract taskid from
*/
-  private static String getPrefixedTaskIdFromFilename(String filename) {
+  static String getPrefixedTaskIdFromFilename(String filename) {
 return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX);
   }
 
   private static String getTaskIdFromFilename(String filename, Pattern 
pattern) {
-return getIdFromFilename(filename, pattern, 1);
+return getIdFromFilename(filename, pattern, 1, false);
   }
 
-  private static int getAttemptIdFromFilename(String filename) {
-String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3);
+  static int getAttemptIdFromFilename(String filename) {
+String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true);
 return Integer.parseInt(attemptStr.substring(1));
   }
 
-  private static String getIdFromFilename(String filename, Pattern pattern, 
int group) {
+  private static String getIdFromFilename(String filename, Pattern pattern, 
int group, boolean extractAttemptId) {
 String taskId = filename;
 int dirEnd = filename.lastIndexOf(Path.SEPARATOR);
-if (dirEnd != -1) {
+if (dirEnd!=-1) {
   taskId = filename.substring(dirEnd + 1);
 }
 
-Matcher m = pattern.matcher(taskId);
-if (!m.matches()) {
-  LOG.warn("Unable to get task id from file name: {}. Using last component 
{}"
-  + " as task id.", filename, taskId);
+// Spark emitted files have the format 
part-[number-string]-uuid..
+// Examples: 
part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 
is the taskId
+// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 
4-c6acfdee is the taskId
+String strings[] = taskId.split("-");

Review comment:
   Agreed. I am not sure if the file format has changed in recent times. 
Let me talk to the Spark team and get their insights as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599051)
Time Spent: 40m  (was: 0.5h)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
>