[jira] [Updated] (HIVE-24328) Run distcp in parallel for all file entries in repl load.

2020-11-10 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24328:
---
Status: In Progress  (was: Patch Available)

> Run distcp in parallel for all file entries in repl load.
> -
>
> Key: HIVE-24328
> URL: https://issues.apache.org/jira/browse/HIVE-24328
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24328.01.patch, HIVE-24328.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24328) Run distcp in parallel for all file entries in repl load.

2020-11-10 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24328:
---
Attachment: HIVE-24328.02.patch
Status: Patch Available  (was: In Progress)

> Run distcp in parallel for all file entries in repl load.
> -
>
> Key: HIVE-24328
> URL: https://issues.apache.org/jira/browse/HIVE-24328
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24328.01.patch, HIVE-24328.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24366) changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump call

2020-11-10 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229740#comment-17229740
 ] 

Pravin Sinha commented on HIVE-24366:
-

+1

> changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump 
> call
> -
>
> Key: HIVE-24366
> URL: https://issues.apache.org/jira/browse/HIVE-24366
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24366.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24366) changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump call

2020-11-10 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-24366:
---
Attachment: HIVE-24366.01.patch

> changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump 
> call
> -
>
> Key: HIVE-24366
> URL: https://issues.apache.org/jira/browse/HIVE-24366
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24366.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24022?focusedWorklogId=510025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510025
 ]

ASF GitHub Bot logged work on HIVE-24022:
-

Author: ASF GitHub Bot
Created on: 11/Nov/20 00:33
Start Date: 11/Nov/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1385:
URL: https://github.com/apache/hive/pull/1385#issuecomment-725051526


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510025)
Time Spent: 1h 10m  (was: 1h)

> Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
> --
>
> Key: HIVE-24022
> URL: https://issues.apache.org/jira/browse/HIVE-24022
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Sam An
>Priority: Minor
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For a table with 3000+ partitions, analyze table takes a lot longer time as 
> HiveMetaStoreAuthorizer tries to create HiveConf for every partition request.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319]
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=510024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510024
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 11/Nov/20 00:33
Start Date: 11/Nov/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1220:
URL: https://github.com/apache/hive/pull/1220#issuecomment-725051537


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510024)
Time Spent: 2h 20m  (was: 2h 10m)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24097) correct NPE exception in HiveMetastoreAuthorizer

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24097?focusedWorklogId=510021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510021
 ]

ASF GitHub Bot logged work on HIVE-24097:
-

Author: ASF GitHub Bot
Created on: 11/Nov/20 00:33
Start Date: 11/Nov/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1448:
URL: https://github.com/apache/hive/pull/1448#issuecomment-725051522


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510021)
Time Spent: 50m  (was: 40m)

> correct NPE exception in HiveMetastoreAuthorizer
> 
>
> Key: HIVE-24097
> URL: https://issues.apache.org/jira/browse/HIVE-24097
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Sam An
>Assignee: Sam An
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In some testing, we found it's possible to have NPE if the preEventType does 
> not fall within the several the HMS currently checks. This makes the 
> AuthzContext a null pointer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24117) Fix for not setting managed table location in incremental load

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24117?focusedWorklogId=510022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510022
 ]

ASF GitHub Bot logged work on HIVE-24117:
-

Author: ASF GitHub Bot
Created on: 11/Nov/20 00:33
Start Date: 11/Nov/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1467:
URL: https://github.com/apache/hive/pull/1467#issuecomment-725051507


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510022)
Time Spent: 40m  (was: 0.5h)

> Fix for not setting managed table location in incremental load
> --
>
> Key: HIVE-24117
> URL: https://issues.apache.org/jira/browse/HIVE-24117
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24117.01.patch, HIVE-24117.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24131) Use original src location always when data copy runs on target

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?focusedWorklogId=510023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-510023
 ]

ASF GitHub Bot logged work on HIVE-24131:
-

Author: ASF GitHub Bot
Created on: 11/Nov/20 00:33
Start Date: 11/Nov/20 00:33
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1477:
URL: https://github.com/apache/hive/pull/1477#issuecomment-725051488


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 510023)
Time Spent: 0.5h  (was: 20m)

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24131.01.patch, HIVE-24131.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24359) Hive Compaction hangs because of doAs when worker set to HS2

2020-11-10 Thread Rajkumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229536#comment-17229536
 ] 

Rajkumar Singh commented on HIVE-24359:
---

even without  HIVE-24089 this will fail/hung if doAs is enabled as we are 
creating the proxy user here 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L539
 if table/sds path owner is different than the hive.

> Hive Compaction hangs because of doAs when worker set to HS2
> 
>
> Key: HIVE-24359
> URL: https://issues.apache.org/jira/browse/HIVE-24359
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Transactions
>Reporter: Chiran Ravani
>Priority: Critical
>
> When creating a managed table and inserting data using Impala, with 
> compaction worker set to HiveServer2 - in secured environment (Kerberized 
> Cluster). Worker thread hangs indefinitely expecting user to provide kerberos 
> credentials from STDIN
> The problem appears to be because of no login context being sent from HS2 to 
> HMS as part of QueryCompactor and HS2 JVM has property 
> javax.security.auth.useSubjectCredsOnly is set to false. Which is causing it 
> to prompt for logins via stdin, however setting to true also does not helo as 
> the context does not seem to be passed in any case.
> Below is observed in HS2 Jstack. If you see the the thread is waiting for 
> stdin "com.sun.security.auth.module.Krb5LoginModule.promptForName"
> {code}
> "c570-node2.abc.host.com-44_executor" #47 daemon prio=1 os_prio=0 
> tid=0x01506000 nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
> at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
> at sun.security.jgss.GSSUtil.login(GSSUtil.java:258)
> at sun.security.jgss.krb5.Krb5Util.getInitialTicket(Krb5Util.java:175)
> at 
> sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:341)
> at 
> sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:337)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:336)
> at 
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:146)
> at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
> at 
> sun.secur

[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=509912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509912
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 20:21
Start Date: 10/Nov/20 20:21
Worklog Time Spent: 10m 
  Work Description: kuczoram opened a new pull request #1660:
URL: https://github.com/apache/hive/pull/1660


   …he move step
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509912)
Time Spent: 50m  (was: 40m)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24366) changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump call

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24366?focusedWorklogId=509906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509906
 ]

ASF GitHub Bot logged work on HIVE-24366:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 20:02
Start Date: 10/Nov/20 20:02
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1659:
URL: https://github.com/apache/hive/pull/1659


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509906)
Remaining Estimate: 0h
Time Spent: 10m

> changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump 
> call
> -
>
> Key: HIVE-24366
> URL: https://issues.apache.org/jira/browse/HIVE-24366
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24366) changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump call

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24366:
--
Labels: pull-request-available  (was: )

> changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump 
> call
> -
>
> Key: HIVE-24366
> URL: https://issues.apache.org/jira/browse/HIVE-24366
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24366) changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump call

2020-11-10 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24366:
--


> changeMarker value sent to atlas export API is set to 0 in the 2nd repl dump 
> call
> -
>
> Key: HIVE-24366
> URL: https://issues.apache.org/jira/browse/HIVE-24366
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=509900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509900
 ]

ASF GitHub Bot logged work on HIVE-23410:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 19:39
Start Date: 10/Nov/20 19:39
Worklog Time Spent: 10m 
  Work Description: kuczoram closed pull request #1620:
URL: https://github.com/apache/hive/pull/1620


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509900)
Time Spent: 40m  (was: 0.5h)

> ACID: Improve the delete and update operations to avoid the move step
> -
>
> Key: HIVE-23410
> URL: https://issues.apache.org/jira/browse/HIVE-23410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23410.1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24336:
--
Labels: pull-request-available  (was: )

> Turn off the direct insert for EXPLAIN ANALYZE queries
> --
>
> Key: HIVE-24336
> URL: https://issues.apache.org/jira/browse/HIVE-24336
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the 
> new files will be created in the table directory, and they won't be 
> cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc 
> tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be 
> created within the staging directory, but the MoveTask won't move them to the 
> final location. So when the EXPLAIN ANALYZE query is finished, the staging 
> directory will be deleted, and the table directory will be the same as before 
> the EXPLAIN query. But with direct insert on the files will be written into 
> the table directory, so an additional cleanup would be necessary in order to 
> restore the files within the table directory to the state before the EXPLAIN 
> ANALYZE query. This could be avoided by turning off the direct insert for an 
> EXPLAIN ANALYZE query. Since the direct insert improves the performance by 
> eliminating the file movements within the MoveTask, but it has no affect on 
> the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-11-10 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24336 started by Marta Kuczora.

> Turn off the direct insert for EXPLAIN ANALYZE queries
> --
>
> Key: HIVE-24336
> URL: https://issues.apache.org/jira/browse/HIVE-24336
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the 
> new files will be created in the table directory, and they won't be 
> cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc 
> tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be 
> created within the staging directory, but the MoveTask won't move them to the 
> final location. So when the EXPLAIN ANALYZE query is finished, the staging 
> directory will be deleted, and the table directory will be the same as before 
> the EXPLAIN query. But with direct insert on the files will be written into 
> the table directory, so an additional cleanup would be necessary in order to 
> restore the files within the table directory to the state before the EXPLAIN 
> ANALYZE query. This could be avoided by turning off the direct insert for an 
> EXPLAIN ANALYZE query. Since the direct insert improves the performance by 
> eliminating the file movements within the MoveTask, but it has no affect on 
> the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-11-10 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora resolved HIVE-24336.
--
Resolution: Fixed

> Turn off the direct insert for EXPLAIN ANALYZE queries
> --
>
> Key: HIVE-24336
> URL: https://issues.apache.org/jira/browse/HIVE-24336
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the 
> new files will be created in the table directory, and they won't be 
> cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc 
> tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be 
> created within the staging directory, but the MoveTask won't move them to the 
> final location. So when the EXPLAIN ANALYZE query is finished, the staging 
> directory will be deleted, and the table directory will be the same as before 
> the EXPLAIN query. But with direct insert on the files will be written into 
> the table directory, so an additional cleanup would be necessary in order to 
> restore the files within the table directory to the state before the EXPLAIN 
> ANALYZE query. This could be avoided by turning off the direct insert for an 
> EXPLAIN ANALYZE query. Since the direct insert improves the performance by 
> eliminating the file movements within the MoveTask, but it has no affect on 
> the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24336?focusedWorklogId=509894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509894
 ]

ASF GitHub Bot logged work on HIVE-24336:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 19:18
Start Date: 10/Nov/20 19:18
Worklog Time Spent: 10m 
  Work Description: kuczoram merged pull request #1632:
URL: https://github.com/apache/hive/pull/1632


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509894)
Remaining Estimate: 0h
Time Spent: 10m

> Turn off the direct insert for EXPLAIN ANALYZE queries
> --
>
> Key: HIVE-24336
> URL: https://issues.apache.org/jira/browse/HIVE-24336
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the 
> new files will be created in the table directory, and they won't be 
> cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc 
> tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be 
> created within the staging directory, but the MoveTask won't move them to the 
> final location. So when the EXPLAIN ANALYZE query is finished, the staging 
> directory will be deleted, and the table directory will be the same as before 
> the EXPLAIN query. But with direct insert on the files will be written into 
> the table directory, so an additional cleanup would be necessary in order to 
> restore the files within the table directory to the state before the EXPLAIN 
> ANALYZE query. This could be avoided by turning off the direct insert for an 
> EXPLAIN ANALYZE query. Since the direct insert improves the performance by 
> eliminating the file movements within the MoveTask, but it has no affect on 
> the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-11-10 Thread Marta Kuczora (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229456#comment-17229456
 ] 

Marta Kuczora commented on HIVE-24336:
--

Pushed to master. Thanks a lot [~szita] for the review!

> Turn off the direct insert for EXPLAIN ANALYZE queries
> --
>
> Key: HIVE-24336
> URL: https://issues.apache.org/jira/browse/HIVE-24336
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the 
> new files will be created in the table directory, and they won't be 
> cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc 
> tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be 
> created within the staging directory, but the MoveTask won't move them to the 
> final location. So when the EXPLAIN ANALYZE query is finished, the staging 
> directory will be deleted, and the table directory will be the same as before 
> the EXPLAIN query. But with direct insert on the files will be written into 
> the table directory, so an additional cleanup would be necessary in order to 
> restore the files within the table directory to the state before the EXPLAIN 
> ANALYZE query. This could be avoided by turning off the direct insert for an 
> EXPLAIN ANALYZE query. Since the direct insert improves the performance by 
> eliminating the file movements within the MoveTask, but it has no affect on 
> the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-11-10 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229424#comment-17229424
 ] 

László Bodor commented on HIVE-23976:
-

thanks [~jcamachorodriguez]! as agreed, we will address further improvements in 
follow-up tickets with [~zabetak]


> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-11-10 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23976.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23976?focusedWorklogId=509868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509868
 ]

ASF GitHub Bot logged work on HIVE-23976:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:24
Start Date: 10/Nov/20 18:24
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1458:
URL: https://github.com/apache/hive/pull/1458


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509868)
Time Spent: 1h 20m  (was: 1h 10m)

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-11-10 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229419#comment-17229419
 ] 

Jesus Camacho Rodriguez commented on HIVE-23976:


Pushed to master, thanks [~abstractdog]!

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509865
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:19
Start Date: 10/Nov/20 18:19
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520773191



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -638,46 +654,28 @@ public boolean getResults(List results) throws 
IOException {
 }
 
 if (isFetchingTable()) {
-  /**
-   * If resultset serialization to thrift object is enabled, and if the 
destination table is
-   * indeed written using ThriftJDBCBinarySerDe, read one row from the 
output sequence file,
-   * since it is a blob of row batches.
-   */
-  if 
(driverContext.getFetchTask().getWork().isUsingThriftJDBCBinarySerDe()) {
-maxRows = 1;
-  }
-  driverContext.getFetchTask().setMaxRows(maxRows);
-  return driverContext.getFetchTask().fetch(results);
+  return getFetchingTableResults(results);
 }
 
 if (driverContext.getResStream() == null) {
   driverContext.setResStream(context.getStream());
-}
-if (driverContext.getResStream() == null) {
-  return false;
+  if (driverContext.getResStream() == null) {
+return false;
+  }
 }

Review comment:
   I agree, it's nicer like this, fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509865)
Time Spent: 2h 40m  (was: 2.5h)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509864
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:18
Start Date: 10/Nov/20 18:18
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520772501



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -687,13 +685,36 @@ public boolean getResults(List results) throws 
IOException {
 return false;
   }
 
-  if (ss == Utilities.StreamStatus.EOF) {
+  if (streamStatus == Utilities.StreamStatus.EOF) {
 driverContext.setResStream(context.getStream());
   }
 }
 return true;
   }
 
+  @SuppressWarnings("rawtypes")
+  private boolean getFetchingTableResults(List results) throws IOException {
+// If result set serialization to thrift object is enabled, and if the 
destination table is indeed written using
+// ThriftJDBCBinarySerDe, read one row from the output sequence file, 
since it is a blob of row batches.
+if (driverContext.getFetchTask().getWork().isUsingThriftJDBCBinarySerDe()) 
{
+  maxRows = 1;
+}
+driverContext.getFetchTask().setMaxRows(maxRows);
+return driverContext.getFetchTask().fetch(results);
+  }
+
+  private String getRow(ByteStream.Output bos, Utilities.StreamStatus 
streamStatus) {
+String row;

Review comment:
   final added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509864)
Time Spent: 2.5h  (was: 2h 20m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509863
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:17
Start Date: 10/Nov/20 18:17
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520772410



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -149,207 +149,64 @@ private CommandProcessorResponse run(String command, 
boolean alreadyCompiled) th
   runInternal(command, alreadyCompiled);
   return new CommandProcessorResponse(getSchema(), null);
 } catch (CommandProcessorException cpe) {
-  SessionState ss = SessionState.get();
-  if (ss == null) {
-throw cpe;
-  }
-  MetaDataFormatter mdf = MetaDataFormatUtils.getFormatter(ss.getConf());
-  if (!(mdf instanceof JsonMetaDataFormatter)) {
-throw cpe;
-  }
-  /*Here we want to encode the error in machine readable way (e.g. JSON)
-   * Ideally, errorCode would always be set to a canonical error defined 
in ErrorMsg.
-   * In practice that is rarely the case, so the messy logic below tries 
to tease
-   * out canonical error code if it can.  Exclude stack trace from output 
when
-   * the error is a specific/expected one.
-   * It's written to stdout for backward compatibility (WebHCat consumes 
it).*/
-  try {
-if (cpe.getCause() == null) {
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState());
-  throw cpe;
-}
-ErrorMsg canonicalErr = ErrorMsg.getErrorMsg(cpe.getResponseCode());
-if (canonicalErr != null && canonicalErr != ErrorMsg.GENERIC_ERROR) {
-  /*Some HiveExceptions (e.g. SemanticException) don't set
-canonical ErrorMsg explicitly, but there is logic
-(e.g. #compile()) to find an appropriate canonical error and
-return its code as error code. In this case we want to
-preserve it for downstream code to interpret*/
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState(), null);
-  throw cpe;
-}
-if (cpe.getCause() instanceof HiveException) {
-  HiveException rc = (HiveException)cpe.getCause();
-  mdf.error(ss.out, cpe.getMessage(), 
rc.getCanonicalErrorMsg().getErrorCode(), cpe.getSqlState(),
-  rc.getCanonicalErrorMsg() == ErrorMsg.GENERIC_ERROR ? 
StringUtils.stringifyException(rc) : null);
-} else {
-  ErrorMsg canonicalMsg = 
ErrorMsg.getErrorMsg(cpe.getCause().getMessage());
-  mdf.error(ss.out, cpe.getMessage(), canonicalMsg.getErrorCode(), 
cpe.getSqlState(),
-  StringUtils.stringifyException(cpe.getCause()));
-}
-  } catch (HiveException ex) {
-CONSOLE.printError("Unable to JSON-encode the error", 
StringUtils.stringifyException(ex));
-  }
+  processRunException(cpe);
   throw cpe;
 }
   }
 
   private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
 DriverState.setDriverState(driverState);
 
-driverState.lock();
-try {
-  if (driverContext != null && driverContext.getPlan() != null
-  && driverContext.getPlan().isPrepareQuery()
-  && !driverContext.getPlan().isExplain()) {
-LOG.info("Skip running tasks for prepare plan");
-return;
-  }
-  if (alreadyCompiled) {
-if (driverState.isCompiled()) {
-  driverState.executing();
-} else {
-  String errorMessage = "FAILED: Precompiled query has been cancelled 
or closed.";
-  CONSOLE.printError(errorMessage);
-  throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage, null, null);
-}
-  } else {
-driverState.compiling();
-  }
-} finally {
-  driverState.unlock();
+if (driverContext != null && driverContext.getPlan() != null &&

Review comment:
   Agree, nice catch, fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509863)
Time Spent: 2h 20m  (was: 2h 10m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
>   

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509861
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:17
Start Date: 10/Nov/20 18:17
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520772193



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -67,7 +67,6 @@
 import org.apache.hadoop.hive.ql.session.SessionState.LogHelper;
 import org.apache.hadoop.util.StringUtils;
 import org.apache.hive.common.util.ShutdownHookManager;
-import org.apache.hive.common.util.TxnIdUtils;

Review comment:
   OK, removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509861)
Time Spent: 2h 10m  (was: 2h)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509860
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 18:17
Start Date: 10/Nov/20 18:17
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520772082



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java
##
@@ -73,7 +73,7 @@ private static IReExecutionPlugin buildReExecPlugin(String 
name) throws RuntimeE
 if("reexecute_lost_am".equals(name)) {
   return new ReExecuteLostAMQueryPlugin();
 }
-if (name.equals("dagsubmit")) {
+if ("dagsubmit".equals(name)) {

Review comment:
   OK, removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509860)
Time Spent: 2h  (was: 1h 50m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?focusedWorklogId=509846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509846
 ]

ASF GitHub Bot logged work on HIVE-24357:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 17:37
Start Date: 10/Nov/20 17:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1653:
URL: https://github.com/apache/hive/pull/1653#issuecomment-724856174


   @jcamachor Could you please take a look?
   
   Right now I'm waiting for the "final" test results - I expect at most qtest 
diffs like `A and A` simplified to `A`.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509846)
Time Spent: 0.5h  (was: 20m)

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: swo.before.jointree.dot.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24350?focusedWorklogId=509845&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509845
 ]

ASF GitHub Bot logged work on HIVE-24350:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 17:35
Start Date: 10/Nov/20 17:35
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1645:
URL: https://github.com/apache/hive/pull/1645#discussion_r520745722



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
##
@@ -93,94 +91,109 @@ private String getAliasForTableScanOperator(MapWork work,
 return null;
   }
 
-  private PartitionDesc changePartitionToMetadataOnly(PartitionDesc desc,
-  Path path) {
-if (desc == null) {
-  return null;
-}
-FileStatus[] filesFoundInPartitionDir = null;
+  private void lookupAndProcessPath(MapWork work, Path path,
+   Collection aliasesToOptimize) {
 try {
-  filesFoundInPartitionDir = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path);
+  boolean isEmpty = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path).length == 0;
+  processPath(work, path, aliasesToOptimize, isEmpty);
 } catch (IOException e) {
-  LOG.error("Cannot determine if the table is empty", e);
-}
-if (!isMetadataOnlyAllowed(filesFoundInPartitionDir)) {
-  return desc;
+  LOG.warn("Could not determine if path {} was empty." +
+  "Cannot use null scan optimization for this path.", path, e);
 }
-
-boolean isEmpty = filesFoundInPartitionDir == null || 
filesFoundInPartitionDir.length == 0;
-desc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
-desc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
-desc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
-NullStructSerDe.class.getName());
-return desc;
   }
 
-  private boolean isMetadataOnlyAllowed(FileStatus[] filesFoundInPartitionDir) 
{
-if (filesFoundInPartitionDir == null || filesFoundInPartitionDir.length == 
0) {
-  return true; // empty folders are safe to convert to metadata-only
-}
-for (FileStatus f : filesFoundInPartitionDir) {
-  if (AcidUtils.isDeleteDelta(f.getPath())) {
-/*
- * as described in HIVE-23712, an acid partition is not a safe subject 
of metadata-only
- * optimization, because there is a chance that it contains no data 
but contains folders
- * (e.g: delta_002_002_, 
delete_delta_003_003_), without scanning
- * the underlying file contents, we cannot tell whether this partition 
contains data or not
- */
-return false;
-  }
+  private void processPath(MapWork work, Path path, Collection 
aliasesToOptimize,
+   boolean isEmpty) {
+PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
+partDesc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
+partDesc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
+partDesc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
+NullStructSerDe.class.getName());
+Path fakePath =
+new Path(NullScanFileSystem.getBase() + partDesc.getTableName()
++ "/part" + encode(partDesc.getPartSpec()));
+StringInternUtils.internUriStringsInPath(fakePath);
+work.addPathToPartitionInfo(fakePath, partDesc);
+work.addPathToAlias(fakePath, new ArrayList<>(aliasesToOptimize));
+Collection aliasesContainingPath = 
work.getPathToAliases().get(path);
+aliasesContainingPath.removeAll(aliasesToOptimize);
+if (aliasesContainingPath.isEmpty()) {
+  work.removePathToAlias(path);
+  work.removePathToPartitionInfo(path);
 }
-return true;
   }
 
-  private void processAlias(MapWork work, Path path,
-  Collection aliasesAffected, Set aliases) {
-// the aliases that are allowed to map to a null scan.
-Collection allowed = aliasesAffected.stream()
-.filter(a -> aliases.contains(a)).collect(Collectors.toList());
-if (!allowed.isEmpty()) {
-  PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
-  PartitionDesc newPartition =
-  changePartitionToMetadataOnly(partDesc, path);
-  // Prefix partition with something to avoid it being a hidden file.
-  Path fakePath =
-  new Path(NullScanFileSystem.getBase() + newPartition.getTableName()
-  + "/part" + encode(newPartition.getPartSpec()));
-  StringInternUtils.internUriStringsInPath(fakePath);
-  work.addPathToPartitionInfo(fakePath, newPartition);
-  work.addPathToAlias(fakePath, new ArrayList<>(allowed));
-  alias

[jira] [Work logged] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24350?focusedWorklogId=509844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509844
 ]

ASF GitHub Bot logged work on HIVE-24350:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 17:34
Start Date: 10/Nov/20 17:34
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1645:
URL: https://github.com/apache/hive/pull/1645#discussion_r520745049



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
##
@@ -93,94 +91,109 @@ private String getAliasForTableScanOperator(MapWork work,
 return null;
   }
 
-  private PartitionDesc changePartitionToMetadataOnly(PartitionDesc desc,
-  Path path) {
-if (desc == null) {
-  return null;
-}
-FileStatus[] filesFoundInPartitionDir = null;
+  private void lookupAndProcessPath(MapWork work, Path path,
+   Collection aliasesToOptimize) {
 try {
-  filesFoundInPartitionDir = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path);
+  boolean isEmpty = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path).length == 0;
+  processPath(work, path, aliasesToOptimize, isEmpty);
 } catch (IOException e) {
-  LOG.error("Cannot determine if the table is empty", e);
-}
-if (!isMetadataOnlyAllowed(filesFoundInPartitionDir)) {
-  return desc;
+  LOG.warn("Could not determine if path {} was empty." +
+  "Cannot use null scan optimization for this path.", path, e);
 }
-
-boolean isEmpty = filesFoundInPartitionDir == null || 
filesFoundInPartitionDir.length == 0;
-desc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
-desc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
-desc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
-NullStructSerDe.class.getName());
-return desc;
   }
 
-  private boolean isMetadataOnlyAllowed(FileStatus[] filesFoundInPartitionDir) 
{
-if (filesFoundInPartitionDir == null || filesFoundInPartitionDir.length == 
0) {
-  return true; // empty folders are safe to convert to metadata-only
-}
-for (FileStatus f : filesFoundInPartitionDir) {
-  if (AcidUtils.isDeleteDelta(f.getPath())) {
-/*
- * as described in HIVE-23712, an acid partition is not a safe subject 
of metadata-only
- * optimization, because there is a chance that it contains no data 
but contains folders
- * (e.g: delta_002_002_, 
delete_delta_003_003_), without scanning
- * the underlying file contents, we cannot tell whether this partition 
contains data or not
- */
-return false;
-  }
+  private void processPath(MapWork work, Path path, Collection 
aliasesToOptimize,
+   boolean isEmpty) {
+PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
+partDesc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
+partDesc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
+partDesc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
+NullStructSerDe.class.getName());
+Path fakePath =
+new Path(NullScanFileSystem.getBase() + partDesc.getTableName()
++ "/part" + encode(partDesc.getPartSpec()));
+StringInternUtils.internUriStringsInPath(fakePath);
+work.addPathToPartitionInfo(fakePath, partDesc);
+work.addPathToAlias(fakePath, new ArrayList<>(aliasesToOptimize));
+Collection aliasesContainingPath = 
work.getPathToAliases().get(path);
+aliasesContainingPath.removeAll(aliasesToOptimize);
+if (aliasesContainingPath.isEmpty()) {
+  work.removePathToAlias(path);
+  work.removePathToPartitionInfo(path);
 }
-return true;
   }
 
-  private void processAlias(MapWork work, Path path,
-  Collection aliasesAffected, Set aliases) {
-// the aliases that are allowed to map to a null scan.
-Collection allowed = aliasesAffected.stream()
-.filter(a -> aliases.contains(a)).collect(Collectors.toList());
-if (!allowed.isEmpty()) {
-  PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
-  PartitionDesc newPartition =
-  changePartitionToMetadataOnly(partDesc, path);
-  // Prefix partition with something to avoid it being a hidden file.
-  Path fakePath =
-  new Path(NullScanFileSystem.getBase() + newPartition.getTableName()
-  + "/part" + encode(newPartition.getPartSpec()));
-  StringInternUtils.internUriStringsInPath(fakePath);
-  work.addPathToPartitionInfo(fakePath, newPartition);
-  work.addPathToAlias(fakePath, new ArrayList<>(allowed));
-  alias

[jira] [Work logged] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?focusedWorklogId=509843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509843
 ]

ASF GitHub Bot logged work on HIVE-24357:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 17:34
Start Date: 10/Nov/20 17:34
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1653:
URL: https://github.com/apache/hive/pull/1653#discussion_r517869934



##
File path: ql/src/test/results/clientpositive/perf/tez/query61.q.out
##
@@ -165,70 +167,81 @@ Stage-0
   SHUFFLE [RS_38]
 PartitionCols:_col2
 Merge Join Operator [MERGEJOIN_256] (rows=2526982 
width=0)
-  
Conds:RS_30._col4=RS_290._col0(Inner),Output:["_col2","_col5"]
-<-Map 20 [SIMPLE_EDGE] vectorized
-  SHUFFLE [RS_290]
+  
Conds:RS_30._col4=RS_295._col0(Inner),Output:["_col2","_col5"]
+<-Map 21 [SIMPLE_EDGE] vectorized
+  SHUFFLE [RS_295]
 PartitionCols:_col0
-Select Operator [SEL_289] (rows=2300 width=4)
+Select Operator [SEL_294] (rows=2300 width=4)
   Output:["_col0"]
-  Filter Operator [FIL_288] (rows=2300 width=259)
+  Filter Operator [FIL_293] (rows=2300 width=259)
 predicate:(((p_channel_dmail = 'Y') or 
(p_channel_email = 'Y') or (p_channel_tv = 'Y')) and p_promo_sk is not null)
 TableScan [TS_18] (rows=2300 width=259)
   
default@promotion,promotion,Tbl:COMPLETE,Col:COMPLETE,Output:["p_promo_sk","p_channel_dmail","p_channel_email","p_channel_tv"]
 <-Reducer 12 [SIMPLE_EDGE]
   SHUFFLE [RS_30]
 PartitionCols:_col4
 Merge Join Operator [MERGEJOIN_255] (rows=2526982 
width=0)
-  
Conds:RS_27._col3=RS_286._col0(Inner),Output:["_col2","_col4","_col5"]
-<-Map 19 [SIMPLE_EDGE] vectorized
-  SHUFFLE [RS_286]
+  
Conds:RS_27._col3=RS_291._col0(Inner),Output:["_col2","_col4","_col5"]
+<-Map 20 [SIMPLE_EDGE] vectorized
+  SHUFFLE [RS_291]
 PartitionCols:_col0
-Select Operator [SEL_285] (rows=341 width=4)
+Select Operator [SEL_290] (rows=341 width=4)
   Output:["_col0"]
-  Filter Operator [FIL_284] (rows=341 
width=115)
+  Filter Operator [FIL_289] (rows=341 
width=115)
 predicate:((s_gmt_offset = -7) and 
s_store_sk is not null)
 TableScan [TS_15] (rows=1704 width=115)
   
default@store,store,Tbl:COMPLETE,Col:COMPLETE,Output:["s_store_sk","s_gmt_offset"]
 <-Reducer 11 [SIMPLE_EDGE]
   SHUFFLE [RS_27]
 PartitionCols:_col3
 Merge Join Operator [MERGEJOIN_254] 
(rows=12627499 width=0)
-  
Conds:RS_24._col1=RS_282._col0(Inner),Output:["_col2","_col3","_col4","_col5"]
-<-Map 18 [SIMPLE_EDGE] vectorized
-  SHUFFLE [RS_282]
+  
Conds:RS_24._col1=RS_287._col0(Inner),Output:["_col2","_col3","_col4","_col5"]
+<-Map 19 [SIMPLE_EDGE] vectorized
+  SHUFFLE [RS_287]
 PartitionCols:_col0
-Select Operator [SEL_281] (rows=46200 
width=4)
+Select Operator [SEL_286] (rows=46200 
width=4)
   Output:["_col0"]
-  Filter Operator [FIL_280] (rows=46200 
width=94)
+  Filter Operator [FIL_285] (rows=46200 
width=94)
 predicate:((i_category = 
'Electronics') and i_item_sk is not null)
 TableScan [TS_12] (rows=462000 
width=94)
   
default@item,item,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_category"]
 <-Reducer 10 [SIMPLE_EDGE]
   SHUF

[jira] [Commented] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229364#comment-17229364
 ] 

Zoltan Haindrich commented on HIVE-24365:
-


* in case 2 TS is merged by SWO ; the FIL expression is corrected to filter for 
only that branch
* note that any further changes to the above created FIL expression may not 
change the results



> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24365:

Description: 
for q88 we have complex and mostly unreadable filter expressions; because 
before merging 2 branches the TS filterexpression is pushed into a FIL operator.

consider 3 scans with filters: (A,B,C)
initially we have
{code} 
T(A)
T(B)
T(C)
{code}

after merging A,B
{code}
T(A || B) -> FIL(A)
  -> FIL(B)
T(C)
{code}

right now if we merge C as well:
{code}
T(A || B || C) -> FIL(A AND (A || B))
   -> FIL(B AND (A || B))
   -> FIL(C)
{code}


  was:
for q88 we have complex and mostly unreadable filter expressions; because 
before merging 2 branches the TS filterexpression is pushed into a FIL operator.




> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24365:
---


> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509818
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:49
Start Date: 10/Nov/20 16:49
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520712873



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -67,7 +67,6 @@
 import org.apache.hadoop.hive.ql.session.SessionState.LogHelper;
 import org.apache.hadoop.util.StringUtils;
 import org.apache.hive.common.util.ShutdownHookManager;
-import org.apache.hive.common.util.TxnIdUtils;

Review comment:
   This is a good change, but please revert for this PR to keep it focused 
on the Driver classes.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverFactory.java
##
@@ -73,7 +73,7 @@ private static IReExecutionPlugin buildReExecPlugin(String 
name) throws RuntimeE
 if("reexecute_lost_am".equals(name)) {
   return new ReExecuteLostAMQueryPlugin();
 }
-if (name.equals("dagsubmit")) {
+if ("dagsubmit".equals(name)) {

Review comment:
   This is a good change, but please revert for this PR to keep it focused 
on the Driver classes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509818)
Time Spent: 1h 50m  (was: 1h 40m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509816
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:45
Start Date: 10/Nov/20 16:45
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520710633



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -687,13 +685,36 @@ public boolean getResults(List results) throws 
IOException {
 return false;
   }
 
-  if (ss == Utilities.StreamStatus.EOF) {
+  if (streamStatus == Utilities.StreamStatus.EOF) {
 driverContext.setResStream(context.getStream());
   }
 }
 return true;
   }
 
+  @SuppressWarnings("rawtypes")
+  private boolean getFetchingTableResults(List results) throws IOException {
+// If result set serialization to thrift object is enabled, and if the 
destination table is indeed written using
+// ThriftJDBCBinarySerDe, read one row from the output sequence file, 
since it is a blob of row batches.
+if (driverContext.getFetchTask().getWork().isUsingThriftJDBCBinarySerDe()) 
{
+  maxRows = 1;
+}
+driverContext.getFetchTask().setMaxRows(maxRows);
+return driverContext.getFetchTask().fetch(results);
+  }
+
+  private String getRow(ByteStream.Output bos, Utilities.StreamStatus 
streamStatus) {
+String row;

Review comment:
   nit: Safer to make this `final` to ensure that there aren't multiple 
paths later if this code changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509816)
Time Spent: 1h 40m  (was: 1.5h)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509815
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:44
Start Date: 10/Nov/20 16:44
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520709535



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -359,6 +216,101 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
+  /**
+   * @return If the perfLogger should be reseted.
+   */
+  private boolean validateTxnList() throws CommandProcessorException {
+int retryShapshotCount = 0;
+int maxRetrySnapshotCount = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
+try {
+  do {
+driverContext.setOutdatedTxn(false);
+// Inserts will not invalidate the snapshot, that could cause 
duplicates.
+if (!driverTxnHandler.isValidTxnListState()) {
+  LOG.info("Re-compiling after acquiring locks, attempt #" + 
retryShapshotCount);
+  // Snapshot was outdated when locks were acquired, hence regenerate 
context, txn list and retry.
+  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
+  // Currently, we acquire a snapshot, compile the query with that 
snapshot, and then - acquire locks.
+  // If snapshot is still valid, we continue as usual.
+  // But if snapshot is not valid, we recompile the query.
+  if (driverContext.isOutdatedTxn()) {
+// Later transaction invalidated the snapshot, a new transaction 
is required
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
+driverContext.getTxnManager().rollbackTxn();
+
+String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
+driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
+lockAndRespond();
+  }
+  driverContext.setRetrial(true);
+  driverContext.getBackupContext().addSubContext(context);
+  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
+  context = driverContext.getBackupContext();
+
+  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
+  driverContext.getTxnManager().getValidTxns().toString());
+
+  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
+driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
+  }
+  // Since we're reusing the compiled plan, we need to update its 
start time for current run
+  
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
+}
+// Re-check snapshot only in case we had to release locks and open a 
new transaction,
+// otherwise exclusive locks should protect output tables/partitions 
in snapshot from concurrent writes.
+  } while (driverContext.isOutdatedTxn() && ++retryShapshotCount <= 
maxRetrySnapshotCount);
+
+  if (retryShapshotCount > maxRetrySnapshotCount) {
+// Throw exception
+HiveException e = new HiveException(
+"Operation could not be executed, " + 
SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED + ".");
+DriverUtils.handleHiveException(driverContext, e, 14, null);
+  } else if (retryShapshotCount != 0) {
+// the reason that we set the txn manager for the cxt here is because 
each query has its own ctx object.
+// The txn mgr is shared across the same instance of Driver, which can 
run multiple queries.
+context.setHiveTxnManager(driverContext.getTxnManager());
+  }

Review comment:
   oh, I see the meaning of the return value, it is what needs to b 
returned from the original method. Sure, I can move this to a separate method, 
do you have a suggestion for the name?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509815)
Time Spent: 1.5h  (was: 1h 20m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIV

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509814
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:42
Start Date: 10/Nov/20 16:42
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520707995



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -638,46 +654,28 @@ public boolean getResults(List results) throws 
IOException {
 }
 
 if (isFetchingTable()) {
-  /**
-   * If resultset serialization to thrift object is enabled, and if the 
destination table is
-   * indeed written using ThriftJDBCBinarySerDe, read one row from the 
output sequence file,
-   * since it is a blob of row batches.
-   */
-  if 
(driverContext.getFetchTask().getWork().isUsingThriftJDBCBinarySerDe()) {
-maxRows = 1;
-  }
-  driverContext.getFetchTask().setMaxRows(maxRows);
-  return driverContext.getFetchTask().fetch(results);
+  return getFetchingTableResults(results);
 }
 
 if (driverContext.getResStream() == null) {
   driverContext.setResStream(context.getStream());
-}
-if (driverContext.getResStream() == null) {
-  return false;
+  if (driverContext.getResStream() == null) {
+return false;
+  }
 }

Review comment:
   This is a little hard to follow.  Perhaps something like:
   
   ```
   if (driverContext.getResStream() == null) {
  // If the driver does not have a stream and neither does the context, 
return
  Stream contextStream = context.getStream();
  if (contextStream == null) {
return false;
  }
  driverContext.setResStream(contextStream);
 }
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509814)
Time Spent: 1h 20m  (was: 1h 10m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Some methods in Driver are too long to be easily understandable. They should 
> be cut into pieces to make them easier to understand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509813
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:41
Start Date: 10/Nov/20 16:41
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520707647



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -359,6 +216,101 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
+  /**
+   * @return If the perfLogger should be reseted.
+   */
+  private boolean validateTxnList() throws CommandProcessorException {
+int retryShapshotCount = 0;
+int maxRetrySnapshotCount = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
+try {
+  do {
+driverContext.setOutdatedTxn(false);
+// Inserts will not invalidate the snapshot, that could cause 
duplicates.
+if (!driverTxnHandler.isValidTxnListState()) {
+  LOG.info("Re-compiling after acquiring locks, attempt #" + 
retryShapshotCount);
+  // Snapshot was outdated when locks were acquired, hence regenerate 
context, txn list and retry.
+  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
+  // Currently, we acquire a snapshot, compile the query with that 
snapshot, and then - acquire locks.
+  // If snapshot is still valid, we continue as usual.
+  // But if snapshot is not valid, we recompile the query.
+  if (driverContext.isOutdatedTxn()) {
+// Later transaction invalidated the snapshot, a new transaction 
is required
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
+driverContext.getTxnManager().rollbackTxn();
+
+String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
+driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
+lockAndRespond();
+  }
+  driverContext.setRetrial(true);
+  driverContext.getBackupContext().addSubContext(context);
+  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
+  context = driverContext.getBackupContext();
+
+  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
+  driverContext.getTxnManager().getValidTxns().toString());
+
+  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
+driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
+  }
+  // Since we're reusing the compiled plan, we need to update its 
start time for current run
+  
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
+}
+// Re-check snapshot only in case we had to release locks and open a 
new transaction,
+// otherwise exclusive locks should protect output tables/partitions 
in snapshot from concurrent writes.
+  } while (driverContext.isOutdatedTxn() && ++retryShapshotCount <= 
maxRetrySnapshotCount);
+
+  if (retryShapshotCount > maxRetrySnapshotCount) {
+// Throw exception
+HiveException e = new HiveException(
+"Operation could not be executed, " + 
SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED + ".");
+DriverUtils.handleHiveException(driverContext, e, 14, null);
+  } else if (retryShapshotCount != 0) {
+// the reason that we set the txn manager for the cxt here is because 
each query has its own ctx object.
+// The txn mgr is shared across the same instance of Driver, which can 
run multiple queries.
+context.setHiveTxnManager(driverContext.getTxnManager());
+  }

Review comment:
   DriverUtils.handleHiveException throws a CommandProcessorException which 
is not caught, as it is neither a LockException nor a SemanticException. Still 
the block that you've mentioned can be moved to a separate method, but I don't 
see the meaning of the returned boolean. What would that be used for?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509813)
Time Spent: 1h 10m  (was: 1h)

> Cut long methods in Driver to smaller, mo

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509807
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:32
Start Date: 10/Nov/20 16:32
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520701097



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -359,6 +216,101 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
+  /**
+   * @return If the perfLogger should be reseted.
+   */
+  private boolean validateTxnList() throws CommandProcessorException {
+int retryShapshotCount = 0;
+int maxRetrySnapshotCount = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
+try {
+  do {
+driverContext.setOutdatedTxn(false);
+// Inserts will not invalidate the snapshot, that could cause 
duplicates.
+if (!driverTxnHandler.isValidTxnListState()) {
+  LOG.info("Re-compiling after acquiring locks, attempt #" + 
retryShapshotCount);
+  // Snapshot was outdated when locks were acquired, hence regenerate 
context, txn list and retry.
+  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
+  // Currently, we acquire a snapshot, compile the query with that 
snapshot, and then - acquire locks.
+  // If snapshot is still valid, we continue as usual.
+  // But if snapshot is not valid, we recompile the query.
+  if (driverContext.isOutdatedTxn()) {
+// Later transaction invalidated the snapshot, a new transaction 
is required
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
+driverContext.getTxnManager().rollbackTxn();
+
+String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
+driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
+lockAndRespond();
+  }
+  driverContext.setRetrial(true);
+  driverContext.getBackupContext().addSubContext(context);
+  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
+  context = driverContext.getBackupContext();
+
+  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
+  driverContext.getTxnManager().getValidTxns().toString());
+
+  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
+driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
+  }
+  // Since we're reusing the compiled plan, we need to update its 
start time for current run
+  
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
+}
+// Re-check snapshot only in case we had to release locks and open a 
new transaction,
+// otherwise exclusive locks should protect output tables/partitions 
in snapshot from concurrent writes.
+  } while (driverContext.isOutdatedTxn() && ++retryShapshotCount <= 
maxRetrySnapshotCount);
+
+  if (retryShapshotCount > maxRetrySnapshotCount) {
+// Throw exception
+HiveException e = new HiveException(
+"Operation could not be executed, " + 
SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED + ".");
+DriverUtils.handleHiveException(driverContext, e, 14, null);
+  } else if (retryShapshotCount != 0) {
+// the reason that we set the txn manager for the cxt here is because 
each query has its own ctx object.
+// The txn mgr is shared across the same instance of Driver, which can 
run multiple queries.
+context.setHiveTxnManager(driverContext.getTxnManager());
+  }

Review comment:
   It is probably overwriting the error code of '14' with the 
outer-try-catch code of '13'.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509807)
Time Spent: 1h  (was: 50m)

> Cut long methods in Driver to smaller, more manageable pieces
> -
>
> Key: HIVE-24333
> URL: https://issues.apache.org/jira/browse/HIVE-24333
> Project: Hive

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509805
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:31
Start Date: 10/Nov/20 16:31
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520699765



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -359,6 +216,101 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
+  /**
+   * @return If the perfLogger should be reseted.
+   */
+  private boolean validateTxnList() throws CommandProcessorException {
+int retryShapshotCount = 0;
+int maxRetrySnapshotCount = HiveConf.getIntVar(driverContext.getConf(),
+HiveConf.ConfVars.HIVE_TXN_MAX_RETRYSNAPSHOT_COUNT);
+
+try {
+  do {
+driverContext.setOutdatedTxn(false);
+// Inserts will not invalidate the snapshot, that could cause 
duplicates.
+if (!driverTxnHandler.isValidTxnListState()) {
+  LOG.info("Re-compiling after acquiring locks, attempt #" + 
retryShapshotCount);
+  // Snapshot was outdated when locks were acquired, hence regenerate 
context, txn list and retry.
+  // TODO: Lock acquisition should be moved before analyze, this is a 
bit hackish.
+  // Currently, we acquire a snapshot, compile the query with that 
snapshot, and then - acquire locks.
+  // If snapshot is still valid, we continue as usual.
+  // But if snapshot is not valid, we recompile the query.
+  if (driverContext.isOutdatedTxn()) {
+// Later transaction invalidated the snapshot, a new transaction 
is required
+LOG.info("Snapshot is outdated, re-initiating transaction ...");
+driverContext.getTxnManager().rollbackTxn();
+
+String userFromUGI = DriverUtils.getUserFromUGI(driverContext);
+driverContext.getTxnManager().openTxn(context, userFromUGI, 
driverContext.getTxnType());
+lockAndRespond();
+  }
+  driverContext.setRetrial(true);
+  driverContext.getBackupContext().addSubContext(context);
+  
driverContext.getBackupContext().setHiveLocks(context.getHiveLocks());
+  context = driverContext.getBackupContext();
+
+  driverContext.getConf().set(ValidTxnList.VALID_TXNS_KEY,
+  driverContext.getTxnManager().getValidTxns().toString());
+
+  if (driverContext.getPlan().hasAcidResourcesInQuery()) {
+compileInternal(context.getCmd(), true);
+driverTxnHandler.recordValidWriteIds();
+driverTxnHandler.setWriteIdForAcidFileSinks();
+  }
+  // Since we're reusing the compiled plan, we need to update its 
start time for current run
+  
driverContext.getPlan().setQueryStartTime(driverContext.getQueryDisplay().getQueryStartTime());
+}
+// Re-check snapshot only in case we had to release locks and open a 
new transaction,
+// otherwise exclusive locks should protect output tables/partitions 
in snapshot from concurrent writes.
+  } while (driverContext.isOutdatedTxn() && ++retryShapshotCount <= 
maxRetrySnapshotCount);
+
+  if (retryShapshotCount > maxRetrySnapshotCount) {
+// Throw exception
+HiveException e = new HiveException(
+"Operation could not be executed, " + 
SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED + ".");
+DriverUtils.handleHiveException(driverContext, e, 14, null);
+  } else if (retryShapshotCount != 0) {
+// the reason that we set the txn manager for the cxt here is because 
each query has its own ctx object.
+// The txn mgr is shared across the same instance of Driver, which can 
run multiple queries.
+context.setHiveTxnManager(driverContext.getTxnManager());
+  }

Review comment:
   Since you are slicing and dicing things, you may want to look at 
breaking this out a bit.
   
   It is bad form to throw and exception, and then handle it, within the same 
method.
   
   I think this should be moved out of this `try` block.
   
   ```
 if (retryShapshotCount > maxRetrySnapshotCount) {
   // Throw exception
   HiveException e = new HiveException(
   "Operation could not be executed, " + 
SNAPSHOT_WAS_OUTDATED_WHEN_LOCKS_WERE_ACQUIRED + ".");
   DriverUtils.handleHiveException(driverContext, e, 14, null);
 }
 if (retryShapshotCount != 0) {
   // the reason that we set the txn manager for the cxt here is 
because each query has its own ctx object.
   // The txn mgr is shared across the s

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509800
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:22
Start Date: 10/Nov/20 16:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520693419



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -149,207 +149,64 @@ private CommandProcessorResponse run(String command, 
boolean alreadyCompiled) th
   runInternal(command, alreadyCompiled);
   return new CommandProcessorResponse(getSchema(), null);
 } catch (CommandProcessorException cpe) {
-  SessionState ss = SessionState.get();
-  if (ss == null) {
-throw cpe;
-  }
-  MetaDataFormatter mdf = MetaDataFormatUtils.getFormatter(ss.getConf());
-  if (!(mdf instanceof JsonMetaDataFormatter)) {
-throw cpe;
-  }
-  /*Here we want to encode the error in machine readable way (e.g. JSON)
-   * Ideally, errorCode would always be set to a canonical error defined 
in ErrorMsg.
-   * In practice that is rarely the case, so the messy logic below tries 
to tease
-   * out canonical error code if it can.  Exclude stack trace from output 
when
-   * the error is a specific/expected one.
-   * It's written to stdout for backward compatibility (WebHCat consumes 
it).*/
-  try {
-if (cpe.getCause() == null) {
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState());
-  throw cpe;
-}
-ErrorMsg canonicalErr = ErrorMsg.getErrorMsg(cpe.getResponseCode());
-if (canonicalErr != null && canonicalErr != ErrorMsg.GENERIC_ERROR) {
-  /*Some HiveExceptions (e.g. SemanticException) don't set
-canonical ErrorMsg explicitly, but there is logic
-(e.g. #compile()) to find an appropriate canonical error and
-return its code as error code. In this case we want to
-preserve it for downstream code to interpret*/
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState(), null);
-  throw cpe;
-}
-if (cpe.getCause() instanceof HiveException) {
-  HiveException rc = (HiveException)cpe.getCause();
-  mdf.error(ss.out, cpe.getMessage(), 
rc.getCanonicalErrorMsg().getErrorCode(), cpe.getSqlState(),
-  rc.getCanonicalErrorMsg() == ErrorMsg.GENERIC_ERROR ? 
StringUtils.stringifyException(rc) : null);
-} else {
-  ErrorMsg canonicalMsg = 
ErrorMsg.getErrorMsg(cpe.getCause().getMessage());
-  mdf.error(ss.out, cpe.getMessage(), canonicalMsg.getErrorCode(), 
cpe.getSqlState(),
-  StringUtils.stringifyException(cpe.getCause()));
-}
-  } catch (HiveException ex) {
-CONSOLE.printError("Unable to JSON-encode the error", 
StringUtils.stringifyException(ex));
-  }
+  processRunException(cpe);
   throw cpe;
 }
   }
 
   private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
 DriverState.setDriverState(driverState);
 
-driverState.lock();
-try {
-  if (driverContext != null && driverContext.getPlan() != null
-  && driverContext.getPlan().isPrepareQuery()
-  && !driverContext.getPlan().isExplain()) {
-LOG.info("Skip running tasks for prepare plan");
-return;
-  }
-  if (alreadyCompiled) {
-if (driverState.isCompiled()) {
-  driverState.executing();
-} else {
-  String errorMessage = "FAILED: Precompiled query has been cancelled 
or closed.";
-  CONSOLE.printError(errorMessage);
-  throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage, null, null);
-}
-  } else {
-driverState.compiling();
-  }
-} finally {
-  driverState.unlock();
+if (driverContext != null && driverContext.getPlan() != null &&

Review comment:
   I do see that the `driverContext` plan is set in a few different places. 
 I'm not sure if this can be checked outside of a lock.  At the very least, I 
would minimize the risk a bit:
   
   ```
   QueryPlan plan = driverContext.getPlan();
   if (plan != null
 && plan.isPrepareQuery()
 && !plan.isExplain()) {
   ```
   
   At least you know that the value of `plan` cannot possibly change between 
calls.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infra

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509799
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:18
Start Date: 10/Nov/20 16:18
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520689424



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -149,207 +149,64 @@ private CommandProcessorResponse run(String command, 
boolean alreadyCompiled) th
   runInternal(command, alreadyCompiled);
   return new CommandProcessorResponse(getSchema(), null);
 } catch (CommandProcessorException cpe) {
-  SessionState ss = SessionState.get();
-  if (ss == null) {
-throw cpe;
-  }
-  MetaDataFormatter mdf = MetaDataFormatUtils.getFormatter(ss.getConf());
-  if (!(mdf instanceof JsonMetaDataFormatter)) {
-throw cpe;
-  }
-  /*Here we want to encode the error in machine readable way (e.g. JSON)
-   * Ideally, errorCode would always be set to a canonical error defined 
in ErrorMsg.
-   * In practice that is rarely the case, so the messy logic below tries 
to tease
-   * out canonical error code if it can.  Exclude stack trace from output 
when
-   * the error is a specific/expected one.
-   * It's written to stdout for backward compatibility (WebHCat consumes 
it).*/
-  try {
-if (cpe.getCause() == null) {
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState());
-  throw cpe;
-}
-ErrorMsg canonicalErr = ErrorMsg.getErrorMsg(cpe.getResponseCode());
-if (canonicalErr != null && canonicalErr != ErrorMsg.GENERIC_ERROR) {
-  /*Some HiveExceptions (e.g. SemanticException) don't set
-canonical ErrorMsg explicitly, but there is logic
-(e.g. #compile()) to find an appropriate canonical error and
-return its code as error code. In this case we want to
-preserve it for downstream code to interpret*/
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState(), null);
-  throw cpe;
-}
-if (cpe.getCause() instanceof HiveException) {
-  HiveException rc = (HiveException)cpe.getCause();
-  mdf.error(ss.out, cpe.getMessage(), 
rc.getCanonicalErrorMsg().getErrorCode(), cpe.getSqlState(),
-  rc.getCanonicalErrorMsg() == ErrorMsg.GENERIC_ERROR ? 
StringUtils.stringifyException(rc) : null);
-} else {
-  ErrorMsg canonicalMsg = 
ErrorMsg.getErrorMsg(cpe.getCause().getMessage());
-  mdf.error(ss.out, cpe.getMessage(), canonicalMsg.getErrorCode(), 
cpe.getSqlState(),
-  StringUtils.stringifyException(cpe.getCause()));
-}
-  } catch (HiveException ex) {
-CONSOLE.printError("Unable to JSON-encode the error", 
StringUtils.stringifyException(ex));
-  }
+  processRunException(cpe);
   throw cpe;
 }
   }
 
   private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
 DriverState.setDriverState(driverState);
 
-driverState.lock();
-try {
-  if (driverContext != null && driverContext.getPlan() != null
-  && driverContext.getPlan().isPrepareQuery()
-  && !driverContext.getPlan().isExplain()) {
-LOG.info("Skip running tasks for prepare plan");
-return;
-  }
-  if (alreadyCompiled) {
-if (driverState.isCompiled()) {
-  driverState.executing();
-} else {
-  String errorMessage = "FAILED: Precompiled query has been cancelled 
or closed.";
-  CONSOLE.printError(errorMessage);
-  throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage, null, null);
-}
-  } else {
-driverState.compiling();
-  }
-} finally {
-  driverState.unlock();
+if (driverContext != null && driverContext.getPlan() != null &&

Review comment:
   This is nice because it has been moved outside the lock scope.  I was 
double-checking if this variable can safely be examined outside the scope of 
the lock, and I noted that `driverContext` can never be `null`.  It is defined 
as `final` and set in the constructor.  Please remove this check.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509799)
Time Spent: 0.5h  

[jira] [Work logged] (HIVE-24333) Cut long methods in Driver to smaller, more manageable pieces

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24333?focusedWorklogId=509798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509798
 ]

ASF GitHub Bot logged work on HIVE-24333:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 16:17
Start Date: 10/Nov/20 16:17
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1629:
URL: https://github.com/apache/hive/pull/1629#discussion_r520689424



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -149,207 +149,64 @@ private CommandProcessorResponse run(String command, 
boolean alreadyCompiled) th
   runInternal(command, alreadyCompiled);
   return new CommandProcessorResponse(getSchema(), null);
 } catch (CommandProcessorException cpe) {
-  SessionState ss = SessionState.get();
-  if (ss == null) {
-throw cpe;
-  }
-  MetaDataFormatter mdf = MetaDataFormatUtils.getFormatter(ss.getConf());
-  if (!(mdf instanceof JsonMetaDataFormatter)) {
-throw cpe;
-  }
-  /*Here we want to encode the error in machine readable way (e.g. JSON)
-   * Ideally, errorCode would always be set to a canonical error defined 
in ErrorMsg.
-   * In practice that is rarely the case, so the messy logic below tries 
to tease
-   * out canonical error code if it can.  Exclude stack trace from output 
when
-   * the error is a specific/expected one.
-   * It's written to stdout for backward compatibility (WebHCat consumes 
it).*/
-  try {
-if (cpe.getCause() == null) {
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState());
-  throw cpe;
-}
-ErrorMsg canonicalErr = ErrorMsg.getErrorMsg(cpe.getResponseCode());
-if (canonicalErr != null && canonicalErr != ErrorMsg.GENERIC_ERROR) {
-  /*Some HiveExceptions (e.g. SemanticException) don't set
-canonical ErrorMsg explicitly, but there is logic
-(e.g. #compile()) to find an appropriate canonical error and
-return its code as error code. In this case we want to
-preserve it for downstream code to interpret*/
-  mdf.error(ss.out, cpe.getMessage(), cpe.getResponseCode(), 
cpe.getSqlState(), null);
-  throw cpe;
-}
-if (cpe.getCause() instanceof HiveException) {
-  HiveException rc = (HiveException)cpe.getCause();
-  mdf.error(ss.out, cpe.getMessage(), 
rc.getCanonicalErrorMsg().getErrorCode(), cpe.getSqlState(),
-  rc.getCanonicalErrorMsg() == ErrorMsg.GENERIC_ERROR ? 
StringUtils.stringifyException(rc) : null);
-} else {
-  ErrorMsg canonicalMsg = 
ErrorMsg.getErrorMsg(cpe.getCause().getMessage());
-  mdf.error(ss.out, cpe.getMessage(), canonicalMsg.getErrorCode(), 
cpe.getSqlState(),
-  StringUtils.stringifyException(cpe.getCause()));
-}
-  } catch (HiveException ex) {
-CONSOLE.printError("Unable to JSON-encode the error", 
StringUtils.stringifyException(ex));
-  }
+  processRunException(cpe);
   throw cpe;
 }
   }
 
   private void runInternal(String command, boolean alreadyCompiled) throws 
CommandProcessorException {
 DriverState.setDriverState(driverState);
 
-driverState.lock();
-try {
-  if (driverContext != null && driverContext.getPlan() != null
-  && driverContext.getPlan().isPrepareQuery()
-  && !driverContext.getPlan().isExplain()) {
-LOG.info("Skip running tasks for prepare plan");
-return;
-  }
-  if (alreadyCompiled) {
-if (driverState.isCompiled()) {
-  driverState.executing();
-} else {
-  String errorMessage = "FAILED: Precompiled query has been cancelled 
or closed.";
-  CONSOLE.printError(errorMessage);
-  throw DriverUtils.createProcessorException(driverContext, 12, 
errorMessage, null, null);
-}
-  } else {
-driverState.compiling();
-  }
-} finally {
-  driverState.unlock();
+if (driverContext != null && driverContext.getPlan() != null &&

Review comment:
   This is nice because it has been moved outside the lock scope.  Do note 
however that driverContext can never be `null`.  It is defined as `final` and 
set in the constructor.  Please remove this check.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509798)
Time Spent: 20m  (was: 10m)

> Cut long methods in Driver to smaller, more manageable pieces
> ---

[jira] [Updated] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-10 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24362:
---
Description: In hive the children information is stored as list of objects. 
During processing of the children of a node, the list of object is converted to 
list of Nodes. This can cause large compilation time if the number of children 
is large(300,000). The list of children can be cached in the AST node to avoid 
this re-computation. The caching part is already fixed as part of HIVE-24031, 
the allocation of array is fixed in this Jira.  (was: In hive the children 
information is stored as list of objects. During processing of the children of 
a node, the list of object is converted to list of Nodes. This can cause large 
compilation time if the number of children is large. The list of children can 
be cached in the AST node to avoid this re-computation. )

> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large(300,000). The list of children can be cached in the AST node to avoid 
> this re-computation. The caching part is already fixed as part of HIVE-24031, 
> the allocation of array is fixed in this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-10 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24362.

Resolution: Fixed

Pushed to master. Thanks [~pgaref] for review.

> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large. The list of children can be cached in the AST node to avoid this 
> re-computation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24362) AST tree processing is suboptimal for tree with large number of nodes

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24362?focusedWorklogId=509739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509739
 ]

ASF GitHub Bot logged work on HIVE-24362:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 14:46
Start Date: 10/Nov/20 14:46
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #1656:
URL: https://github.com/apache/hive/pull/1656


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509739)
Time Spent: 20m  (was: 10m)

> AST tree processing is suboptimal for tree with large number of nodes
> -
>
> Key: HIVE-24362
> URL: https://issues.apache.org/jira/browse/HIVE-24362
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In hive the children information is stored as list of objects. During 
> processing of the children of a node, the list of object is converted to list 
> of Nodes. This can cause large compilation time if the number of children is 
> large. The list of children can be cached in the AST node to avoid this 
> re-computation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24341) Sweep phase for proactive cache eviction

2020-11-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24341 started by Ádám Szita.
-
> Sweep phase for proactive cache eviction
> 
>
> Key: HIVE-24341
> URL: https://issues.apache.org/jira/browse/HIVE-24341
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24340) Mark phase for proactive cache eviction

2020-11-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24340:
--
Status: Patch Available  (was: In Progress)

> Mark phase for proactive cache eviction
> ---
>
> Key: HIVE-24340
> URL: https://issues.apache.org/jira/browse/HIVE-24340
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24340) Mark phase for proactive cache eviction

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24340?focusedWorklogId=509716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509716
 ]

ASF GitHub Bot logged work on HIVE-24340:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 14:27
Start Date: 10/Nov/20 14:27
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1658:
URL: https://github.com/apache/hive/pull/1658


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509716)
Remaining Estimate: 0h
Time Spent: 10m

> Mark phase for proactive cache eviction
> ---
>
> Key: HIVE-24340
> URL: https://issues.apache.org/jira/browse/HIVE-24340
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24340) Mark phase for proactive cache eviction

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24340:
--
Labels: pull-request-available  (was: )

> Mark phase for proactive cache eviction
> ---
>
> Key: HIVE-24340
> URL: https://issues.apache.org/jira/browse/HIVE-24340
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-20022) Upgrade hadoop.version to 3.1.1

2020-11-10 Thread yongtaoliao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongtaoliao reassigned HIVE-20022:
--

Assignee: yongtaoliao  (was: Daniel Voros)

> Upgrade hadoop.version to 3.1.1
> ---
>
> Key: HIVE-20022
> URL: https://issues.apache.org/jira/browse/HIVE-20022
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>Assignee: yongtaoliao
>Priority: Blocker
> Attachments: HIVE-20022.1.patch, HIVE-20022.2.patch, 
> HIVE-20022.3.patch, HIVE-20022.3.patch, HIVE-20022.4.patch, HIVE-20022.4.patch
>
>
> HIVE-19304 is relying on YARN-7142 and YARN-8122 that will only be released 
> in Hadoop 3.1.1. We should upgrade when possible.
> cc [~gsaha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24328) Run distcp in parallel for all file entries in repl load.

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24328?focusedWorklogId=509616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509616
 ]

ASF GitHub Bot logged work on HIVE-24328:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 10:04
Start Date: 10/Nov/20 10:04
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1648:
URL: https://github.com/apache/hive/pull/1648#discussion_r520437679



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -647,6 +647,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
   "Provide the maximum number of partitions of a table that will be 
batched together during  \n"
 + "repl load. All the partitions in a batch will make a single 
metastore call to update the metadata. \n"
 + "The data for these partitions will be copied before copying the 
metadata batch. "),
+REPL_PARALLEL_COPY_TASKS("hive.repl.parallel.copy.tasks",1000,

Review comment:
   Setting this to 100





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509616)
Time Spent: 3h 20m  (was: 3h 10m)

> Run distcp in parallel for all file entries in repl load.
> -
>
> Key: HIVE-24328
> URL: https://issues.apache.org/jira/browse/HIVE-24328
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24328.01.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24350) NullScanTaskDispatcher should use stats

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24350?focusedWorklogId=509607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509607
 ]

ASF GitHub Bot logged work on HIVE-24350:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 09:32
Start Date: 10/Nov/20 09:32
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1645:
URL: https://github.com/apache/hive/pull/1645#discussion_r520415139



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
##
@@ -93,94 +91,109 @@ private String getAliasForTableScanOperator(MapWork work,
 return null;
   }
 
-  private PartitionDesc changePartitionToMetadataOnly(PartitionDesc desc,
-  Path path) {
-if (desc == null) {
-  return null;
-}
-FileStatus[] filesFoundInPartitionDir = null;
+  private void lookupAndProcessPath(MapWork work, Path path,
+   Collection aliasesToOptimize) {
 try {
-  filesFoundInPartitionDir = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path);
+  boolean isEmpty = 
Utilities.listNonHiddenFileStatus(physicalContext.getConf(), path).length == 0;
+  processPath(work, path, aliasesToOptimize, isEmpty);
 } catch (IOException e) {
-  LOG.error("Cannot determine if the table is empty", e);
-}
-if (!isMetadataOnlyAllowed(filesFoundInPartitionDir)) {
-  return desc;
+  LOG.warn("Could not determine if path {} was empty." +
+  "Cannot use null scan optimization for this path.", path, e);
 }
-
-boolean isEmpty = filesFoundInPartitionDir == null || 
filesFoundInPartitionDir.length == 0;
-desc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
-desc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
-desc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
-NullStructSerDe.class.getName());
-return desc;
   }
 
-  private boolean isMetadataOnlyAllowed(FileStatus[] filesFoundInPartitionDir) 
{
-if (filesFoundInPartitionDir == null || filesFoundInPartitionDir.length == 
0) {
-  return true; // empty folders are safe to convert to metadata-only
-}
-for (FileStatus f : filesFoundInPartitionDir) {
-  if (AcidUtils.isDeleteDelta(f.getPath())) {
-/*
- * as described in HIVE-23712, an acid partition is not a safe subject 
of metadata-only
- * optimization, because there is a chance that it contains no data 
but contains folders
- * (e.g: delta_002_002_, 
delete_delta_003_003_), without scanning
- * the underlying file contents, we cannot tell whether this partition 
contains data or not
- */
-return false;
-  }
+  private void processPath(MapWork work, Path path, Collection 
aliasesToOptimize,
+   boolean isEmpty) {
+PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
+partDesc.setInputFileFormatClass(isEmpty ? ZeroRowsInputFormat.class : 
OneNullRowInputFormat.class);
+partDesc.setOutputFileFormatClass(HiveIgnoreKeyTextOutputFormat.class);
+partDesc.getProperties().setProperty(serdeConstants.SERIALIZATION_LIB,
+NullStructSerDe.class.getName());
+Path fakePath =
+new Path(NullScanFileSystem.getBase() + partDesc.getTableName()
++ "/part" + encode(partDesc.getPartSpec()));
+StringInternUtils.internUriStringsInPath(fakePath);
+work.addPathToPartitionInfo(fakePath, partDesc);
+work.addPathToAlias(fakePath, new ArrayList<>(aliasesToOptimize));
+Collection aliasesContainingPath = 
work.getPathToAliases().get(path);
+aliasesContainingPath.removeAll(aliasesToOptimize);
+if (aliasesContainingPath.isEmpty()) {
+  work.removePathToAlias(path);
+  work.removePathToPartitionInfo(path);
 }
-return true;
   }
 
-  private void processAlias(MapWork work, Path path,
-  Collection aliasesAffected, Set aliases) {
-// the aliases that are allowed to map to a null scan.
-Collection allowed = aliasesAffected.stream()
-.filter(a -> aliases.contains(a)).collect(Collectors.toList());
-if (!allowed.isEmpty()) {
-  PartitionDesc partDesc = work.getPathToPartitionInfo().get(path).clone();
-  PartitionDesc newPartition =
-  changePartitionToMetadataOnly(partDesc, path);
-  // Prefix partition with something to avoid it being a hidden file.
-  Path fakePath =
-  new Path(NullScanFileSystem.getBase() + newPartition.getTableName()
-  + "/part" + encode(newPartition.getPartSpec()));
-  StringInternUtils.internUriStringsInPath(fakePath);
-  work.addPathToPartitionInfo(fakePath, newPartition);
-  work.addPathToAlias(fakePath, new ArrayList<>(allowed));
-  aliase

[jira] [Work logged] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23976?focusedWorklogId=509584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-509584
 ]

ASF GitHub Bot logged work on HIVE-23976:
-

Author: ASF GitHub Bot
Created on: 10/Nov/20 08:31
Start Date: 10/Nov/20 08:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1458:
URL: https://github.com/apache/hive/pull/1458#issuecomment-724547490


   > @abstractdog , can you trigger tests again?
   
   yeah, rebased to master, and pushed for new tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 509584)
Time Spent: 1h 10m  (was: 1h)

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)