date:20201013

[jira] [Resolved] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread Adesh Kumar Rao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao resolved HIVE-24247.

Resolution: Invalid

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread Adesh Kumar Rao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213665#comment-17213665
 ] 

Adesh Kumar Rao commented on HIVE-24247:


Not a bug.

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread Adesh Kumar Rao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-24247:
---
Fix Version/s: (was: 4.0.0)
Affects Version/s: (was: 4.0.0)

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24247?focusedWorklogId=500469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500469
 ]

ASF GitHub Bot logged work on HIVE-24247:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 06:50
Start Date: 14/Oct/20 06:50
Worklog Time Spent: 10m 
  Work Description: adesh-rao closed pull request #1575:
URL: https://github.com/apache/hive/pull/1575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500469)
Time Spent: 20m  (was: 10m)

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=500449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500449
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 06:07
Start Date: 14/Oct/20 06:07
Worklog Time Spent: 10m 
  Work Description: okumin commented on pull request #1531:
URL: https://github.com/apache/hive/pull/1531#issuecomment-708179596


   @kgyrtkirk I have updated some points so that # of rows will never be 0.
   Could you please have a look when you have a chance?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500449)
Time Spent: 3h  (was: 2h 50m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=500448&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500448
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 06:06
Start Date: 14/Oct/20 06:06
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r504422554



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2961,10 +2961,11 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
   final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics();
   final Statistics udtfStats = 
parents.get(LateralViewJoinOperator.UDTF_TAG).getStatistics();
 
-  final double factor = (double) udtfStats.getNumRows() / (double) 
selectStats.getNumRows();
+  final long udtfNumRows = Math.max(udtfStats.getNumRows(), 1);
+  final double factor = (double) udtfNumRows / (double) 
Math.max(selectStats.getNumRows(), 1);

Review comment:
   `factor` will be greater than 0.0 and must not 0 or infinity.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500448)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24271) Create managed table relies on hive.create.as.acid settings.

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24271:
--
Labels: pull-request-available  (was: )

> Create managed table relies on hive.create.as.acid settings.
> 
>
> Key: HIVE-24271
> URL: https://issues.apache.org/jira/browse/HIVE-24271
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.acid;
> ++
> |set |
> ++
> | hive.create.as.acid=false  |
> ++
> 1 row selected (0.018 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.insert.only;
> +---+
> |set|
> +---+
> | hive.create.as.insert.only=false  |
> +---+
> 1 row selected (0.013 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> create managed table 
> mgd_table(a int);
> INFO  : Compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.021 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.048 seconds
> INFO  : OK
> No rows affected (0.107 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> describe formatted mgd_table;
> INFO  : Compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
> type:string, comment:from deserializer), FieldSchema(name:data_type, 
> type:string, comment:from deserializer), FieldSchema(name:comment, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.037 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.03 seconds
> INFO  : OK
> +---+++
> |   col_name| data_type   
>|  comment   |
> +---+++
> | a | int 
>||
> |   | NULL
>| NULL   |
> | # Detailed Table Information  | NULL
>| NULL   |
> | Database: | bothfalseonhs2  
>| NULL   |
> | OwnerType:| USER
>| NULL   |
> | Owner:| hive
>| NULL   |
> | CreateTime:   | Wed Oct 14 05:35:26 UTC 2020
>| NULL   |
> | LastAccessTime:   | UNKNOWN 
>| NULL   |
> | Retention:| 0   
>| NULL

[jira] [Work logged] (HIVE-24271) Create managed table relies on hive.create.as.acid settings.

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24271?focusedWorklogId=500442&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500442
 ]

ASF GitHub Bot logged work on HIVE-24271:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 05:48
Start Date: 14/Oct/20 05:48
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1578:
URL: https://github.com/apache/hive/pull/1578


   … table (Naveen Gangam)
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   Currently "create managed table" does not create a ACID table when 
hive.create.as.acid=false and hive.create.as.insert.only=false on the HS2 
server. This DDL should create an ACID table regardless of the server settings.
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   Manually tested the fix.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500442)
Remaining Estimate: 0h
Time Spent: 10m

> Create managed table relies on hive.create.as.acid settings.
> 
>
> Key: HIVE-24271
> URL: https://issues.apache.org/jira/browse/HIVE-24271
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.acid;
> ++
> |set |
> ++
> | hive.create.as.acid=false  |
> ++
> 1 row selected (0.018 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.insert.only;
> +---+
> |set|
> +---+
> | hive.create.as.insert.only=false  |
> +---+
> 1 row selected (0.013 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> create managed table 
> mgd_table(a int);
> INFO  : Compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.021 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.048 seconds
> INFO  : OK
> No rows affected (0.107 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> describe formatted mgd_table;
> INFO  : Compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
> type:string, comment:from deserializer), FieldSchema(name:data_type, 
> type:string, comment:from deserializer), FieldSchema(name:comment, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.037 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.03 seconds
> INFO  : OK
> +---+++
> |   col_name| data_type   
>|  comment   |
> +---+++
> | a | int 
>

[jira] [Assigned] (HIVE-24271) Create managed table relies on hive.create.as.acid settings.

2020-10-13 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-24271:



> Create managed table relies on hive.create.as.acid settings.
> 
>
> Key: HIVE-24271
> URL: https://issues.apache.org/jira/browse/HIVE-24271
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.acid;
> ++
> |set |
> ++
> | hive.create.as.acid=false  |
> ++
> 1 row selected (0.018 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> set hive.create.as.insert.only;
> +---+
> |set|
> +---+
> | hive.create.as.insert.only=false  |
> +---+
> 1 row selected (0.013 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> create managed table 
> mgd_table(a int);
> INFO  : Compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.021 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140): 
> create managed table mgd_table(a int)
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053526_9ba1ffa3-3aa2-47c3-8514-1fe58fe4f140); 
> Time taken: 0.048 seconds
> INFO  : OK
> No rows affected (0.107 seconds)
> 0: jdbc:hive2://ngangam-3.ngangam.root.hwx.si> describe formatted mgd_table;
> INFO  : Compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
> type:string, comment:from deserializer), FieldSchema(name:data_type, 
> type:string, comment:from deserializer), FieldSchema(name:comment, 
> type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.037 seconds
> INFO  : Executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5): 
> describe formatted mgd_table
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20201014053533_8919be7d-41b0-41e5-b9eb-847801a9d8c5); 
> Time taken: 0.03 seconds
> INFO  : OK
> +---+++
> |   col_name| data_type   
>|  comment   |
> +---+++
> | a | int 
>||
> |   | NULL
>| NULL   |
> | # Detailed Table Information  | NULL
>| NULL   |
> | Database: | bothfalseonhs2  
>| NULL   |
> | OwnerType:| USER
>| NULL   |
> | Owner:| hive
>| NULL   |
> | CreateTime:   | Wed Oct 14 05:35:26 UTC 2020
>| NULL   |
> | LastAccessTime:   | UNKNOWN 
>| NULL   |
> | Retention:| 0   
>| NULL   |
> | Location: | 
> hdfs://ngangam-3.ngangam.root.hwx.site:8020/warehouse/tablespace/external/hive/bo

[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=500396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500396
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 03:05
Start Date: 14/Oct/20 03:05
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1577:
URL: https://github.com/apache/hive/pull/1577#discussion_r504372394



##
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##
@@ -1852,6 +1863,9 @@ public void close() throws IOException {
   closeSparkSession();
   registry.closeCUDFLoaders();
   dropSessionPaths(sessionConf);
+  if (pathCleaner != null) {
+pathCleaner.shutdown();

Review comment:
   dropSessionPaths() would have created the thread and would have started 
deleting. As shutdown() is called immediately, it would stop further processing 
and quit without purging other entries. (i.e This would prevent the thread from 
processing > 1024 entries in PathCleaner. It would be good to wait until all 
files are purged before quitting the thread.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/PathCleaner.java
##
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.concurrent.BlockingDeque;
+import java.util.concurrent.LinkedBlockingDeque;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+/**
+ * This class is used to asynchronously remove directories after query 
execution
+ */
+public class PathCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(PathCleaner.class.getName());
+
+  private final BlockingDeque deleteActions = new 
LinkedBlockingDeque<>();
+  private final AtomicBoolean isShutdown = new AtomicBoolean();
+  private final Thread cleanupThread;
+
+  public PathCleaner(String name) {
+cleanupThread = new Thread(new Runnable() {
+  @Override
+  public void run() {
+while (!isShutdown.get()) {
+  Path path = null;
+  FileSystem fs;
+  try {
+AsyncDeleteAction firstAction = deleteActions.poll(30, 
TimeUnit.SECONDS);
+if (firstAction != null) {

Review comment:
   Refer to the other comment plz. If firstAction is null, shutdown 
condition can be checked to quit the thread.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500396)
Time Spent: 20m  (was: 10m)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=500390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500390
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 02:45
Start Date: 14/Oct/20 02:45
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1205:
URL: https://github.com/apache/hive/pull/1205


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500390)
Time Spent: 6h  (was: 5h 50m)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=500386&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500386
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 02:22
Start Date: 14/Oct/20 02:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1205:
URL: https://github.com/apache/hive/pull/1205#issuecomment-708113764


   Rerun tests to check the failed reading of xml.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500386)
Time Spent: 5h 40m  (was: 5.5h)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=500387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500387
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 02:22
Start Date: 14/Oct/20 02:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1205:
URL: https://github.com/apache/hive/pull/1205


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500387)
Time Spent: 5h 50m  (was: 5h 40m)

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=500360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500360
 ]

ASF GitHub Bot logged work on HIVE-23935:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 00:55
Start Date: 14/Oct/20 00:55
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1321:
URL: https://github.com/apache/hive/pull/1321


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500360)
Time Spent: 50m  (was: 40m)

> Fetching primaryKey through beeline fails with NPE
> --
>
> Key: HIVE-23935
> URL: https://issues.apache.org/jira/browse/HIVE-23935
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE
> {noformat}
> 0: jdbc:hive2://localhost:1> !primarykeys Persons
> Error: MetaException(message:java.lang.NullPointerException) (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: 
> MetaException(message:java.lang.NullPointerException)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351)
>   at 
> org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89)
>   at org.apache.hive.beeline.Commands.metadata(Commands.java:125)
>   at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57)
>   at 
> org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24014) Need to delete DumpDirectoryCleanerTask

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24014?focusedWorklogId=500359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500359
 ]

ASF GitHub Bot logged work on HIVE-24014:
-

Author: ASF GitHub Bot
Created on: 14/Oct/20 00:54
Start Date: 14/Oct/20 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1378:
URL: https://github.com/apache/hive/pull/1378


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500359)
Time Spent: 50m  (was: 40m)

> Need to delete DumpDirectoryCleanerTask
> ---
>
> Key: HIVE-24014
> URL: https://issues.apache.org/jira/browse/HIVE-24014
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24014.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With the newer implementation, every dump operation cleans up the  
> dump-directory previously consumed by load operation. Hence, for a policy, at 
> most only one dump directory will be there. Also, now dump directory base 
> location config is policy level config and hence this DumpDirCleanerTask will 
> not be effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24106) Abort polling on the operation state when the current thread is interrupted

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24106?focusedWorklogId=500345&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500345
 ]

ASF GitHub Bot logged work on HIVE-24106:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 23:57
Start Date: 13/Oct/20 23:57
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1456:
URL: https://github.com/apache/hive/pull/1456


   …ead is interrupted
   
   
   
   ### What changes were proposed in this pull request?
   Abort polling on the operation state when the current thread is interrupted
   
   
   
   ### Why are the changes needed?
   If running HiveStatement asynchronously as a task like in a thread or 
future,  if we interrupt the task,  the HiveStatement would continue to poll on 
the operation state until finish. It's may better to provide a way to abort the 
executing in such case.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500345)
Time Spent: 1h 40m  (was: 1.5h)

> Abort polling on the operation state when the current thread is interrupted
> ---
>
> Key: HIVE-24106
> URL: https://issues.apache.org/jira/browse/HIVE-24106
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> If running HiveStatement asynchronously as a task like in a thread or future, 
>  if we interrupt the task,  the HiveStatement would continue to poll on the 
> operation state until finish. It's may better to provide a way to abort the 
> executing in such case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24106) Abort polling on the operation state when the current thread is interrupted

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24106?focusedWorklogId=500340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500340
 ]

ASF GitHub Bot logged work on HIVE-24106:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 23:45
Start Date: 13/Oct/20 23:45
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1456:
URL: https://github.com/apache/hive/pull/1456


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500340)
Time Spent: 1.5h  (was: 1h 20m)

> Abort polling on the operation state when the current thread is interrupted
> ---
>
> Key: HIVE-24106
> URL: https://issues.apache.org/jira/browse/HIVE-24106
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If running HiveStatement asynchronously as a task like in a thread or future, 
>  if we interrupt the task,  the HiveStatement would continue to poll on the 
> operation state until finish. It's may better to provide a way to abort the 
> executing in such case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-21498) Upgrade Thrift to 0.13.0

2020-10-13 Thread shanyu zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213451#comment-17213451
 ] 

shanyu zhao commented on HIVE-21498:


Should we consider cherry picking it to hive 2.3? Many people have production 
environment still on hive 2.3, and this upgrade solves security vulnerability 
issues.

> Upgrade Thrift to 0.13.0
> 
>
> Key: HIVE-21498
> URL: https://issues.apache.org/jira/browse/HIVE-21498
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Ajith S
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: security
> Fix For: 4.0.0
>
>
> Upgrade to consider security fixes.
> Especially https://issues.apache.org/jira/browse/THRIFT-4506



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread Mustafa Iman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213430#comment-17213430
 ] 

Mustafa Iman commented on HIVE-24270:
-

[~rajesh.balamohan] can you take a look?

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24270:

Status: Patch Available  (was: Open)

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?focusedWorklogId=500299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500299
 ]

ASF GitHub Bot logged work on HIVE-24270:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 21:18
Start Date: 13/Oct/20 21:18
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1577:
URL: https://github.com/apache/hive/pull/1577


   Change-Id: Ife47a373720bca86edab5f911eade9a471ac5b35
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500299)
Remaining Estimate: 0h
Time Spent: 10m

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24270:
--
Labels: pull-request-available  (was: )

> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24270) Move scratchdir cleanup to background

2020-10-13 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-24270:
---


> Move scratchdir cleanup to background
> -
>
> Key: HIVE-24270
> URL: https://issues.apache.org/jira/browse/HIVE-24270
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> In cloud environment, scratchdir cleaning at the end of the query may take 
> long time. This causes client to hang up to 1 minute even after the results 
> were streamed back. During this time client just waits for cleanup to finish. 
> Cleanup can take place in the background in HiveServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24267) RetryingClientTimeBased should always perform first invocation

2020-10-13 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24267:

Attachment: HIVE-24267.02.patch

> RetryingClientTimeBased should always perform first invocation
> --
>
> Key: HIVE-24267
> URL: https://issues.apache.org/jira/browse/HIVE-24267
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24267.01.patch, HIVE-24267.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24267) RetryingClientTimeBased should always perform first invocation

2020-10-13 Thread Aasha Medhi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213309#comment-17213309
 ] 

Aasha Medhi commented on HIVE-24267:


+1

> RetryingClientTimeBased should always perform first invocation
> --
>
> Key: HIVE-24267
> URL: https://issues.apache.org/jira/browse/HIVE-24267
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24267.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24265:
--

Assignee: Jesus Camacho Rodriguez

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213219#comment-17213219
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

* I've run 3 times the test without the the patch (all passed)
* and 3 times with the patch (all failed)

so it seems like this is definetly related to HIVE-24202

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24255) StorageHandler with select-limit query is returning 0 rows

2020-10-13 Thread Vineet Garg (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213209#comment-17213209
 ] 

Vineet Garg commented on HIVE-24255:


Merged the pull request to upstream master. Thanks [~nareshpr]

> StorageHandler with select-limit query is returning 0 rows
> --
>
> Key: HIVE-24255
> URL: https://issues.apache.org/jira/browse/HIVE-24255
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> CREATE EXTERNAL TABLE dbs(db_id bigint, db_location_uri string, name string, 
> owner_name string, owner_type string) STORED BY 
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES 
> ('hive.sql.database.type'='METASTORE', 'hive.sql.query'='SELECT `DB_ID`, 
> `DB_LOCATION_URI`, `NAME`, `OWNER_NAME`, `OWNER_TYPE` FROM `DBS`'); 
> {code}
> ==> Wrong Result <==
> {code:java}
> set hive.limit.optimize.enable=true;
> select * from dbs limit 1;
> --
>  VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> --
> Map 1 .. container SUCCEEDED 0 0 0 0 0 0
> --
> VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 0.91 s
> --
> ++--+---+-+-+
> | dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | 
> dbs.owner_type |
> ++--+---+-+-+
> ++--+---+-+-+
> {code}
> ==> Correct Result <==
> {code:java}
> set hive.limit.optimize.enable=false;
> select * from dbs limit 1;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 4.11 s
> --+++---+-+-+
> | dbs.db_id  |dbs.db_location_uri | dbs.name  
> | dbs.owner_name  | dbs.owner_type  |
> +++---+-+-+
> | 1  | hdfs://abcd:8020/warehouse/tablespace/managed/hive | default   
> | public  | ROLE|
> +++---+-+-+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24255) StorageHandler with select-limit query is returning 0 rows

2020-10-13 Thread Vineet Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-24255.

Fix Version/s: 4.0.0
   Resolution: Fixed

> StorageHandler with select-limit query is returning 0 rows
> --
>
> Key: HIVE-24255
> URL: https://issues.apache.org/jira/browse/HIVE-24255
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> CREATE EXTERNAL TABLE dbs(db_id bigint, db_location_uri string, name string, 
> owner_name string, owner_type string) STORED BY 
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES 
> ('hive.sql.database.type'='METASTORE', 'hive.sql.query'='SELECT `DB_ID`, 
> `DB_LOCATION_URI`, `NAME`, `OWNER_NAME`, `OWNER_TYPE` FROM `DBS`'); 
> {code}
> ==> Wrong Result <==
> {code:java}
> set hive.limit.optimize.enable=true;
> select * from dbs limit 1;
> --
>  VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> --
> Map 1 .. container SUCCEEDED 0 0 0 0 0 0
> --
> VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 0.91 s
> --
> ++--+---+-+-+
> | dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | 
> dbs.owner_type |
> ++--+---+-+-+
> ++--+---+-+-+
> {code}
> ==> Correct Result <==
> {code:java}
> set hive.limit.optimize.enable=false;
> select * from dbs limit 1;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 4.11 s
> --+++---+-+-+
> | dbs.db_id  |dbs.db_location_uri | dbs.name  
> | dbs.owner_name  | dbs.owner_type  |
> +++---+-+-+
> | 1  | hdfs://abcd:8020/warehouse/tablespace/managed/hive | default   
> | public  | ROLE|
> +++---+-+-+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24255) StorageHandler with select-limit query is returning 0 rows

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24255?focusedWorklogId=500157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500157
 ]

ASF GitHub Bot logged work on HIVE-24255:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 16:38
Start Date: 13/Oct/20 16:38
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1568:
URL: https://github.com/apache/hive/pull/1568


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500157)
Time Spent: 40m  (was: 0.5h)

> StorageHandler with select-limit query is returning 0 rows
> --
>
> Key: HIVE-24255
> URL: https://issues.apache.org/jira/browse/HIVE-24255
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> CREATE EXTERNAL TABLE dbs(db_id bigint, db_location_uri string, name string, 
> owner_name string, owner_type string) STORED BY 
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES 
> ('hive.sql.database.type'='METASTORE', 'hive.sql.query'='SELECT `DB_ID`, 
> `DB_LOCATION_URI`, `NAME`, `OWNER_NAME`, `OWNER_TYPE` FROM `DBS`'); 
> {code}
> ==> Wrong Result <==
> {code:java}
> set hive.limit.optimize.enable=true;
> select * from dbs limit 1;
> --
>  VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> --
> Map 1 .. container SUCCEEDED 0 0 0 0 0 0
> --
> VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 0.91 s
> --
> ++--+---+-+-+
> | dbs.db_id | dbs.db_location_uri | dbs.name | dbs.owner_name | 
> dbs.owner_type |
> ++--+---+-+-+
> ++--+---+-+-+
> {code}
> ==> Correct Result <==
> {code:java}
> set hive.limit.optimize.enable=false;
> select * from dbs limit 1;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 4.11 s
> --+++---+-+-+
> | dbs.db_id  |dbs.db_location_uri | dbs.name  
> | dbs.owner_name  | dbs.owner_type  |
> +++---+-+-+
> | 1  | hdfs://abcd:8020/warehouse/tablespace/managed/hive | default   
> | public  | ROLE|
> +++---+-+-+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=500151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500151
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 16:30
Start Date: 13/Oct/20 16:30
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r504075393



##
File path: 
ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out
##
@@ -4816,18 +4816,34 @@ STAGE PLANS:
   alias: srcpart
   filterExpr: ds is not null (type: boolean)
   Statistics: Num rows: 2000 Data size: 389248 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Group By Operator
-keys: ds (type: string)
-minReductionHashAggr: 0.99
-mode: hash
-outputColumnNames: _col0
-Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-Reduce Output Operator
-  key expressions: _col0 (type: string)
-  null sort order: z
-  sort order: +
-  Map-reduce partition columns: _col0 (type: string)
+  Filter Operator

Review comment:
   these 2 filter operators will be merged by the 'downstream merge'  patch





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500151)
Time Spent: 3h 10m  (was: 3h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-13 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213206#comment-17213206
 ] 

Chao Sun commented on HIVE-21737:
-

Sounds good to me [~iemejia]. Thanks!

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24184) Re-order methods in Driver

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24184?focusedWorklogId=500109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500109
 ]

ASF GitHub Bot logged work on HIVE-24184:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 15:22
Start Date: 13/Oct/20 15:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1512:
URL: https://github.com/apache/hive/pull/1512#issuecomment-707816599


   LGTM pending tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500109)
Time Spent: 50m  (was: 40m)

> Re-order methods in Driver
> --
>
> Key: HIVE-24184
> URL: https://issues.apache.org/jira/browse/HIVE-24184
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Driver is still a huge class, with a lot of methods. They are not 
> representing the order of the process done by the Driver (compilation, 
> execution, result providing, closing). Also the constructors are not at the 
> beginning of the class. All of these make the class harder to read. By 
> re-ordering them it would be easier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24184) Re-order methods in Driver

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24184?focusedWorklogId=500107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500107
 ]

ASF GitHub Bot logged work on HIVE-24184:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 15:19
Start Date: 13/Oct/20 15:19
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1512:
URL: https://github.com/apache/hive/pull/1512#discussion_r504039229



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -588,15 +353,280 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
-  @Override
-  public boolean isFetchingTable() {
-return driverContext.getFetchTask() != null;
+  public void lockAndRespond() throws CommandProcessorException {
+// Assumes the query has already been compiled
+if (driverContext.getPlan() == null) {
+  throw new IllegalStateException(
+  "No previously compiled query for driver - queryId=" + 
driverContext.getQueryState().getQueryId());
+}
+
+try {
+  driverTxnHandler.acquireLocksIfNeeded();
+} catch (CommandProcessorException cpe) {
+  driverTxnHandler.rollback(cpe);
+  throw cpe;
+}
   }
 
-  @SuppressWarnings({ "unchecked", "rawtypes" })
   @Override
-  public boolean getResults(List res) throws IOException {
-if (driverState.isDestroyed() || driverState.isClosed()) {
+  public CommandProcessorResponse compileAndRespond(String command) throws 
CommandProcessorException {
+return compileAndRespond(command, false);
+  }
+
+  public CommandProcessorResponse compileAndRespond(String command, boolean 
cleanupTxnList)
+  throws CommandProcessorException {
+try {
+  compileInternal(command, false);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  throw cpe;
+} finally {
+  if (cleanupTxnList) {
+// Valid txn list might be generated for a query compiled using this 
command, thus we need to reset it
+driverTxnHandler.cleanupTxnList();
+  }
+}
+  }
+
+  private void compileInternal(String command, boolean deferClose) throws 
CommandProcessorException {
+Metrics metrics = MetricsFactory.getInstance();
+if (metrics != null) {
+  metrics.incrementCounter(MetricsConstant.WAITING_COMPILE_OPS, 1);
+}
+
+PerfLogger perfLogger = SessionState.getPerfLogger(true);
+perfLogger.perfLogBegin(CLASS_NAME, PerfLogger.WAIT_COMPILE);
+
+try (CompileLock compileLock = 
CompileLockFactory.newInstance(driverContext.getConf(), command)) {
+  boolean success = compileLock.tryAcquire();
+
+  perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.WAIT_COMPILE);
+
+  if (metrics != null) {
+metrics.decrementCounter(MetricsConstant.WAITING_COMPILE_OPS, 1);
+  }
+  if (!success) {
+String errorMessage = 
ErrorMsg.COMPILE_LOCK_TIMED_OUT.getErrorCodedMsg();
+throw DriverUtils.createProcessorException(driverContext, 
ErrorMsg.COMPILE_LOCK_TIMED_OUT.getErrorCode(),
+errorMessage, null, null);
+  }
+
+  try {
+compile(command, true, deferClose);
+  } catch (CommandProcessorException cpe) {
+try {
+  driverTxnHandler.endTransactionAndCleanup(false);
+} catch (LockException e) {
+  LOG.warn("Exception in releasing locks. " + 
StringUtils.stringifyException(e));
+}
+throw cpe;
+  }
+}
+//Save compile-time PerfLogging for WebUI.
+//Execution-time Perf logs are done by either another thread's PerfLogger
+//or a reset PerfLogger.
+
driverContext.getQueryDisplay().setPerfLogStarts(QueryDisplay.Phase.COMPILATION,
 perfLogger.getStartTimes());
+
driverContext.getQueryDisplay().setPerfLogEnds(QueryDisplay.Phase.COMPILATION, 
perfLogger.getEndTimes());
+  }
+
+  /**
+   * Compiles a new HQL command, but potentially resets taskID counter. Not 
resetting task counter is useful for
+   * generating re-entrant QL queries.
+   *
+   * @param command  The HiveQL query to compile
+   * @param resetTaskIds Resets taskID counter if true.
+   * @return 0 for ok
+   */
+  public int compile(String command, boolean resetTaskIds) {
+try {
+  compile(command, resetTaskIds, false);
+  return 0;
+} catch (CommandProcessorException cpr) {
+  return cpr.getErrorCode();
+}
+  }
+
+  /**
+   * Compiles an HQL command, creates an execution plan for it.
+   *
+   * @param deferClose indicates if the close/destroy should be deferred when 
the process has been interrupted, it
+   *should be set to true if the compile is called within another 
method like runInternal, which defers the
+   *close to the called in that method.
+   * @param resetTaskIds Reset

[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Jesus Camacho Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213159#comment-17213159
 ] 

Jesus Camacho Rodriguez commented on HIVE-24265:


[~kgyrtkirk], in fact I think the patch could have an effect if we would be 
serving from the cache incorrectly. Please let me know when you validate the 
bisect.

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24184) Re-order methods in Driver

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24184?focusedWorklogId=500100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500100
 ]

ASF GitHub Bot logged work on HIVE-24184:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 15:03
Start Date: 13/Oct/20 15:03
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1512:
URL: https://github.com/apache/hive/pull/1512#discussion_r504025808



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -588,15 +353,280 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 SessionState.getPerfLogger().cleanupPerfLogMetrics();
   }
 
-  @Override
-  public boolean isFetchingTable() {
-return driverContext.getFetchTask() != null;
+  public void lockAndRespond() throws CommandProcessorException {
+// Assumes the query has already been compiled
+if (driverContext.getPlan() == null) {
+  throw new IllegalStateException(
+  "No previously compiled query for driver - queryId=" + 
driverContext.getQueryState().getQueryId());
+}
+
+try {
+  driverTxnHandler.acquireLocksIfNeeded();
+} catch (CommandProcessorException cpe) {
+  driverTxnHandler.rollback(cpe);
+  throw cpe;
+}
   }
 
-  @SuppressWarnings({ "unchecked", "rawtypes" })
   @Override
-  public boolean getResults(List res) throws IOException {
-if (driverState.isDestroyed() || driverState.isClosed()) {
+  public CommandProcessorResponse compileAndRespond(String command) throws 
CommandProcessorException {
+return compileAndRespond(command, false);
+  }
+
+  public CommandProcessorResponse compileAndRespond(String command, boolean 
cleanupTxnList)
+  throws CommandProcessorException {
+try {
+  compileInternal(command, false);
+  return new CommandProcessorResponse(getSchema(), null);
+} catch (CommandProcessorException cpe) {
+  throw cpe;
+} finally {
+  if (cleanupTxnList) {
+// Valid txn list might be generated for a query compiled using this 
command, thus we need to reset it
+driverTxnHandler.cleanupTxnList();
+  }
+}
+  }
+
+  private void compileInternal(String command, boolean deferClose) throws 
CommandProcessorException {
+Metrics metrics = MetricsFactory.getInstance();
+if (metrics != null) {
+  metrics.incrementCounter(MetricsConstant.WAITING_COMPILE_OPS, 1);
+}
+
+PerfLogger perfLogger = SessionState.getPerfLogger(true);
+perfLogger.perfLogBegin(CLASS_NAME, PerfLogger.WAIT_COMPILE);
+
+try (CompileLock compileLock = 
CompileLockFactory.newInstance(driverContext.getConf(), command)) {
+  boolean success = compileLock.tryAcquire();
+
+  perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.WAIT_COMPILE);
+
+  if (metrics != null) {
+metrics.decrementCounter(MetricsConstant.WAITING_COMPILE_OPS, 1);
+  }
+  if (!success) {
+String errorMessage = 
ErrorMsg.COMPILE_LOCK_TIMED_OUT.getErrorCodedMsg();
+throw DriverUtils.createProcessorException(driverContext, 
ErrorMsg.COMPILE_LOCK_TIMED_OUT.getErrorCode(),
+errorMessage, null, null);
+  }
+
+  try {
+compile(command, true, deferClose);
+  } catch (CommandProcessorException cpe) {
+try {
+  driverTxnHandler.endTransactionAndCleanup(false);
+} catch (LockException e) {
+  LOG.warn("Exception in releasing locks. " + 
StringUtils.stringifyException(e));
+}
+throw cpe;
+  }
+}
+//Save compile-time PerfLogging for WebUI.
+//Execution-time Perf logs are done by either another thread's PerfLogger
+//or a reset PerfLogger.
+
driverContext.getQueryDisplay().setPerfLogStarts(QueryDisplay.Phase.COMPILATION,
 perfLogger.getStartTimes());
+
driverContext.getQueryDisplay().setPerfLogEnds(QueryDisplay.Phase.COMPILATION, 
perfLogger.getEndTimes());
+  }
+
+  /**
+   * Compiles a new HQL command, but potentially resets taskID counter. Not 
resetting task counter is useful for
+   * generating re-entrant QL queries.
+   *
+   * @param command  The HiveQL query to compile
+   * @param resetTaskIds Resets taskID counter if true.
+   * @return 0 for ok
+   */
+  public int compile(String command, boolean resetTaskIds) {
+try {
+  compile(command, resetTaskIds, false);
+  return 0;
+} catch (CommandProcessorException cpr) {
+  return cpr.getErrorCode();
+}
+  }
+
+  /**
+   * Compiles an HQL command, creates an execution plan for it.
+   *
+   * @param deferClose indicates if the close/destroy should be deferred when 
the process has been interrupted, it
+   *should be set to true if the compile is called within another 
method like runInternal, which defers the
+   *close to the called in that method.
+   * @param resetTaskIds Resets ta

[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?focusedWorklogId=500062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500062
 ]

ASF GitHub Bot logged work on HIVE-24266:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 13:56
Start Date: 13/Oct/20 13:56
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1576:
URL: https://github.com/apache/hive/pull/1576#discussion_r503970960



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -1156,13 +1157,36 @@ public BISplitStrategy(Context context, FileSystem fs, 
Path dir,
   } else {
 TreeMap blockOffsets = 
SHIMS.getLocationsWithOffset(fs, fileStatus);
 for (Map.Entry entry : 
blockOffsets.entrySet()) {
-  if (entry.getKey() + entry.getValue().getLength() > logicalLen) {
+  long blockOffset = entry.getKey();
+  long blockLength = entry.getValue().getLength();
+  if(blockOffset > logicalLen) {
 //don't create splits for anything past logical EOF
-continue;
+//map is ordered, thus any possible entry in the iteration 
after this is bound to be > logicalLen
+break;
   }
-  OrcSplit orcSplit = new OrcSplit(fileStatus.getPath(), fileKey, 
entry.getKey(),
-entry.getValue().getLength(), entry.getValue().getHosts(), 
null, isOriginal, true,
-deltas, -1, logicalLen, dir, offsetAndBucket);
+  long splitLength = blockLength;
+
+  long blockEndOvershoot = (blockOffset + blockLength) - 
logicalLen;
+  if (blockEndOvershoot > 0) {
+// if logicalLen is placed within a block, we should make 
(this last) split out of the part of this block
+// -> we should read less than block end
+splitLength -= blockEndOvershoot;
+  } else if (blockOffsets.lastKey() == blockOffset && 
blockEndOvershoot < 0) {
+// This is the last block but it ends before logicalLen
+// This can happen with HDFS if hflush was called and blocks 
are not persisted to disk yet, but content
+// is otherwise available for readers, as DNs have these 
buffers in memory at this time.
+// -> we should read more than (persisted) block end, but 
surely not more than the whole block
+if (fileStatus instanceof HdfsLocatedFileStatus) {
+  HdfsLocatedFileStatus hdfsFileStatus = 
(HdfsLocatedFileStatus)fileStatus;
+  if (hdfsFileStatus.getLocatedBlocks().isUnderConstruction()) 
{
+// blockEndOvershoot is negative here...
+splitLength = Math.min(splitLength - blockEndOvershoot, 
hdfsFileStatus.getBlockSize());

Review comment:
   hdfsFileStatus.blockSize() is not the block length, but the configured 
(max) block size (e.g. 256MB) - that's how big a block can be max, and that's 
why we shouldn't read past that in this split





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500062)
Time Spent: 40m  (was: 0.5h)

> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that

[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?focusedWorklogId=500056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500056
 ]

ASF GitHub Bot logged work on HIVE-24266:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 13:27
Start Date: 13/Oct/20 13:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1576:
URL: https://github.com/apache/hive/pull/1576#discussion_r503950040



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -1156,13 +1157,36 @@ public BISplitStrategy(Context context, FileSystem fs, 
Path dir,
   } else {
 TreeMap blockOffsets = 
SHIMS.getLocationsWithOffset(fs, fileStatus);
 for (Map.Entry entry : 
blockOffsets.entrySet()) {
-  if (entry.getKey() + entry.getValue().getLength() > logicalLen) {
+  long blockOffset = entry.getKey();
+  long blockLength = entry.getValue().getLength();
+  if(blockOffset > logicalLen) {
 //don't create splits for anything past logical EOF
-continue;
+//map is ordered, thus any possible entry in the iteration 
after this is bound to be > logicalLen
+break;
   }
-  OrcSplit orcSplit = new OrcSplit(fileStatus.getPath(), fileKey, 
entry.getKey(),
-entry.getValue().getLength(), entry.getValue().getHosts(), 
null, isOriginal, true,
-deltas, -1, logicalLen, dir, offsetAndBucket);
+  long splitLength = blockLength;
+
+  long blockEndOvershoot = (blockOffset + blockLength) - 
logicalLen;
+  if (blockEndOvershoot > 0) {
+// if logicalLen is placed within a block, we should make 
(this last) split out of the part of this block
+// -> we should read less than block end
+splitLength -= blockEndOvershoot;
+  } else if (blockOffsets.lastKey() == blockOffset && 
blockEndOvershoot < 0) {
+// This is the last block but it ends before logicalLen
+// This can happen with HDFS if hflush was called and blocks 
are not persisted to disk yet, but content
+// is otherwise available for readers, as DNs have these 
buffers in memory at this time.
+// -> we should read more than (persisted) block end, but 
surely not more than the whole block
+if (fileStatus instanceof HdfsLocatedFileStatus) {
+  HdfsLocatedFileStatus hdfsFileStatus = 
(HdfsLocatedFileStatus)fileStatus;
+  if (hdfsFileStatus.getLocatedBlocks().isUnderConstruction()) 
{
+// blockEndOvershoot is negative here...
+splitLength = Math.min(splitLength - blockEndOvershoot, 
hdfsFileStatus.getBlockSize());

Review comment:
   Why is this capped with blockSize?
   Isn't there a case where we are at the end of the block but still writing?
   SplitLength might be this and it is easier to understad:
   ```
   logicalLen - blockOffset
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500056)
Time Spent: 0.5h  (was: 20m)

> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating fi

[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?focusedWorklogId=500051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500051
 ]

ASF GitHub Bot logged work on HIVE-24266:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 13:21
Start Date: 13/Oct/20 13:21
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1576:
URL: https://github.com/apache/hive/pull/1576#discussion_r503945609



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -1156,13 +1157,36 @@ public BISplitStrategy(Context context, FileSystem fs, 
Path dir,
   } else {
 TreeMap blockOffsets = 
SHIMS.getLocationsWithOffset(fs, fileStatus);
 for (Map.Entry entry : 
blockOffsets.entrySet()) {
-  if (entry.getKey() + entry.getValue().getLength() > logicalLen) {
+  long blockOffset = entry.getKey();
+  long blockLength = entry.getValue().getLength();
+  if(blockOffset > logicalLen) {

Review comment:
   nit: space





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500051)
Time Spent: 20m  (was: 10m)

> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24266:
--
Labels: pull-request-available  (was: )

> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?focusedWorklogId=500045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500045
 ]

ASF GitHub Bot logged work on HIVE-24266:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 13:00
Start Date: 13/Oct/20 13:00
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #1576:
URL: https://github.com/apache/hive/pull/1576


   in HDFS environment if a writer is using hflush to write ORC ACID files 
during a transaction commit, the results might be seen as missing when reading 
the table before this file is completely persisted to disk (thus synced)
   
   This is due to hflush not persisting the new buffers to disk, it rather just 
ensures that new readers can see the new content. This causes the block 
information to be incomplete, on which BISplitStrategy relies on. Although the 
side file (_flush_length) tracks the proper end of the file that is being 
written, this information is neglected in the favour of block information, and 
we may end up generating a very short split instead of the larger, available 
length.
   When ETLSplitStrategy is used there is not even a try to rely on ACID side 
file when calculating file length, so that needs to fixed too.
   
   Moreover we might see the newly committed rows not to appear due to OrcTail 
caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
cache off to anyone that wants real time row updates to be read in:
   
   set hive.orc.cache.stripe.details.mem.size=0;  
   ..as tweaking with that code would probably open a can of worms..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500045)
Remaining Estimate: 0h
Time Spent: 10m

> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24266) Committed rows in hflush'd ACID files may be missing from query result

2020-10-13 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-24266:
--
Description: 
in HDFS environment if a writer is using hflush to write ORC ACID files during 
a transaction commit, the results might be seen as missing when reading the 
table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just 
ensures that new readers can see the new content. This causes the block 
information to be incomplete, on which BISplitStrategy relies on. Although the 
side file (_flush_length) tracks the proper end of the file that is being 
written, this information is neglected in the favour of block information, and 
we may end up generating a very short split instead of the larger, available 
length.
When ETLSplitStrategy is used there is not even a try to rely on ACID side file 
when calculating file length, so that needs to fixed too.

Moreover we might see the newly committed rows not to appear due to OrcTail 
caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
cache off to anyone that wants real time row updates to be read in:
{code:java}
set hive.orc.cache.stripe.details.mem.size=0;  {code}
..as tweaking with that code would probably open a can of worms..

  was:
in HDFS environment if a writer is using hflush to write ORC ACID files during 
a transaction commit, the results might be seen as missing when reading the 
table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just 
ensures that new readers can see the new content. This causes the block 
information to be incomplete, on which BISplitStrategy relies on. Although the 
side file (_flush_length) tracks the proper end of the file that is being 
written, this information is neglected in the favour of block information, and 
we may end up generating a very short split instead of the larger, available 
length.


> Committed rows in hflush'd ACID files may be missing from query result
> --
>
> Key: HIVE-24266
> URL: https://issues.apache.org/jira/browse/HIVE-24266
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> in HDFS environment if a writer is using hflush to write ORC ACID files 
> during a transaction commit, the results might be seen as missing when 
> reading the table before this file is completely persisted to disk (thus 
> synced)
> This is due to hflush not persisting the new buffers to disk, it rather just 
> ensures that new readers can see the new content. This causes the block 
> information to be incomplete, on which BISplitStrategy relies on. Although 
> the side file (_flush_length) tracks the proper end of the file that is being 
> written, this information is neglected in the favour of block information, 
> and we may end up generating a very short split instead of the larger, 
> available length.
> When ETLSplitStrategy is used there is not even a try to rely on ACID side 
> file when calculating file length, so that needs to fixed too.
> Moreover we might see the newly committed rows not to appear due to OrcTail 
> caching in ETLSplitStrategy. For now I'm just going to recommend turning that 
> cache off to anyone that wants real time row updates to be read in:
> {code:java}
> set hive.orc.cache.stripe.details.mem.size=0;  {code}
> ..as tweaking with that code would probably open a can of worms..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24247?focusedWorklogId=500039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500039
 ]

ASF GitHub Bot logged work on HIVE-24247:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 12:48
Start Date: 13/Oct/20 12:48
Worklog Time Spent: 10m 
  Work Description: adesh-rao opened a new pull request #1575:
URL: https://github.com/apache/hive/pull/1575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500039)
Remaining Estimate: 0h
Time Spent: 10m

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24247) StorageBasedAuthorizationProvider does not look into Hadoop ACL while check for access

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24247:
--
Labels: pull-request-available  (was: )

> StorageBasedAuthorizationProvider does not look into Hadoop ACL while check 
> for access
> --
>
> Key: HIVE-24247
> URL: https://issues.apache.org/jira/browse/HIVE-24247
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> StorageBasedAuthorizationProvider uses
> {noformat}
> FileSystem.access(Path, Action)
> {noformat}
> method to check the access.
> This method gets the FileStatus object and checks access based on that. ACL's 
> are not present in FileStatus.
>  
> Instead, Hive should use
> {noformat}
> FileSystem.get(path.toUri(), conf);
> {noformat}
> {noformat}
> .access(Path, Action)
> {noformat}
> where the implemented file system can do the access checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24257) Wrong check constraint naming in Hive metastore

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24257?focusedWorklogId=500017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-500017
 ]

ASF GitHub Bot logged work on HIVE-24257:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 12:07
Start Date: 13/Oct/20 12:07
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on pull request #1570:
URL: https://github.com/apache/hive/pull/1570#issuecomment-707694433


   LGTM. +1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 500017)
Time Spent: 20m  (was: 10m)

> Wrong check constraint naming in Hive metastore
> ---
>
> Key: HIVE-24257
> URL: https://issues.apache.org/jira/browse/HIVE-24257
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current 
> struct SQLCheckConstraint {
>   1: string catName, // catalog name
>   2: string table_db,// table schema
>   3: string table_name,  // table name
>   4: string column_name, // column name
>   5: string check_expression,// check expression
>   6: string dc_name, // default name
>   7: bool enable_cstr,   // Enable/Disable
>   8: bool validate_cstr, // Validate/No validate
>   9: bool rely_cstr  // Rely/No Rely
> }
> Naming for CheckConstraint is wrong it should be cc_name instead of dc_name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499982
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:31
Start Date: 13/Oct/20 10:31
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503843227



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query2.q.out
##
@@ -128,46 +128,104 @@ Plan optimized by CBO.
 
 Vertex dependency in root stage
 Map 1 <- Union 2 (CONTAINS)
-Map 9 <- Union 2 (CONTAINS)
-Reducer 3 <- Map 10 (SIMPLE_EDGE), Union 2 (SIMPLE_EDGE)
+Map 13 <- Union 14 (CONTAINS)
+Map 15 <- Union 14 (CONTAINS)
+Map 8 <- Union 2 (CONTAINS)
+Reducer 10 <- Map 9 (SIMPLE_EDGE), Union 14 (SIMPLE_EDGE)
+Reducer 11 <- Reducer 10 (SIMPLE_EDGE)
+Reducer 12 <- Map 9 (SIMPLE_EDGE), Reducer 11 (SIMPLE_EDGE)
+Reducer 3 <- Map 9 (SIMPLE_EDGE), Union 2 (SIMPLE_EDGE)
 Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
-Reducer 5 <- Map 10 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
-Reducer 6 <- Reducer 5 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)
+Reducer 5 <- Map 9 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
+Reducer 6 <- Reducer 12 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
-Reducer 8 <- Map 10 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 
 Stage-0
   Fetch Operator
 limit:-1
 Stage-1
   Reducer 7 vectorized
-  File Output Operator [FS_173]
-Select Operator [SEL_172] (rows=12881 width=788)
+  File Output Operator [FS_187]
+Select Operator [SEL_186] (rows=12881 width=788)
   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"]
 <-Reducer 6 [SIMPLE_EDGE]
   SHUFFLE [RS_57]
 Select Operator [SEL_56] (rows=12881 width=788)
   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"]
   Merge Join Operator [MERGEJOIN_146] (rows=12881 width=1572)
 Conds:RS_53.(_col0 - 
53)=RS_54._col0(Inner),Output:["_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16"]
+  <-Reducer 12 [SIMPLE_EDGE]
+SHUFFLE [RS_54]
+  PartitionCols:_col0
+  Merge Join Operator [MERGEJOIN_145] (rows=652 width=788)
+
Conds:RS_185._col0=RS_181._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"]
+  <-Map 9 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_181]
+  PartitionCols:_col0
+  Select Operator [SEL_177] (rows=652 width=4)
+Output:["_col0"]
+Filter Operator [FIL_173] (rows=652 width=8)
+  predicate:((d_year = 2001) and d_week_seq is not 
null)
+  TableScan [TS_8] (rows=73049 width=99)
+
default@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_week_seq","d_day_name","d_year"]
+  <-Reducer 11 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_185]
+  PartitionCols:_col0
+  Group By Operator [GBY_184] (rows=13152 width=788)
+
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","sum(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","sum(VALUE._col5)","sum(VALUE._col6)"],keys:KEY._col0
+  <-Reducer 10 [SIMPLE_EDGE]
+SHUFFLE [RS_40]
+  PartitionCols:_col0
+  Group By Operator [GBY_39] (rows=3182784 width=788)
+
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"],aggregations:["sum(_col1)","sum(_col2)","sum(_col3)","sum(_col4)","sum(_col5)","sum(_col6)","sum(_col7)"],keys:_col0
+Select Operator [SEL_37] (rows=430516591 width=143)
+  
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"]
+  Merge Join Operator [MERGEJOIN_144] 
(rows=430516591 width=143)
+Conds:Union 
14._col0=RS_180._col0(Inner),Output:["_col1","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
+  <-Map 9 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_180]
+  PartitionCols:_col0
+  Select Operator [SEL_176] (rows=73049 
width=36)
+
Output:["_col0","_col1","_col2","_col3","_col4"

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499981
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:27
Start Date: 13/Oct/20 10:27
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503840610



##
File path: ql/src/test/results/clientpositive/perf/tez/constraints/query44.q.out
##
@@ -103,102 +107,143 @@ Stage-0
 Top N Key Operator [TNK_99] (rows=6951 width=218)
   keys:_col1,top n:100
   Merge Join Operator [MERGEJOIN_116] (rows=6951 width=218)
-
Conds:RS_66._col2=RS_146._col0(Inner),Output:["_col1","_col5","_col7"]
-  <-Map 11 [SIMPLE_EDGE] vectorized
-SHUFFLE [RS_146]
+
Conds:RS_66._col2=RS_163._col0(Inner),Output:["_col1","_col5","_col7"]
+  <-Map 14 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_163]
   PartitionCols:_col0
-  Select Operator [SEL_144] (rows=462000 width=111)
+  Select Operator [SEL_161] (rows=462000 width=111)
 Output:["_col0","_col1"]
 TableScan [TS_56] (rows=462000 width=111)
   
default@item,i1,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_product_name"]
   <-Reducer 6 [SIMPLE_EDGE]
 SHUFFLE [RS_66]
   PartitionCols:_col2
   Merge Join Operator [MERGEJOIN_115] (rows=6951 width=115)
-
Conds:RS_63._col0=RS_145._col0(Inner),Output:["_col1","_col2","_col5"]
-  <-Map 11 [SIMPLE_EDGE] vectorized
-SHUFFLE [RS_145]
+
Conds:RS_63._col0=RS_162._col0(Inner),Output:["_col1","_col2","_col5"]
+  <-Map 14 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_162]
   PartitionCols:_col0
-   Please refer to the previous Select Operator 
[SEL_144]
+   Please refer to the previous Select Operator 
[SEL_161]
   <-Reducer 5 [SIMPLE_EDGE]
 SHUFFLE [RS_63]
   PartitionCols:_col0
   Merge Join Operator [MERGEJOIN_114] (rows=6951 
width=12)
-
Conds:RS_138._col1=RS_143._col1(Inner),Output:["_col0","_col1","_col2"]
-  <-Reducer 4 [SIMPLE_EDGE] vectorized
-SHUFFLE [RS_138]
+
Conds:RS_146._col1=RS_160._col1(Inner),Output:["_col0","_col1","_col2"]
+  <-Reducer 12 [SIMPLE_EDGE] vectorized
+SHUFFLE [RS_160]
   PartitionCols:_col1
-  Select Operator [SEL_137] (rows=6951 width=8)
+  Select Operator [SEL_159] (rows=6951 width=8)
 Output:["_col0","_col1"]
-Filter Operator [FIL_136] (rows=6951 width=116)
+Filter Operator [FIL_158] (rows=6951 width=116)
   predicate:(rank_window_0 < 11)
-  PTF Operator [PTF_135] (rows=20854 width=116)
-Function 
definitions:[{},{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS 
LAST","partition by:":"0"}]
-Select Operator [SEL_134] (rows=20854 
width=116)
+  PTF Operator [PTF_157] (rows=20854 width=116)
+Function 
definitions:[{},{"name:":"windowingtablefunction","order by:":"_col1 DESC NULLS 
FIRST","partition by:":"0"}]
+Select Operator [SEL_156] (rows=20854 
width=116)
   Output:["_col0","_col1"]
-<-Reducer 3 [SIMPLE_EDGE]
-  SHUFFLE [RS_21]
+<-Reducer 11 [SIMPLE_EDGE]
+  SHUFFLE [RS_49]
 PartitionCols:0
-Top N Key Operator [TNK_100] 
(rows=20854 width=228)
+Top N Key Operator [TNK_101] 
(rows=20854 width=228)
   keys:_col1,top n:11
-  Filter Operator [FIL_20] (rows=20854 
width=228)
+

[jira] [Updated] (HIVE-24256) REPL LOAD fails because of unquoted column name

2020-10-13 Thread Viacheslav Avramenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Avramenko updated HIVE-24256:

   Attachment: HIVE-24256.01.patch
Fix Version/s: 4.0.0
   Status: Patch Available  (was: Open)

> REPL LOAD fails because of unquoted column name
> ---
>
> Key: HIVE-24256
> URL: https://issues.apache.org/jira/browse/HIVE-24256
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Viacheslav Avramenko
>Assignee: Viacheslav Avramenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-24256.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is unquoted column name NWI_TABLE in one of the SQL queries which 
> executed during REPL LOAD.
>  This causes the command to fail when Postgres is used for metastore.
> {code:sql}
> SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE \"NWI_DATABASE\" = ? AND 
> NWI_TABLE = ?
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499979
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:26
Start Date: 13/Oct/20 10:26
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503840362



##
File path: ql/src/test/results/clientpositive/llap/subquery_in.q.out
##
@@ -5078,9 +5087,10 @@ STAGE PLANS:
   Edges:
 Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
-Reducer 4 <- Reducer 3 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
+Reducer 4 <- Reducer 3 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
 Reducer 5 <- Map 1 (SIMPLE_EDGE)
 Reducer 6 <- Map 1 (SIMPLE_EDGE)
+Reducer 7 <- Map 1 (SIMPLE_EDGE)

Review comment:
   yes; I was too brave to enable ts merging for every existing op... :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499979)
Time Spent: 2h 40m  (was: 2.5h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499978
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:24
Start Date: 13/Oct/20 10:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503838695



##
File path: ql/src/test/results/clientpositive/llap/subquery_in.q.out
##
@@ -4355,6 +4355,9 @@ STAGE PLANS:
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 13 Data size: 1352 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Filter Operator

Review comment:
   this difference is gone - no more ts merging for now; will get back to 
it later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499978)
Time Spent: 2.5h  (was: 2h 20m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499977
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:23
Start Date: 13/Oct/20 10:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503838522



##
File path: 
ql/src/test/results/clientpositive/llap/special_character_in_tabnames_1.q.out
##
@@ -1986,18 +1986,18 @@ STAGE PLANS:
 Tez
  A masked pattern was here 
   Edges:
-Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 6 (SIMPLE_EDGE)
+Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
-Reducer 4 <- Reducer 3 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
+Reducer 4 <- Reducer 3 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
 Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
-Reducer 7 <- Map 6 (SIMPLE_EDGE)
+Reducer 6 <- Map 1 (SIMPLE_EDGE)
  A masked pattern was here 
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: b
-  filterExpr: key is not null (type: boolean)
+  filterExpr: (key is not null or (key > '9')) (type: boolean)

Review comment:
   this difference is gone - no more ts merging for now; will get back to 
it later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499977)
Time Spent: 2h 20m  (was: 2h 10m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499976
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:23
Start Date: 13/Oct/20 10:23
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503838169



##
File path: ql/src/test/results/clientpositive/llap/sharedworkresidual.q.out
##
@@ -143,6 +143,10 @@ STAGE PLANS:
 sort order: 
 Statistics: Num rows: 1 Data size: 188 Basic stats: 
COMPLETE Column stats: NONE
 value expressions: _col0 (type: string)
+Select Operator

Review comment:
   this difference is gone - no more ts merging for now; will get back to 
it later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499976)
Time Spent: 2h 10m  (was: 2h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24269:

Description: https://github.com/apache/hive/pull/1553#discussion_r503837757

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499975
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:22
Start Date: 13/Oct/20 10:22
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503837757



##
File path: ql/src/test/results/clientpositive/llap/ppd_repeated_alias.q.out
##
@@ -348,14 +348,14 @@ STAGE PLANS:
  A masked pattern was here 
   Edges:
 Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
-Reducer 3 <- Map 4 (XPROD_EDGE), Reducer 2 (XPROD_EDGE)
+Reducer 3 <- Map 1 (XPROD_EDGE), Reducer 2 (XPROD_EDGE)
  A masked pattern was here 
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: c
-  filterExpr: foo is not null (type: boolean)
+  filterExpr: (foo is not null or (foo = 1)) (type: boolean)

Review comment:
   this difference is gone - no more ts merging for now; will get back to 
it later
   
   yes; we might want to run simplification on these - I've opened HIVE-24269 
to add that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499975)
Time Spent: 2h  (was: 1h 50m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24269:
---


> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499972
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:18
Start Date: 13/Oct/20 10:18
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503835536



##
File path: ql/src/test/results/clientpositive/llap/join_parse.q.out
##
@@ -499,34 +499,41 @@ STAGE PLANS:
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 500 Data size: 43500 Basic 
stats: COMPLETE Column stats: COMPLETE
+  Filter Operator
+predicate: (value is not null and key is not null) (type: 
boolean)
+Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
+Select Operator
+  expressions: key (type: string), value (type: string)
+  outputColumnNames: _col0, _col1
+  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: string)
 null sort order: z
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
-Statistics: Num rows: 500 Data size: 43500 Basic 
stats: COMPLETE Column stats: COMPLETE
+Statistics: Num rows: 500 Data size: 89000 Basic 
stats: COMPLETE Column stats: COMPLETE
+value expressions: _col1 (type: string)
 Execution mode: vectorized, llap
 LLAP IO: all inputs
 Map 6 
 Map Operator Tree:
 TableScan
-  alias: src1
-  filterExpr: (value is not null and key is not null) (type: 
boolean)
-  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
+  alias: src2

Review comment:
   difference is gone - no more ts merging for now; will get back to it 
later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499972)
Time Spent: 1h 50m  (was: 1h 40m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24268) Investigate srcpart scans in dynamic_partition_pruning test

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24268:

Description: 
there seems to be some opportunities missed by shared work optimizer

see srcpart scans around 
[here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]

https://github.com/apache/hive/pull/1553#discussion_r503834803


  was:
there seems to be some opportunities missed by shared work optimizer

see srcpart scans around 
[here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]




> Investigate srcpart scans in dynamic_partition_pruning test
> ---
>
> Key: HIVE-24268
> URL: https://issues.apache.org/jira/browse/HIVE-24268
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there seems to be some opportunities missed by shared work optimizer
> see srcpart scans around 
> [here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]
> https://github.com/apache/hive/pull/1553#discussion_r503834803



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499969
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:17
Start Date: 13/Oct/20 10:17
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503834803



##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_partition_pruning.q.out
##
@@ -4277,21 +4277,37 @@ STAGE PLANS:
   alias: srcpart
   filterExpr: ds is not null (type: boolean)
   Statistics: Num rows: 2000 Data size: 389248 Basic stats: 
COMPLETE Column stats: COMPLETE
-  Group By Operator
-keys: ds (type: string)
-minReductionHashAggr: 0.99
-mode: hash
-outputColumnNames: _col0
-Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-Reduce Output Operator
-  key expressions: _col0 (type: string)
-  null sort order: z
-  sort order: +
-  Map-reduce partition columns: _col0 (type: string)
+  Filter Operator
+predicate: ds is not null (type: boolean)

Review comment:
   I see 2 filter operators doing the same in this plan - which will be 
merged by the "downstream merge" patch.
   
   However; to my best knowledge the TS filterExpr should only be considered 
"best-effort" because the reader may decide to not filter by some parts of the 
expr.
   
   In this plan we should probably have removed 2 extra srcpart scans by the 
`SubTree` logic - that should be investigated - I've opened HIVE-24268
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499969)
Time Spent: 1h 40m  (was: 1.5h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24268) Investigate srcpart scans in dynamic_partition_pruning test

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24268:
---


> Investigate srcpart scans in dynamic_partition_pruning test
> ---
>
> Key: HIVE-24268
> URL: https://issues.apache.org/jira/browse/HIVE-24268
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there seems to be some opportunities missed by shared work optimizer
> see srcpart scans around 
> [here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499962
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:03
Start Date: 13/Oct/20 10:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503826556



##
File path: 
ql/src/test/results/clientpositive/llap/cbo_SortUnionTransposeRule.q.out
##
@@ -1006,6 +1027,22 @@ STAGE PLANS:
   output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
   serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Reducer 5 
+Execution mode: vectorized, llap

Review comment:
   This is caused by enabling the schema merge for all the optimizations. 
Apparently the greedy operator chain matching logic worked slightly better when 
it was run w/o ts-merge first and only then executed to also consider merging 
the schema.
   
   In this patch I will only introduce the new optimization and do the 
generalization in either in the "downstream merge" patch or completely 
separately; it will worth because it will enable the `RemoveSemiJoin` mode to 
consider merging the ts schema





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499962)
Time Spent: 1.5h  (was: 1h 20m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24267) RetryingClientTimeBased should always perform first invocation

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24267?focusedWorklogId=499960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499960
 ]

ASF GitHub Bot logged work on HIVE-24267:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 10:00
Start Date: 13/Oct/20 10:00
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1573:
URL: https://github.com/apache/hive/pull/1573#discussion_r503824826



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestAtlasDumpTask.java
##
@@ -114,4 +123,55 @@ public void testAtlasRestClientBuilder() throws 
SemanticException, IOException {
 AtlasRestClient atlasClient = atlasRestCleintBuilder.getClient(conf);
 Assert.assertTrue(atlasClient != null);
   }
+
+  @Test
+  public void testRetryingClientTimeBased() throws SemanticException, 
IOException, AtlasServiceException {
+AtlasClientV2 atlasClientV2 = mock(AtlasClientV2.class);
+AtlasExportRequest exportRequest = mock(AtlasExportRequest.class);
+String exportResponseData = "dumpExportContent";
+InputStream exportedMetadataIS = new 
ByteArrayInputStream(exportResponseData.getBytes(StandardCharsets.UTF_8));
+
when(atlasClientV2.exportData(any(AtlasExportRequest.class))).thenReturn(exportedMetadataIS);
+when(exportRequest.toString()).thenReturn("dummyExportRequest");
+when(conf.getTimeVar(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION, 
TimeUnit.SECONDS)).thenReturn(60L);
+when(conf.getTimeVar(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY, 
TimeUnit.SECONDS)).thenReturn(1L);
+AtlasRestClient atlasClient = new AtlasRestClientImpl(atlasClientV2, conf);
+AtlasRestClientImpl atlasRestClientImpl = (AtlasRestClientImpl)atlasClient;
+InputStream inputStream = atlasRestClientImpl.exportData(exportRequest);
+ArgumentCaptor expReqCaptor = 
ArgumentCaptor.forClass(AtlasExportRequest.class);
+Mockito.verify(atlasClientV2, 
Mockito.times(1)).exportData(expReqCaptor.capture());
+Assert.assertEquals(expReqCaptor.getValue().toString(), 
"dummyExportRequest");
+byte[] exportResponseDataReadBytes = new byte[exportResponseData.length()];
+inputStream.read(exportResponseDataReadBytes);
+String exportResponseDataReadString = new 
String(exportResponseDataReadBytes, StandardCharsets.UTF_8);
+Assert.assertEquals(exportResponseData, exportResponseDataReadString);
+  }
+
+  @Test
+  public void testRetryingClientTimeBasedExhausted() throws 
AtlasServiceException {
+AtlasClientV2 atlasClientV2 = mock(AtlasClientV2.class);
+AtlasExportRequest exportRequest = mock(AtlasExportRequest.class);
+AtlasServiceException atlasServiceException = 
mock(AtlasServiceException.class);
+when(atlasServiceException.getMessage()).thenReturn("import or export is 
in progress");
+
when(atlasClientV2.exportData(any(AtlasExportRequest.class))).thenThrow(atlasServiceException);
+when(exportRequest.toString()).thenReturn("dummyExportRequest");
+when(conf.getTimeVar(HiveConf.ConfVars.REPL_RETRY_TOTAL_DURATION, 
TimeUnit.SECONDS)).thenReturn(60L);
+when(conf.getTimeVar(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY, 
TimeUnit.SECONDS)).thenReturn(10L);
+
when(conf.getTimeVar(HiveConf.ConfVars.REPL_RETRY_MAX_DELAY_BETWEEN_RETRIES, 
TimeUnit.SECONDS)).thenReturn(20L);
+
when(conf.getFloatVar(HiveConf.ConfVars.REPL_RETRY_BACKOFF_COEFFICIENT)).thenReturn(2.0f);
+AtlasRestClient atlasClient = new AtlasRestClientImpl(atlasClientV2, conf);
+AtlasRestClientImpl atlasRestClientImpl = (AtlasRestClientImpl)atlasClient;
+InputStream inputStream = null;
+try {
+  inputStream = atlasRestClientImpl.exportData(exportRequest);
+  Assert.fail("Should have thrown SemanticException.");
+} catch (SemanticException ex) {
+  Assert.assertTrue(ex.getMessage().contains("Retry exhausted for 
retryable error code"));

Review comment:
   The actual exception should also be present

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/AtlasDumpTask.java
##
@@ -194,9 +194,11 @@ long dumpAtlasMetaData(AtlasRequestBuilder 
atlasRequestBuilder, AtlasReplInfo at
   AtlasExportRequest exportRequest = 
atlasRequestBuilder.createExportRequest(atlasReplInfo,
   atlasReplInfo.getSrcCluster());
   inputStream = atlasRestClient.exportData(exportRequest);
-  FileSystem fs = 
atlasReplInfo.getStagingDir().getFileSystem(atlasReplInfo.getConf());
-  Path exportFilePath = new Path(atlasReplInfo.getStagingDir(), 
ReplUtils.REPL_ATLAS_EXPORT_FILE_NAME);
-  numBytesWritten = Utils.writeFile(fs, exportFilePath, inputStream, conf);
+  if (inputStream != null) {
+FileSystem fs = 
atlasReplInfo.getStagingDir().getFileSystem(atlasReplInfo.getConf());
+Path exportFilePath = new Path(atlasReplInfo.getStagingDir(), 
ReplUtils.REPL_ATLAS_EXP

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499959
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 09:58
Start Date: 13/Oct/20 09:58
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503823443



##
File path: ql/src/test/queries/clientpositive/explainuser_1.q
##
@@ -9,6 +9,7 @@
 --! qt:dataset:cbo_t1
 set hive.vectorized.execution.enabled=false;
 set hive.strict.checks.bucketing=false;
+set hive.optimize.shared.work.dppunion=false;

Review comment:
   I didn't wanted the new feature to do further twists with the q.out of 
existing "directed" tests
   
   I've removed these set calls from the q files





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499959)
Time Spent: 1h 20m  (was: 1h 10m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499957
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 09:54
Start Date: 13/Oct/20 09:54
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503821169



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -386,125 +456,81 @@ public boolean sharedWorkOptimization(ParseContext pctx, 
SharedWorkOptimizerCach
   LOG.debug("Merging subtree starting at {} into subtree starting 
at {}",
   discardableTsOp, retainableTsOp);
 } else {
-  ExprNodeDesc newRetainableTsFilterExpr = null;
-  List semijoinExprNodes = new ArrayList<>();
-  if (retainableTsOp.getConf().getFilterExpr() != null) {
-// Gather SJ expressions and normal expressions
-List allExprNodesExceptSemijoin = new 
ArrayList<>();
-splitExpressions(retainableTsOp.getConf().getFilterExpr(),
-allExprNodesExceptSemijoin, semijoinExprNodes);
-// Create new expressions
-if (allExprNodesExceptSemijoin.size() > 1) {
-  newRetainableTsFilterExpr = 
ExprNodeGenericFuncDesc.newInstance(
-  new GenericUDFOPAnd(), allExprNodesExceptSemijoin);
-} else if (allExprNodesExceptSemijoin.size() > 0 &&
-allExprNodesExceptSemijoin.get(0) instanceof 
ExprNodeGenericFuncDesc) {
-  newRetainableTsFilterExpr = 
allExprNodesExceptSemijoin.get(0);
-}
-// Push filter on top of children for retainable
-pushFilterToTopOfTableScan(optimizerCache, retainableTsOp);
+
+  if (sr.discardableOps.size() > 1) {
+throw new RuntimeException("we can't discard more in this 
path");
   }
-  ExprNodeDesc newDiscardableTsFilterExpr = null;
-  if (discardableTsOp.getConf().getFilterExpr() != null) {
-// If there is a single discardable operator, it is a 
TableScanOperator
-// and it means that we will merge filter expressions for it. 
Thus, we
-// might need to remove DPP predicates before doing that
-List allExprNodesExceptSemijoin = new 
ArrayList<>();
-splitExpressions(discardableTsOp.getConf().getFilterExpr(),
-allExprNodesExceptSemijoin, new ArrayList<>());
-// Create new expressions
-if (allExprNodesExceptSemijoin.size() > 1) {
-  newDiscardableTsFilterExpr = 
ExprNodeGenericFuncDesc.newInstance(
-  new GenericUDFOPAnd(), allExprNodesExceptSemijoin);
-} else if (allExprNodesExceptSemijoin.size() > 0 &&
-allExprNodesExceptSemijoin.get(0) instanceof 
ExprNodeGenericFuncDesc) {
-  newDiscardableTsFilterExpr = 
allExprNodesExceptSemijoin.get(0);
-}
-// Remove and add semijoin filter from expressions
-replaceSemijoinExpressions(discardableTsOp, semijoinExprNodes);
-// Push filter on top of children for discardable
-pushFilterToTopOfTableScan(optimizerCache, discardableTsOp);
+
+  SharedWorkModel modelR = new SharedWorkModel(retainableTsOp);
+  SharedWorkModel modelD = new SharedWorkModel(discardableTsOp);
+
+  // Push filter on top of children for retainable
+  pushFilterToTopOfTableScan(optimizerCache, retainableTsOp);
+
+  if (mode == Mode.RemoveSemijoin || mode == Mode.SubtreeMerge) {
+// FIXME: I think idea here is to clear the discardable's 
semijoin filter
+// - by using the retainable's (which should be empty in case 
of this mode)
+replaceSemijoinExpressions(discardableTsOp, 
modelR.getSemiJoinFilter());
   }
+  // Push filter on top of children for discardable
+  pushFilterToTopOfTableScan(optimizerCache, discardableTsOp);
+
   // Obtain filter for shared TS operator
-  ExprNodeGenericFuncDesc exprNode = null;
-  if (newRetainableTsFilterExpr != null && 
newDiscardableTsFilterExpr != null) {
-// Combine
-exprNode = (ExprNodeGenericFuncDesc) newRetainableTsFilterExpr;
-if (!exprNode.isSame(newDiscardableTsFilterExpr)) {
-  // We merge filters from previous scan by ORing with filters 
from current scan
-  if (exprNode.getGene

[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-10-13 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213014#comment-17213014
 ] 

Ismaël Mejía commented on HIVE-21737:
-

[~csun] I think the first thing I will do is probably to update the Hive patch 
to refer temporally to the Avro SNAPSHOT that fixes the validation issue to be 
sure it passes the full Hive suite before doing the release of Avro. WDYT? I 
hope Hive tests let us just put the SNAPSHOTs deps only for the tests and I 
will update the patch once the release is out.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499948
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 09:40
Start Date: 13/Oct/20 09:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503812132



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -386,125 +456,81 @@ public boolean sharedWorkOptimization(ParseContext pctx, 
SharedWorkOptimizerCach
   LOG.debug("Merging subtree starting at {} into subtree starting 
at {}",
   discardableTsOp, retainableTsOp);
 } else {
-  ExprNodeDesc newRetainableTsFilterExpr = null;
-  List semijoinExprNodes = new ArrayList<>();
-  if (retainableTsOp.getConf().getFilterExpr() != null) {
-// Gather SJ expressions and normal expressions
-List allExprNodesExceptSemijoin = new 
ArrayList<>();
-splitExpressions(retainableTsOp.getConf().getFilterExpr(),
-allExprNodesExceptSemijoin, semijoinExprNodes);
-// Create new expressions
-if (allExprNodesExceptSemijoin.size() > 1) {
-  newRetainableTsFilterExpr = 
ExprNodeGenericFuncDesc.newInstance(
-  new GenericUDFOPAnd(), allExprNodesExceptSemijoin);
-} else if (allExprNodesExceptSemijoin.size() > 0 &&
-allExprNodesExceptSemijoin.get(0) instanceof 
ExprNodeGenericFuncDesc) {
-  newRetainableTsFilterExpr = 
allExprNodesExceptSemijoin.get(0);
-}
-// Push filter on top of children for retainable
-pushFilterToTopOfTableScan(optimizerCache, retainableTsOp);
+
+  if (sr.discardableOps.size() > 1) {
+throw new RuntimeException("we can't discard more in this 
path");

Review comment:
   there could be a few things which could go south here - one is that 
pushing filters out from the discardable ts will most likely not work as 
desired.
   
   I feel tempted to remove this multi operator matching stuff in HIVE-24241 - 
because that approach is much simpler; more separated from merging of the 
operators.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499948)
Time Spent: 1h  (was: 50m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499945
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 09:35
Start Date: 13/Oct/20 09:35
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503809039



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -326,6 +372,7 @@ public boolean sharedWorkOptimization(ParseContext pctx, 
SharedWorkOptimizerCach
 LOG.debug("{} and {} cannot be merged", retainableTsOp, 
discardableTsOp);
 continue;
   }
+  // FIXME: I think this optimization is assymetric; but the check 
is symmetric

Review comment:
   this could be done as a cleanup - however I've already concluded that 
because of the table ordering the problematic case will actually never happen; 
so we are safe
   
   I've removed the FIXME
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499945)
Time Spent: 50m  (was: 40m)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=499942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499942
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 09:32
Start Date: 13/Oct/20 09:32
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1553:
URL: https://github.com/apache/hive/pull/1553#discussion_r503806798



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -386,125 +456,81 @@ public boolean sharedWorkOptimization(ParseContext pctx, 
SharedWorkOptimizerCach
   LOG.debug("Merging subtree starting at {} into subtree starting 
at {}",
   discardableTsOp, retainableTsOp);
 } else {
-  ExprNodeDesc newRetainableTsFilterExpr = null;
-  List semijoinExprNodes = new ArrayList<>();
-  if (retainableTsOp.getConf().getFilterExpr() != null) {
-// Gather SJ expressions and normal expressions
-List allExprNodesExceptSemijoin = new 
ArrayList<>();
-splitExpressions(retainableTsOp.getConf().getFilterExpr(),
-allExprNodesExceptSemijoin, semijoinExprNodes);
-// Create new expressions
-if (allExprNodesExceptSemijoin.size() > 1) {
-  newRetainableTsFilterExpr = 
ExprNodeGenericFuncDesc.newInstance(
-  new GenericUDFOPAnd(), allExprNodesExceptSemijoin);
-} else if (allExprNodesExceptSemijoin.size() > 0 &&
-allExprNodesExceptSemijoin.get(0) instanceof 
ExprNodeGenericFuncDesc) {
-  newRetainableTsFilterExpr = 
allExprNodesExceptSemijoin.get(0);
-}
-// Push filter on top of children for retainable
-pushFilterToTopOfTableScan(optimizerCache, retainableTsOp);
+
+  if (sr.discardableOps.size() > 1) {
+throw new RuntimeException("we can't discard more in this 
path");
   }
-  ExprNodeDesc newDiscardableTsFilterExpr = null;
-  if (discardableTsOp.getConf().getFilterExpr() != null) {
-// If there is a single discardable operator, it is a 
TableScanOperator
-// and it means that we will merge filter expressions for it. 
Thus, we
-// might need to remove DPP predicates before doing that
-List allExprNodesExceptSemijoin = new 
ArrayList<>();
-splitExpressions(discardableTsOp.getConf().getFilterExpr(),
-allExprNodesExceptSemijoin, new ArrayList<>());
-// Create new expressions
-if (allExprNodesExceptSemijoin.size() > 1) {
-  newDiscardableTsFilterExpr = 
ExprNodeGenericFuncDesc.newInstance(
-  new GenericUDFOPAnd(), allExprNodesExceptSemijoin);
-} else if (allExprNodesExceptSemijoin.size() > 0 &&
-allExprNodesExceptSemijoin.get(0) instanceof 
ExprNodeGenericFuncDesc) {
-  newDiscardableTsFilterExpr = 
allExprNodesExceptSemijoin.get(0);
-}
-// Remove and add semijoin filter from expressions
-replaceSemijoinExpressions(discardableTsOp, semijoinExprNodes);
-// Push filter on top of children for discardable
-pushFilterToTopOfTableScan(optimizerCache, discardableTsOp);
+
+  SharedWorkModel modelR = new SharedWorkModel(retainableTsOp);
+  SharedWorkModel modelD = new SharedWorkModel(discardableTsOp);
+
+  // Push filter on top of children for retainable
+  pushFilterToTopOfTableScan(optimizerCache, retainableTsOp);
+
+  if (mode == Mode.RemoveSemijoin || mode == Mode.SubtreeMerge) {
+// FIXME: I think idea here is to clear the discardable's 
semijoin filter

Review comment:
   I made this note - because it was not obvious to me what's happening 
here..I've rephrased it to be easier to understand





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499942)
Time Spent: 40m  (was: 0.5h)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>

[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212986#comment-17212986
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

this is odd... HIVE-24202 doesn't look like something which have made stat 
related changes
I'll validate the bisect's result

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212983#comment-17212983
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

bisect ended up with 8f4f3b90fa5987e82025ecf81f8084c90130fd6b / HIVE-24202
cc: [~jcamachorodriguez]

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo

2020-10-13 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-24063.

Fix Version/s: 4.0.0
   Resolution: Fixed

> SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
> ---
>
> Key: HIVE-24063
> URL: https://issues.apache.org/jira/browse/HIVE-24063
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the current SqlOperator is SqlCastFunction, 
> FunctionRegistry.getFunctionInfo would return null, 
> but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to 
> metastore for the function definition,  an exception stack trace can be seen 
> here in HiveServer2 log:
> INFO exec.FunctionRegistry: Unable to look up default.cast in metastore
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Function @hive#default.cast does not exist)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] 
>  
> So it's may be better to handle explicit cast before geting the FunctionInfo 
> from Registry. Even if there is no cast in the query,  the method 
> handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo

2020-10-13 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212980#comment-17212980
 ] 

Zhihua Deng commented on HIVE-24063:


Thanks very much for your review, [~kgyrtkirk]!

> SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
> ---
>
> Key: HIVE-24063
> URL: https://issues.apache.org/jira/browse/HIVE-24063
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the current SqlOperator is SqlCastFunction, 
> FunctionRegistry.getFunctionInfo would return null, 
> but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to 
> metastore for the function definition,  an exception stack trace can be seen 
> here in HiveServer2 log:
> INFO exec.FunctionRegistry: Unable to look up default.cast in metastore
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Function @hive#default.cast does not exist)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] 
>  
> So it's may be better to handle explicit cast before geting the FunctionInfo 
> from Registry. Even if there is no cast in the query,  the method 
> handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-10-13 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212977#comment-17212977
 ] 

Zhihua Deng commented on HIVE-24069:


Thanks so much for your help and review, [~kgyrtkirk]!

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24107) Fix typo in ReloadFunctionsOperation

2020-10-13 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212974#comment-17212974
 ] 

Zhihua Deng commented on HIVE-24107:


Thank you very much for your help and review, [~kgyrtkirk]!

> Fix typo in ReloadFunctionsOperation
> 
>
> Key: HIVE-24107
> URL: https://issues.apache.org/jira/browse/HIVE-24107
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive.get() will register all functions as doRegisterAllFns is true,  so 
> Hive.get().reloadFunctions() may load all functions from metastore twice, use 
> Hive.get(false) instead may be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212971#comment-17212971
 ] 

Zhihua Deng commented on HIVE-24248:


Thanks very much for your help [~kgyrtkirk] and review [~kkasa]!

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode

2020-10-13 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212967#comment-17212967
 ] 

Zhihua Deng commented on HIVE-24146:


Thank you very much for your help and review, [~kgyrtkirk]!

> Cleanup TaskExecutionException in GenericUDTFExplode
> 
>
> Key: HIVE-24146
> URL: https://issues.apache.org/jira/browse/HIVE-24146
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> - Remove TaskExecutionException, which may be not used anymore;
> - Remove the default handling in GenericUDTFExplode#process, which has been 
> verified during the function initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24246) Fix for Ranger Deny policy overriding policy with same resource name

2020-10-13 Thread Anishek Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-24246:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix for Ranger Deny policy overriding policy with same resource name 
> -
>
> Key: HIVE-24246
> URL: https://issues.apache.org/jira/browse/HIVE-24246
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24246.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24246) Fix for Ranger Deny policy overriding policy with same resource name

2020-10-13 Thread Anishek Agarwal (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212958#comment-17212958
 ] 

Anishek Agarwal commented on HIVE-24246:


Committed to master, Thanks for the patch [~aasha] and review [~pkumarsinha]

> Fix for Ranger Deny policy overriding policy with same resource name 
> -
>
> Key: HIVE-24246
> URL: https://issues.apache.org/jira/browse/HIVE-24246
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24246.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24246) Fix for Ranger Deny policy overriding policy with same resource name

2020-10-13 Thread Pravin Sinha (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212931#comment-17212931
 ] 

Pravin Sinha commented on HIVE-24246:
-

+1

> Fix for Ranger Deny policy overriding policy with same resource name 
> -
>
> Key: HIVE-24246
> URL: https://issues.apache.org/jira/browse/HIVE-24246
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24246.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24248?focusedWorklogId=499910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499910
 ]

ASF GitHub Bot logged work on HIVE-24248:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:17
Start Date: 13/Oct/20 08:17
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1565:
URL: https://github.com/apache/hive/pull/1565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499910)
Time Spent: 2h 10m  (was: 2h)

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24246) Fix for Ranger Deny policy overriding policy with same resource name

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24246?focusedWorklogId=499911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499911
 ]

ASF GitHub Bot logged work on HIVE-24246:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:17
Start Date: 13/Oct/20 08:17
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1566:
URL: https://github.com/apache/hive/pull/1566#discussion_r503755873



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -162,7 +162,8 @@ private void initiateAuthorizationLoadTask() throws 
SemanticException {
 if 
(RANGER_AUTHORIZER.equalsIgnoreCase(conf.getVar(HiveConf.ConfVars.REPL_AUTHORIZATION_PROVIDER_SERVICE)))
 {
   Path rangerLoadRoot = new Path(new Path(work.dumpDirectory).getParent(), 
ReplUtils.REPL_RANGER_BASE_DIR);
   LOG.info("Adding Import Ranger Metadata Task from {} ", rangerLoadRoot);
-  RangerLoadWork rangerLoadWork = new RangerLoadWork(rangerLoadRoot, 
work.getSourceDbName(), work.dbNameToLoadIn,
+  String targetDbName = StringUtils.isEmpty(work.dbNameToLoadIn) ? 
work.getSourceDbName() : work.dbNameToLoadIn;

Review comment:
   no when you don't pass target db name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499911)
Time Spent: 40m  (was: 0.5h)

> Fix for Ranger Deny policy overriding policy with same resource name 
> -
>
> Key: HIVE-24246
> URL: https://issues.apache.org/jira/browse/HIVE-24246
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24246.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24248:
---

Assignee: Zhihua Deng

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24248.
-
Resolution: Fixed

merged into master. Thank you Zhihua Deng for fixing this and Krisztian for 
reviewing the changes!

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24248?focusedWorklogId=499909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499909
 ]

ASF GitHub Bot logged work on HIVE-24248:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:16
Start Date: 13/Oct/20 08:16
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1565:
URL: https://github.com/apache/hive/pull/1565#issuecomment-707574375


   test seems to be stable
   http://ci.hive.apache.org/job/hive-flaky-check/131/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499909)
Time Spent: 2h  (was: 1h 50m)

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24145) Fix preemption issues in reducers and file sink operators

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24145?focusedWorklogId=499908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499908
 ]

ASF GitHub Bot logged work on HIVE-24145:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:13
Start Date: 13/Oct/20 08:13
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1485:
URL: https://github.com/apache/hive/pull/1485


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499908)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix preemption issues in reducers and file sink operators
> -
>
> Key: HIVE-24145
> URL: https://issues.apache.org/jira/browse/HIVE-24145
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are two issues because of preemption:
>  # Reducers are getting reordered as part of optimizations because of which 
> more preemption happen
>  # Preemption in the middle of writing can cause the file to not close and 
> lead to errors when we read the file later



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24146.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Zhihua Deng!

> Cleanup TaskExecutionException in GenericUDTFExplode
> 
>
> Key: HIVE-24146
> URL: https://issues.apache.org/jira/browse/HIVE-24146
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> - Remove TaskExecutionException, which may be not used anymore;
> - Remove the default handling in GenericUDTFExplode#process, which has been 
> verified during the function initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24107) Fix typo in ReloadFunctionsOperation

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24107?focusedWorklogId=499906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499906
 ]

ASF GitHub Bot logged work on HIVE-24107:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:12
Start Date: 13/Oct/20 08:12
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1457:
URL: https://github.com/apache/hive/pull/1457


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499906)
Time Spent: 1h  (was: 50m)

> Fix typo in ReloadFunctionsOperation
> 
>
> Key: HIVE-24107
> URL: https://issues.apache.org/jira/browse/HIVE-24107
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive.get() will register all functions as doRegisterAllFns is true,  so 
> Hive.get().reloadFunctions() may load all functions from metastore twice, use 
> Hive.get(false) instead may be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24107) Fix typo in ReloadFunctionsOperation

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24107.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> Fix typo in ReloadFunctionsOperation
> 
>
> Key: HIVE-24107
> URL: https://issues.apache.org/jira/browse/HIVE-24107
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive.get() will register all functions as doRegisterAllFns is true,  so 
> Hive.get().reloadFunctions() may load all functions from metastore twice, use 
> Hive.get(false) instead may be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24146?focusedWorklogId=499907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499907
 ]

ASF GitHub Bot logged work on HIVE-24146:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:12
Start Date: 13/Oct/20 08:12
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1483:
URL: https://github.com/apache/hive/pull/1483


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499907)
Time Spent: 1h  (was: 50m)

> Cleanup TaskExecutionException in GenericUDTFExplode
> 
>
> Key: HIVE-24146
> URL: https://issues.apache.org/jira/browse/HIVE-24146
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> - Remove TaskExecutionException, which may be not used anymore;
> - Remove the default handling in GenericUDTFExplode#process, which has been 
> verified during the function initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-10-13 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24069.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24069?focusedWorklogId=499905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499905
 ]

ASF GitHub Bot logged work on HIVE-24069:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 08:10
Start Date: 13/Oct/20 08:10
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1429:
URL: https://github.com/apache/hive/pull/1429


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499905)
Time Spent: 50m  (was: 40m)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24059?focusedWorklogId=499897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499897
 ]

ASF GitHub Bot logged work on HIVE-24059:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 07:58
Start Date: 13/Oct/20 07:58
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1418:
URL: https://github.com/apache/hive/pull/1418


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499897)
Time Spent: 1h 50m  (was: 1h 40m)

> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24059.01.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.
> cc [~prasanth_j] [~jdere] [~anishek] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo

2020-10-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24063?focusedWorklogId=499899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499899
 ]

ASF GitHub Bot logged work on HIVE-24063:
-

Author: ASF GitHub Bot
Created on: 13/Oct/20 07:58
Start Date: 13/Oct/20 07:58
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1421:
URL: https://github.com/apache/hive/pull/1421


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499899)
Time Spent: 20m  (was: 10m)

> SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo
> ---
>
> Key: HIVE-24063
> URL: https://issues.apache.org/jira/browse/HIVE-24063
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the current SqlOperator is SqlCastFunction, 
> FunctionRegistry.getFunctionInfo would return null, 
> but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to 
> metastore for the function definition,  an exception stack trace can be seen 
> here in HiveServer2 log:
> INFO exec.FunctionRegistry: Unable to look up default.cast in metastore
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Function @hive#default.cast does not exist)
>  at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] 
>  
> So it's may be better to handle explicit cast before geting the FunctionInfo 
> from Registry. Even if there is no cast in the query,  the method 
> handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

92 matches

Mail list logo