[jira] [Updated] (HIVE-25332) Refactor UDF CAST( as DATE)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25332:
-
Summary: Refactor UDF CAST( as DATE)  (was: refactor UDF 
CAST( as DATE))

> Refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25332) refactor UDF CAST( as DATE)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25332:



> refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25333:


Assignee: Ashish Sharma

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25297) Refactor GenericUDFDateDiff

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25297:
-
Parent: HIVE-25333
Issue Type: Sub-task  (was: Task)

> Refactor GenericUDFDateDiff
> ---
>
> Key: HIVE-25297
> URL: https://issues.apache.org/jira/browse/HIVE-25297
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description
> Remove redundant code and refactor entire GenericUDFDateDiff.class code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25332) Refactor UDF CAST( as DATE)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25332:
-
Parent: HIVE-25333
Issue Type: Sub-task  (was: Improvement)

> Refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25332) Refactor UDF CAST( as DATE)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25332 started by Ashish Sharma.

> Refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25333 started by Ashish Sharma.

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25332) Refactor UDF CAST( as DATE)

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?focusedWorklogId=622896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622896
 ]

ASF GitHub Bot logged work on HIVE-25332:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 07:20
Start Date: 15/Jul/21 07:20
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #2480:
URL: https://github.com/apache/hive/pull/2480


   
   
   ### What changes were proposed in this pull request?
   
   
   Refactoring GenericUDFToDate to reduce code redundancy 
   ### Why are the changes needed?
   
   
   Complete code refactor.
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622896)
Remaining Estimate: 0h
Time Spent: 10m

> Refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25332) Refactor UDF CAST( as DATE)

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25332:
--
Labels: pull-request-available  (was: )

> Refactor UDF CAST( as DATE)
> 
>
> Key: HIVE-25332
> URL: https://issues.apache.org/jira/browse/HIVE-25332
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description
> CAST( as DATE) is done by GenericUDFToDate.class which is 
> written back in 2013 and hasn't been refactor after since also there is code 
> duplicate as well.
> DOD
> Refactor entire UDF 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25333:

Component/s: UDF

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25333:

Target Version/s: 4.0.0

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: Refactoring, UDF
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25333:

Labels: Refactoring UDF  (was: )

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: Refactoring, UDF
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25333) Refactor Existing UDF

2021-07-15 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25333:

Affects Version/s: 4.0.0
   3.1.2

> Refactor Existing UDF
> -
>
> Key: HIVE-25333
> URL: https://issues.apache.org/jira/browse/HIVE-25333
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: Refactoring, UDF
>
> Description
> Most of the UDF code is write in 2013 no changes has done after since in UDF. 
> Objective of this EPIC - 
> 1. Refactor all existing UDF implementation from UDF.class to GenericUDF.class
> 2. Clean the up the code to reduce code duplication
> DOD
> Refactor all UDF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25334:



> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25290) Stabilize TestTxnHandler

2021-07-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-25290:
-

Assignee: Haymant Mangla

> Stabilize TestTxnHandler
> 
>
> Key: HIVE-25290
> URL: https://issues.apache.org/jira/browse/HIVE-25290
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Haymant Mangla
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-flaky-check/271/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25330) Make FS calls in CopyUtils retryable

2021-07-15 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-25330:
---

Assignee: Haymant Mangla  (was: Pravin Sinha)

> Make FS calls in CopyUtils retryable
> 
>
> Key: HIVE-25330
> URL: https://issues.apache.org/jira/browse/HIVE-25330
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25330) Make FS calls in CopyUtils retryable

2021-07-15 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381180#comment-17381180
 ] 

Pravin Sinha commented on HIVE-25330:
-

One such trace is this:
{code:java}
2021-07-09 03:34:30,643 ERROR org.apache.hadoop.hive.ql.exec.ReplCopyTask: 
[Thread-98208]: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1685)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1745)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1738)
at 
org.apache.hadoop.hive.ql.parse.repl.CopyUtils.doCopy(CopyUtils.java:154)
at 
org.apache.hadoop.hive.ql.parse.repl.CopyUtils.copyAndVerify(CopyUtils.java:114)
at 
org.apache.hadoop.hive.ql.exec.ReplCopyTask.execute(ReplCopyTask.java:155)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83){code}

> Make FS calls in CopyUtils retryable
> 
>
> Key: HIVE-25330
> URL: https://issues.apache.org/jira/browse/HIVE-25330
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Haymant Mangla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25288) Fix TestMmCompactorOnTez

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25288?focusedWorklogId=622916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622916
 ]

ASF GitHub Bot logged work on HIVE-25288:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:01
Start Date: 15/Jul/21 09:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2476:
URL: https://github.com/apache/hive/pull/2476#discussion_r670274361



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorOnTezTest.java
##
@@ -85,6 +86,8 @@ public void setup() throws Exception {
 hiveConf.setVar(HiveConf.ConfVars.METASTOREWAREHOUSE, TEST_WAREHOUSE_DIR);
 hiveConf.setVar(HiveConf.ConfVars.HIVEINPUTFORMAT, 
HiveInputFormat.class.getName());
 hiveConf.setVar(HiveConf.ConfVars.HIVEFETCHTASKCONVERSION, "none");
+hiveConf.set(MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT.getVarname(), 
"2000");

Review comment:
   We should use the code below to set MetastoreConf values to handle 
migration between HiveConf, MetastoreConf:
   ```
   MetastoreConf.setVar(hiveConf, 
MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT, "2000");
   ```
   
   This is not too relevant here since this is only test code, but I prefer to 
stick to the standard way everywhere, so the next contributor will learn 
correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622916)
Time Spent: 20m  (was: 10m)

> Fix TestMmCompactorOnTez
> 
>
> Key: HIVE-25288
> URL: https://issues.apache.org/jira/browse/HIVE-25288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/240/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25288) Fix TestMmCompactorOnTez

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25288?focusedWorklogId=622917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622917
 ]

ASF GitHub Bot logged work on HIVE-25288:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:02
Start Date: 15/Jul/21 09:02
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2476:
URL: https://github.com/apache/hive/pull/2476#discussion_r670274835



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -123,7 +123,7 @@ public void run() {
 long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
 final long cleanerWaterMark = minTxnIdSeenOpen < 0 ? minOpenTxnId 
: Math.min(minOpenTxnId, minTxnIdSeenOpen);
 
-LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
+LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);

Review comment:
   Do I understand correctly that this is the fix and the other things are 
just general improvements of the code?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622917)
Time Spent: 0.5h  (was: 20m)

> Fix TestMmCompactorOnTez
> 
>
> Key: HIVE-25288
> URL: https://issues.apache.org/jira/browse/HIVE-25288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/240/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25288) Fix TestMmCompactorOnTez

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25288?focusedWorklogId=622919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622919
 ]

ASF GitHub Bot logged work on HIVE-25288:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:04
Start Date: 15/Jul/21 09:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2476:
URL: https://github.com/apache/hive/pull/2476#discussion_r670276793



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -123,7 +123,7 @@ public void run() {
 long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
 final long cleanerWaterMark = minTxnIdSeenOpen < 0 ? minOpenTxnId 
: Math.min(minOpenTxnId, minTxnIdSeenOpen);
 
-LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
+LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);

Review comment:
   No, that's just the logging fix. The main fix is in TxnHandler, see 
deleteInvalidOpenTransactions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622919)
Time Spent: 40m  (was: 0.5h)

> Fix TestMmCompactorOnTez
> 
>
> Key: HIVE-25288
> URL: https://issues.apache.org/jira/browse/HIVE-25288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/240/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25325?focusedWorklogId=622920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622920
 ]

ASF GitHub Bot logged work on HIVE-25325:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:06
Start Date: 15/Jul/21 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2471:
URL: https://github.com/apache/hive/pull/2471#discussion_r670278116



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1313,6 +1313,186 @@ public void testScanTableCaseInsensitive() throws 
IOException {
 Assert.assertArrayEquals(new Object[] {1L, "Bob", "Green"}, rows.get(1));
   }
 
+  @Test
+  public void testTruncateTable() throws IOException, TException, 
InterruptedException {
+// Create an Iceberg table with some records in it then execute a truncate 
table command.
+// Then check if the data is deleted and the table statistics are reset to 
0.
+String databaseName = "default";
+String tableName = "customers";
+Table icebergTable = testTables.createTable(shell, tableName, 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
+testTruncateTable(databaseName, tableName, icebergTable, 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS,
+HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, true, false);
+  }
+
+  @Test
+  public void testTruncateEmptyTable() throws IOException, TException, 
InterruptedException {
+// Create an empty Iceberg table and execute a truncate table command on 
it.
+String databaseName = "default";
+String tableName = "customers";
+String fullTableName = databaseName + "." + tableName;

Review comment:
   nit: I usually use:
   ```
   TableIdentifier identifier = TableIdentifier.of(databaseName, tableName);
   [...]
   String alterTableCommand =
   "ALTER TABLE " + identifier + " SET 
TBLPROPERTIES('external.table.purge'='true')";
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622920)
Time Spent: 20m  (was: 10m)

> Add TRUNCATE TABLE support for Hive Iceberg tables
> --
>
> Key: HIVE-25325
> URL: https://issues.apache.org/jira/browse/HIVE-25325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement the TRUNCATE operation for Hive Iceberg tables. Since these tables 
> are unpartitioned in Hive, only the truncate unpartitioned table use case has 
> to be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-15 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu reassigned HIVE-25335:
--

Assignee: zhengchenyu

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25326) Include partitioning info in DESCRIBE TABLE command on Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25326?focusedWorklogId=622924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622924
 ]

ASF GitHub Bot logged work on HIVE-25326:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:11
Start Date: 15/Jul/21 09:11
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2472:
URL: https://github.com/apache/hive/pull/2472#discussion_r670282242



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -0,0 +1,22 @@
+DROP TABLE IF EXISTS ice_t;
+CREATE EXTERNAL TABLE ice_t (i int, s string, ts timestamp, d date) STORED BY 
ICEBERG;
+
+DROP TABLE IF EXISTS ice_t_transform;
+CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
+
+DROP TABLE IF EXISTS ice_t_transform_prop;
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+
+DROP TABLE IF EXISTS ice_t_identity_part;
+CREATE EXTERNAL TABLE ice_t_identity_part (a int) PARTITIONED BY (b string) 
STORED BY ICEBERG;
+
+DESCRIBE FORMATTED ice_t;
+DESCRIBE FORMATTED ice_t_transform;
+DESCRIBE FORMATTED ice_t_transform_prop;
+DESCRIBE FORMATTED ice_t_identity_part;
+
+SET hive.ddl.output.format=json;
+DESCRIBE EXTENDED ice_t;

Review comment:
   When the output format is set to `json` the partition information is 
only available when the `EXTENDED` keyword is used. But in q tests, the output 
of every `DESCRIBE EXTENDED` command is redacted... :( 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622924)
Time Spent: 50m  (was: 40m)

> Include partitioning info in DESCRIBE TABLE command on Iceberg tables
> -
>
> Key: HIVE-25326
> URL: https://issues.apache.org/jira/browse/HIVE-25326
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25326) Include partitioning info in DESCRIBE TABLE command on Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25326?focusedWorklogId=622926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622926
 ]

ASF GitHub Bot logged work on HIVE-25326:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:12
Start Date: 15/Jul/21 09:12
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2472:
URL: https://github.com/apache/hive/pull/2472#discussion_r670283280



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/info/desc/formatter/TextDescTableFormatter.java
##
@@ -100,6 +102,32 @@ public void describeTable(HiveConf conf, DataOutputStream 
out, String columnPath
 }
   }
 
+  private void addPartitionTransformData(DataOutputStream out, Table table, 
boolean isOutputPadded) throws IOException {
+String partitionTransformData = "";

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622926)
Time Spent: 1h  (was: 50m)

> Include partitioning info in DESCRIBE TABLE command on Iceberg tables
> -
>
> Key: HIVE-25326
> URL: https://issues.apache.org/jira/browse/HIVE-25326
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25325?focusedWorklogId=622928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622928
 ]

ASF GitHub Bot logged work on HIVE-25325:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:14
Start Date: 15/Jul/21 09:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2471:
URL: https://github.com/apache/hive/pull/2471#discussion_r670284586



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/truncate_force_iceberg_table.q
##
@@ -0,0 +1,20 @@
+set hive.vectorized.execution.enabled=false;
+
+drop table if exists test_truncate;
+create external table test_truncate (id int, value string) stored by iceberg 
stored as parquet;
+alter table test_truncate set tblproperties('external.table.purge'='false');
+insert into test_truncate values (1, 
'one'),(2,'two'),(3,'three'),(4,'four'),(5,'five'); 
+insert into test_truncate values (6, 'six'), (7, 'seven');
+insert into test_truncate values (8, 'eight'), (9, 'nine'), (10, 'ten');
+analyze table test_truncate compute statistics;
+
+select * from test_truncate order by id;

Review comment:
   Question: I remember that once upon a time we replaced all of the `order 
by` statements in the queries to some QTestUtil construct to save on the 
execution time of the queries. Do I remember correctly? Would it be relevant 
here? If so, I could try to remember what was needed in the `.q` file to short 
the output.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622928)
Time Spent: 0.5h  (was: 20m)

> Add TRUNCATE TABLE support for Hive Iceberg tables
> --
>
> Key: HIVE-25325
> URL: https://issues.apache.org/jira/browse/HIVE-25325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Implement the TRUNCATE operation for Hive Iceberg tables. Since these tables 
> are unpartitioned in Hive, only the truncate unpartitioned table use case has 
> to be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25335) Unreasonable setting reduce number, when join big size table(but small row count) and small size table

2021-07-15 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HIVE-25335:
---
Attachment: HIVE-25335.001.patch

> Unreasonable setting reduce number, when join big size table(but small row 
> count) and small size table
> --
>
> Key: HIVE-25335
> URL: https://issues.apache.org/jira/browse/HIVE-25335
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HIVE-25335.001.patch
>
>
> I found an application which is slow in our cluster, because the proccess 
> bytes of one reduce is very huge, but only two reduce. 
> when I debug, I found the reason. Because in this sql, one big size table 
> (about 30G) with few row count(about 3.5M), another small size table (about 
> 100M) have more row count (about 3.6M). So JoinStatsRule.process only use 
> 100M to estimate reducer's number. But we need to  process 30G byte in fact.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25325?focusedWorklogId=622932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622932
 ]

ASF GitHub Bot logged work on HIVE-25325:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:17
Start Date: 15/Jul/21 09:17
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2471:
URL: https://github.com/apache/hive/pull/2471#discussion_r670287076



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -3360,42 +3360,50 @@ public CmRecycleResponse cm_recycle(final 
CmRecycleRequest request) throws MetaE
   public void truncate_table(final String dbName, final String tableName, 
List partNames)
   throws NoSuchObjectException, MetaException {
 // Deprecated path, won't work for txn tables.
-truncateTableInternal(dbName, tableName, partNames, null, -1);
+truncateTableInternal(dbName, tableName, partNames, null, -1, null);
   }
 
   @Override
   public TruncateTableResponse truncate_table_req(TruncateTableRequest req)
   throws MetaException, TException {
 truncateTableInternal(req.getDbName(), req.getTableName(), 
req.getPartNames(),
-req.getValidWriteIdList(), req.getWriteId());
+req.getValidWriteIdList(), req.getWriteId(), 
req.getEnvironmentContext());
 return new TruncateTableResponse();
   }
 
   private void truncateTableInternal(String dbName, String tableName, 
List partNames,
- String validWriteIds, long writeId) 
throws MetaException, NoSuchObjectException {
+ String validWriteIds, long writeId, 
EnvironmentContext context) throws MetaException, NoSuchObjectException {
 boolean isSkipTrash = false, needCmRecycle = false;
 try {
   String[] parsedDbName = parseDbName(dbName, conf);
   Table tbl = get_table_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME], tableName);
 
-  boolean truncateFiles = !TxnUtils.isTransactionalTable(tbl) ||
-  !MetastoreConf.getBoolVar(getConf(), 
MetastoreConf.ConfVars.TRUNCATE_ACID_USE_BASE);
-
-  if (truncateFiles) {
-isSkipTrash = MetaStoreUtils.isSkipTrash(tbl.getParameters());
-Database db = get_database_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME]);
-needCmRecycle = ReplChangeManager.shouldEnableCm(db, tbl);
+  boolean skipDataDeletion = false;
+  if (context != null && context.getProperties() != null
+  && context.getProperties().get("truncateSkipDataDeletion") != null) {

Review comment:
   I think we should use a constant for this. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622932)
Time Spent: 40m  (was: 0.5h)

> Add TRUNCATE TABLE support for Hive Iceberg tables
> --
>
> Key: HIVE-25325
> URL: https://issues.apache.org/jira/browse/HIVE-25325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement the TRUNCATE operation for Hive Iceberg tables. Since these tables 
> are unpartitioned in Hive, only the truncate unpartitioned table use case has 
> to be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25336) Use single call to get tables in DropDatabaseAnalyzer

2021-07-15 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25336:
---


> Use single call to get tables in DropDatabaseAnalyzer
> -
>
> Key: HIVE-25336
> URL: https://issues.apache.org/jira/browse/HIVE-25336
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Optimise 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseAnalyzer.analyzeInternal(DropDatabaseAnalyzer.java:61),
>  where it fetches entire tables one by one. Move to a single call. This could 
> save around 20+ seconds when large number of tables are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25336) Use single call to get tables in DropDatabaseAnalyzer

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25336?focusedWorklogId=622934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622934
 ]

ASF GitHub Bot logged work on HIVE-25336:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:20
Start Date: 15/Jul/21 09:20
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2481:
URL: https://github.com/apache/hive/pull/2481


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622934)
Remaining Estimate: 0h
Time Spent: 10m

> Use single call to get tables in DropDatabaseAnalyzer
> -
>
> Key: HIVE-25336
> URL: https://issues.apache.org/jira/browse/HIVE-25336
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Optimise 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseAnalyzer.analyzeInternal(DropDatabaseAnalyzer.java:61),
>  where it fetches entire tables one by one. Move to a single call. This could 
> save around 20+ seconds when large number of tables are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-07-15 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25334 started by Ashish Sharma.

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25336) Use single call to get tables in DropDatabaseAnalyzer

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25336:
--
Labels: pull-request-available  (was: )

> Use single call to get tables in DropDatabaseAnalyzer
> -
>
> Key: HIVE-25336
> URL: https://issues.apache.org/jira/browse/HIVE-25336
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Optimise 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseAnalyzer.analyzeInternal(DropDatabaseAnalyzer.java:61),
>  where it fetches entire tables one by one. Move to a single call. This could 
> save around 20+ seconds when large number of tables are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25336) Use single call to get tables in DropDatabaseAnalyzer

2021-07-15 Thread Anishek Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381205#comment-17381205
 ] 

Anishek Agarwal commented on HIVE-25336:


dont think you can get all table definitions at once, can lead to memory 
pressure on HMS, some form of batching however would make sense. or lazy 
loading.

> Use single call to get tables in DropDatabaseAnalyzer
> -
>
> Key: HIVE-25336
> URL: https://issues.apache.org/jira/browse/HIVE-25336
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Optimise 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseAnalyzer.analyzeInternal(DropDatabaseAnalyzer.java:61),
>  where it fetches entire tables one by one. Move to a single call. This could 
> save around 20+ seconds when large number of tables are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25325?focusedWorklogId=622939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622939
 ]

ASF GitHub Bot logged work on HIVE-25325:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:40
Start Date: 15/Jul/21 09:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2471:
URL: https://github.com/apache/hive/pull/2471#discussion_r670304348



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -3360,42 +3360,50 @@ public CmRecycleResponse cm_recycle(final 
CmRecycleRequest request) throws MetaE
   public void truncate_table(final String dbName, final String tableName, 
List partNames)
   throws NoSuchObjectException, MetaException {
 // Deprecated path, won't work for txn tables.
-truncateTableInternal(dbName, tableName, partNames, null, -1);
+truncateTableInternal(dbName, tableName, partNames, null, -1, null);
   }
 
   @Override
   public TruncateTableResponse truncate_table_req(TruncateTableRequest req)
   throws MetaException, TException {
 truncateTableInternal(req.getDbName(), req.getTableName(), 
req.getPartNames(),
-req.getValidWriteIdList(), req.getWriteId());
+req.getValidWriteIdList(), req.getWriteId(), 
req.getEnvironmentContext());
 return new TruncateTableResponse();
   }
 
   private void truncateTableInternal(String dbName, String tableName, 
List partNames,
- String validWriteIds, long writeId) 
throws MetaException, NoSuchObjectException {
+ String validWriteIds, long writeId, 
EnvironmentContext context) throws MetaException, NoSuchObjectException {
 boolean isSkipTrash = false, needCmRecycle = false;
 try {
   String[] parsedDbName = parseDbName(dbName, conf);
   Table tbl = get_table_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME], tableName);
 
-  boolean truncateFiles = !TxnUtils.isTransactionalTable(tbl) ||
-  !MetastoreConf.getBoolVar(getConf(), 
MetastoreConf.ConfVars.TRUNCATE_ACID_USE_BASE);
-
-  if (truncateFiles) {
-isSkipTrash = MetaStoreUtils.isSkipTrash(tbl.getParameters());
-Database db = get_database_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME]);
-needCmRecycle = ReplChangeManager.shouldEnableCm(db, tbl);
+  boolean skipDataDeletion = false;
+  if (context != null && context.getProperties() != null

Review comment:
   It is up to you, but I have learned this from @marton-bod, and something 
like this might work:
   ```
   Optional.of(context)
   .map(EnvironmentContext::getProperties)
   .map(prop -> prop.get("truncateSkipDataDeletion"))
   .map(Boolean::parseBoolean)
   .orElse(false);
   ``` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622939)
Time Spent: 50m  (was: 40m)

> Add TRUNCATE TABLE support for Hive Iceberg tables
> --
>
> Key: HIVE-25325
> URL: https://issues.apache.org/jira/browse/HIVE-25325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Implement the TRUNCATE operation for Hive Iceberg tables. Since these tables 
> are unpartitioned in Hive, only the truncate unpartitioned table use case has 
> to be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25325?focusedWorklogId=622942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622942
 ]

ASF GitHub Bot logged work on HIVE-25325:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:42
Start Date: 15/Jul/21 09:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2471:
URL: https://github.com/apache/hive/pull/2471#discussion_r670306068



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -3360,42 +3360,50 @@ public CmRecycleResponse cm_recycle(final 
CmRecycleRequest request) throws MetaE
   public void truncate_table(final String dbName, final String tableName, 
List partNames)
   throws NoSuchObjectException, MetaException {
 // Deprecated path, won't work for txn tables.
-truncateTableInternal(dbName, tableName, partNames, null, -1);
+truncateTableInternal(dbName, tableName, partNames, null, -1, null);
   }
 
   @Override
   public TruncateTableResponse truncate_table_req(TruncateTableRequest req)
   throws MetaException, TException {
 truncateTableInternal(req.getDbName(), req.getTableName(), 
req.getPartNames(),
-req.getValidWriteIdList(), req.getWriteId());
+req.getValidWriteIdList(), req.getWriteId(), 
req.getEnvironmentContext());
 return new TruncateTableResponse();
   }
 
   private void truncateTableInternal(String dbName, String tableName, 
List partNames,
- String validWriteIds, long writeId) 
throws MetaException, NoSuchObjectException {
+ String validWriteIds, long writeId, 
EnvironmentContext context) throws MetaException, NoSuchObjectException {
 boolean isSkipTrash = false, needCmRecycle = false;
 try {
   String[] parsedDbName = parseDbName(dbName, conf);
   Table tbl = get_table_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME], tableName);
 
-  boolean truncateFiles = !TxnUtils.isTransactionalTable(tbl) ||
-  !MetastoreConf.getBoolVar(getConf(), 
MetastoreConf.ConfVars.TRUNCATE_ACID_USE_BASE);
-
-  if (truncateFiles) {
-isSkipTrash = MetaStoreUtils.isSkipTrash(tbl.getParameters());
-Database db = get_database_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME]);
-needCmRecycle = ReplChangeManager.shouldEnableCm(db, tbl);
+  boolean skipDataDeletion = false;
+  if (context != null && context.getProperties() != null
+  && context.getProperties().get("truncateSkipDataDeletion") != null) {
+skipDataDeletion = 
Boolean.parseBoolean(context.getProperties().get("truncateSkipDataDeletion"));
   }
-  // This is not transactional
-  for (Path location : getLocationsForTruncate(getMS(), 
parsedDbName[CAT_NAME],
-  parsedDbName[DB_NAME], tableName, tbl, partNames)) {
-FileSystem fs = location.getFileSystem(getConf());
+
+  if (!skipDataDeletion) {
+boolean truncateFiles = !TxnUtils.isTransactionalTable(tbl)
+|| !MetastoreConf.getBoolVar(getConf(), 
MetastoreConf.ConfVars.TRUNCATE_ACID_USE_BASE);
+
 if (truncateFiles) {
-  truncateDataFiles(location, fs, isSkipTrash, needCmRecycle);
-} else {
-  // For Acid tables we don't need to delete the old files, only write 
an empty baseDir.
-  // Compaction and cleaner will take care of the rest
-  addTruncateBaseFile(location, writeId, fs);
+  isSkipTrash = MetaStoreUtils.isSkipTrash(tbl.getParameters());
+  Database db = get_database_core(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME]);
+  needCmRecycle = ReplChangeManager.shouldEnableCm(db, tbl);
+}
+// This is not transactional
+for (Path location : getLocationsForTruncate(getMS(), 
parsedDbName[CAT_NAME], parsedDbName[DB_NAME], tableName,
+tbl, partNames)) {
+  FileSystem fs = location.getFileSystem(getConf());
+  if (truncateFiles) {

Review comment:
   Why is this check for `truncateFiles` here? I think we already checked 
it in line 3391.
   Am I right that this is the same code as before just inside the `if 
(!skipDataDeletion)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622942)
Time Spent: 1h  (was: 50m)

> Add TRUNCATE TABLE support for Hive Iceberg tables
> --
>
> Key: HIVE-25325
> URL: https://issues.apache.o

[jira] [Work logged] (HIVE-25288) Fix TestMmCompactorOnTez

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25288?focusedWorklogId=622943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622943
 ]

ASF GitHub Bot logged work on HIVE-25288:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:43
Start Date: 15/Jul/21 09:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2476:
URL: https://github.com/apache/hive/pull/2476#discussion_r670306814



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -123,7 +123,7 @@ public void run() {
 long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
 final long cleanerWaterMark = minTxnIdSeenOpen < 0 ? minOpenTxnId 
: Math.min(minOpenTxnId, minTxnIdSeenOpen);
 
-LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
+LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);

Review comment:
   Thx




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622943)
Time Spent: 50m  (was: 40m)

> Fix TestMmCompactorOnTez
> 
>
> Key: HIVE-25288
> URL: https://issues.apache.org/jira/browse/HIVE-25288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/240/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25256) Support ALTER TABLE CHANGE COLUMN for Iceberg

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25256?focusedWorklogId=622945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622945
 ]

ASF GitHub Bot logged work on HIVE-25256:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 09:50
Start Date: 15/Jul/21 09:50
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2463:
URL: https://github.com/apache/hive/pull/2463#discussion_r670311756



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -505,19 +512,83 @@ private void 
handleReplaceColumns(org.apache.hadoop.hive.metastore.api.Table hms
 }
 
 for (FieldSchema updatedCol : schemaDifference.getTypeChanged()) {
-  Type newType = 
HiveSchemaUtil.convert(TypeInfoUtils.getTypeInfoFromTypeString(updatedCol.getType()));
-  if (!(newType instanceof Type.PrimitiveType)) {
-throw new MetaException(String.format("Cannot promote type of column: 
'%s' to a non-primitive type: %s.",
-updatedCol.getName(), newType));
-  }
-  updateSchema.updateColumn(updatedCol.getName(), (Type.PrimitiveType) 
newType, updatedCol.getComment());
+  updateSchema.updateColumn(updatedCol.getName(), 
getPrimitiveTypeOrThrow(updatedCol), updatedCol.getComment());
 }
 
 for (FieldSchema updatedCol : schemaDifference.getCommentChanged()) {
   updateSchema.updateColumnDoc(updatedCol.getName(), 
updatedCol.getComment());
 }
   }
 
+  private void handleChangeColumn(org.apache.hadoop.hive.metastore.api.Table 
hmsTable) throws MetaException {
+List hmsCols = hmsTable.getSd().getCols();
+List icebergCols = 
HiveSchemaUtil.convert(icebergTable.schema());
+// compute schema difference for renames, type/comment changes
+HiveSchemaUtil.SchemaDifference schemaDifference = 
HiveSchemaUtil.getSchemaDiff(hmsCols, icebergCols, true);
+// check column reorder (which could happen even in the absence of any 
rename, type or comment change)
+Map renameMapping = ImmutableMap.of();
+if (!schemaDifference.getMissingFromSecond().isEmpty()) {
+  renameMapping = ImmutableMap.of(
+  schemaDifference.getMissingFromSecond().get(0).getName(),
+  schemaDifference.getMissingFromFirst().get(0).getName());
+}
+Pair> outOfOrder = 
HiveSchemaUtil.getFirstOutOfOrderColumn(hmsCols, icebergCols,
+renameMapping);
+
+if (!schemaDifference.isEmpty() || outOfOrder != null) {
+  updateSchema = icebergTable.updateSchema();
+} else {
+  // we should get here if the user didn't change anything about the column
+  // i.e. no changes to the name, type, comment or order
+  LOG.info("Found no difference between new and old schema for ALTER TABLE 
CHANGE COLUMN for" +
+  " table: {}. There will be no Iceberg commit.", 
hmsTable.getTableName());
+  return;
+}
+
+// case 1: column name has been renamed
+if (!schemaDifference.getMissingFromSecond().isEmpty()) {
+  FieldSchema updatedField = 
schemaDifference.getMissingFromSecond().get(0);
+  FieldSchema oldField = schemaDifference.getMissingFromFirst().get(0);
+  updateSchema.renameColumn(oldField.getName(), updatedField.getName());
+
+  // check if type/comment changed too
+  if (!Objects.equals(oldField.getType(), updatedField.getType())) {
+updateSchema.updateColumn(oldField.getName(), 
getPrimitiveTypeOrThrow(updatedField), updatedField.getComment());
+  } else if (!Objects.equals(oldField.getComment(), 
updatedField.getComment())) {
+updateSchema.updateColumnDoc(oldField.getName(), 
updatedField.getComment());
+  }
+
+// case 2: only column type and/or comment changed
+} else if (!schemaDifference.getTypeChanged().isEmpty()) {
+  FieldSchema updatedField = schemaDifference.getTypeChanged().get(0);
+  updateSchema.updateColumn(updatedField.getName(), 
getPrimitiveTypeOrThrow(updatedField),
+  updatedField.getComment());
+
+// case 3: only comment changed
+} else if (!schemaDifference.getCommentChanged().isEmpty()) {
+  FieldSchema updatedField = schemaDifference.getCommentChanged().get(0);
+  updateSchema.updateColumnDoc(updatedField.getName(), 
updatedField.getComment());
+}

Review comment:
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622945)
Time Spent: 2h 10m  (was: 2h)

> Support ALTER TABLE CHANGE COLUMN for Iceberg
> -
>
> Key: HIVE

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=622962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622962
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 10:30
Start Date: 15/Jul/21 10:30
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #2482:
URL: https://github.com/apache/hive/pull/2482


   
   
   ### What changes were proposed in this pull request?
   
   Refactor cast(<> as timestamp)
   
   ### Why are the changes needed?
   
   Code and old and repetitive 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622962)
Remaining Estimate: 0h
Time Spent: 10m

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25334:
--
Labels: pull-request-available  (was: )

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25294) Optimise the metadata count queries for local mode

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25294?focusedWorklogId=622979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622979
 ]

ASF GitHub Bot logged work on HIVE-25294:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 11:15
Start Date: 15/Jul/21 11:15
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2435:
URL: https://github.com/apache/hive/pull/2435#discussion_r670363826



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -374,9 +362,14 @@ public HMSHandler(String name, Configuration conf, boolean 
init) throws MetaExce
 }
   }
 }
-if (init) {
-  init();
-}
+  }
+
+  @VisibleForTesting
+  public static HMSHandler getInitializedHandler(String name, Configuration 
conf)

Review comment:
   I don't understand why would we need this method - we already had the 
`init` boolean trick
   
   I think you are after making sure that all HMSHandler are initialized; 
without the patch didn't you get NPE-s when you try to access databaseCount - 
when the handler is not initialized?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622979)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimise the metadata count queries for local mode
> --
>
> Key: HIVE-25294
> URL: https://issues.apache.org/jira/browse/HIVE-25294
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When Metastore is in local mode,  the client uses his own private HMSHandler 
> to get the meta data,  the HMSHandler should be initialized before being 
> ready to serve. When the metrics is enabled, HMSHandler will count the number 
> of db, table, partitions,  which cloud lead to some problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25301) Expose notification log table through sys db

2021-07-15 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381256#comment-17381256
 ] 

Ayush Saxena commented on HIVE-25301:
-

Hey [~pvary]

We don't want to expose this to normal users, this is something to be used at 
the admin level only in case of specific replication scenarios. We even don't 
expose a couple of the other replication related metrics through 
{{INFORMATION_SCHEMA}}


> Expose notification log table through sys db
> 
>
> Key: HIVE-25301
> URL: https://issues.apache.org/jira/browse/HIVE-25301
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Expose the notification_log table in RDBMS through Hive sys database



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25301) Expose notification log table through sys db

2021-07-15 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381289#comment-17381289
 ] 

Peter Vary commented on HIVE-25301:
---

[~ayushtkn]: Thanks! This makes sense.

> Expose notification log table through sys db
> 
>
> Key: HIVE-25301
> URL: https://issues.apache.org/jira/browse/HIVE-25301
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Expose the notification_log table in RDBMS through Hive sys database



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25323) Fix TestVectorCastStatement

2021-07-15 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381338#comment-17381338
 ] 

Karen Coppage commented on HIVE-25323:
--

[~adeshrao] thanks for taking a look!

Probably the reason java.sql.Timestamp was left in vectorization is lost to 
time.

Until the fix is done – and I assume it will be a bit of time – we can 
re-enable the test, convert the timestamps from java.sql.Timestamp -> 
hive.Timestamp and then compare.

By the way do you know why the test was timing out after 5h instead of failing?

> Fix TestVectorCastStatement
> ---
>
> Key: HIVE-25323
> URL: https://issues.apache.org/jira/browse/HIVE-25323
> Project: Hive
>  Issue Type: Task
>Reporter: Karen Coppage
>Assignee: Adesh Kumar Rao
>Priority: Major
>
> org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorCastStatement 
> tests were timing out after 5 hours.
> [http://ci.hive.apache.org/job/hive-flaky-check/307/]
> First failure: 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/749/pipeline/242]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-07-15 Thread Igor Dvorzhak (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HIVE-25277:
-
Target Version/s: 2.3.9, 3.1.3, 4.0.0  (was: 2.3.6, 2.3.7, 3.1.2)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=623116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623116
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 15:36
Start Date: 15/Jul/21 15:36
Worklog Time Spent: 10m 
  Work Description: medb commented on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-880798336


   @kgyrtkirk may you take a look and merge this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623116)
Time Spent: 1h 10m  (was: 1h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=623119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623119
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 15:38
Start Date: 15/Jul/21 15:38
Worklog Time Spent: 10m 
  Work Description: medb edited a comment on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-880798336


   @kgyrtkirk @hmangla98 may you take a look and merge this PR?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623119)
Time Spent: 1h 20m  (was: 1h 10m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=623120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623120
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 15:38
Start Date: 15/Jul/21 15:38
Worklog Time Spent: 10m 
  Work Description: medb edited a comment on pull request #2421:
URL: https://github.com/apache/hive/pull/2421#issuecomment-880798336


   @kgyrtkirk @hmangla98 @nrg4878 may you take a look and merge this PR?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623120)
Time Spent: 1.5h  (was: 1h 20m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25290) Stabilize TestTxnHandler

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25290:
--
Labels: pull-request-available  (was: )

> Stabilize TestTxnHandler
> 
>
> Key: HIVE-25290
> URL: https://issues.apache.org/jira/browse/HIVE-25290
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/271/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25290) Stabilize TestTxnHandler

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25290?focusedWorklogId=623123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623123
 ]

ASF GitHub Bot logged work on HIVE-25290:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 15:43
Start Date: 15/Jul/21 15:43
Worklog Time Spent: 10m 
  Work Description: hmangla98 opened a new pull request #2483:
URL: https://github.com/apache/hive/pull/2483


   Fix testReplOpenTxn in testTxnHandler.
   
   http://ci.hive.apache.org/job/hive-flaky-check/271/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623123)
Remaining Estimate: 0h
Time Spent: 10m

> Stabilize TestTxnHandler
> 
>
> Key: HIVE-25290
> URL: https://issues.apache.org/jira/browse/HIVE-25290
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Haymant Mangla
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-flaky-check/271/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25336) Use single call to get tables in DropDatabaseAnalyzer

2021-07-15 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381472#comment-17381472
 ] 

Ayush Saxena commented on HIVE-25336:
-

[~anishek] I haven't introduced any new API, Have used the existing 
{{getAllTableObjects}} call, it is used in other couple of cases as well, One 
being {{show tables extended}} as well.
 Checked with Rajesh on this, guess around ~10K tables, we won't have any any 
problem as such.
 In the long run if the numbers start going too high, we might try & introduce 
some iterator based API or may be optimise {{getAllTableObjects}} itself in a 
way how {{getListing}} works for {{FileSystems}}?, and can replace everywhere. 
I can keep a task open for that

> Use single call to get tables in DropDatabaseAnalyzer
> -
>
> Key: HIVE-25336
> URL: https://issues.apache.org/jira/browse/HIVE-25336
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Optimise 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseAnalyzer.analyzeInternal(DropDatabaseAnalyzer.java:61),
>  where it fetches entire tables one by one. Move to a single call. This could 
> save around 20+ seconds when large number of tables are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25321) [HMS] Advance write Id during AlterTableDropPartition and AlterTableExchangePartition

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25321?focusedWorklogId=623178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623178
 ]

ASF GitHub Bot logged work on HIVE-25321:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 17:23
Start Date: 15/Jul/21 17:23
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on a change in pull request #2465:
URL: https://github.com/apache/hive/pull/2465#discussion_r670660542



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/drop/AbstractDropPartitionAnalyzer.java
##
@@ -105,11 +106,18 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 
 AlterTableDropPartitionDesc desc =
 new AlterTableDropPartitionDesc(tableName, partitionSpecs, mustPurge, 
replicationSpec);
+
+Task ddlTask = TaskFactory.get(new DDLWork(getInputs(), 
getOutputs(), desc));

Review comment:
   Is `ddlTask` used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623178)
Time Spent: 20m  (was: 10m)

> [HMS] Advance write Id during AlterTableDropPartition and 
> AlterTableExchangePartition
> -
>
> Key: HIVE-25321
> URL: https://issues.apache.org/jira/browse/HIVE-25321
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All DDLs should advance the write ID, so that we can provide consistent data 
> from the cache, based on the validWriteIds. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25337) EXPLAIN Tez/Dag

2021-07-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-25337:
---

Assignee: László Bodor

> EXPLAIN Tez/Dag
> ---
>
> Key: HIVE-25337
> URL: https://issues.apache.org/jira/browse/HIVE-25337
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25337) EXPLAIN Tez/Dag

2021-07-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25337:

Description: Just an idea, to consider if we can expose some really tez/dag 
related thing with Hive's explain command, which user is not interested about 
under normal circumstances, e.g. edge types (on class level), input/output 
classes, output committers if any, dag plan size, etc.

> EXPLAIN Tez/Dag
> ---
>
> Key: HIVE-25337
> URL: https://issues.apache.org/jira/browse/HIVE-25337
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Just an idea, to consider if we can expose some really tez/dag related thing 
> with Hive's explain command, which user is not interested about under normal 
> circumstances, e.g. edge types (on class level), input/output classes, output 
> committers if any, dag plan size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25337) EXPLAIN Tez/Dag

2021-07-15 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381562#comment-17381562
 ] 

Matt McCline edited comment on HIVE-25337 at 7/15/21, 7:58 PM:
---

Great idea. And, when I added EXPLAIN VECTORIZATION the primary audience was 1 
person (me)! There is a tremendous value in showing more information.


was (Author: mattmccline):
Great idea. And, when I added EXPLAIN VECTORIZATION to primary audience was 1 
person (me)! There is a tremendous value in showing more information.

> EXPLAIN Tez/Dag
> ---
>
> Key: HIVE-25337
> URL: https://issues.apache.org/jira/browse/HIVE-25337
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Just an idea, to consider if we can expose some really tez/dag related thing 
> with Hive's explain command, which user is not interested about under normal 
> circumstances, e.g. edge types (on class level), input/output classes, output 
> committers if any, dag plan size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25337) EXPLAIN Tez/Dag

2021-07-15 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381562#comment-17381562
 ] 

Matt McCline commented on HIVE-25337:
-

Great idea. And, when I added EXPLAIN VECTORIZATION to primary audience was 1 
person (me)! There is a tremendous value in showing more information.

> EXPLAIN Tez/Dag
> ---
>
> Key: HIVE-25337
> URL: https://issues.apache.org/jira/browse/HIVE-25337
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Just an idea, to consider if we can expose some really tez/dag related thing 
> with Hive's explain command, which user is not interested about under normal 
> circumstances, e.g. edge types (on class level), input/output classes, output 
> committers if any, dag plan size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25190?focusedWorklogId=623306&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623306
 ]

ASF GitHub Bot logged work on HIVE-25190:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 21:10
Start Date: 15/Jul/21 21:10
Worklog Time Spent: 10m 
  Work Description: omalley commented on a change in pull request #2408:
URL: https://github.com/apache/hive/pull/2408#discussion_r657298367



##
File path: 
storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
##
@@ -258,73 +262,56 @@ public void setValPreallocated(int elementNum, int 
length) {
   public void setConcat(int elementNum, byte[] leftSourceBuf, int leftStart, 
int leftLen,
   byte[] rightSourceBuf, int rightStart, int rightLen) {
 int newLen = leftLen + rightLen;
-if ((nextFree + newLen) > buffer.length) {
-  increaseBufferSpace(newLen);
-}
-vector[elementNum] = buffer;
-this.start[elementNum] = nextFree;
+ensureValPreallocated(newLen);
+vector[elementNum] = currentValue;
+this.start[elementNum] = currentOffset;
 this.length[elementNum] = newLen;
 
-System.arraycopy(leftSourceBuf, leftStart, buffer, nextFree, leftLen);
-nextFree += leftLen;
-System.arraycopy(rightSourceBuf, rightStart, buffer, nextFree, rightLen);
-nextFree += rightLen;
+System.arraycopy(leftSourceBuf, leftStart, currentValue, currentOffset, 
leftLen);
+System.arraycopy(rightSourceBuf, rightStart, currentValue,
+currentOffset + leftLen, rightLen);
   }
 
   /**
-   * Increase buffer space enough to accommodate next element.
+   * Allocate/reuse enough buffer space to accommodate next element.
+   * Updates the nextFree field to point to the start of the new record.
+   * If smallBuffer is used, smallBufferNextFree is updated.
+   *
* This uses an exponential increase mechanism to rapidly
* increase buffer size to enough to hold all data.
* As batches get re-loaded, buffer space allocated will quickly
* stabilize.
*
* @param nextElemLength size of next element to be added
+   * @return the buffer to use for the next element
*/
-  public void increaseBufferSpace(int nextElemLength) {
-// A call to increaseBufferSpace() or ensureValPreallocated() will ensure 
that buffer[] points to
+  private byte[] allocateBuffer(int nextElemLength) {
+// A call to ensureValPreallocated() will ensure that buffer[] points to
 // a byte[] with sufficient space for the specified size.
-// This will either point to smallBuffer, or to a newly allocated byte 
array for larger values.
 
-if (nextElemLength > MAX_SIZE_FOR_SMALL_BUFFER) {
-  // Larger allocations will be special-cased and will not use the normal 
buffer.
-  // buffer/nextFree will be set to a newly allocated array just for the 
current row.
-  // The next row will require another call to increaseBufferSpace() since 
this new buffer should be used up.
-  byte[] newBuffer = new byte[nextElemLength];
+// If this is a large value or small buffer is maxed out, allocate a

Review comment:
   Given that we let the buffer grow to 1gb and only use it for values 
under 1mb, except in rare cases, I think we would be best off keeping the code 
simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623306)
Time Spent: 1h 40m  (was: 1.5h)

> BytesColumnVector fails when the aggregate size is > 1gb
> 
>
> Key: HIVE-25190
> URL: https://issues.apache.org/jira/browse/HIVE-25190
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
> + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25190?focusedWorklogId=623308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623308
 ]

ASF GitHub Bot logged work on HIVE-25190:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 21:14
Start Date: 15/Jul/21 21:14
Worklog Time Spent: 10m 
  Work Description: pavibhai commented on a change in pull request #2408:
URL: https://github.com/apache/hive/pull/2408#discussion_r670809437



##
File path: 
storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
##
@@ -49,13 +49,14 @@
*/
   public int[] length;
 
-  // A call to increaseBufferSpace() or ensureValPreallocated() will ensure 
that buffer[] points to
-  // a byte[] with sufficient space for the specified size.
-  private byte[] buffer;   // optional buffer to use when actually copying in 
data
+  // Calls to ensureValPreallocated() ensure that buffer starting at nextFree
+  // points to a byte[] with sufficient space for the specified size.

Review comment:
   The new names are meaningful




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623308)
Time Spent: 1h 50m  (was: 1h 40m)

> BytesColumnVector fails when the aggregate size is > 1gb
> 
>
> Key: HIVE-25190
> URL: https://issues.apache.org/jira/browse/HIVE-25190
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
> + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21614) Derby does not support CLOB comparisons

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21614?focusedWorklogId=623325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623325
 ]

ASF GitHub Bot logged work on HIVE-21614:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 23:00
Start Date: 15/Jul/21 23:00
Worklog Time Spent: 10m 
  Work Description: hankfanchiu opened a new pull request #2484:
URL: https://github.com/apache/hive/pull/2484


   ### What changes were proposed in this pull request?
   
   On the Hive MetaStore server's side of the 
`HiveMetaStoreClient#listTableNamesByFilter()` API, conditionally manipulate 
the string filter by replacing any `=` comparison on a table parameter with a 
`LIKE` operator.
   
   This replacement is only performed if:
   
   1. The database product backing the Hive MetaStore is Derby,
   2. The `=` comparison is on a table parameter, i.e. the 
`"TABLE_PARAMS"."PARAM_VALUE"` column.
   
   ### Why are the changes needed?
   
   HIVE-12274 changed the type of the `"TABLE_PARAMS"."PARAM_VALUE"` column to 
`CLOB` for Derby and Oracle, e.g.:
   
   
https://github.com/apache/hive/blob/7d4134e7fe9bfb8b8aa3344a9ae72f4f36c98b2c/metastore/scripts/upgrade/derby/039-HIVE-12274.derby.sql#L8-L12
   
   With the type change, invoking `ObjectStore#listTableNamesByFilter()` given 
a `=` comparison -- on a table parameter stored in Derby -- fails with the 
following exception:
   
   ```
   ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB (UCS_BASIC)' 
are not supported. Types must be comparable. String types must also have 
matching collation. If collation does not match, a possible solution is to cast 
operands to force them to the default collation (e.g. SELECT tablename FROM 
sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 'T1')
   ```
   
   Without reverting the column type change, we add support for `=` comparison 
on table parameters in Derby.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, invoking `HiveMetaStoreClient#listTableNamesByFilter()` with a 
`$param_key = '$param_value'` filter now succeeds for Derby databases.
   
   ### How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623325)
Remaining Estimate: 0h
Time Spent: 10m

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>

[jira] [Updated] (HIVE-21614) Derby does not support CLOB comparisons

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21614:
--
Labels: pull-request-available  (was: )

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21614) Derby does not support CLOB comparisons

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21614?focusedWorklogId=623326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623326
 ]

ASF GitHub Bot logged work on HIVE-21614:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 23:03
Start Date: 15/Jul/21 23:03
Worklog Time Spent: 10m 
  Work Description: hankfanchiu commented on pull request #2484:
URL: https://github.com/apache/hive/pull/2484#issuecomment-881063844


   @pvary, mind taking a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623326)
Time Spent: 20m  (was: 10m)

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24713) HS2 never shut down after reconnecting to Zookeeper

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24713?focusedWorklogId=623341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623341
 ]

ASF GitHub Bot logged work on HIVE-24713:
-

Author: ASF GitHub Bot
Created on: 16/Jul/21 00:09
Start Date: 16/Jul/21 00:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1932:
URL: https://github.com/apache/hive/pull/1932


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623341)
Time Spent: 1h 50m  (was: 1h 40m)

> HS2 never shut down after reconnecting to Zookeeper
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper always happens.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I added some logging to DeRegisterWatcher and checked what events were 
> occurred at the time of 3(restarting of ZK server);
>  # WatchedEvent state:Disconnected type:None path:null
>  # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
>  # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
>  # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
>  path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]
> As the zk manual says, watches are one-time triggers. When the connection to 
> the ZK server was reestablished, state:SyncConnected type:NodeDataChanged for 
> the path is fired and it's the end. *DeregisterWatcher must be registered 
> again for the same znode to get a future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25294) Optimise the metadata count queries for local mode

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25294?focusedWorklogId=623364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623364
 ]

ASF GitHub Bot logged work on HIVE-25294:
-

Author: ASF GitHub Bot
Created on: 16/Jul/21 02:28
Start Date: 16/Jul/21 02:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2435:
URL: https://github.com/apache/hive/pull/2435#discussion_r670920872



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -374,9 +362,14 @@ public HMSHandler(String name, Configuration conf, boolean 
init) throws MetaExce
 }
   }
 }
-if (init) {
-  init();
-}
+  }
+
+  @VisibleForTesting
+  public static HMSHandler getInitializedHandler(String name, Configuration 
conf)

Review comment:
   The `init` is set to true only in our tests,  the 
[RetryingHMSHandler](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java#L80)
 will init the HMSHandler,  and HiveMetaStore(including the local mode) uses 
RetryingHMSHandler to delegate the request to the initialized HMSHandler, so I 
propose to remove the boolean trick in the constructors, introduces a new 
method `getInitializedHandler` mainly for the tests.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623364)
Time Spent: 1.5h  (was: 1h 20m)

> Optimise the metadata count queries for local mode
> --
>
> Key: HIVE-25294
> URL: https://issues.apache.org/jira/browse/HIVE-25294
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When Metastore is in local mode,  the client uses his own private HMSHandler 
> to get the meta data,  the HMSHandler should be initialized before being 
> ready to serve. When the metrics is enabled, HMSHandler will count the number 
> of db, table, partitions,  which cloud lead to some problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21614) Derby does not support CLOB comparisons

2021-07-15 Thread Hank Fanchiu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hank Fanchiu reassigned HIVE-21614:
---

Assignee: Hank Fanchiu

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Assignee: Hank Fanchiu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-21614) Derby does not support CLOB comparisons

2021-07-15 Thread Hank Fanchiu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21614 started by Hank Fanchiu.
---
> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Assignee: Hank Fanchiu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25337) EXPLAIN Tez/Dag

2021-07-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381779#comment-17381779
 ] 

László Bodor commented on HIVE-25337:
-

thanks [~mattmccline]! btw, you should be aware that now EXPLAIN VECTORIZATION 
DETAIL is the standard starting point while investigating vectorization issues 
;)

> EXPLAIN Tez/Dag
> ---
>
> Key: HIVE-25337
> URL: https://issues.apache.org/jira/browse/HIVE-25337
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Just an idea, to consider if we can expose some really tez/dag related thing 
> with Hive's explain command, which user is not interested about under normal 
> circumstances, e.g. edge types (on class level), input/output classes, output 
> committers if any, dag plan size, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25338) AIOBE in conv UDF if input is empty

2021-07-15 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-25338:
-


> AIOBE in conv UDF if input is empty
> ---
>
> Key: HIVE-25338
> URL: https://issues.apache.org/jira/browse/HIVE-25338
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> Repro
> {code:java}
> create table test (a string);
> insert into test values ("");
> select conv(a,16,10) from test;{code}
> Exception trace:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25338) AIOBE in conv UDF if input is empty

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25338?focusedWorklogId=623430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-623430
 ]

ASF GitHub Bot logged work on HIVE-25338:
-

Author: ASF GitHub Bot
Created on: 16/Jul/21 06:07
Start Date: 16/Jul/21 06:07
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #2485:
URL: https://github.com/apache/hive/pull/2485


   ### What changes were proposed in this pull request?
   If conv UDF has empty input, return the output as empty rather than throwing 
AIOBE.
   
   ### Why are the changes needed?
   Query is failing 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Included Testcase in q file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 623430)
Remaining Estimate: 0h
Time Spent: 10m

> AIOBE in conv UDF if input is empty
> ---
>
> Key: HIVE-25338
> URL: https://issues.apache.org/jira/browse/HIVE-25338
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro
> {code:java}
> create table test (a string);
> insert into test values ("");
> select conv(a,16,10) from test;{code}
> Exception trace:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25338) AIOBE in conv UDF if input is empty

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25338:
--
Labels: pull-request-available  (was: )

> AIOBE in conv UDF if input is empty
> ---
>
> Key: HIVE-25338
> URL: https://issues.apache.org/jira/browse/HIVE-25338
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro
> {code:java}
> create table test (a string);
> insert into test values ("");
> select conv(a,16,10) from test;{code}
> Exception trace:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25338) AIOBE in conv UDF if input is empty

2021-07-15 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-25338:
--
Issue Type: Bug  (was: New Feature)

> AIOBE in conv UDF if input is empty
> ---
>
> Key: HIVE-25338
> URL: https://issues.apache.org/jira/browse/HIVE-25338
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro
> {code:java}
> create table test (a string);
> insert into test values ("");
> select conv(a,16,10) from test;{code}
> Exception trace:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>  at org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25270) To create external table without schema should use db schema instead of the metastore default fs

2021-07-15 Thread shezm (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381817#comment-17381817
 ] 

shezm commented on HIVE-25270:
--

Attached patch #1 that changes behavior of create external table without schema 
.

> To create external table without schema should use db schema instead of the 
> metastore default fs
> 
>
> Key: HIVE-25270
> URL: https://issues.apache.org/jira/browse/HIVE-25270
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi
> when hive creates an external table without specifying the schema of the 
> location, such as the following sql
> {code:java}
> CREATE EXTERNAL TABLE `user.test_tbl` (
> id string,
> name string
> )
> LOCATION '/user/data/test_tbl'
> {code}
> The default schema will use the default.fs of metastore conf.
> But in some cases, there will be multiple hadoop namenodes, such as using 
> hadoop federation or hadoop rbf.
> I think that when creating an external table without specifying a schema, the 
> schema of db should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-25270) To create external table without schema should use db schema instead of the metastore default fs

2021-07-15 Thread shezm (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shezm updated HIVE-25270:
-
Comment: was deleted

(was: Attached patch #1 that changes behavior of create external table without 
schema .)

> To create external table without schema should use db schema instead of the 
> metastore default fs
> 
>
> Key: HIVE-25270
> URL: https://issues.apache.org/jira/browse/HIVE-25270
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi
> when hive creates an external table without specifying the schema of the 
> location, such as the following sql
> {code:java}
> CREATE EXTERNAL TABLE `user.test_tbl` (
> id string,
> name string
> )
> LOCATION '/user/data/test_tbl'
> {code}
> The default schema will use the default.fs of metastore conf.
> But in some cases, there will be multiple hadoop namenodes, such as using 
> hadoop federation or hadoop rbf.
> I think that when creating an external table without specifying a schema, the 
> schema of db should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)