date:20210311

[jira] [Updated] (HIVE-24818) REPL LOAD of views with partitions fails

2021-03-11 Thread Anurag Shekhar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Shekhar updated HIVE-24818:
--
Summary: REPL LOAD of views with partitions fails   (was: REPL LOAD 
(Bootstrap ) of views with partitions fails )

> REPL LOAD of views with partitions fails 
> -
>
> Key: HIVE-24818
> URL: https://issues.apache.org/jira/browse/HIVE-24818
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565114
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 05:08
Start Date: 12/Mar/21 05:08
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2064:
URL: https://github.com/apache/hive/pull/2064#issuecomment-797237927


   > @dengzhhu653, Basic stats of partitions touched by ETL will be printed on 
beeline console, i don't think it will be huge.
   
   I wonder for the cases like loading dynamic partitions,  there may be 
somthing unpredictable...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 565114)
Time Spent: 1h 40m  (was: 1.5h)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565095
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 03:39
Start Date: 12/Mar/21 03:39
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on pull request #2064:
URL: https://github.com/apache/hive/pull/2064#issuecomment-797212918


   @dengzhhu653, Basic stats of partitions touched by ETL will be printed on 
beeline console, i don't think it will be huge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 565095)
Time Spent: 1.5h  (was: 1h 20m)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24372) can't find applicationId in InPlaceUpdateStream when tasks run in parallel

2021-03-11 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300016#comment-17300016
 ] 

Zhihua Deng commented on HIVE-24372:


Maybe https://issues.apache.org/jira/browse/HIVE-22416 and 
https://issues.apache.org/jira/browse/HIVE-21722 can fix this problem...

> can't find applicationId in InPlaceUpdateStream when tasks run in parallel
> --
>
> Key: HIVE-24372
> URL: https://issues.apache.org/jira/browse/HIVE-24372
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Tez
>Affects Versions: 3.1.0
> Environment: hadoop 3.1.0
> hive 3.1.0
> hive.execution.engine=tez
>  
>Reporter: lhy
>Priority: Major
> Attachments: image-2020-11-12-13-16-28-228.png, 
> image-2020-11-12-13-16-33-852.png, image-2020-11-12-13-17-21-675.png
>
>
> if hive.exec.parallel = false(hive.session.silent=false)
> then we can found log "INFO : Status: Running (Executing on YARN cluster with 
> App id application_1603689507490_0109)" in the console.
> !image-2020-11-12-13-16-33-852.png!
>  if hive.exec.parallel = true(hive.session.silent=false)
> then we can't found log to show application id ，but we can found it in 
> hiveserver2.log
> !image-2020-11-12-13-17-21-675.png!
>  
> hclidriver can show application logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=565089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565089
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 02:57
Start Date: 12/Mar/21 02:57
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1946:
URL: https://github.com/apache/hive/pull/1946#discussion_r592854564



##
File path: 
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
##
@@ -113,43 +113,68 @@ protected void initServer() {
   // TCP Server
   server = new TThreadPoolServer(sargs);
   server.setServerEventHandler(new TServerEventHandler() {
+
 @Override
 public ServerContext createContext(TProtocol input, TProtocol output) {
   Metrics metrics = MetricsFactory.getInstance();
   if (metrics != null) {
-try {
-  metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS);
-  
metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT);
-} catch (Exception e) {
-  LOG.warn("Error Reporting JDO operation to Metrics system", e);
-}
+metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS);
+
metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT);
   }
   return new ThriftCLIServerContext();
 }
 
+/**
+ * This is called by the Thrift server when the underlying client
+ * connection is cleaned up by the server because the connection has
+ * been closed.
+ */
 @Override
 public void deleteContext(ServerContext serverContext, TProtocol 
input, TProtocol output) {
   Metrics metrics = MetricsFactory.getInstance();
   if (metrics != null) {
-try {
-  metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS);
-} catch (Exception e) {
-  LOG.warn("Error Reporting JDO operation to Metrics system", e);
-}
+metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS);
   }
-  ThriftCLIServerContext context = (ThriftCLIServerContext) 
serverContext;
-  SessionHandle sessionHandle = context.getSessionHandle();
-  if (sessionHandle != null) {
-LOG.info("Session disconnected without closing properly. ");
+
+  final ThriftCLIServerContext context = (ThriftCLIServerContext) 
serverContext;
+  final Optional sessionHandle = 
context.getSessionHandle();
+
+  if (sessionHandle.isPresent()) {
+// Normally, the client should politely inform the server it is
+// closing its session with Hive before closing its network
+// connection. However, if the client connection dies for any 
reason
+// (load-balancer round-robin configuration, firewall kills
+// long-running sessions, bad client, failed client, timed-out
+// client, etc.) then the server will close the connection without
+// having properly cleaned up the Hive session (resources,
+// configuration, logging etc.). That needs to be cleaned up now.
+LOG.warn(
+"Client connection bound to {} unexpectedly closed: closing 
this Hive session to release its resources. "
++ "The connection processed {} total messages during its 
lifetime of {}ms. Inspect the client connection "
++ "for time-out, firewall killing the connection, invalid 
load balancer configuration, etc.",
+sessionHandle, context.getMessagesProcessedCount(), 
context.getDuration().toMillis());
 try {
-  boolean close = 
cliService.getSessionManager().getSession(sessionHandle).getHiveConf()
+  final boolean close = 
cliService.getSessionManager().getSession(sessionHandle.get()).getHiveConf()
   
.getBoolVar(ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT);
-  LOG.info((close ? "" : "Not ") + "Closing the session: " + 
sessionHandle);
   if (close) {
-cliService.closeSession(sessionHandle);
+cliService.closeSession(sessionHandle.get());
+  } else {
+LOG.warn("Session not actually closed because configuration {} 
is set to false",
+ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT.varname);
   }
 } catch (HiveSQLException e) {
-  LOG.warn("Failed to close session: " + e, e);
+  LOG.warn("Failed to close session", e);
+}
+  } else {
+// There is no session handle because the client gracefully closed
+// the session *or*

[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565071
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 02:00
Start Date: 12/Mar/21 02:00
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2064:
URL: https://github.com/apache/hive/pull/2064#issuecomment-797180213


   Might the beeline be flooded with the logs, what can we use the logs for? 
others lgtm...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 565071)
Time Spent: 1h 20m  (was: 1h 10m)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=565069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565069
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 01:50
Start Date: 12/Mar/21 01:50
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1946:
URL: https://github.com/apache/hive/pull/1946#discussion_r592854564



##
File path: 
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
##
@@ -113,43 +113,68 @@ protected void initServer() {
   // TCP Server
   server = new TThreadPoolServer(sargs);
   server.setServerEventHandler(new TServerEventHandler() {
+
 @Override
 public ServerContext createContext(TProtocol input, TProtocol output) {
   Metrics metrics = MetricsFactory.getInstance();
   if (metrics != null) {
-try {
-  metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS);
-  
metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT);
-} catch (Exception e) {
-  LOG.warn("Error Reporting JDO operation to Metrics system", e);
-}
+metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS);
+
metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT);
   }
   return new ThriftCLIServerContext();
 }
 
+/**
+ * This is called by the Thrift server when the underlying client
+ * connection is cleaned up by the server because the connection has
+ * been closed.
+ */
 @Override
 public void deleteContext(ServerContext serverContext, TProtocol 
input, TProtocol output) {
   Metrics metrics = MetricsFactory.getInstance();
   if (metrics != null) {
-try {
-  metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS);
-} catch (Exception e) {
-  LOG.warn("Error Reporting JDO operation to Metrics system", e);
-}
+metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS);
   }
-  ThriftCLIServerContext context = (ThriftCLIServerContext) 
serverContext;
-  SessionHandle sessionHandle = context.getSessionHandle();
-  if (sessionHandle != null) {
-LOG.info("Session disconnected without closing properly. ");
+
+  final ThriftCLIServerContext context = (ThriftCLIServerContext) 
serverContext;
+  final Optional sessionHandle = 
context.getSessionHandle();
+
+  if (sessionHandle.isPresent()) {
+// Normally, the client should politely inform the server it is
+// closing its session with Hive before closing its network
+// connection. However, if the client connection dies for any 
reason
+// (load-balancer round-robin configuration, firewall kills
+// long-running sessions, bad client, failed client, timed-out
+// client, etc.) then the server will close the connection without
+// having properly cleaned up the Hive session (resources,
+// configuration, logging etc.). That needs to be cleaned up now.
+LOG.warn(
+"Client connection bound to {} unexpectedly closed: closing 
this Hive session to release its resources. "
++ "The connection processed {} total messages during its 
lifetime of {}ms. Inspect the client connection "
++ "for time-out, firewall killing the connection, invalid 
load balancer configuration, etc.",
+sessionHandle, context.getMessagesProcessedCount(), 
context.getDuration().toMillis());
 try {
-  boolean close = 
cliService.getSessionManager().getSession(sessionHandle).getHiveConf()
+  final boolean close = 
cliService.getSessionManager().getSession(sessionHandle.get()).getHiveConf()
   
.getBoolVar(ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT);
-  LOG.info((close ? "" : "Not ") + "Closing the session: " + 
sessionHandle);
   if (close) {
-cliService.closeSession(sessionHandle);
+cliService.closeSession(sessionHandle.get());
+  } else {
+LOG.warn("Session not actually closed because configuration {} 
is set to false",
+ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT.varname);
   }
 } catch (HiveSQLException e) {
-  LOG.warn("Failed to close session: " + e, e);
+  LOG.warn("Failed to close session", e);
+}
+  } else {
+// There is no session handle because the client gracefully closed
+// the session *or*

[jira] [Work logged] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24201?focusedWorklogId=565054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565054
 ]

ASF GitHub Bot logged work on HIVE-24201:
-

Author: ASF GitHub Bot
Created on: 12/Mar/21 01:04
Start Date: 12/Mar/21 01:04
Worklog Time Spent: 10m 
  Work Description: Dawn2111 opened a new pull request #2065:
URL: https://github.com/apache/hive/pull/2065


   ### What changes were proposed in this pull request?
Currently, the Workload management move trigger kills the query being moved 
to a different pool if destination pool does not have enough capacity. This PR 
introduces a "delayed move" configuration which lets the query run in the 
source pool as long as possible, if the destination pool is full. It will 
attempt the move to destination pool only when there is claim upon the source 
pool. If the destination pool is not full, delayed move behaves as normal move 
i.e. the move will happen immediately.
   
   ### Why are the changes needed?
   For better utilization of cluster resources. 
   
   ### Does this PR introduce _any_ user-facing change? Yes
   
   ### How was this patch tested? Unit test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 565054)
Remaining Estimate: 0h
Time Spent: 10m

> WorkloadManager kills query being moved to different pool if destination pool 
> does not have enough sessions
> ---
>
> Key: HIVE-24201
> URL: https://issues.apache.org/jira/browse/HIVE-24201
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, llap
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Pritha Dawn
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To reproduce, create a resource plan with move trigger, like below:
> {code:java}
> ++
> |line|
> ++
> | experiment[status=DISABLED,parallelism=null,defaultPool=default] |
> |  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for default |
> |  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
> |  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
> |  |  mapped for users: abcd   |
> |  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for users: efgh   |
>  
> {code}
> Now, run two queries in pool1 and pool2 using different users. The query 
> running in pool2 will tried to move to pool1 and it will get killed because 
> pool1 will not have session to handle the query.
> Currently, the Workload management move trigger kills the query being moved 
> to a different pool if destination pool does not have enough capacity.  We 
> could have a "delayed move" configuration which lets the query run in the 
> source pool as long as possible, if the destination pool is full. It will 
> attempt the move to destination pool only when there is claim upon the source 
> pool. If the destination pool is not full, delayed move behaves as normal 
> move i.e. the move will happen immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24201:
--
Labels: pull-request-available  (was: )

> WorkloadManager kills query being moved to different pool if destination pool 
> does not have enough sessions
> ---
>
> Key: HIVE-24201
> URL: https://issues.apache.org/jira/browse/HIVE-24201
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, llap
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Pritha Dawn
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To reproduce, create a resource plan with move trigger, like below:
> {code:java}
> ++
> |line|
> ++
> | experiment[status=DISABLED,parallelism=null,defaultPool=default] |
> |  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for default |
> |  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
> |  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
> |  |  mapped for users: abcd   |
> |  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for users: efgh   |
>  
> {code}
> Now, run two queries in pool1 and pool2 using different users. The query 
> running in pool2 will tried to move to pool1 and it will get killed because 
> pool1 will not have session to handle the query.
> Currently, the Workload management move trigger kills the query being moved 
> to a different pool if destination pool does not have enough capacity.  We 
> could have a "delayed move" configuration which lets the query run in the 
> source pool as long as possible, if the destination pool is full. It will 
> attempt the move to destination pool only when there is claim upon the source 
> pool. If the destination pool is not full, delayed move behaves as normal 
> move i.e. the move will happen immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions

2021-03-11 Thread Pritha Dawn (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritha Dawn updated HIVE-24201:
---
Description: 
To reproduce, create a resource plan with move trigger, like below:
{code:java}
++
|line|
++
| experiment[status=DISABLED,parallelism=null,defaultPool=default] |
|  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
|  |  mapped for default |
|  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
|  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
|  |  mapped for users: abcd   |
|  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
|  |  mapped for users: efgh   |
 
{code}
Now, run two queries in pool1 and pool2 using different users. The query 
running in pool2 will tried to move to pool1 and it will get killed because 
pool1 will not have session to handle the query.

Currently, the Workload management move trigger kills the query being moved to 
a different pool if destination pool does not have enough capacity.  We could 
have a "delayed move" configuration which lets the query run in the source pool 
as long as possible, if the destination pool is full. It will attempt the move 
to destination pool only when there is claim upon the source pool. If the 
destination pool is not full, delayed move behaves as normal move i.e. the move 
will happen immediately.

  was:
To reproduce, create a resource plan with move trigger, like below:
{code:java}
++
|line|
++
| experiment[status=DISABLED,parallelism=null,defaultPool=default] |
|  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
|  |  mapped for default |
|  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
|  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
|  |  mapped for users: abcd   |
|  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
|  |  mapped for users: efgh   |
 
{code}
Now, run two queries in pool1 and pool2 using different users. The query 
running in pool2 will tried to move to pool1 and it will get killed because 
pool1 will not have session to handle the query.

Once killed this query needs to be re-run externally. It can be optimized and 
should be retried to run in destination pool directly(it will get queued and 
run once the session is alive).


> WorkloadManager kills query being moved to different pool if destination pool 
> does not have enough sessions
> ---
>
> Key: HIVE-24201
> URL: https://issues.apache.org/jira/browse/HIVE-24201
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, llap
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Pritha Dawn
>Priority: Minor
>
> To reproduce, create a resource plan with move trigger, like below:
> {code:java}
> ++
> |line|
> ++
> | experiment[status=DISABLED,parallelism=null,defaultPool=default] |
> |  +  default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for default |
> |  +  pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] |
> |  |  trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } |
> |  |  mapped for users: abcd   |
> |  +  pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] |
> |  |  mapped for users: efgh   |
>  
> {code}
> Now, run two queries in pool1 and pool2 using different users. The query 
> running in pool2 will tried to move to pool1 and it will get killed because 
> pool1 will not have session to handle the query.
> Currently, the Workload management move trigger kills the query being moved 
> to a different pool if destination pool does not have enough capacity.  We 
> could have a "delayed move" configuration which lets the query run in the 
> source pool as long as possible, if the destination pool is full. It will 
> attempt the move to destination pool only when there is claim upon the source 
> pool. If the destination pool is not full, delayed move behaves as normal 
> move i.e. the move will happen immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-03-11 Thread Kishen Das (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299867#comment-17299867
 ] 

Kishen Das edited comment on HIVE-23820 at 3/12/21, 12:11 AM:
--

[~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to 
pass tableId in get_table_req API ? I was thinking of enhancing 
getValidWriteIdList to return tableId as well and sending that back in the 
get_table_req API. We can also cache tableId at the Hive session level when we 
make get_table_req for the first time in compilation phase and and send it back 
in get_table_req API in subsequent calls within the same session.


was (Author: kishendas):
[~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to 
pass tableId in get_table_req API ? I was thinking of enhancing 
getValidWriteIdList to return tableId as well and sending that back in the 
get_table_req API. 

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565013
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 23:11
Start Date: 11/Mar/21 23:11
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #2064:
URL: https://github.com/apache/hive/pull/2064


   ### What changes were proposed in this pull request?
   ETL flow prints Partition Basic Stats in beeline or client console. After 
HIVE-16061, this stats are not getting printed on the client.
   
   ### Why are the changes needed?
   As there are multiple clients connecting to HS2 & running ETL, it was 
difficult for end user to search for stats in HS2 logs. This feature will help 
to know data loaded per partition.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   I did manual testing by including this fix in my local cluster & validated 
client console shows the logs.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 565013)
Time Spent: 1h 10m  (was: 1h)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24877) Support X'xxxx' syntax for hexadecimal values like spark & mysql

2021-03-11 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24877:
--
Description: 
Hive is currently not supporting following syntax

select x'abc';
{code:java}
org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input 
near 'x' ''abc'' '' in selection target
  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
  at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93)
  at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85)
  at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169)
  at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
Though we have same via hex/unhex built-in UDF's, it's better to have 
{{X'value'}} and x'{{value'}} syntax support for Hive.

[https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]

[https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]

[https://mariadb.com/kb/en/hexadecimal-literals/]

  was:
Hive is currently not supporting following syntax

select x'abc';
{code:java}
org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input 
near 'x' ''abc'' '' in selection 
targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize 
input near 'x' ''abc'' '' in selection target at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at 
org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at 
org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
Though we have same via hex/unhex built-in UDF's, it's better to have 
{{X'value'}} and x'{{value'}} syntax support for Hive.

[https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]

[https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]

[https://mariadb.com/kb/en/hexadecimal-literals/]


> Support X'' syntax for hexadecimal values like spark & mysql
> 
>
> Key: HIVE-24877
> URL: https://issues.apache.org/jira/browse/HIVE-24877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Minor
>
> Hive is currently not supporting following syntax
> select x'abc';
> {code:java}
> org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize 
> input near 'x' ''abc'' '' in selection target
>   at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
>   at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93)
>   at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85)
>   at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
> Though we have same via hex/unhex built-in UDF's, it's better to have 
> {{X'value'}} and x'{{value'}} syntax support for Hive.
> [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]
> [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]
> [https://mariadb.com/kb/en/hexadecimal-literals/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24877) Support X'xxxx' syntax for hexadecimal values like spark & mysql

2021-03-11 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24877:
--
Description: 
Hive is currently not supporting following syntax

select x'abc';
{code:java}
org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input 
near 'x' ''abc'' '' in selection 
targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize 
input near 'x' ''abc'' '' in selection target at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at 
org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at 
org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
Though we have same via hex/unhex built-in UDF's, it's better to have 
{{X'value'}} and x'{{value'}} syntax support for Hive.

[https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]

[https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]

[https://mariadb.com/kb/en/hexadecimal-literals/]

  was:
Hive is currently not supporting following syntax

select x'abc';
{code:java}
org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input 
near 'x' ''abc'' '' in selection target 
org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input 
near 'x' ''31FECC'' '' in selection target at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at 
org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at 
org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
Though we have same via hex/unhex built-in UDF's, it's better to have 
{{X'value'}} and x'{{value'}} syntax support for Hive.

[https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]

[https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]

https://mariadb.com/kb/en/hexadecimal-literals/


> Support X'' syntax for hexadecimal values like spark & mysql
> 
>
> Key: HIVE-24877
> URL: https://issues.apache.org/jira/browse/HIVE-24877
> Project: Hive
>  Issue Type: New Feature
>Reporter: Naresh P R
>Priority: Minor
>
> Hive is currently not supporting following syntax
> select x'abc';
> {code:java}
> org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize 
> input near 'x' ''abc'' '' in selection 
> targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot 
> recognize input near 'x' ''abc'' '' in selection target at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at 
> org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code}
> Though we have same via hex/unhex built-in UDF's, it's better to have 
> {{X'value'}} and x'{{value'}} syntax support for Hive.
> [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal]
> [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex]
> [https://mariadb.com/kb/en/hexadecimal-literals/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24828) [HMS] Provide new HMS API to return latest committed compaction record for a given table

2021-03-11 Thread Yu-Wen Lai (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24828 started by Yu-Wen Lai.
-
> [HMS] Provide new HMS API to return latest committed compaction record for a 
> given table
> 
>
> Key: HIVE-24828
> URL: https://issues.apache.org/jira/browse/HIVE-24828
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Yu-Wen Lai
>Priority: Major
>
> We need a new HMS API to return the latest committed compaction record for a 
> given table. This can be used by a remote cache to decide whether a given 
> table's file metadata has been compacted or not, in order to decide whether 
> file metadata has to be refreshed from the file system before serving or it 
> can serve the current data from the cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-03-11 Thread Kishen Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-23820:
-

Assignee: Ashish Sharma  (was: Kishen Das)

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-03-11 Thread Kishen Das (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299867#comment-17299867
 ] 

Kishen Das commented on HIVE-23820:
---

[~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to 
pass tableId in get_table_req API ? I was thinking of enhancing 
getValidWriteIdList to return tableId as well and sending that back in the 
get_table_req API. 

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24876?focusedWorklogId=564869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564869
 ]

ASF GitHub Bot logged work on HIVE-24876:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 19:26
Start Date: 11/Mar/21 19:26
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2063:
URL: https://github.com/apache/hive/pull/2063


   …'t belong to admin role
   
   
   
   ### What changes were proposed in this pull request?
   Disbale the logger configuration page for non-admin users.
   
   
   
   ### Why are the changes needed?
   Otherwise normal users can flood log files with unrequired information.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. If a user needs to access this log config page, he should be configured 
as admin in hive-site.xml with the config hive.user.in.admin.role property can 
have comma separated values.
   
   hive.user.in.admin.role
   bob,adam
   
   
   
   
   ### How was this patch tested?
   Local machine. Remote cluster.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564869)
Remaining Estimate: 0h
Time Spent: 10m

> Disable /longconf.jsp page on HS2 web UI for non admin users
> 
>
> Key: HIVE-24876
> URL: https://issues.apache.org/jira/browse/HIVE-24876
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> /logconf.jsp page should be disabled to the users that are not in admin 
> roles. Otherwise, any user can flood the log files with different log levels 
> that can be configured on HS2 web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24876:
--
Labels: pull-request-available  (was: )

> Disable /longconf.jsp page on HS2 web UI for non admin users
> 
>
> Key: HIVE-24876
> URL: https://issues.apache.org/jira/browse/HIVE-24876
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> /logconf.jsp page should be disabled to the users that are not in admin 
> roles. Otherwise, any user can flood the log files with different log levels 
> that can be configured on HS2 web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users

2021-03-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-24876:



> Disable /longconf.jsp page on HS2 web UI for non admin users
> 
>
> Key: HIVE-24876
> URL: https://issues.apache.org/jira/browse/HIVE-24876
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> /logconf.jsp page should be disabled to the users that are not in admin 
> roles. Otherwise, any user can flood the log files with different log levels 
> that can be configured on HS2 web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value

2021-03-11 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24865:
---
Fix Version/s: 4.0.0

> Implement Respect/Ignore Nulls in first/last_value
> --
>
> Key: HIVE-24865
> URL: https://issues.apache.org/jira/browse/HIVE-24865
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  ::=
> RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Example:
> {code:java}
> select last_value(b) ignore nulls over(partition by a order by b) from t1;
> {code}
> Existing non-standard implementation:
> {code:java}
> select last_value(b, true) over(partition by a order by b) from t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value

2021-03-11 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24865.
---
Resolution: Fixed

Pushed to master. Thanks [~jcamachorodriguez] for review.

> Implement Respect/Ignore Nulls in first/last_value
> --
>
> Key: HIVE-24865
> URL: https://issues.apache.org/jira/browse/HIVE-24865
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  ::=
> RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Example:
> {code:java}
> select last_value(b) ignore nulls over(partition by a order by b) from t1;
> {code}
> Existing non-standard implementation:
> {code:java}
> select last_value(b, true) over(partition by a order by b) from t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24865?focusedWorklogId=564780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564780
 ]

ASF GitHub Bot logged work on HIVE-24865:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 17:40
Start Date: 11/Mar/21 17:40
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2060:
URL: https://github.com/apache/hive/pull/2060


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564780)
Time Spent: 1h 10m  (was: 1h)

> Implement Respect/Ignore Nulls in first/last_value
> --
>
> Key: HIVE-24865
> URL: https://issues.apache.org/jira/browse/HIVE-24865
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser, UDF
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  ::=
> RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Example:
> {code:java}
> select last_value(b) ignore nulls over(partition by a order by b) from t1;
> {code}
> Existing non-standard implementation:
> {code:java}
> select last_value(b, true) over(partition by a order by b) from t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24825) Create AcidMetricsService

2021-03-11 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga updated HIVE-24825:
---
Fix Version/s: 4.0.0

> Create AcidMetricsService
> -
>
> Key: HIVE-24825
> URL: https://issues.apache.org/jira/browse/HIVE-24825
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
> Fix For: 4.0.0
>
>
> Create a new service in HMS, that will collect and publish JMX metrics about 
> ACID related processes and metadata.
>  * There should be a subconfig other than METRICS_ENABLED for acid metrics
>  * The collection frequency should be configurable
>  * The existing oldest initiated compaction and the number of compactions in 
> different statuses metrics collection should be moved here from Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24824) Define metrics for compaction observability

2021-03-11 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24824 started by Peter Varga.
--
> Define metrics for compaction observability
> ---
>
> Key: HIVE-24824
> URL: https://issues.apache.org/jira/browse/HIVE-24824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Many times if there are failures in the Compaction background processes 
> (Initiator, Worker, Cleaner) it is hard notice the problem until it causes 
> serious performance degradation.
> We should create new JMX metrics, that would make it easier to monitor the 
> compaction health. Examples are:
>  * number of failed / initiated compaction
>  * number of aborted txns, oldest aborted txns
>  * tables with disabled compactions and writes
>  * Initiator and Cleaner cycle runtime
>  * Size of ACID metadata tables that should have ~ constant rows 
> (txn_to_writeId, completed_txns)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24825) Create AcidMetricsService

2021-03-11 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24825 started by Peter Varga.
--
> Create AcidMetricsService
> -
>
> Key: HIVE-24825
> URL: https://issues.apache.org/jira/browse/HIVE-24825
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Create a new service in HMS, that will collect and publish JMX metrics about 
> ACID related processes and metadata.
>  * There should be a subconfig other than METRICS_ENABLED for acid metrics
>  * The collection frequency should be configurable
>  * The existing oldest initiated compaction and the number of compactions in 
> different statuses metrics collection should be moved here from Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24825) Create AcidMetricsService

2021-03-11 Thread Peter Varga (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga resolved HIVE-24825.

Resolution: Fixed

> Create AcidMetricsService
> -
>
> Key: HIVE-24825
> URL: https://issues.apache.org/jira/browse/HIVE-24825
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Create a new service in HMS, that will collect and publish JMX metrics about 
> ACID related processes and metadata.
>  * There should be a subconfig other than METRICS_ENABLED for acid metrics
>  * The collection frequency should be configurable
>  * The existing oldest initiated compaction and the number of compactions in 
> different statuses metrics collection should be moved here from Initiator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564737
 ]

ASF GitHub Bot logged work on HIVE-24758:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:50
Start Date: 11/Mar/21 16:50
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1963:
URL: https://github.com/apache/hive/pull/1963#discussion_r592527788



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -253,7 +259,7 @@ public int execute() {
   counters = mergedCounters;
 } catch (Exception err) {
   // Don't fail execution due to counters - just don't print summary 
info
-  LOG.warn("Failed to get counters. Ignoring, summary info will be 
incomplete. " + err, err);
+  LOG.warn("Failed to get counters. Ignoring, summary info will be 
incomplete.", err);

Review comment:
   Missing {} here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564737)
Time Spent: 2h 10m  (was: 2h)

> Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
> -
>
> Key: HIVE-24758
> URL: https://issues.apache.org/jira/browse/HIVE-24758
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In order to get the logs for a particular query, submitted to Tez on YARN, 
> the following pieces of information are required:
> * YARN Application ID
> * TEZ DAG ID
> * HS2 Host that ran the job
> Include this information in TezTask output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24824) Define metrics for compaction observability

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24824?focusedWorklogId=564735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564735
 ]

ASF GitHub Bot logged work on HIVE-24824:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:48
Start Date: 11/Mar/21 16:48
Worklog Time Spent: 10m 
  Work Description: pvargacl merged pull request #2016:
URL: https://github.com/apache/hive/pull/2016


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564735)
Time Spent: 1h 20m  (was: 1h 10m)

> Define metrics for compaction observability
> ---
>
> Key: HIVE-24824
> URL: https://issues.apache.org/jira/browse/HIVE-24824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Many times if there are failures in the Compaction background processes 
> (Initiator, Worker, Cleaner) it is hard notice the problem until it causes 
> serious performance degradation.
> We should create new JMX metrics, that would make it easier to monitor the 
> compaction health. Examples are:
>  * number of failed / initiated compaction
>  * number of aborted txns, oldest aborted txns
>  * tables with disabled compactions and writes
>  * Initiator and Cleaner cycle runtime
>  * Size of ACID metadata tables that should have ~ constant rows 
> (txn_to_writeId, completed_txns)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564717
 ]

ASF GitHub Bot logged work on HIVE-24758:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:25
Start Date: 11/Mar/21 16:25
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1963:
URL: https://github.com/apache/hive/pull/1963#discussion_r592506951



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -236,6 +239,10 @@ public int execute() {
   throw new HiveException("Operation cancelled");
 }
 
+// Log all the info required to find the various logs for this query
+LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session 
ID: [{}]", getHostNameIP(), queryId,

Review comment:
   @pgaref I at least re-used the `hive-common` package feature here.  
Please review once more.  Let not the perfect be the enemy of the good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564717)
Time Spent: 2h  (was: 1h 50m)

> Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
> -
>
> Key: HIVE-24758
> URL: https://issues.apache.org/jira/browse/HIVE-24758
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In order to get the logs for a particular query, submitted to Tez on YARN, 
> the following pieces of information are required:
> * YARN Application ID
> * TEZ DAG ID
> * HS2 Host that ran the job
> Include this information in TezTask output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564710
 ]

ASF GitHub Bot logged work on HIVE-24758:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:18
Start Date: 11/Mar/21 16:18
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1963:
URL: https://github.com/apache/hive/pull/1963#discussion_r592501626



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -236,6 +239,10 @@ public int execute() {
   throw new HiveException("Operation cancelled");
 }
 
+// Log all the info required to find the various logs for this query
+LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session 
ID: [{}]", getHostNameIP(), queryId,

Review comment:
   I'd hate to tie anything to the session state.
   ```java
 // Need to remove this static hack. But this is the way currently to 
get a session.
 SessionState ss = SessionState.get();
   ```
   
   Let me see what else we can do.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564710)
Time Spent: 1h 50m  (was: 1h 40m)

> Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
> -
>
> Key: HIVE-24758
> URL: https://issues.apache.org/jira/browse/HIVE-24758
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In order to get the logs for a particular query, submitted to Tez on YARN, 
> the following pieces of information are required:
> * YARN Application ID
> * TEZ DAG ID
> * HS2 Host that ran the job
> Include this information in TezTask output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564706
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:16
Start Date: 11/Mar/21 16:16
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2020:
URL: https://github.com/apache/hive/pull/2020#discussion_r592499445



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -2213,9 +2213,9 @@ private void create_table_core(final RawStore ms, final 
CreateTableRequest req)
   }
 
   if (!TableType.VIRTUAL_VIEW.toString().equals(tbl.getTableType())) {
-if (tbl.getSd().getLocation() == null
-|| tbl.getSd().getLocation().isEmpty()) {
-  tblPath = wh.getDefaultTablePath(db, tbl);
+if (tbl.getSd().getLocation() == null || 
tbl.getSd().getLocation().isEmpty()) {

Review comment:
   I think we do, just tried it on a cluster, created a transactional table 
with a custom location and everything works as expected (insert, read, 
compaction...)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564706)
Time Spent: 3h 40m  (was: 3.5h)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=564704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564704
 ]

ASF GitHub Bot logged work on HIVE-24739:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:14
Start Date: 11/Mar/21 16:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1946:
URL: https://github.com/apache/hive/pull/1946#issuecomment-796853271


   @pvary Made requested change.  Please review. :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564704)
Time Spent: 7h 50m  (was: 7h 40m)

> Clarify Usage of Thrift TServerEventHandler and Count Number of Messages 
> Processed
> --
>
> Key: HIVE-24739
> URL: https://issues.apache.org/jira/browse/HIVE-24739
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Make the messages emitted from {{TServerEventHandler}} more meaningful.  
> Also, track the number of messages that each client sends to aid in 
> troubleshooting.
> I run into this issue all the time with and this would greatly help clarify 
> the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24832) Remove Spring Artifacts from Log4j Properties Files

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24832?focusedWorklogId=564700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564700
 ]

ASF GitHub Bot logged work on HIVE-24832:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 16:09
Start Date: 11/Mar/21 16:09
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2023:
URL: https://github.com/apache/hive/pull/2023#issuecomment-796850115


   @miklosgergely @pvary Review please? :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564700)
Time Spent: 20m  (was: 10m)

> Remove Spring Artifacts from Log4j Properties Files
> ---
>
> Key: HIVE-24832
> URL: https://issues.apache.org/jira/browse/HIVE-24832
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Getting a warning about a bad FILE logger and it looks like it's coming from 
> some antiquated copy & paste code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564653
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:57
Start Date: 11/Mar/21 14:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592430006



##
File path: iceberg-handler/pom.xml
##
@@ -0,0 +1,189 @@
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+iceberg-handler
+jar
+Hive Iceberg Handler
+
+
+..
+0.11.0
+4.0.2
+1.9.2
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+

Review comment:
   Makes sense. As discussed offline, I've excluded orc, parquet, avro, 
guava and fasterxml only





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564653)
Time Spent: 2h 50m  (was: 2h 40m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564652
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:55
Start Date: 11/Mar/21 14:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592428680



##
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java
##
@@ -190,6 +190,7 @@ private static int daysFromTimestamp(Timestamp timestamp) {
   // We have to use the LocalDateTime to get the micros. See the comment above.
   private static long microsFromTimestamp(Timestamp timestamp) {
 // 
`org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()`
+// since HIVE-21862 changes literal parsing to UTC based timestamps

Review comment:
   You're right, removed it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564652)
Time Spent: 2h 40m  (was: 2.5h)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564651
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:55
Start Date: 11/Mar/21 14:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592428376



##
File path: iceberg-handler/pom.xml
##
@@ -16,41 +16,46 @@
 
 
 ..
-0.11.0
 4.0.2
-1.9.2
+1.9.2

Review comment:
   Good idea, done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564651)
Time Spent: 2.5h  (was: 2h 20m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564649
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:55
Start Date: 11/Mar/21 14:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592428189



##
File path: 
iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java
##
@@ -230,7 +231,8 @@ public void testDateType() {
   public void testTimestampType() {
 Literal timestampLiteral = 
Literal.of("2012-10-02T05:16:17.123456").to(Types.TimestampType.withoutZone());
 long timestampMicros = timestampLiteral.value();
-Timestamp ts = 
Timestamp.valueOf(DateTimeUtil.timestampFromMicros(timestampMicros));
+// 
`org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()`
+Timestamp ts = 
Timestamp.from(DateTimeUtil.timestampFromMicros(timestampMicros).toInstant(ZoneOffset.UTC));

Review comment:
   Removed it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564649)
Time Spent: 2h 20m  (was: 2h 10m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=564628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564628
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:25
Start Date: 11/Mar/21 14:25
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1936:
URL: https://github.com/apache/hive/pull/1936#discussion_r592395172



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java
##
@@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws 
Throwable {
 .verifyResult(inc2Tuple.lastReplicationId);
   }
 
-  private void assertFalseExternalFileList(Path externalTableFileList)
-  throws IOException {
+  private void assertFalseExternalFileList(String dumpLocation)

Review comment:
   Can you please move this method to ReplicationTestUtils itself. We have 
duplicate code for this method on two classes.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java
##
@@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws 
Throwable {
 .verifyResult(inc2Tuple.lastReplicationId);
   }
 
-  private void assertFalseExternalFileList(Path externalTableFileList)
-  throws IOException {
+  private void assertFalseExternalFileList(String dumpLocation)
+  throws IOException {

Review comment:
   I think it doesn't throw IOException 

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -2225,17 +2224,11 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
   /*
* Method used from TestReplicationScenariosExclusiveReplica
*/
-  private void assertExternalFileInfo(List expected, String 
dumplocation, boolean isIncremental,
+  private void assertExternalFileList(List expected, String 
dumplocation,
   WarehouseInstance warehouseInstance)
   throws IOException {
 Path hivePath = new Path(dumplocation, ReplUtils.REPL_HIVE_BASE_DIR);
-Path metadataPath = new Path(hivePath, EximUtil.METADATA_PATH_NAME);
-Path externalTableInfoFile;
-if (isIncremental) {
-  externalTableInfoFile = new Path(hivePath, FILE_NAME);
-} else {
-  externalTableInfoFile = new Path(metadataPath, 
primaryDbName.toLowerCase() + File.separator + FILE_NAME);
-}
-ReplicationTestUtils.assertExternalFileInfo(warehouseInstance, expected, 
externalTableInfoFile);
+Path externalTblFileList = new Path(hivePath, EximUtil.FILE_LIST_EXTERNAL);

Review comment:
   This is still not addressed. The same method(and code) is defined in 
three classes. assertExternalFileList. And essentially they aren't doing more 
than the path formation. As discussed, can we not use the 
ReplicationTestUtils.assertExternalFileList directly by modifying the signature 
a bit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564628)
Time Spent: 6h  (was: 5h 50m)

> Moving to file based iteration for copying data
> ---
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, 
> HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564619
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:12
Start Date: 11/Mar/21 14:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2020:
URL: https://github.com/apache/hive/pull/2020#discussion_r592392273



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -1129,14 +1129,17 @@ public void createTable(Table tbl, boolean ifNotExists,
   principalPrivs.setRolePrivileges(grants.getRoleGrants());
   tTbl.setPrivileges(principalPrivs);
 }
+if (HiveConf.getBoolVar(conf, 
ConfVars.HIVE_TXN_LOCKLESS_READS_ENABLED) && 
AcidUtils.isTransactionalTable(tbl)) {

Review comment:
   i don't think it would be a problem, but i'll double check. Thing is 
that I am indirectly passing HIVE_TXN_LOCKLESS_READS_ENABLED config value to 
HMS via the txnId attribute. If set - HIVE_TXN_LOCKLESS_READS_ENABLED was 
enabled.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564619)
Time Spent: 3h 10m  (was: 3h)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564623
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:17
Start Date: 11/Mar/21 14:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2020:
URL: https://github.com/apache/hive/pull/2020#discussion_r592395791



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -2213,9 +2213,9 @@ private void create_table_core(final RawStore ms, final 
CreateTableRequest req)
   }
 
   if (!TableType.VIRTUAL_VIEW.toString().equals(tbl.getTableType())) {
-if (tbl.getSd().getLocation() == null
-|| tbl.getSd().getLocation().isEmpty()) {
-  tblPath = wh.getDefaultTablePath(db, tbl);
+if (tbl.getSd().getLocation() == null || 
tbl.getSd().getLocation().isEmpty()) {

Review comment:
   i don't think we support custom location for acid managed tables, do we? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564623)
Time Spent: 3.5h  (was: 3h 20m)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564622
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:14
Start Date: 11/Mar/21 14:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2020:
URL: https://github.com/apache/hive/pull/2020#discussion_r592393806



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2997,6 +2997,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+HIVE_TXN_LOCKLESS_READS_ENABLED("hive.txn.lockless.reads.enabled", false,

Review comment:
   makes sense. i'll create a separate config for async drop, however 
HIVE_TXN_LOCKLESS_READS_ENABLED could be still leveraged to enable whole 
lockless read feature.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564622)
Time Spent: 3h 20m  (was: 3h 10m)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564616
 ]

ASF GitHub Bot logged work on HIVE-24445:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 14:02
Start Date: 11/Mar/21 14:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2020:
URL: https://github.com/apache/hive/pull/2020#discussion_r592384397



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -2863,6 +2862,10 @@ private boolean drop_table_core(final RawStore ms, final 
String catName, final S
 deletePartitionData(partPaths, ifPurge, 
ReplChangeManager.shouldEnableCm(db, tbl));
 // Delete the data in the table
 deleteTableData(tblPath, ifPurge, ReplChangeManager.shouldEnableCm(db, 
tbl));
+  } else if (TxnUtils.isTransactionalTable(tbl)) {
+CompactionRequest rqst = new CompactionRequest(dbname, name, 
CompactionType.MAJOR);

Review comment:
   Yeah, that's just a placeholder for marking table as "ready for 
cleaning". 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564616)
Time Spent: 3h  (was: 2h 50m)

> Non blocking DROP table implementation
> --
>
> Key: HIVE-24445
> URL: https://issues.apache.org/jira/browse/HIVE-24445
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Implement a way to execute drop table operations in a way that doesn't have 
> to wait for currently running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24873:

Description: 
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
Reducer 2
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
window function: GenericUDAFSumHiveDecimal
window frame: ROWS PRECEDING(MAX)~CURRENT

...

Reducer 8
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
  window function: GenericUDAFSumHiveDecimal
  window frame: ROWS PRECEDING(MAX)~CURRENT |
{code}


The interesting part is:
{code}
explain vectorization detail select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date;
{code}

the same applies to query63:
{code}
...
,avg(sum(ss_sales_price)) over (partition by i_manager_id) avg_monthly_sales
...
{code}

  was:
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
Reducer 2
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
window function: GenericUDAFSumHiveDecimal
window frame: ROWS PRECEDING(MAX)~CURRENT

...

Reducer 8
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
  window function: GenericUDAFSumHiveDecimal
  window frame: ROWS PRECEDING(MAX)~CURRENT

[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24873:

Description: 
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
Reducer 2
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
window function: GenericUDAFSumHiveDecimal
window frame: ROWS PRECEDING(MAX)~CURRENT

...

Reducer 8
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
  window function: GenericUDAFSumHiveDecimal
  window frame: ROWS PRECEDING(MAX)~CURRENT |
{code}


The interesting part is:
{code}
explain vectorization detail select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date;
{code}

  was:
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
Reducer 2
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
window function: GenericUDAFSumHiveDecimal
window frame: ROWS PRECEDING(MAX)~CURRENT

...

Reducer 8
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
  window function: GenericUDAFSumHiveDecimal
  window frame: ROWS PRECEDING(MAX)~CURRENT |
{code}


> TPCDS query51 doesn't vectorize:   Only PTF directly under reduce-shuffle is 
> supported
>

[jira] [Work logged] (HIVE-24817) "not in" clause returns incorrect data when there is coercion

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24817?focusedWorklogId=564608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564608
 ]

ASF GitHub Bot logged work on HIVE-24817:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 13:49
Start Date: 11/Mar/21 13:49
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera commented on a change in pull request 
#2027:
URL: https://github.com/apache/hive/pull/2027#discussion_r592373912



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
 T columnDesc = children.get(0);
 T valueDesc = interpretNode(columnDesc, children.get(i));
 if (valueDesc == null) {
-  if (hasNullValue) {
-// Skip if null value has already been added
-continue;
-  }
-  TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+  // Keep original
+  TypeInfo targetType = exprFactory.getTypeInfo(children.get(i));
   if (!expressions.containsKey(targetType)) {
 expressions.put(targetType, columnDesc);
   }
-  T nullConst = exprFactory.createConstantExpr(targetType, null);
-  expressions.put(targetType, nullConst);
-  hasNullValue = true;
+  expressions.put(targetType, children.get(i));
 } else {

Review comment:
   Yeah, not in (null) is a very weird construct.  Equivalent to why you 
need to say "x IS NULL" and you can't say "x = NULL"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564608)
Time Spent: 2h 10m  (was: 2h)

> "not in" clause returns incorrect data when there is coercion
> -
>
> Key: HIVE-24817
> URL: https://issues.apache.org/jira/browse/HIVE-24817
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When the query has a where clause that has an integer column checking against 
> being "not in" a decimal column, the decimal column is being changed to null, 
> causing incorrect results.
> This is a sample query of a failure:
> select count(*) from my_tbl where int_col not in (355.8);
> Since the int_col can never be 355.8, one would expect all the rows to be 
> returned, but it is changing the 355.8 into a null value causing no rows to 
> be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: reduce-shuffle is supported

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24873:

Description: 
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
Reducer 2
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
window function: GenericUDAFSumHiveDecimal
window frame: ROWS PRECEDING(MAX)~CURRENT

...

Reducer 8
notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
supported
window functions:
  window function: GenericUDAFSumHiveDecimal
  window frame: ROWS PRECEDING(MAX)~CURRENT |
{code}

  was:
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
| Reducer 2  |
...
| notVectorizedReason: PTF operator: Only PTF directly under 
reduce-shuffle is supported |
{code}


> TPCDS query51 doesn't vectorize:  reduce-shuffle is supported
> -
>
> Key: HIVE-24873
> URL: https://issues.apache.org/jira/browse/HIVE-24873
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> {code}
> EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
> select
>   ws_item_sk item_sk, d_date,
>   sum(sum(ws_sales_price))
>   over (partition by ws_item_sk order by d_date rows between unbounded 
> preceding and current row) cume_sales
> from web_sales
> ,date_dim
> where ws_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ws_item_sk is not NULL
> group by ws_item_sk, d_date),

[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24873:

Summary: TPCDS query51 doesn't vectorize:   Only PTF directly under 
reduce-shuffle is supported  (was: TPCDS query51 doesn't vectorize:  
reduce-shuffle is supported)

> TPCDS query51 doesn't vectorize:   Only PTF directly under reduce-shuffle is 
> supported
> --
>
> Key: HIVE-24873
> URL: https://issues.apache.org/jira/browse/HIVE-24873
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> {code}
> EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
> select
>   ws_item_sk item_sk, d_date,
>   sum(sum(ws_sales_price))
>   over (partition by ws_item_sk order by d_date rows between unbounded 
> preceding and current row) cume_sales
> from web_sales
> ,date_dim
> where ws_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ws_item_sk is not NULL
> group by ws_item_sk, d_date),
> store_v1 as (
> select
>   ss_item_sk item_sk, d_date,
>   sum(sum(ss_sales_price))
>   over (partition by ss_item_sk order by d_date rows between unbounded 
> preceding and current row) cume_sales
> from store_sales
> ,date_dim
> where ss_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ss_item_sk is not NULL
> group by ss_item_sk, d_date)
>  select  *
> from (select item_sk
>  ,d_date
>  ,web_sales
>  ,store_sales
>  ,max(web_sales)
>  over (partition by item_sk order by d_date rows between unbounded 
> preceding and current row) web_cumulative
>  ,max(store_sales)
>  over (partition by item_sk order by d_date rows between unbounded 
> preceding and current row) store_cumulative
>  from (select case when web.item_sk is not null then web.item_sk else 
> store.item_sk end item_sk
>  ,case when web.d_date is not null then web.d_date else 
> store.d_date end d_date
>  ,web.cume_sales web_sales
>  ,store.cume_sales store_sales
>from web_v1 web full outer join store_v1 store on (web.item_sk = 
> store.item_sk
>   and web.d_date = 
> store.d_date)
>   )x )y
> where web_cumulative > store_cumulative
> order by item_sk
> ,d_date
> limit 100;
> {code}
> {code}
> Reducer 2
> notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
> supported
> window functions:
> window function: GenericUDAFSumHiveDecimal
> window frame: ROWS PRECEDING(MAX)~CURRENT
> ...
> Reducer 8
> notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is 
> supported
> window functions:
>   window function: GenericUDAFSumHiveDecimal
>   window frame: ROWS PRECEDING(MAX)~CURRENT |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24874) Worker performance metric

2021-03-11 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24874:
--
Description: 
Wrap Compaction Worker with PerformanceLogger.
Major and Minor compactions should be measured to separate metrics.

> Worker performance metric
> -
>
> Key: HIVE-24874
> URL: https://issues.apache.org/jira/browse/HIVE-24874
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Wrap Compaction Worker with PerformanceLogger.
> Major and Minor compactions should be measured to separate metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: reduce-shuffle is supported

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24873:

Description: 
{code}
EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
select
  ws_item_sk item_sk, d_date,
  sum(sum(ws_sales_price))
  over (partition by ws_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from web_sales
,date_dim
where ws_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ws_item_sk is not NULL
group by ws_item_sk, d_date),
store_v1 as (
select
  ss_item_sk item_sk, d_date,
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1214 and 1214+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date)
 select  *
from (select item_sk
 ,d_date
 ,web_sales
 ,store_sales
 ,max(web_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) web_cumulative
 ,max(store_sales)
 over (partition by item_sk order by d_date rows between unbounded 
preceding and current row) store_cumulative
 from (select case when web.item_sk is not null then web.item_sk else 
store.item_sk end item_sk
 ,case when web.d_date is not null then web.d_date else 
store.d_date end d_date
 ,web.cume_sales web_sales
 ,store.cume_sales store_sales
   from web_v1 web full outer join store_v1 store on (web.item_sk = 
store.item_sk
  and web.d_date = 
store.d_date)
  )x )y
where web_cumulative > store_cumulative
order by item_sk
,d_date
limit 100;
{code}

{code}
| Reducer 2  |
...
| notVectorizedReason: PTF operator: Only PTF directly under 
reduce-shuffle is supported |
{code}

> TPCDS query51 doesn't vectorize:  reduce-shuffle is supported
> -
>
> Key: HIVE-24873
> URL: https://issues.apache.org/jira/browse/HIVE-24873
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> {code}
> EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
> select
>   ws_item_sk item_sk, d_date,
>   sum(sum(ws_sales_price))
>   over (partition by ws_item_sk order by d_date rows between unbounded 
> preceding and current row) cume_sales
> from web_sales
> ,date_dim
> where ws_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ws_item_sk is not NULL
> group by ws_item_sk, d_date),
> store_v1 as (
> select
>   ss_item_sk item_sk, d_date,
>   sum(sum(ss_sales_price))
>   over (partition by ss_item_sk order by d_date rows between unbounded 
> preceding and current row) cume_sales
> from store_sales
> ,date_dim
> where ss_sold_date_sk=d_date_sk
>   and d_month_seq between 1214 and 1214+11
>   and ss_item_sk is not NULL
> group by ss_item_sk, d_date)
>  select  *
> from (select item_sk
>  ,d_date
>  ,web_sales
>  ,store_sales
>  ,max(web_sales)
>  over (partition by item_sk order by d_date rows between unbounded 
> preceding and current row) web_cumulative
>  ,max(store_sales)
>  over (partition by item_sk order by d_date rows between unbounded 
> preceding and current row) store_cumulative
>  from (select case when web.item_sk is not null then web.item_sk else 
> store.item_sk end item_sk
>  ,case when web.d_date is not null then web.d_date else 
> store.d_date end d_date
>  ,web.cume_sales web_sales
>  ,store.cume_sales store_sales
>from web_v1 web full outer join store_v1 store on (web.item_sk = 
> store.item_sk
>   and web.d_date = 
> store.d_date)
>   )x )y
> where web_cumulative > store_cumulative
> order by item_sk
> ,d_date
> limit 100;
> {code}
> {code}
> | Reducer 2  |
> ...
> | notVectorizedReason: PTF operator: Only PTF directly under 
> reduce-shuffle is supported |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24761) Support vectorization for bounded windows in PTF

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24761:

Parent: HIVE-24872
Issue Type: Sub-task  (was: Improvement)

> Support vectorization for bounded windows in PTF
> 
>
> Key: HIVE-24761
> URL: https://issues.apache.org/jira/browse/HIVE-24761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> {code}
>  notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is 
> supported
> {code}
> Currently, bounded windows are not supported in VectorPTFOperator. If we 
> simply remove the check compile-time:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911
> {code}
>   if (!windowFrameDef.isStartUnbounded()) {
> setOperatorIssue(functionName + " only UNBOUNDED start frame is 
> supported");
> return false;
>   }
> {code}
> We get incorrect results, that's because vectorized codepath completely 
> ignores boundaries, and simply iterates through all the input batches in 
> [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]:
> {code}
> for (VectorPTFEvaluatorBase evaluator : evaluators) {
>   evaluator.evaluateGroupBatch(batch);
>   if (isLastGroupBatch) {
> evaluator.doLastBatchWork();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24824) Define metrics for compaction observability

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24824?focusedWorklogId=564567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564567
 ]

ASF GitHub Bot logged work on HIVE-24824:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 12:33
Start Date: 11/Mar/21 12:33
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2016:
URL: https://github.com/apache/hive/pull/2016#discussion_r592322560



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java
##
@@ -865,7 +865,7 @@ public void processCompactionCandidatesInParallel() throws 
Exception {
   }
 
   @Test
-  public void testInitiatorMetricsEnabled() throws Exception {
+  public void testAcidMetricsEnabled() throws Exception {

Review comment:
   Moved all the test to a common place TestCompactionMetrics.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564567)
Time Spent: 1h 10m  (was: 1h)

> Define metrics for compaction observability
> ---
>
> Key: HIVE-24824
> URL: https://issues.apache.org/jira/browse/HIVE-24824
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Many times if there are failures in the Compaction background processes 
> (Initiator, Worker, Cleaner) it is hard notice the problem until it causes 
> serious performance degradation.
> We should create new JMX metrics, that would make it easier to monitor the 
> compaction health. Examples are:
>  * number of failed / initiated compaction
>  * number of aborted txns, oldest aborted txns
>  * tables with disabled compactions and writes
>  * Initiator and Cleaner cycle runtime
>  * Size of ACID metadata tables that should have ~ constant rows 
> (txn_to_writeId, completed_txns)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564560
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 12:16
Start Date: 11/Mar/21 12:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2058:
URL: https://github.com/apache/hive/pull/2058#issuecomment-796694558


   Checked the relevant files, left a few comments



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564560)
Time Spent: 2h 10m  (was: 2h)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564553
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 12:07
Start Date: 11/Mar/21 12:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592306506



##
File path: 
iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java
##
@@ -230,7 +231,8 @@ public void testDateType() {
   public void testTimestampType() {
 Literal timestampLiteral = 
Literal.of("2012-10-02T05:16:17.123456").to(Types.TimestampType.withoutZone());
 long timestampMicros = timestampLiteral.value();
-Timestamp ts = 
Timestamp.valueOf(DateTimeUtil.timestampFromMicros(timestampMicros));
+// 
`org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()`
+Timestamp ts = 
Timestamp.from(DateTimeUtil.timestampFromMicros(timestampMicros).toInstant(ZoneOffset.UTC));

Review comment:
   Same problem than with the comment in `HiveIcebergFilterFactory`
   Either remove the comment, or rephrase it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564553)
Time Spent: 2h  (was: 1h 50m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564546
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:59
Start Date: 11/Mar/21 11:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2061:
URL: https://github.com/apache/hive/pull/2061#discussion_r592301473



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java
##
@@ -44,11 +44,7 @@
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
+import java.util.*;

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564546)
Time Spent: 50m  (was: 40m)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564545
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:58
Start Date: 11/Mar/21 11:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592300835



##
File path: iceberg-handler/pom.xml
##
@@ -16,41 +16,46 @@
 
 
 ..
-0.11.0
 4.0.2
-1.9.2
+1.9.2

Review comment:
   Maybe it is just a little bit misleading name. We will use the 1.9.2 
avro for Iceberg as well too.
   Shall we just rename all the iceberg specific versions to:
   - iceberg.kryo.version
   - iceberg.avro.version
   ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564545)
Time Spent: 1h 50m  (was: 1h 40m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564544
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:56
Start Date: 11/Mar/21 11:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2061:
URL: https://github.com/apache/hive/pull/2061#discussion_r592299917



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -171,9 +176,12 @@ public void run() {
   StringUtils.stringifyException(t));
 }
 finally {
-  if(handle != null) {
+  if (handle != null) {
 handle.releaseLocks();
   }
+  if (metricsEnabled) {

Review comment:
   checked: it won't fail and won't report metric in this case





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564544)
Time Spent: 40m  (was: 0.5h)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564542
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:53
Start Date: 11/Mar/21 11:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592297779



##
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java
##
@@ -190,6 +190,7 @@ private static int daysFromTimestamp(Timestamp timestamp) {
   // We have to use the LocalDateTime to get the micros. See the comment above.
   private static long microsFromTimestamp(Timestamp timestamp) {
 // 
`org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()`
+// since HIVE-21862 changes literal parsing to UTC based timestamps

Review comment:
   The comment in this form does not make sense to me 😢 
   Either:
   ```
   // HIVE-21862 changes literal parsing to UTC based timestamps to this:
   // 
`org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()`
   ```
   Or just remove the comment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564542)
Time Spent: 1h 40m  (was: 1.5h)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564537
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:43
Start Date: 11/Mar/21 11:43
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2061:
URL: https://github.com/apache/hive/pull/2061#discussion_r592291682



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java
##
@@ -44,11 +44,7 @@
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
+import java.util.*;

Review comment:
   revert this please





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564537)
Time Spent: 0.5h  (was: 20m)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564535
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:41
Start Date: 11/Mar/21 11:41
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #2061:
URL: https://github.com/apache/hive/pull/2061#discussion_r592290451



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -171,9 +176,12 @@ public void run() {
   StringUtils.stringifyException(t));
 }
 finally {
-  if(handle != null) {
+  if (handle != null) {
 handle.releaseLocks();
   }
+  if (metricsEnabled) {

Review comment:
   Won't this fail, if the acquireLock times out and the perflogger was not 
started? The PerfLogger in ql has a method for checking this 
startTimeHasMethod, maybe it worth to copy that and use it here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564535)
Time Spent: 20m  (was: 10m)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564534
 ]

ASF GitHub Bot logged work on HIVE-24867:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:40
Start Date: 11/Mar/21 11:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2058:
URL: https://github.com/apache/hive/pull/2058#discussion_r592290164



##
File path: iceberg-handler/pom.xml
##
@@ -0,0 +1,189 @@
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+iceberg-handler
+jar
+Hive Iceberg Handler
+
+
+..
+0.11.0
+4.0.2
+1.9.2
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+

Review comment:
   I would try to focus on keeping the same source files as in the Iceberg 
repo, so we can easily port changes between the 2, but otherwise I would not 
try to stick to the same things just because it was the same there.
   
   OTOH if we know the reason why they were removed and it applies here too, 
then we should do the same





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564534)
Time Spent: 1.5h  (was: 1h 20m)

> Create iceberg-handler module in Hive
> -
>
> Key: HIVE-24867
> URL: https://issues.apache.org/jira/browse/HIVE-24867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-handler module in Hive
>  * Copy the code from the Iceberg/iceberg-mr module into this new Hive module
>  * Make necessary changes so it compiles with Hive 4.0.0 dependencies 
> (iceberg-mr code was based on Hive 3.1)
>  * Make sure all tests pass



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24857) Trigger Tez output commit after close operation

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24857?focusedWorklogId=564531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564531
 ]

ASF GitHub Bot logged work on HIVE-24857:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 11:36
Start Date: 11/Mar/21 11:36
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2048:
URL: https://github.com/apache/hive/pull/2048


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564531)
Time Spent: 0.5h  (was: 20m)

> Trigger Tez output commit after close operation
> ---
>
> Key: HIVE-24857
> URL: https://issues.apache.org/jira/browse/HIVE-24857
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Tez triggers the OutputCommitter.commit() operation between the 
> proc.run() and proc.close() operations in TezProcessor. However, when writing 
> out data, calling the proc.close() operation may still produce some extra 
> records, which would be missed by the output committer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24857) Trigger Tez output commit after close operation

2021-03-11 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-24857.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the patch [~Marton Bod]!

> Trigger Tez output commit after close operation
> ---
>
> Key: HIVE-24857
> URL: https://issues.apache.org/jira/browse/HIVE-24857
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Tez triggers the OutputCommitter.commit() operation between the 
> proc.run() and proc.close() operations in TezProcessor. However, when writing 
> out data, calling the proc.close() operation may still produce some extra 
> records, which would be missed by the output committer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance metrics

2021-03-11 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24871:
--
Summary: Initiator / Cleaner performance metrics  (was: Initiator / Cleaner 
performance should be measured with PerformanceLogger)

> Initiator / Cleaner performance metrics
> ---
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564495
 ]

ASF GitHub Bot logged work on HIVE-24871:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 10:11
Start Date: 11/Mar/21 10:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #2061:
URL: https://github.com/apache/hive/pull/2061


   …erformanceLogger
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564495)
Remaining Estimate: 0h
Time Spent: 10m

> Initiator / Cleaner performance should be measured with PerformanceLogger
> -
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24871:
--
Labels: pull-request-available  (was: )

> Initiator / Cleaner performance should be measured with PerformanceLogger
> -
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-24862.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Fix race condition causing NPE during dynamic partition loading
> ---
>
> Key: HIVE-24862
> URL: https://issues.apache.org/jira/browse/HIVE-24862
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Following properties default to 15 threads.
> {noformat}
> hive.load.dynamic.partitions.thread
> hive.mv.files.thread  
> {noformat}
> During loadDynamicPartitions, it ends ups initializing {{newFiles}} without 
> synchronization (HIVE-20661, HIVE-24738). 
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871]
> This causes race condition when dynamic partition thread internally makes use 
> of {{hive.mv.files.threads}} in copyFiles/replaceFiles. 
>  This causes "NPE" during retrieval in {{addInsertFileInformation()}}.
>  
> e.g stacktrace
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740)
>   at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566)
>   at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24862?focusedWorklogId=564490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564490
 ]

ASF GitHub Bot logged work on HIVE-24862:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 09:54
Start Date: 11/Mar/21 09:54
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2053:
URL: https://github.com/apache/hive/pull/2053


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564490)
Time Spent: 40m  (was: 0.5h)

> Fix race condition causing NPE during dynamic partition loading
> ---
>
> Key: HIVE-24862
> URL: https://issues.apache.org/jira/browse/HIVE-24862
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Following properties default to 15 threads.
> {noformat}
> hive.load.dynamic.partitions.thread
> hive.mv.files.thread  
> {noformat}
> During loadDynamicPartitions, it ends ups initializing {{newFiles}} without 
> synchronization (HIVE-20661, HIVE-24738). 
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871]
> This causes race condition when dynamic partition thread internally makes use 
> of {{hive.mv.files.threads}} in copyFiles/replaceFiles. 
>  This causes "NPE" during retrieval in {{addInsertFileInformation()}}.
>  
> e.g stacktrace
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740)
>   at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566)
>   at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading

2021-03-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299453#comment-17299453
 ] 

Ádám Szita commented on HIVE-24862:
---

Committed to master, thanks [~zchovan]!

> Fix race condition causing NPE during dynamic partition loading
> ---
>
> Key: HIVE-24862
> URL: https://issues.apache.org/jira/browse/HIVE-24862
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Following properties default to 15 threads.
> {noformat}
> hive.load.dynamic.partitions.thread
> hive.mv.files.thread  
> {noformat}
> During loadDynamicPartitions, it ends ups initializing {{newFiles}} without 
> synchronization (HIVE-20661, HIVE-24738). 
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871]
> This causes race condition when dynamic partition thread internally makes use 
> of {{hive.mv.files.threads}} in copyFiles/replaceFiles. 
>  This causes "NPE" during retrieval in {{addInsertFileInformation()}}.
>  
> e.g stacktrace
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740)
>   at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566)
>   at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24812) Disable sharedworkoptimizer remove semijoin by default

2021-03-11 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24812.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Krisztian for reviewing the changes!

> Disable sharedworkoptimizer remove semijoin by default
> --
>
> Key: HIVE-24812
> URL: https://issues.apache.org/jira/browse/HIVE-24812
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SJ removal backfired a bit when I was testing stuff - because of the 
> additional opportunities paralleledges may enable ; because it will increased 
> the shuffled memory amount and/or even make MJ broadcast inputs larger
> set hive.optimize.shared.work.semijoin=false by default for now
> right now it's better to leave dppunion to pick up these cases instead of 
> removing the SJ fully - after HIVE-24376 we might enable it back 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24812) Disable sharedworkoptimizer remove semijoin by default

2021-03-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24812?focusedWorklogId=564487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564487
 ]

ASF GitHub Bot logged work on HIVE-24812:
-

Author: ASF GitHub Bot
Created on: 11/Mar/21 09:52
Start Date: 11/Mar/21 09:52
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #2006:
URL: https://github.com/apache/hive/pull/2006


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 564487)
Time Spent: 0.5h  (was: 20m)

> Disable sharedworkoptimizer remove semijoin by default
> --
>
> Key: HIVE-24812
> URL: https://issues.apache.org/jira/browse/HIVE-24812
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SJ removal backfired a bit when I was testing stuff - because of the 
> additional opportunities paralleledges may enable ; because it will increased 
> the shuffled memory amount and/or even make MJ broadcast inputs larger
> set hive.optimize.shared.work.semijoin=false by default for now
> right now it's better to leave dppunion to pick up these cases instead of 
> removing the SJ fully - after HIVE-24376 we might enable it back 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger

2021-03-11 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24871:
--
Description: 
The PerformanceLogger should be used in Initiator and Cleaner service.
 * One cycle of Initiator should be measured, with ignoring the time spent 
waiting on the lock for AUX table
 * One compaction cleanup should be measured in Cleaner (using different metric 
for major and minor compaction cleanup)

Important note: the PerformanceLogger implementation from metastore should be 
used (not the ql one) otherwise the metric won't be published in HMS.

> Initiator / Cleaner performance should be measured with PerformanceLogger
> -
>
> Key: HIVE-24871
> URL: https://issues.apache.org/jira/browse/HIVE-24871
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> The PerformanceLogger should be used in Initiator and Cleaner service.
>  * One cycle of Initiator should be measured, with ignoring the time spent 
> waiting on the lock for AUX table
>  * One compaction cleanup should be measured in Cleaner (using different 
> metric for major and minor compaction cleanup)
> Important note: the PerformanceLogger implementation from metastore should be 
> used (not the ql one) otherwise the metric won't be published in HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously in batches

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Description: 
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of operation. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982
{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
  long count = ((Long)query.execute(oldCD)).longValue();

  //if no other SD references this CD, we can throw it out.
  if (count == 0) {
{code}

My proposal is to run this in a batched way, in every configurable amount of 
seconds/minutes/whatever.

  was:
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of operation. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

My proposal is to run this in a batched way, in every configurable amount of 
seconds/minutes/whatever.


> Metastore: cleanup unused column descriptors asynchronously in batches
> --
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of operation. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
>   long count = ((Long)query.execute(oldCD)).longValue();
>   //if no other SD references this CD, we can throw it out.
>   if (count == 0) {
> {code}
> My proposal is to run this in a batched way, in every configurable amount of 
> seconds/minutes/whatever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously in batches

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Summary: Metastore: cleanup unused column descriptors asynchronously in 
batches  (was: Metastore: cleanup unused column descriptors asynchronously)

> Metastore: cleanup unused column descriptors asynchronously in batches
> --
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of operation. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}
> My proposal is to run this in a batched way, in every configurable amount of 
> seconds/minutes/whatever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Description: 
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of operation. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

My proposal is to run this in a batched way, in every configurable amount of 
seconds/minutes/whatever.

  was:
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

My proposal is to run this in a batched way, in every configurable amount of 
seconds/minutes/whatever.


> Metastore: cleanup unused column descriptors asynchronously
> ---
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of operation. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}
> My proposal is to run this in a batched way, in every configurable amount of 
> seconds/minutes/whatever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Description: 
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

My proposal is to run this in a batched way, in every configurable amount of 
seconds/minutes/whatever.

  was:
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}


> Metastore: cleanup unused column descriptors asynchronously
> ---
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of opeartion. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}
> My proposal is to run this in a batched way, in every configurable amount of 
> seconds/minutes/whatever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Description: 
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately. Moreover, there is a 
{code}
select count(*) from "SDS" where "CD_ID"=12345;
{code}
kind of query in it, which can take a relatively long time compared to alter 
partition. 

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

  was:
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately.

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}


> Metastore: cleanup unused column descriptors asynchronously
> ---
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of opeartion. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:

Description: 
HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
in every alter partition kind of opeartion. During a replication, 
alterPartition could be a heavy path, and has no direct advantage of running 
removeUnusedColumnDescriptor immediately.

{code}
  query = pm.newQuery("select count(1) from " +
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
  query.declareParameters("MColumnDescriptor inCD");
{code}

> Metastore: cleanup unused column descriptors asynchronously
> ---
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of opeartion. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately.
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously

2021-03-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-24870:
---

Assignee: László Bodor

> Metastore: cleanup unused column descriptors asynchronously
> ---
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of opeartion. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately.
> {code}
>   query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>   query.declareParameters("MColumnDescriptor inCD");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-03-11 Thread Ashish Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299422#comment-17299422
 ] 

Ashish Sharma commented on HIVE-23820:
--

[~kishendas] i am working on HIVE-23571. For which this ticket is blocker. If 
you are not working on this ticket then can i pick this ticket up so that I can 
implement  end to end writeid and tableid check from client to objectstore and 
rawstore.

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

80 matches

Mail list logo