[jira] [Updated] (HIVE-24818) REPL LOAD of views with partitions fails
[ https://issues.apache.org/jira/browse/HIVE-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Shekhar updated HIVE-24818: -- Summary: REPL LOAD of views with partitions fails (was: REPL LOAD (Bootstrap ) of views with partitions fails ) > REPL LOAD of views with partitions fails > - > > Key: HIVE-24818 > URL: https://issues.apache.org/jira/browse/HIVE-24818 > Project: Hive > Issue Type: Bug > Components: repl >Reporter: Anurag Shekhar >Assignee: Anurag Shekhar >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console
[ https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565114 ] ASF GitHub Bot logged work on HIVE-23779: - Author: ASF GitHub Bot Created on: 12/Mar/21 05:08 Start Date: 12/Mar/21 05:08 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2064: URL: https://github.com/apache/hive/pull/2064#issuecomment-797237927 > @dengzhhu653, Basic stats of partitions touched by ETL will be printed on beeline console, i don't think it will be huge. I wonder for the cases like loading dynamic partitions, there may be somthing unpredictable... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 565114) Time Spent: 1h 40m (was: 1.5h) > BasicStatsTask Info is not getting printed in beeline console > - > > Key: HIVE-23779 > URL: https://issues.apache.org/jira/browse/HIVE-23779 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > After HIVE-16061, partition basic stats are not getting printed in beeline > console. > {code:java} > INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, > totalSize=14607, rawDataSize=0]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console
[ https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565095 ] ASF GitHub Bot logged work on HIVE-23779: - Author: ASF GitHub Bot Created on: 12/Mar/21 03:39 Start Date: 12/Mar/21 03:39 Worklog Time Spent: 10m Work Description: nareshpr commented on pull request #2064: URL: https://github.com/apache/hive/pull/2064#issuecomment-797212918 @dengzhhu653, Basic stats of partitions touched by ETL will be printed on beeline console, i don't think it will be huge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 565095) Time Spent: 1.5h (was: 1h 20m) > BasicStatsTask Info is not getting printed in beeline console > - > > Key: HIVE-23779 > URL: https://issues.apache.org/jira/browse/HIVE-23779 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > After HIVE-16061, partition basic stats are not getting printed in beeline > console. > {code:java} > INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, > totalSize=14607, rawDataSize=0]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24372) can't find applicationId in InPlaceUpdateStream when tasks run in parallel
[ https://issues.apache.org/jira/browse/HIVE-24372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300016#comment-17300016 ] Zhihua Deng commented on HIVE-24372: Maybe https://issues.apache.org/jira/browse/HIVE-22416 and https://issues.apache.org/jira/browse/HIVE-21722 can fix this problem... > can't find applicationId in InPlaceUpdateStream when tasks run in parallel > -- > > Key: HIVE-24372 > URL: https://issues.apache.org/jira/browse/HIVE-24372 > Project: Hive > Issue Type: Improvement > Components: Beeline, Tez >Affects Versions: 3.1.0 > Environment: hadoop 3.1.0 > hive 3.1.0 > hive.execution.engine=tez > >Reporter: lhy >Priority: Major > Attachments: image-2020-11-12-13-16-28-228.png, > image-2020-11-12-13-16-33-852.png, image-2020-11-12-13-17-21-675.png > > > if hive.exec.parallel = false(hive.session.silent=false) > then we can found log "INFO : Status: Running (Executing on YARN cluster with > App id application_1603689507490_0109)" in the console. > !image-2020-11-12-13-16-33-852.png! > if hive.exec.parallel = true(hive.session.silent=false) > then we can't found log to show application id ,but we can found it in > hiveserver2.log > !image-2020-11-12-13-17-21-675.png! > > hclidriver can show application logs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
[ https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=565089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565089 ] ASF GitHub Bot logged work on HIVE-24739: - Author: ASF GitHub Bot Created on: 12/Mar/21 02:57 Start Date: 12/Mar/21 02:57 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1946: URL: https://github.com/apache/hive/pull/1946#discussion_r592854564 ## File path: service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java ## @@ -113,43 +113,68 @@ protected void initServer() { // TCP Server server = new TThreadPoolServer(sargs); server.setServerEventHandler(new TServerEventHandler() { + @Override public ServerContext createContext(TProtocol input, TProtocol output) { Metrics metrics = MetricsFactory.getInstance(); if (metrics != null) { -try { - metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS); - metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT); -} catch (Exception e) { - LOG.warn("Error Reporting JDO operation to Metrics system", e); -} +metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS); + metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT); } return new ThriftCLIServerContext(); } +/** + * This is called by the Thrift server when the underlying client + * connection is cleaned up by the server because the connection has + * been closed. + */ @Override public void deleteContext(ServerContext serverContext, TProtocol input, TProtocol output) { Metrics metrics = MetricsFactory.getInstance(); if (metrics != null) { -try { - metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS); -} catch (Exception e) { - LOG.warn("Error Reporting JDO operation to Metrics system", e); -} +metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS); } - ThriftCLIServerContext context = (ThriftCLIServerContext) serverContext; - SessionHandle sessionHandle = context.getSessionHandle(); - if (sessionHandle != null) { -LOG.info("Session disconnected without closing properly. "); + + final ThriftCLIServerContext context = (ThriftCLIServerContext) serverContext; + final Optional sessionHandle = context.getSessionHandle(); + + if (sessionHandle.isPresent()) { +// Normally, the client should politely inform the server it is +// closing its session with Hive before closing its network +// connection. However, if the client connection dies for any reason +// (load-balancer round-robin configuration, firewall kills +// long-running sessions, bad client, failed client, timed-out +// client, etc.) then the server will close the connection without +// having properly cleaned up the Hive session (resources, +// configuration, logging etc.). That needs to be cleaned up now. +LOG.warn( +"Client connection bound to {} unexpectedly closed: closing this Hive session to release its resources. " ++ "The connection processed {} total messages during its lifetime of {}ms. Inspect the client connection " ++ "for time-out, firewall killing the connection, invalid load balancer configuration, etc.", +sessionHandle, context.getMessagesProcessedCount(), context.getDuration().toMillis()); try { - boolean close = cliService.getSessionManager().getSession(sessionHandle).getHiveConf() + final boolean close = cliService.getSessionManager().getSession(sessionHandle.get()).getHiveConf() .getBoolVar(ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT); - LOG.info((close ? "" : "Not ") + "Closing the session: " + sessionHandle); if (close) { -cliService.closeSession(sessionHandle); +cliService.closeSession(sessionHandle.get()); + } else { +LOG.warn("Session not actually closed because configuration {} is set to false", +ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT.varname); } } catch (HiveSQLException e) { - LOG.warn("Failed to close session: " + e, e); + LOG.warn("Failed to close session", e); +} + } else { +// There is no session handle because the client gracefully closed +// the session *or*
[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console
[ https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565071 ] ASF GitHub Bot logged work on HIVE-23779: - Author: ASF GitHub Bot Created on: 12/Mar/21 02:00 Start Date: 12/Mar/21 02:00 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2064: URL: https://github.com/apache/hive/pull/2064#issuecomment-797180213 Might the beeline be flooded with the logs, what can we use the logs for? others lgtm... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 565071) Time Spent: 1h 20m (was: 1h 10m) > BasicStatsTask Info is not getting printed in beeline console > - > > Key: HIVE-23779 > URL: https://issues.apache.org/jira/browse/HIVE-23779 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > After HIVE-16061, partition basic stats are not getting printed in beeline > console. > {code:java} > INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, > totalSize=14607, rawDataSize=0]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
[ https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=565069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565069 ] ASF GitHub Bot logged work on HIVE-24739: - Author: ASF GitHub Bot Created on: 12/Mar/21 01:50 Start Date: 12/Mar/21 01:50 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on a change in pull request #1946: URL: https://github.com/apache/hive/pull/1946#discussion_r592854564 ## File path: service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java ## @@ -113,43 +113,68 @@ protected void initServer() { // TCP Server server = new TThreadPoolServer(sargs); server.setServerEventHandler(new TServerEventHandler() { + @Override public ServerContext createContext(TProtocol input, TProtocol output) { Metrics metrics = MetricsFactory.getInstance(); if (metrics != null) { -try { - metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS); - metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT); -} catch (Exception e) { - LOG.warn("Error Reporting JDO operation to Metrics system", e); -} +metrics.incrementCounter(MetricsConstant.OPEN_CONNECTIONS); + metrics.incrementCounter(MetricsConstant.CUMULATIVE_CONNECTION_COUNT); } return new ThriftCLIServerContext(); } +/** + * This is called by the Thrift server when the underlying client + * connection is cleaned up by the server because the connection has + * been closed. + */ @Override public void deleteContext(ServerContext serverContext, TProtocol input, TProtocol output) { Metrics metrics = MetricsFactory.getInstance(); if (metrics != null) { -try { - metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS); -} catch (Exception e) { - LOG.warn("Error Reporting JDO operation to Metrics system", e); -} +metrics.decrementCounter(MetricsConstant.OPEN_CONNECTIONS); } - ThriftCLIServerContext context = (ThriftCLIServerContext) serverContext; - SessionHandle sessionHandle = context.getSessionHandle(); - if (sessionHandle != null) { -LOG.info("Session disconnected without closing properly. "); + + final ThriftCLIServerContext context = (ThriftCLIServerContext) serverContext; + final Optional sessionHandle = context.getSessionHandle(); + + if (sessionHandle.isPresent()) { +// Normally, the client should politely inform the server it is +// closing its session with Hive before closing its network +// connection. However, if the client connection dies for any reason +// (load-balancer round-robin configuration, firewall kills +// long-running sessions, bad client, failed client, timed-out +// client, etc.) then the server will close the connection without +// having properly cleaned up the Hive session (resources, +// configuration, logging etc.). That needs to be cleaned up now. +LOG.warn( +"Client connection bound to {} unexpectedly closed: closing this Hive session to release its resources. " ++ "The connection processed {} total messages during its lifetime of {}ms. Inspect the client connection " ++ "for time-out, firewall killing the connection, invalid load balancer configuration, etc.", +sessionHandle, context.getMessagesProcessedCount(), context.getDuration().toMillis()); try { - boolean close = cliService.getSessionManager().getSession(sessionHandle).getHiveConf() + final boolean close = cliService.getSessionManager().getSession(sessionHandle.get()).getHiveConf() .getBoolVar(ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT); - LOG.info((close ? "" : "Not ") + "Closing the session: " + sessionHandle); if (close) { -cliService.closeSession(sessionHandle); +cliService.closeSession(sessionHandle.get()); + } else { +LOG.warn("Session not actually closed because configuration {} is set to false", +ConfVars.HIVE_SERVER2_CLOSE_SESSION_ON_DISCONNECT.varname); } } catch (HiveSQLException e) { - LOG.warn("Failed to close session: " + e, e); + LOG.warn("Failed to close session", e); +} + } else { +// There is no session handle because the client gracefully closed +// the session *or*
[jira] [Work logged] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions
[ https://issues.apache.org/jira/browse/HIVE-24201?focusedWorklogId=565054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565054 ] ASF GitHub Bot logged work on HIVE-24201: - Author: ASF GitHub Bot Created on: 12/Mar/21 01:04 Start Date: 12/Mar/21 01:04 Worklog Time Spent: 10m Work Description: Dawn2111 opened a new pull request #2065: URL: https://github.com/apache/hive/pull/2065 ### What changes were proposed in this pull request? Currently, the Workload management move trigger kills the query being moved to a different pool if destination pool does not have enough capacity. This PR introduces a "delayed move" configuration which lets the query run in the source pool as long as possible, if the destination pool is full. It will attempt the move to destination pool only when there is claim upon the source pool. If the destination pool is not full, delayed move behaves as normal move i.e. the move will happen immediately. ### Why are the changes needed? For better utilization of cluster resources. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 565054) Remaining Estimate: 0h Time Spent: 10m > WorkloadManager kills query being moved to different pool if destination pool > does not have enough sessions > --- > > Key: HIVE-24201 > URL: https://issues.apache.org/jira/browse/HIVE-24201 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, llap >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Pritha Dawn >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > To reproduce, create a resource plan with move trigger, like below: > {code:java} > ++ > |line| > ++ > | experiment[status=DISABLED,parallelism=null,defaultPool=default] | > | + default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] | > | | mapped for default | > | + pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] | > | | trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } | > | | mapped for users: abcd | > | + pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] | > | | mapped for users: efgh | > > {code} > Now, run two queries in pool1 and pool2 using different users. The query > running in pool2 will tried to move to pool1 and it will get killed because > pool1 will not have session to handle the query. > Currently, the Workload management move trigger kills the query being moved > to a different pool if destination pool does not have enough capacity. We > could have a "delayed move" configuration which lets the query run in the > source pool as long as possible, if the destination pool is full. It will > attempt the move to destination pool only when there is claim upon the source > pool. If the destination pool is not full, delayed move behaves as normal > move i.e. the move will happen immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions
[ https://issues.apache.org/jira/browse/HIVE-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24201: -- Labels: pull-request-available (was: ) > WorkloadManager kills query being moved to different pool if destination pool > does not have enough sessions > --- > > Key: HIVE-24201 > URL: https://issues.apache.org/jira/browse/HIVE-24201 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, llap >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Pritha Dawn >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > To reproduce, create a resource plan with move trigger, like below: > {code:java} > ++ > |line| > ++ > | experiment[status=DISABLED,parallelism=null,defaultPool=default] | > | + default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] | > | | mapped for default | > | + pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] | > | | trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } | > | | mapped for users: abcd | > | + pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] | > | | mapped for users: efgh | > > {code} > Now, run two queries in pool1 and pool2 using different users. The query > running in pool2 will tried to move to pool1 and it will get killed because > pool1 will not have session to handle the query. > Currently, the Workload management move trigger kills the query being moved > to a different pool if destination pool does not have enough capacity. We > could have a "delayed move" configuration which lets the query run in the > source pool as long as possible, if the destination pool is full. It will > attempt the move to destination pool only when there is claim upon the source > pool. If the destination pool is not full, delayed move behaves as normal > move i.e. the move will happen immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24201) WorkloadManager kills query being moved to different pool if destination pool does not have enough sessions
[ https://issues.apache.org/jira/browse/HIVE-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritha Dawn updated HIVE-24201: --- Description: To reproduce, create a resource plan with move trigger, like below: {code:java} ++ |line| ++ | experiment[status=DISABLED,parallelism=null,defaultPool=default] | | + default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] | | | mapped for default | | + pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] | | | trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } | | | mapped for users: abcd | | + pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] | | | mapped for users: efgh | {code} Now, run two queries in pool1 and pool2 using different users. The query running in pool2 will tried to move to pool1 and it will get killed because pool1 will not have session to handle the query. Currently, the Workload management move trigger kills the query being moved to a different pool if destination pool does not have enough capacity. We could have a "delayed move" configuration which lets the query run in the source pool as long as possible, if the destination pool is full. It will attempt the move to destination pool only when there is claim upon the source pool. If the destination pool is not full, delayed move behaves as normal move i.e. the move will happen immediately. was: To reproduce, create a resource plan with move trigger, like below: {code:java} ++ |line| ++ | experiment[status=DISABLED,parallelism=null,defaultPool=default] | | + default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] | | | mapped for default | | + pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] | | | trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } | | | mapped for users: abcd | | + pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] | | | mapped for users: efgh | {code} Now, run two queries in pool1 and pool2 using different users. The query running in pool2 will tried to move to pool1 and it will get killed because pool1 will not have session to handle the query. Once killed this query needs to be re-run externally. It can be optimized and should be retried to run in destination pool directly(it will get queued and run once the session is alive). > WorkloadManager kills query being moved to different pool if destination pool > does not have enough sessions > --- > > Key: HIVE-24201 > URL: https://issues.apache.org/jira/browse/HIVE-24201 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, llap >Affects Versions: 4.0.0 >Reporter: Adesh Kumar Rao >Assignee: Pritha Dawn >Priority: Minor > > To reproduce, create a resource plan with move trigger, like below: > {code:java} > ++ > |line| > ++ > | experiment[status=DISABLED,parallelism=null,defaultPool=default] | > | + default[allocFraction=0.888,schedulingPolicy=null,parallelism=1] | > | | mapped for default | > | + pool2[allocFraction=0.1,schedulingPolicy=fair,parallelism=1] | > | | trigger t1: if (ELAPSED_TIME > 20) { MOVE TO pool1 } | > | | mapped for users: abcd | > | + pool1[allocFraction=0.012,schedulingPolicy=null,parallelism=1] | > | | mapped for users: efgh | > > {code} > Now, run two queries in pool1 and pool2 using different users. The query > running in pool2 will tried to move to pool1 and it will get killed because > pool1 will not have session to handle the query. > Currently, the Workload management move trigger kills the query being moved > to a different pool if destination pool does not have enough capacity. We > could have a "delayed move" configuration which lets the query run in the > source pool as long as possible, if the destination pool is full. It will > attempt the move to destination pool only when there is claim upon the source > pool. If the destination pool is not full, delayed move behaves as normal > move i.e. the move will happen immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23820) [HS2] Send tableId in request for get_table_request API
[ https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299867#comment-17299867 ] Kishen Das edited comment on HIVE-23820 at 3/12/21, 12:11 AM: -- [~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to pass tableId in get_table_req API ? I was thinking of enhancing getValidWriteIdList to return tableId as well and sending that back in the get_table_req API. We can also cache tableId at the Hive session level when we make get_table_req for the first time in compilation phase and and send it back in get_table_req API in subsequent calls within the same session. was (Author: kishendas): [~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to pass tableId in get_table_req API ? I was thinking of enhancing getValidWriteIdList to return tableId as well and sending that back in the get_table_req API. > [HS2] Send tableId in request for get_table_request API > --- > > Key: HIVE-23820 > URL: https://issues.apache.org/jira/browse/HIVE-23820 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Ashish Sharma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console
[ https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=565013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-565013 ] ASF GitHub Bot logged work on HIVE-23779: - Author: ASF GitHub Bot Created on: 11/Mar/21 23:11 Start Date: 11/Mar/21 23:11 Worklog Time Spent: 10m Work Description: nareshpr opened a new pull request #2064: URL: https://github.com/apache/hive/pull/2064 ### What changes were proposed in this pull request? ETL flow prints Partition Basic Stats in beeline or client console. After HIVE-16061, this stats are not getting printed on the client. ### Why are the changes needed? As there are multiple clients connecting to HS2 & running ETL, it was difficult for end user to search for stats in HS2 logs. This feature will help to know data loaded per partition. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I did manual testing by including this fix in my local cluster & validated client console shows the logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 565013) Time Spent: 1h 10m (was: 1h) > BasicStatsTask Info is not getting printed in beeline console > - > > Key: HIVE-23779 > URL: https://issues.apache.org/jira/browse/HIVE-23779 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > After HIVE-16061, partition basic stats are not getting printed in beeline > console. > {code:java} > INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, > totalSize=14607, rawDataSize=0]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24877) Support X'xxxx' syntax for hexadecimal values like spark & mysql
[ https://issues.apache.org/jira/browse/HIVE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24877: -- Description: Hive is currently not supporting following syntax select x'abc'; {code:java} org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection target at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} Though we have same via hex/unhex built-in UDF's, it's better to have {{X'value'}} and x'{{value'}} syntax support for Hive. [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] [https://mariadb.com/kb/en/hexadecimal-literals/] was: Hive is currently not supporting following syntax select x'abc'; {code:java} org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection target at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} Though we have same via hex/unhex built-in UDF's, it's better to have {{X'value'}} and x'{{value'}} syntax support for Hive. [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] [https://mariadb.com/kb/en/hexadecimal-literals/] > Support X'' syntax for hexadecimal values like spark & mysql > > > Key: HIVE-24877 > URL: https://issues.apache.org/jira/browse/HIVE-24877 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Minor > > Hive is currently not supporting following syntax > select x'abc'; > {code:java} > org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize > input near 'x' ''abc'' '' in selection target > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) > at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) > at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) > at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} > Though we have same via hex/unhex built-in UDF's, it's better to have > {{X'value'}} and x'{{value'}} syntax support for Hive. > [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] > [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] > [https://mariadb.com/kb/en/hexadecimal-literals/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24877) Support X'xxxx' syntax for hexadecimal values like spark & mysql
[ https://issues.apache.org/jira/browse/HIVE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24877: -- Description: Hive is currently not supporting following syntax select x'abc'; {code:java} org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection target at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} Though we have same via hex/unhex built-in UDF's, it's better to have {{X'value'}} and x'{{value'}} syntax support for Hive. [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] [https://mariadb.com/kb/en/hexadecimal-literals/] was: Hive is currently not supporting following syntax select x'abc'; {code:java} org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''abc'' '' in selection target org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize input near 'x' ''31FECC'' '' in selection target at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} Though we have same via hex/unhex built-in UDF's, it's better to have {{X'value'}} and x'{{value'}} syntax support for Hive. [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] https://mariadb.com/kb/en/hexadecimal-literals/ > Support X'' syntax for hexadecimal values like spark & mysql > > > Key: HIVE-24877 > URL: https://issues.apache.org/jira/browse/HIVE-24877 > Project: Hive > Issue Type: New Feature >Reporter: Naresh P R >Priority: Minor > > Hive is currently not supporting following syntax > select x'abc'; > {code:java} > org.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot recognize > input near 'x' ''abc'' '' in selection > targetorg.apache.hadoop.hive.ql.parse.ParseException: line 2:8 cannot > recognize input near 'x' ''abc'' '' in selection target at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:93) at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:85) at > org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:169) at > org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:102) at > org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) at > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445){code} > Though we have same via hex/unhex built-in UDF's, it's better to have > {{X'value'}} and x'{{value'}} syntax support for Hive. > [https://spark.apache.org/docs/latest/sql-ref-literals.html#binary-literal] > [https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex] > [https://mariadb.com/kb/en/hexadecimal-literals/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24828) [HMS] Provide new HMS API to return latest committed compaction record for a given table
[ https://issues.apache.org/jira/browse/HIVE-24828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24828 started by Yu-Wen Lai. - > [HMS] Provide new HMS API to return latest committed compaction record for a > given table > > > Key: HIVE-24828 > URL: https://issues.apache.org/jira/browse/HIVE-24828 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Yu-Wen Lai >Priority: Major > > We need a new HMS API to return the latest committed compaction record for a > given table. This can be used by a remote cache to decide whether a given > table's file metadata has been compacted or not, in order to decide whether > file metadata has to be refreshed from the file system before serving or it > can serve the current data from the cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23820) [HS2] Send tableId in request for get_table_request API
[ https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-23820: - Assignee: Ashish Sharma (was: Kishen Das) > [HS2] Send tableId in request for get_table_request API > --- > > Key: HIVE-23820 > URL: https://issues.apache.org/jira/browse/HIVE-23820 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Ashish Sharma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23820) [HS2] Send tableId in request for get_table_request API
[ https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299867#comment-17299867 ] Kishen Das commented on HIVE-23820: --- [~ashish-kumar-sharma] Sure, you can work on this. Btw how are you planning to pass tableId in get_table_req API ? I was thinking of enhancing getValidWriteIdList to return tableId as well and sending that back in the get_table_req API. > [HS2] Send tableId in request for get_table_request API > --- > > Key: HIVE-23820 > URL: https://issues.apache.org/jira/browse/HIVE-23820 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users
[ https://issues.apache.org/jira/browse/HIVE-24876?focusedWorklogId=564869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564869 ] ASF GitHub Bot logged work on HIVE-24876: - Author: ASF GitHub Bot Created on: 11/Mar/21 19:26 Start Date: 11/Mar/21 19:26 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request #2063: URL: https://github.com/apache/hive/pull/2063 …'t belong to admin role ### What changes were proposed in this pull request? Disbale the logger configuration page for non-admin users. ### Why are the changes needed? Otherwise normal users can flood log files with unrequired information. ### Does this PR introduce _any_ user-facing change? Yes. If a user needs to access this log config page, he should be configured as admin in hive-site.xml with the config hive.user.in.admin.role property can have comma separated values. hive.user.in.admin.role bob,adam ### How was this patch tested? Local machine. Remote cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564869) Remaining Estimate: 0h Time Spent: 10m > Disable /longconf.jsp page on HS2 web UI for non admin users > > > Key: HIVE-24876 > URL: https://issues.apache.org/jira/browse/HIVE-24876 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > /logconf.jsp page should be disabled to the users that are not in admin > roles. Otherwise, any user can flood the log files with different log levels > that can be configured on HS2 web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users
[ https://issues.apache.org/jira/browse/HIVE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24876: -- Labels: pull-request-available (was: ) > Disable /longconf.jsp page on HS2 web UI for non admin users > > > Key: HIVE-24876 > URL: https://issues.apache.org/jira/browse/HIVE-24876 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > /logconf.jsp page should be disabled to the users that are not in admin > roles. Otherwise, any user can flood the log files with different log levels > that can be configured on HS2 web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users
[ https://issues.apache.org/jira/browse/HIVE-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-24876: > Disable /longconf.jsp page on HS2 web UI for non admin users > > > Key: HIVE-24876 > URL: https://issues.apache.org/jira/browse/HIVE-24876 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > /logconf.jsp page should be disabled to the users that are not in admin > roles. Otherwise, any user can flood the log files with different log levels > that can be configured on HS2 web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value
[ https://issues.apache.org/jira/browse/HIVE-24865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-24865: --- Fix Version/s: 4.0.0 > Implement Respect/Ignore Nulls in first/last_value > -- > > Key: HIVE-24865 > URL: https://issues.apache.org/jira/browse/HIVE-24865 > Project: Hive > Issue Type: Improvement > Components: Parser, UDF >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > ::= > RESPECT NULLS | IGNORE NULLS > ::= > [ treatment> > ] > ::= > FIRST_VALUE | LAST_VALUE > {code} > Example: > {code:java} > select last_value(b) ignore nulls over(partition by a order by b) from t1; > {code} > Existing non-standard implementation: > {code:java} > select last_value(b, true) over(partition by a order by b) from t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value
[ https://issues.apache.org/jira/browse/HIVE-24865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24865. --- Resolution: Fixed Pushed to master. Thanks [~jcamachorodriguez] for review. > Implement Respect/Ignore Nulls in first/last_value > -- > > Key: HIVE-24865 > URL: https://issues.apache.org/jira/browse/HIVE-24865 > Project: Hive > Issue Type: Improvement > Components: Parser, UDF >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > ::= > RESPECT NULLS | IGNORE NULLS > ::= > [ treatment> > ] > ::= > FIRST_VALUE | LAST_VALUE > {code} > Example: > {code:java} > select last_value(b) ignore nulls over(partition by a order by b) from t1; > {code} > Existing non-standard implementation: > {code:java} > select last_value(b, true) over(partition by a order by b) from t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24865) Implement Respect/Ignore Nulls in first/last_value
[ https://issues.apache.org/jira/browse/HIVE-24865?focusedWorklogId=564780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564780 ] ASF GitHub Bot logged work on HIVE-24865: - Author: ASF GitHub Bot Created on: 11/Mar/21 17:40 Start Date: 11/Mar/21 17:40 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2060: URL: https://github.com/apache/hive/pull/2060 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564780) Time Spent: 1h 10m (was: 1h) > Implement Respect/Ignore Nulls in first/last_value > -- > > Key: HIVE-24865 > URL: https://issues.apache.org/jira/browse/HIVE-24865 > Project: Hive > Issue Type: Improvement > Components: Parser, UDF >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > ::= > RESPECT NULLS | IGNORE NULLS > ::= > [ treatment> > ] > ::= > FIRST_VALUE | LAST_VALUE > {code} > Example: > {code:java} > select last_value(b) ignore nulls over(partition by a order by b) from t1; > {code} > Existing non-standard implementation: > {code:java} > select last_value(b, true) over(partition by a order by b) from t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24825) Create AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga updated HIVE-24825: --- Fix Version/s: 4.0.0 > Create AcidMetricsService > - > > Key: HIVE-24825 > URL: https://issues.apache.org/jira/browse/HIVE-24825 > Project: Hive > Issue Type: Sub-task >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Fix For: 4.0.0 > > > Create a new service in HMS, that will collect and publish JMX metrics about > ACID related processes and metadata. > * There should be a subconfig other than METRICS_ENABLED for acid metrics > * The collection frequency should be configurable > * The existing oldest initiated compaction and the number of compactions in > different statuses metrics collection should be moved here from Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24824) Define metrics for compaction observability
[ https://issues.apache.org/jira/browse/HIVE-24824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24824 started by Peter Varga. -- > Define metrics for compaction observability > --- > > Key: HIVE-24824 > URL: https://issues.apache.org/jira/browse/HIVE-24824 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many times if there are failures in the Compaction background processes > (Initiator, Worker, Cleaner) it is hard notice the problem until it causes > serious performance degradation. > We should create new JMX metrics, that would make it easier to monitor the > compaction health. Examples are: > * number of failed / initiated compaction > * number of aborted txns, oldest aborted txns > * tables with disabled compactions and writes > * Initiator and Cleaner cycle runtime > * Size of ACID metadata tables that should have ~ constant rows > (txn_to_writeId, completed_txns) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24825) Create AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24825 started by Peter Varga. -- > Create AcidMetricsService > - > > Key: HIVE-24825 > URL: https://issues.apache.org/jira/browse/HIVE-24825 > Project: Hive > Issue Type: Sub-task >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > > Create a new service in HMS, that will collect and publish JMX metrics about > ACID related processes and metadata. > * There should be a subconfig other than METRICS_ENABLED for acid metrics > * The collection frequency should be configurable > * The existing oldest initiated compaction and the number of compactions in > different statuses metrics collection should be moved here from Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24825) Create AcidMetricsService
[ https://issues.apache.org/jira/browse/HIVE-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga resolved HIVE-24825. Resolution: Fixed > Create AcidMetricsService > - > > Key: HIVE-24825 > URL: https://issues.apache.org/jira/browse/HIVE-24825 > Project: Hive > Issue Type: Sub-task >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > > Create a new service in HMS, that will collect and publish JMX metrics about > ACID related processes and metadata. > * There should be a subconfig other than METRICS_ENABLED for acid metrics > * The collection frequency should be configurable > * The existing oldest initiated compaction and the number of compactions in > different statuses metrics collection should be moved here from Initiator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
[ https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564737 ] ASF GitHub Bot logged work on HIVE-24758: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:50 Start Date: 11/Mar/21 16:50 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1963: URL: https://github.com/apache/hive/pull/1963#discussion_r592527788 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ## @@ -253,7 +259,7 @@ public int execute() { counters = mergedCounters; } catch (Exception err) { // Don't fail execution due to counters - just don't print summary info - LOG.warn("Failed to get counters. Ignoring, summary info will be incomplete. " + err, err); + LOG.warn("Failed to get counters. Ignoring, summary info will be incomplete.", err); Review comment: Missing {} here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564737) Time Spent: 2h 10m (was: 2h) > Log Tez Task DAG ID, DAG Session ID, HS2 Hostname > - > > Key: HIVE-24758 > URL: https://issues.apache.org/jira/browse/HIVE-24758 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > In order to get the logs for a particular query, submitted to Tez on YARN, > the following pieces of information are required: > * YARN Application ID > * TEZ DAG ID > * HS2 Host that ran the job > Include this information in TezTask output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24824) Define metrics for compaction observability
[ https://issues.apache.org/jira/browse/HIVE-24824?focusedWorklogId=564735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564735 ] ASF GitHub Bot logged work on HIVE-24824: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:48 Start Date: 11/Mar/21 16:48 Worklog Time Spent: 10m Work Description: pvargacl merged pull request #2016: URL: https://github.com/apache/hive/pull/2016 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564735) Time Spent: 1h 20m (was: 1h 10m) > Define metrics for compaction observability > --- > > Key: HIVE-24824 > URL: https://issues.apache.org/jira/browse/HIVE-24824 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many times if there are failures in the Compaction background processes > (Initiator, Worker, Cleaner) it is hard notice the problem until it causes > serious performance degradation. > We should create new JMX metrics, that would make it easier to monitor the > compaction health. Examples are: > * number of failed / initiated compaction > * number of aborted txns, oldest aborted txns > * tables with disabled compactions and writes > * Initiator and Cleaner cycle runtime > * Size of ACID metadata tables that should have ~ constant rows > (txn_to_writeId, completed_txns) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
[ https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564717 ] ASF GitHub Bot logged work on HIVE-24758: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:25 Start Date: 11/Mar/21 16:25 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1963: URL: https://github.com/apache/hive/pull/1963#discussion_r592506951 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ## @@ -236,6 +239,10 @@ public int execute() { throw new HiveException("Operation cancelled"); } +// Log all the info required to find the various logs for this query +LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session ID: [{}]", getHostNameIP(), queryId, Review comment: @pgaref I at least re-used the `hive-common` package feature here. Please review once more. Let not the perfect be the enemy of the good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564717) Time Spent: 2h (was: 1h 50m) > Log Tez Task DAG ID, DAG Session ID, HS2 Hostname > - > > Key: HIVE-24758 > URL: https://issues.apache.org/jira/browse/HIVE-24758 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In order to get the logs for a particular query, submitted to Tez on YARN, > the following pieces of information are required: > * YARN Application ID > * TEZ DAG ID > * HS2 Host that ran the job > Include this information in TezTask output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
[ https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=564710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564710 ] ASF GitHub Bot logged work on HIVE-24758: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:18 Start Date: 11/Mar/21 16:18 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1963: URL: https://github.com/apache/hive/pull/1963#discussion_r592501626 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ## @@ -236,6 +239,10 @@ public int execute() { throw new HiveException("Operation cancelled"); } +// Log all the info required to find the various logs for this query +LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session ID: [{}]", getHostNameIP(), queryId, Review comment: I'd hate to tie anything to the session state. ```java // Need to remove this static hack. But this is the way currently to get a session. SessionState ss = SessionState.get(); ``` Let me see what else we can do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564710) Time Spent: 1h 50m (was: 1h 40m) > Log Tez Task DAG ID, DAG Session ID, HS2 Hostname > - > > Key: HIVE-24758 > URL: https://issues.apache.org/jira/browse/HIVE-24758 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In order to get the logs for a particular query, submitted to Tez on YARN, > the following pieces of information are required: > * YARN Application ID > * TEZ DAG ID > * HS2 Host that ran the job > Include this information in TezTask output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation
[ https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564706 ] ASF GitHub Bot logged work on HIVE-24445: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:16 Start Date: 11/Mar/21 16:16 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #2020: URL: https://github.com/apache/hive/pull/2020#discussion_r592499445 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -2213,9 +2213,9 @@ private void create_table_core(final RawStore ms, final CreateTableRequest req) } if (!TableType.VIRTUAL_VIEW.toString().equals(tbl.getTableType())) { -if (tbl.getSd().getLocation() == null -|| tbl.getSd().getLocation().isEmpty()) { - tblPath = wh.getDefaultTablePath(db, tbl); +if (tbl.getSd().getLocation() == null || tbl.getSd().getLocation().isEmpty()) { Review comment: I think we do, just tried it on a cluster, created a transactional table with a custom location and everything works as expected (insert, read, compaction...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564706) Time Spent: 3h 40m (was: 3.5h) > Non blocking DROP table implementation > -- > > Key: HIVE-24445 > URL: https://issues.apache.org/jira/browse/HIVE-24445 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Implement a way to execute drop table operations in a way that doesn't have > to wait for currently running read operations to be finished. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
[ https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=564704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564704 ] ASF GitHub Bot logged work on HIVE-24739: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:14 Start Date: 11/Mar/21 16:14 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1946: URL: https://github.com/apache/hive/pull/1946#issuecomment-796853271 @pvary Made requested change. Please review. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564704) Time Spent: 7h 50m (was: 7h 40m) > Clarify Usage of Thrift TServerEventHandler and Count Number of Messages > Processed > -- > > Key: HIVE-24739 > URL: https://issues.apache.org/jira/browse/HIVE-24739 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > > Make the messages emitted from {{TServerEventHandler}} more meaningful. > Also, track the number of messages that each client sends to aid in > troubleshooting. > I run into this issue all the time with and this would greatly help clarify > the logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24832) Remove Spring Artifacts from Log4j Properties Files
[ https://issues.apache.org/jira/browse/HIVE-24832?focusedWorklogId=564700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564700 ] ASF GitHub Bot logged work on HIVE-24832: - Author: ASF GitHub Bot Created on: 11/Mar/21 16:09 Start Date: 11/Mar/21 16:09 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2023: URL: https://github.com/apache/hive/pull/2023#issuecomment-796850115 @miklosgergely @pvary Review please? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564700) Time Spent: 20m (was: 10m) > Remove Spring Artifacts from Log4j Properties Files > --- > > Key: HIVE-24832 > URL: https://issues.apache.org/jira/browse/HIVE-24832 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Getting a warning about a bad FILE logger and it looks like it's coming from > some antiquated copy & paste code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564653 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:57 Start Date: 11/Mar/21 14:57 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592430006 ## File path: iceberg-handler/pom.xml ## @@ -0,0 +1,189 @@ + +http://maven.apache.org/POM/4.0.0"; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +iceberg-handler +jar +Hive Iceberg Handler + + +.. +0.11.0 +4.0.2 +1.9.2 + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + +org.apache.hive +hive-exec +${project.version} + + Review comment: Makes sense. As discussed offline, I've excluded orc, parquet, avro, guava and fasterxml only This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564653) Time Spent: 2h 50m (was: 2h 40m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564652 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:55 Start Date: 11/Mar/21 14:55 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592428680 ## File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java ## @@ -190,6 +190,7 @@ private static int daysFromTimestamp(Timestamp timestamp) { // We have to use the LocalDateTime to get the micros. See the comment above. private static long microsFromTimestamp(Timestamp timestamp) { // `org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()` +// since HIVE-21862 changes literal parsing to UTC based timestamps Review comment: You're right, removed it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564652) Time Spent: 2h 40m (was: 2.5h) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564651 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:55 Start Date: 11/Mar/21 14:55 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592428376 ## File path: iceberg-handler/pom.xml ## @@ -16,41 +16,46 @@ .. -0.11.0 4.0.2 -1.9.2 +1.9.2 Review comment: Good idea, done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564651) Time Spent: 2.5h (was: 2h 20m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564649 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:55 Start Date: 11/Mar/21 14:55 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592428189 ## File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java ## @@ -230,7 +231,8 @@ public void testDateType() { public void testTimestampType() { Literal timestampLiteral = Literal.of("2012-10-02T05:16:17.123456").to(Types.TimestampType.withoutZone()); long timestampMicros = timestampLiteral.value(); -Timestamp ts = Timestamp.valueOf(DateTimeUtil.timestampFromMicros(timestampMicros)); +// `org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()` +Timestamp ts = Timestamp.from(DateTimeUtil.timestampFromMicros(timestampMicros).toInstant(ZoneOffset.UTC)); Review comment: Removed it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564649) Time Spent: 2h 20m (was: 2h 10m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data
[ https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=564628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564628 ] ASF GitHub Bot logged work on HIVE-24718: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:25 Start Date: 11/Mar/21 14:25 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1936: URL: https://github.com/apache/hive/pull/1936#discussion_r592395172 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java ## @@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws Throwable { .verifyResult(inc2Tuple.lastReplicationId); } - private void assertFalseExternalFileList(Path externalTableFileList) - throws IOException { + private void assertFalseExternalFileList(String dumpLocation) Review comment: Can you please move this method to ReplicationTestUtils itself. We have duplicate code for this method on two classes. ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTablesMetaDataOnly.java ## @@ -639,9 +629,11 @@ public void testIncrementalDumpEmptyDumpDirectory() throws Throwable { .verifyResult(inc2Tuple.lastReplicationId); } - private void assertFalseExternalFileList(Path externalTableFileList) - throws IOException { + private void assertFalseExternalFileList(String dumpLocation) + throws IOException { Review comment: I think it doesn't throw IOException ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java ## @@ -2225,17 +2224,11 @@ private void setupUDFJarOnHDFS(Path identityUdfLocalPath, Path identityUdfHdfsPa /* * Method used from TestReplicationScenariosExclusiveReplica */ - private void assertExternalFileInfo(List expected, String dumplocation, boolean isIncremental, + private void assertExternalFileList(List expected, String dumplocation, WarehouseInstance warehouseInstance) throws IOException { Path hivePath = new Path(dumplocation, ReplUtils.REPL_HIVE_BASE_DIR); -Path metadataPath = new Path(hivePath, EximUtil.METADATA_PATH_NAME); -Path externalTableInfoFile; -if (isIncremental) { - externalTableInfoFile = new Path(hivePath, FILE_NAME); -} else { - externalTableInfoFile = new Path(metadataPath, primaryDbName.toLowerCase() + File.separator + FILE_NAME); -} -ReplicationTestUtils.assertExternalFileInfo(warehouseInstance, expected, externalTableInfoFile); +Path externalTblFileList = new Path(hivePath, EximUtil.FILE_LIST_EXTERNAL); Review comment: This is still not addressed. The same method(and code) is defined in three classes. assertExternalFileList. And essentially they aren't doing more than the path formation. As discussed, can we not use the ReplicationTestUtils.assertExternalFileList directly by modifying the signature a bit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564628) Time Spent: 6h (was: 5h 50m) > Moving to file based iteration for copying data > --- > > Key: HIVE-24718 > URL: https://issues.apache.org/jira/browse/HIVE-24718 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24718.01.patch, HIVE-24718.02.patch, > HIVE-24718.04.patch, HIVE-24718.05.patch, HIVE-24718.06.patch > > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation
[ https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564619 ] ASF GitHub Bot logged work on HIVE-24445: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:12 Start Date: 11/Mar/21 14:12 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2020: URL: https://github.com/apache/hive/pull/2020#discussion_r592392273 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -1129,14 +1129,17 @@ public void createTable(Table tbl, boolean ifNotExists, principalPrivs.setRolePrivileges(grants.getRoleGrants()); tTbl.setPrivileges(principalPrivs); } +if (HiveConf.getBoolVar(conf, ConfVars.HIVE_TXN_LOCKLESS_READS_ENABLED) && AcidUtils.isTransactionalTable(tbl)) { Review comment: i don't think it would be a problem, but i'll double check. Thing is that I am indirectly passing HIVE_TXN_LOCKLESS_READS_ENABLED config value to HMS via the txnId attribute. If set - HIVE_TXN_LOCKLESS_READS_ENABLED was enabled. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564619) Time Spent: 3h 10m (was: 3h) > Non blocking DROP table implementation > -- > > Key: HIVE-24445 > URL: https://issues.apache.org/jira/browse/HIVE-24445 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > Implement a way to execute drop table operations in a way that doesn't have > to wait for currently running read operations to be finished. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation
[ https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564623 ] ASF GitHub Bot logged work on HIVE-24445: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:17 Start Date: 11/Mar/21 14:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2020: URL: https://github.com/apache/hive/pull/2020#discussion_r592395791 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -2213,9 +2213,9 @@ private void create_table_core(final RawStore ms, final CreateTableRequest req) } if (!TableType.VIRTUAL_VIEW.toString().equals(tbl.getTableType())) { -if (tbl.getSd().getLocation() == null -|| tbl.getSd().getLocation().isEmpty()) { - tblPath = wh.getDefaultTablePath(db, tbl); +if (tbl.getSd().getLocation() == null || tbl.getSd().getLocation().isEmpty()) { Review comment: i don't think we support custom location for acid managed tables, do we? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564623) Time Spent: 3.5h (was: 3h 20m) > Non blocking DROP table implementation > -- > > Key: HIVE-24445 > URL: https://issues.apache.org/jira/browse/HIVE-24445 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Implement a way to execute drop table operations in a way that doesn't have > to wait for currently running read operations to be finished. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation
[ https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564622 ] ASF GitHub Bot logged work on HIVE-24445: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:14 Start Date: 11/Mar/21 14:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2020: URL: https://github.com/apache/hive/pull/2020#discussion_r592393806 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2997,6 +2997,9 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false, "Enables read-only transaction classification and related optimizations"), +HIVE_TXN_LOCKLESS_READS_ENABLED("hive.txn.lockless.reads.enabled", false, Review comment: makes sense. i'll create a separate config for async drop, however HIVE_TXN_LOCKLESS_READS_ENABLED could be still leveraged to enable whole lockless read feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564622) Time Spent: 3h 20m (was: 3h 10m) > Non blocking DROP table implementation > -- > > Key: HIVE-24445 > URL: https://issues.apache.org/jira/browse/HIVE-24445 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Implement a way to execute drop table operations in a way that doesn't have > to wait for currently running read operations to be finished. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24445) Non blocking DROP table implementation
[ https://issues.apache.org/jira/browse/HIVE-24445?focusedWorklogId=564616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564616 ] ASF GitHub Bot logged work on HIVE-24445: - Author: ASF GitHub Bot Created on: 11/Mar/21 14:02 Start Date: 11/Mar/21 14:02 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2020: URL: https://github.com/apache/hive/pull/2020#discussion_r592384397 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -2863,6 +2862,10 @@ private boolean drop_table_core(final RawStore ms, final String catName, final S deletePartitionData(partPaths, ifPurge, ReplChangeManager.shouldEnableCm(db, tbl)); // Delete the data in the table deleteTableData(tblPath, ifPurge, ReplChangeManager.shouldEnableCm(db, tbl)); + } else if (TxnUtils.isTransactionalTable(tbl)) { +CompactionRequest rqst = new CompactionRequest(dbname, name, CompactionType.MAJOR); Review comment: Yeah, that's just a placeholder for marking table as "ready for cleaning". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564616) Time Spent: 3h (was: 2h 50m) > Non blocking DROP table implementation > -- > > Key: HIVE-24445 > URL: https://issues.apache.org/jira/browse/HIVE-24445 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Implement a way to execute drop table operations in a way that doesn't have > to wait for currently running read operations to be finished. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported
[ https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24873: Description: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} Reducer 2 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT ... Reducer 8 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT | {code} The interesting part is: {code} explain vectorization detail select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date; {code} the same applies to query63: {code} ... ,avg(sum(ss_sales_price)) over (partition by i_manager_id) avg_monthly_sales ... {code} was: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} Reducer 2 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT ... Reducer 8 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT
[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported
[ https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24873: Description: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} Reducer 2 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT ... Reducer 8 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT | {code} The interesting part is: {code} explain vectorization detail select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date; {code} was: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} Reducer 2 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT ... Reducer 8 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT | {code} > TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is > supported >
[jira] [Work logged] (HIVE-24817) "not in" clause returns incorrect data when there is coercion
[ https://issues.apache.org/jira/browse/HIVE-24817?focusedWorklogId=564608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564608 ] ASF GitHub Bot logged work on HIVE-24817: - Author: ASF GitHub Bot Created on: 11/Mar/21 13:49 Start Date: 11/Mar/21 13:49 Worklog Time Spent: 10m Work Description: scarlin-cloudera commented on a change in pull request #2027: URL: https://github.com/apache/hive/pull/2027#discussion_r592373912 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java ## @@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node, T columnDesc = children.get(0); T valueDesc = interpretNode(columnDesc, children.get(i)); if (valueDesc == null) { - if (hasNullValue) { -// Skip if null value has already been added -continue; - } - TypeInfo targetType = exprFactory.getTypeInfo(columnDesc); + // Keep original + TypeInfo targetType = exprFactory.getTypeInfo(children.get(i)); if (!expressions.containsKey(targetType)) { expressions.put(targetType, columnDesc); } - T nullConst = exprFactory.createConstantExpr(targetType, null); - expressions.put(targetType, nullConst); - hasNullValue = true; + expressions.put(targetType, children.get(i)); } else { Review comment: Yeah, not in (null) is a very weird construct. Equivalent to why you need to say "x IS NULL" and you can't say "x = NULL" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564608) Time Spent: 2h 10m (was: 2h) > "not in" clause returns incorrect data when there is coercion > - > > Key: HIVE-24817 > URL: https://issues.apache.org/jira/browse/HIVE-24817 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > When the query has a where clause that has an integer column checking against > being "not in" a decimal column, the decimal column is being changed to null, > causing incorrect results. > This is a sample query of a failure: > select count(*) from my_tbl where int_col not in (355.8); > Since the int_col can never be 355.8, one would expect all the rows to be > returned, but it is changing the 355.8 into a null value causing no rows to > be returned. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: reduce-shuffle is supported
[ https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24873: Description: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} Reducer 2 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT ... Reducer 8 notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported window functions: window function: GenericUDAFSumHiveDecimal window frame: ROWS PRECEDING(MAX)~CURRENT | {code} was: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} | Reducer 2 | ... | notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported | {code} > TPCDS query51 doesn't vectorize: reduce-shuffle is supported > - > > Key: HIVE-24873 > URL: https://issues.apache.org/jira/browse/HIVE-24873 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > {code} > EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( > select > ws_item_sk item_sk, d_date, > sum(sum(ws_sales_price)) > over (partition by ws_item_sk order by d_date rows between unbounded > preceding and current row) cume_sales > from web_sales > ,date_dim > where ws_sold_date_sk=d_date_sk > and d_month_seq between 1214 and 1214+11 > and ws_item_sk is not NULL > group by ws_item_sk, d_date),
[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported
[ https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24873: Summary: TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported (was: TPCDS query51 doesn't vectorize: reduce-shuffle is supported) > TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is > supported > -- > > Key: HIVE-24873 > URL: https://issues.apache.org/jira/browse/HIVE-24873 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > {code} > EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( > select > ws_item_sk item_sk, d_date, > sum(sum(ws_sales_price)) > over (partition by ws_item_sk order by d_date rows between unbounded > preceding and current row) cume_sales > from web_sales > ,date_dim > where ws_sold_date_sk=d_date_sk > and d_month_seq between 1214 and 1214+11 > and ws_item_sk is not NULL > group by ws_item_sk, d_date), > store_v1 as ( > select > ss_item_sk item_sk, d_date, > sum(sum(ss_sales_price)) > over (partition by ss_item_sk order by d_date rows between unbounded > preceding and current row) cume_sales > from store_sales > ,date_dim > where ss_sold_date_sk=d_date_sk > and d_month_seq between 1214 and 1214+11 > and ss_item_sk is not NULL > group by ss_item_sk, d_date) > select * > from (select item_sk > ,d_date > ,web_sales > ,store_sales > ,max(web_sales) > over (partition by item_sk order by d_date rows between unbounded > preceding and current row) web_cumulative > ,max(store_sales) > over (partition by item_sk order by d_date rows between unbounded > preceding and current row) store_cumulative > from (select case when web.item_sk is not null then web.item_sk else > store.item_sk end item_sk > ,case when web.d_date is not null then web.d_date else > store.d_date end d_date > ,web.cume_sales web_sales > ,store.cume_sales store_sales >from web_v1 web full outer join store_v1 store on (web.item_sk = > store.item_sk > and web.d_date = > store.d_date) > )x )y > where web_cumulative > store_cumulative > order by item_sk > ,d_date > limit 100; > {code} > {code} > Reducer 2 > notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is > supported > window functions: > window function: GenericUDAFSumHiveDecimal > window frame: ROWS PRECEDING(MAX)~CURRENT > ... > Reducer 8 > notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is > supported > window functions: > window function: GenericUDAFSumHiveDecimal > window frame: ROWS PRECEDING(MAX)~CURRENT | > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24874) Worker performance metric
[ https://issues.apache.org/jira/browse/HIVE-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-24874: -- Description: Wrap Compaction Worker with PerformanceLogger. Major and Minor compactions should be measured to separate metrics. > Worker performance metric > - > > Key: HIVE-24874 > URL: https://issues.apache.org/jira/browse/HIVE-24874 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > > Wrap Compaction Worker with PerformanceLogger. > Major and Minor compactions should be measured to separate metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24873) TPCDS query51 doesn't vectorize: reduce-shuffle is supported
[ https://issues.apache.org/jira/browse/HIVE-24873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24873: Description: {code} EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1214 and 1214+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date limit 100; {code} {code} | Reducer 2 | ... | notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported | {code} > TPCDS query51 doesn't vectorize: reduce-shuffle is supported > - > > Key: HIVE-24873 > URL: https://issues.apache.org/jira/browse/HIVE-24873 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > {code} > EXPLAIN VECTORIZATION DETAIL WITH web_v1 as ( > select > ws_item_sk item_sk, d_date, > sum(sum(ws_sales_price)) > over (partition by ws_item_sk order by d_date rows between unbounded > preceding and current row) cume_sales > from web_sales > ,date_dim > where ws_sold_date_sk=d_date_sk > and d_month_seq between 1214 and 1214+11 > and ws_item_sk is not NULL > group by ws_item_sk, d_date), > store_v1 as ( > select > ss_item_sk item_sk, d_date, > sum(sum(ss_sales_price)) > over (partition by ss_item_sk order by d_date rows between unbounded > preceding and current row) cume_sales > from store_sales > ,date_dim > where ss_sold_date_sk=d_date_sk > and d_month_seq between 1214 and 1214+11 > and ss_item_sk is not NULL > group by ss_item_sk, d_date) > select * > from (select item_sk > ,d_date > ,web_sales > ,store_sales > ,max(web_sales) > over (partition by item_sk order by d_date rows between unbounded > preceding and current row) web_cumulative > ,max(store_sales) > over (partition by item_sk order by d_date rows between unbounded > preceding and current row) store_cumulative > from (select case when web.item_sk is not null then web.item_sk else > store.item_sk end item_sk > ,case when web.d_date is not null then web.d_date else > store.d_date end d_date > ,web.cume_sales web_sales > ,store.cume_sales store_sales >from web_v1 web full outer join store_v1 store on (web.item_sk = > store.item_sk > and web.d_date = > store.d_date) > )x )y > where web_cumulative > store_cumulative > order by item_sk > ,d_date > limit 100; > {code} > {code} > | Reducer 2 | > ... > | notVectorizedReason: PTF operator: Only PTF directly under > reduce-shuffle is supported | > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24761) Support vectorization for bounded windows in PTF
[ https://issues.apache.org/jira/browse/HIVE-24761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24761: Parent: HIVE-24872 Issue Type: Sub-task (was: Improvement) > Support vectorization for bounded windows in PTF > > > Key: HIVE-24761 > URL: https://issues.apache.org/jira/browse/HIVE-24761 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > {code} > notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is > supported > {code} > Currently, bounded windows are not supported in VectorPTFOperator. If we > simply remove the check compile-time: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911 > {code} > if (!windowFrameDef.isStartUnbounded()) { > setOperatorIssue(functionName + " only UNBOUNDED start frame is > supported"); > return false; > } > {code} > We get incorrect results, that's because vectorized codepath completely > ignores boundaries, and simply iterates through all the input batches in > [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]: > {code} > for (VectorPTFEvaluatorBase evaluator : evaluators) { > evaluator.evaluateGroupBatch(batch); > if (isLastGroupBatch) { > evaluator.doLastBatchWork(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24824) Define metrics for compaction observability
[ https://issues.apache.org/jira/browse/HIVE-24824?focusedWorklogId=564567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564567 ] ASF GitHub Bot logged work on HIVE-24824: - Author: ASF GitHub Bot Created on: 11/Mar/21 12:33 Start Date: 11/Mar/21 12:33 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #2016: URL: https://github.com/apache/hive/pull/2016#discussion_r592322560 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java ## @@ -865,7 +865,7 @@ public void processCompactionCandidatesInParallel() throws Exception { } @Test - public void testInitiatorMetricsEnabled() throws Exception { + public void testAcidMetricsEnabled() throws Exception { Review comment: Moved all the test to a common place TestCompactionMetrics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564567) Time Spent: 1h 10m (was: 1h) > Define metrics for compaction observability > --- > > Key: HIVE-24824 > URL: https://issues.apache.org/jira/browse/HIVE-24824 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Many times if there are failures in the Compaction background processes > (Initiator, Worker, Cleaner) it is hard notice the problem until it causes > serious performance degradation. > We should create new JMX metrics, that would make it easier to monitor the > compaction health. Examples are: > * number of failed / initiated compaction > * number of aborted txns, oldest aborted txns > * tables with disabled compactions and writes > * Initiator and Cleaner cycle runtime > * Size of ACID metadata tables that should have ~ constant rows > (txn_to_writeId, completed_txns) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564560 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 12:16 Start Date: 11/Mar/21 12:16 Worklog Time Spent: 10m Work Description: pvary commented on pull request #2058: URL: https://github.com/apache/hive/pull/2058#issuecomment-796694558 Checked the relevant files, left a few comments This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564560) Time Spent: 2h 10m (was: 2h) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564553 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 12:07 Start Date: 11/Mar/21 12:07 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592306506 ## File path: iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java ## @@ -230,7 +231,8 @@ public void testDateType() { public void testTimestampType() { Literal timestampLiteral = Literal.of("2012-10-02T05:16:17.123456").to(Types.TimestampType.withoutZone()); long timestampMicros = timestampLiteral.value(); -Timestamp ts = Timestamp.valueOf(DateTimeUtil.timestampFromMicros(timestampMicros)); +// `org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()` +Timestamp ts = Timestamp.from(DateTimeUtil.timestampFromMicros(timestampMicros).toInstant(ZoneOffset.UTC)); Review comment: Same problem than with the comment in `HiveIcebergFilterFactory` Either remove the comment, or rephrase it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564553) Time Spent: 2h (was: 1h 50m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics
[ https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564546 ] ASF GitHub Bot logged work on HIVE-24871: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:59 Start Date: 11/Mar/21 11:59 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2061: URL: https://github.com/apache/hive/pull/2061#discussion_r592301473 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java ## @@ -44,11 +44,7 @@ import org.junit.Assert; import org.junit.Test; -import java.util.ArrayList; -import java.util.Collections; -import java.util.HashMap; -import java.util.List; -import java.util.Map; +import java.util.*; Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564546) Time Spent: 50m (was: 40m) > Initiator / Cleaner performance metrics > --- > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564545 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:58 Start Date: 11/Mar/21 11:58 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592300835 ## File path: iceberg-handler/pom.xml ## @@ -16,41 +16,46 @@ .. -0.11.0 4.0.2 -1.9.2 +1.9.2 Review comment: Maybe it is just a little bit misleading name. We will use the 1.9.2 avro for Iceberg as well too. Shall we just rename all the iceberg specific versions to: - iceberg.kryo.version - iceberg.avro.version ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564545) Time Spent: 1h 50m (was: 1h 40m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics
[ https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564544 ] ASF GitHub Bot logged work on HIVE-24871: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:56 Start Date: 11/Mar/21 11:56 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2061: URL: https://github.com/apache/hive/pull/2061#discussion_r592299917 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java ## @@ -171,9 +176,12 @@ public void run() { StringUtils.stringifyException(t)); } finally { - if(handle != null) { + if (handle != null) { handle.releaseLocks(); } + if (metricsEnabled) { Review comment: checked: it won't fail and won't report metric in this case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564544) Time Spent: 40m (was: 0.5h) > Initiator / Cleaner performance metrics > --- > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564542 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:53 Start Date: 11/Mar/21 11:53 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592297779 ## File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java ## @@ -190,6 +190,7 @@ private static int daysFromTimestamp(Timestamp timestamp) { // We have to use the LocalDateTime to get the micros. See the comment above. private static long microsFromTimestamp(Timestamp timestamp) { // `org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()` +// since HIVE-21862 changes literal parsing to UTC based timestamps Review comment: The comment in this form does not make sense to me 😢 Either: ``` // HIVE-21862 changes literal parsing to UTC based timestamps to this: // `org.apache.hadoop.hive.common.type.Timestamp.valueOf(lit.toString()).toSqlTimestamp()` ``` Or just remove the comment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564542) Time Spent: 1h 40m (was: 1.5h) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics
[ https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564537 ] ASF GitHub Bot logged work on HIVE-24871: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:43 Start Date: 11/Mar/21 11:43 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #2061: URL: https://github.com/apache/hive/pull/2061#discussion_r592291682 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java ## @@ -44,11 +44,7 @@ import org.junit.Assert; import org.junit.Test; -import java.util.ArrayList; -import java.util.Collections; -import java.util.HashMap; -import java.util.List; -import java.util.Map; +import java.util.*; Review comment: revert this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564537) Time Spent: 0.5h (was: 20m) > Initiator / Cleaner performance metrics > --- > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance metrics
[ https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564535 ] ASF GitHub Bot logged work on HIVE-24871: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:41 Start Date: 11/Mar/21 11:41 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #2061: URL: https://github.com/apache/hive/pull/2061#discussion_r592290451 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java ## @@ -171,9 +176,12 @@ public void run() { StringUtils.stringifyException(t)); } finally { - if(handle != null) { + if (handle != null) { handle.releaseLocks(); } + if (metricsEnabled) { Review comment: Won't this fail, if the acquireLock times out and the perflogger was not started? The PerfLogger in ql has a method for checking this startTimeHasMethod, maybe it worth to copy that and use it here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564535) Time Spent: 20m (was: 10m) > Initiator / Cleaner performance metrics > --- > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24867) Create iceberg-handler module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24867?focusedWorklogId=564534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564534 ] ASF GitHub Bot logged work on HIVE-24867: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:40 Start Date: 11/Mar/21 11:40 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2058: URL: https://github.com/apache/hive/pull/2058#discussion_r592290164 ## File path: iceberg-handler/pom.xml ## @@ -0,0 +1,189 @@ + +http://maven.apache.org/POM/4.0.0"; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +iceberg-handler +jar +Hive Iceberg Handler + + +.. +0.11.0 +4.0.2 +1.9.2 + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + +org.apache.hive +hive-exec +${project.version} + + Review comment: I would try to focus on keeping the same source files as in the Iceberg repo, so we can easily port changes between the 2, but otherwise I would not try to stick to the same things just because it was the same there. OTOH if we know the reason why they were removed and it applies here too, then we should do the same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564534) Time Spent: 1.5h (was: 1h 20m) > Create iceberg-handler module in Hive > - > > Key: HIVE-24867 > URL: https://issues.apache.org/jira/browse/HIVE-24867 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > * Create a new iceberg-handler module in Hive > * Copy the code from the Iceberg/iceberg-mr module into this new Hive module > * Make necessary changes so it compiles with Hive 4.0.0 dependencies > (iceberg-mr code was based on Hive 3.1) > * Make sure all tests pass -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24857) Trigger Tez output commit after close operation
[ https://issues.apache.org/jira/browse/HIVE-24857?focusedWorklogId=564531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564531 ] ASF GitHub Bot logged work on HIVE-24857: - Author: ASF GitHub Bot Created on: 11/Mar/21 11:36 Start Date: 11/Mar/21 11:36 Worklog Time Spent: 10m Work Description: pvary merged pull request #2048: URL: https://github.com/apache/hive/pull/2048 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564531) Time Spent: 0.5h (was: 20m) > Trigger Tez output commit after close operation > --- > > Key: HIVE-24857 > URL: https://issues.apache.org/jira/browse/HIVE-24857 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently Tez triggers the OutputCommitter.commit() operation between the > proc.run() and proc.close() operations in TezProcessor. However, when writing > out data, calling the proc.close() operation may still produce some extra > records, which would be missed by the output committer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24857) Trigger Tez output commit after close operation
[ https://issues.apache.org/jira/browse/HIVE-24857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-24857. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the patch [~Marton Bod]! > Trigger Tez output commit after close operation > --- > > Key: HIVE-24857 > URL: https://issues.apache.org/jira/browse/HIVE-24857 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently Tez triggers the OutputCommitter.commit() operation between the > proc.run() and proc.close() operations in TezProcessor. However, when writing > out data, calling the proc.close() operation may still produce some extra > records, which would be missed by the output committer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance metrics
[ https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-24871: -- Summary: Initiator / Cleaner performance metrics (was: Initiator / Cleaner performance should be measured with PerformanceLogger) > Initiator / Cleaner performance metrics > --- > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger
[ https://issues.apache.org/jira/browse/HIVE-24871?focusedWorklogId=564495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564495 ] ASF GitHub Bot logged work on HIVE-24871: - Author: ASF GitHub Bot Created on: 11/Mar/21 10:11 Start Date: 11/Mar/21 10:11 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #2061: URL: https://github.com/apache/hive/pull/2061 …erformanceLogger ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564495) Remaining Estimate: 0h Time Spent: 10m > Initiator / Cleaner performance should be measured with PerformanceLogger > - > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger
[ https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24871: -- Labels: pull-request-available (was: ) > Initiator / Cleaner performance should be measured with PerformanceLogger > - > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading
[ https://issues.apache.org/jira/browse/HIVE-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-24862. --- Fix Version/s: 4.0.0 Resolution: Fixed > Fix race condition causing NPE during dynamic partition loading > --- > > Key: HIVE-24862 > URL: https://issues.apache.org/jira/browse/HIVE-24862 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Following properties default to 15 threads. > {noformat} > hive.load.dynamic.partitions.thread > hive.mv.files.thread > {noformat} > During loadDynamicPartitions, it ends ups initializing {{newFiles}} without > synchronization (HIVE-20661, HIVE-24738). > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871] > This causes race condition when dynamic partition thread internally makes use > of {{hive.mv.files.threads}} in copyFiles/replaceFiles. > This causes "NPE" during retrieval in {{addInsertFileInformation()}}. > > e.g stacktrace > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740) > at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740) > at > org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566) > at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading
[ https://issues.apache.org/jira/browse/HIVE-24862?focusedWorklogId=564490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564490 ] ASF GitHub Bot logged work on HIVE-24862: - Author: ASF GitHub Bot Created on: 11/Mar/21 09:54 Start Date: 11/Mar/21 09:54 Worklog Time Spent: 10m Work Description: szlta merged pull request #2053: URL: https://github.com/apache/hive/pull/2053 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564490) Time Spent: 40m (was: 0.5h) > Fix race condition causing NPE during dynamic partition loading > --- > > Key: HIVE-24862 > URL: https://issues.apache.org/jira/browse/HIVE-24862 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Following properties default to 15 threads. > {noformat} > hive.load.dynamic.partitions.thread > hive.mv.files.thread > {noformat} > During loadDynamicPartitions, it ends ups initializing {{newFiles}} without > synchronization (HIVE-20661, HIVE-24738). > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871] > This causes race condition when dynamic partition thread internally makes use > of {{hive.mv.files.threads}} in copyFiles/replaceFiles. > This causes "NPE" during retrieval in {{addInsertFileInformation()}}. > > e.g stacktrace > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740) > at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740) > at > org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566) > at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24862) Fix race condition causing NPE during dynamic partition loading
[ https://issues.apache.org/jira/browse/HIVE-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299453#comment-17299453 ] Ádám Szita commented on HIVE-24862: --- Committed to master, thanks [~zchovan]! > Fix race condition causing NPE during dynamic partition loading > --- > > Key: HIVE-24862 > URL: https://issues.apache.org/jira/browse/HIVE-24862 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Following properties default to 15 threads. > {noformat} > hive.load.dynamic.partitions.thread > hive.mv.files.thread > {noformat} > During loadDynamicPartitions, it ends ups initializing {{newFiles}} without > synchronization (HIVE-20661, HIVE-24738). > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2871] > This causes race condition when dynamic partition thread internally makes use > of {{hive.mv.files.threads}} in copyFiles/replaceFiles. > This causes "NPE" during retrieval in {{addInsertFileInformation()}}. > > e.g stacktrace > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.fixRelativePart(DistributedFileSystem.java:3396) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1740) > at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1740) > at > org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566) > at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:3540) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2414) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:2909) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24812) Disable sharedworkoptimizer remove semijoin by default
[ https://issues.apache.org/jira/browse/HIVE-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24812. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Krisztian for reviewing the changes! > Disable sharedworkoptimizer remove semijoin by default > -- > > Key: HIVE-24812 > URL: https://issues.apache.org/jira/browse/HIVE-24812 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > SJ removal backfired a bit when I was testing stuff - because of the > additional opportunities paralleledges may enable ; because it will increased > the shuffled memory amount and/or even make MJ broadcast inputs larger > set hive.optimize.shared.work.semijoin=false by default for now > right now it's better to leave dppunion to pick up these cases instead of > removing the SJ fully - after HIVE-24376 we might enable it back -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24812) Disable sharedworkoptimizer remove semijoin by default
[ https://issues.apache.org/jira/browse/HIVE-24812?focusedWorklogId=564487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564487 ] ASF GitHub Bot logged work on HIVE-24812: - Author: ASF GitHub Bot Created on: 11/Mar/21 09:52 Start Date: 11/Mar/21 09:52 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #2006: URL: https://github.com/apache/hive/pull/2006 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 564487) Time Spent: 0.5h (was: 20m) > Disable sharedworkoptimizer remove semijoin by default > -- > > Key: HIVE-24812 > URL: https://issues.apache.org/jira/browse/HIVE-24812 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > SJ removal backfired a bit when I was testing stuff - because of the > additional opportunities paralleledges may enable ; because it will increased > the shuffled memory amount and/or even make MJ broadcast inputs larger > set hive.optimize.shared.work.semijoin=false by default for now > right now it's better to leave dppunion to pick up these cases instead of > removing the SJ fully - after HIVE-24376 we might enable it back -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24871) Initiator / Cleaner performance should be measured with PerformanceLogger
[ https://issues.apache.org/jira/browse/HIVE-24871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-24871: -- Description: The PerformanceLogger should be used in Initiator and Cleaner service. * One cycle of Initiator should be measured, with ignoring the time spent waiting on the lock for AUX table * One compaction cleanup should be measured in Cleaner (using different metric for major and minor compaction cleanup) Important note: the PerformanceLogger implementation from metastore should be used (not the ql one) otherwise the metric won't be published in HMS. > Initiator / Cleaner performance should be measured with PerformanceLogger > - > > Key: HIVE-24871 > URL: https://issues.apache.org/jira/browse/HIVE-24871 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Priority: Major > > The PerformanceLogger should be used in Initiator and Cleaner service. > * One cycle of Initiator should be measured, with ignoring the time spent > waiting on the lock for AUX table > * One compaction cleanup should be measured in Cleaner (using different > metric for major and minor compaction cleanup) > Important note: the PerformanceLogger implementation from metastore should be > used (not the ql one) otherwise the metric won't be published in HMS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously in batches
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Description: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of operation. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982 {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); long count = ((Long)query.execute(oldCD)).longValue(); //if no other SD references this CD, we can throw it out. if (count == 0) { {code} My proposal is to run this in a batched way, in every configurable amount of seconds/minutes/whatever. was: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of operation. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} My proposal is to run this in a batched way, in every configurable amount of seconds/minutes/whatever. > Metastore: cleanup unused column descriptors asynchronously in batches > -- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of operation. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982 > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > long count = ((Long)query.execute(oldCD)).longValue(); > //if no other SD references this CD, we can throw it out. > if (count == 0) { > {code} > My proposal is to run this in a batched way, in every configurable amount of > seconds/minutes/whatever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously in batches
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Summary: Metastore: cleanup unused column descriptors asynchronously in batches (was: Metastore: cleanup unused column descriptors asynchronously) > Metastore: cleanup unused column descriptors asynchronously in batches > -- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of operation. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} > My proposal is to run this in a batched way, in every configurable amount of > seconds/minutes/whatever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Description: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of operation. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} My proposal is to run this in a batched way, in every configurable amount of seconds/minutes/whatever. was: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} My proposal is to run this in a batched way, in every configurable amount of seconds/minutes/whatever. > Metastore: cleanup unused column descriptors asynchronously > --- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of operation. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} > My proposal is to run this in a batched way, in every configurable amount of > seconds/minutes/whatever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Description: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} My proposal is to run this in a batched way, in every configurable amount of seconds/minutes/whatever. was: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} > Metastore: cleanup unused column descriptors asynchronously > --- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of opeartion. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} > My proposal is to run this in a batched way, in every configurable amount of > seconds/minutes/whatever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Description: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. Moreover, there is a {code} select count(*) from "SDS" where "CD_ID"=12345; {code} kind of query in it, which can take a relatively long time compared to alter partition. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} was: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} > Metastore: cleanup unused column descriptors asynchronously > --- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of opeartion. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24870: Description: HIVE-2246 introduces CD_ID for optimizing metastore db (details there). ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called in every alter partition kind of opeartion. During a replication, alterPartition could be a heavy path, and has no direct advantage of running removeUnusedColumnDescriptor immediately. {code} query = pm.newQuery("select count(1) from " + "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == inCD)"); query.declareParameters("MColumnDescriptor inCD"); {code} > Metastore: cleanup unused column descriptors asynchronously > --- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of opeartion. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24870) Metastore: cleanup unused column descriptors asynchronously
[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-24870: --- Assignee: László Bodor > Metastore: cleanup unused column descriptors asynchronously > --- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of opeartion. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23820) [HS2] Send tableId in request for get_table_request API
[ https://issues.apache.org/jira/browse/HIVE-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299422#comment-17299422 ] Ashish Sharma commented on HIVE-23820: -- [~kishendas] i am working on HIVE-23571. For which this ticket is blocker. If you are not working on this ticket then can i pick this ticket up so that I can implement end to end writeid and tableid check from client to objectstore and rawstore. > [HS2] Send tableId in request for get_table_request API > --- > > Key: HIVE-23820 > URL: https://issues.apache.org/jira/browse/HIVE-23820 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)