[jira] [Updated] (HIVE-25084) Incorrect aggregate results on bucketed table
[ https://issues.apache.org/jira/browse/HIVE-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-25084: -- Attachment: test4.q > Incorrect aggregate results on bucketed table > - > > Key: HIVE-25084 > URL: https://issues.apache.org/jira/browse/HIVE-25084 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > Attachments: test4.q > > > Steps to repro > {code:java} > CREATE TABLE test_table( > col1 int, > col2 char(32), > col3 varchar(3)) > CLUSTERED BY (col2) > SORTED BY ( >col2 ASC, >col3 ASC, >col1 ASC) > INTO 32 BUCKETS stored as orc; > set hive.query.results.cache.enabled=false; > insert into test_table values(2, "123456", "15"); > insert into test_table values(1, "123456", "15"); > SELECT col2, col3, max(col1) AS max_sequence FROM test_table GROUP BY col2, > col3; > ==> LocalFetch correct result <== > 123456 15 2 > ==> Wrong result with Tez/Llap <== > set hive.fetch.task.conversion=none; > 123456 15 2 > 123456 15 1 > ==> Correct result with Tez/Llap disabling map aggregation <== > set hive.map.aggr=false; > 123456 15 2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25083) Extra reviewer pattern
[ https://issues.apache.org/jira/browse/HIVE-25083?focusedWorklogId=591632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591632 ] ASF GitHub Bot logged work on HIVE-25083: - Author: ASF GitHub Bot Created on: 30/Apr/21 16:34 Start Date: 30/Apr/21 16:34 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #2237: URL: https://github.com/apache/hive/pull/2237 Change-Id: I9f507147d8749a0eab4fcf7ea8ea24449a6f6024 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591632) Remaining Estimate: 0h Time Spent: 10m > Extra reviewer pattern > -- > > Key: HIVE-25083 > URL: https://issues.apache.org/jira/browse/HIVE-25083 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25083) Extra reviewer pattern
[ https://issues.apache.org/jira/browse/HIVE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25083: -- Labels: pull-request-available (was: ) > Extra reviewer pattern > -- > > Key: HIVE-25083 > URL: https://issues.apache.org/jira/browse/HIVE-25083 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25083) Extra reviewer pattern
[ https://issues.apache.org/jira/browse/HIVE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-25083: - > Extra reviewer pattern > -- > > Key: HIVE-25083 > URL: https://issues.apache.org/jira/browse/HIVE-25083 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader
[ https://issues.apache.org/jira/browse/HIVE-25082?focusedWorklogId=591622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591622 ] ASF GitHub Bot logged work on HIVE-25082: - Author: ASF GitHub Bot Created on: 30/Apr/21 16:21 Start Date: 30/Apr/21 16:21 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #2236: URL: https://github.com/apache/hive/pull/2236 Change-Id: I1585469ac7f6ec032fc666d467cb0725bff19633 ### What changes were proposed in this pull request? Avoid useless TimestampStreamReader instance checks by making updateTimezone() a default method in SettableTreeReader ### Why are the changes needed? Cleaner code, less instance of checks ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591622) Remaining Estimate: 0h Time Spent: 10m > Make updateTimezone a default method on SettableTreeReader > -- > > Key: HIVE-25082 > URL: https://issues.apache.org/jira/browse/HIVE-25082 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Avoid useless TimestampStreamReader instance checks by making > updateTimezone() a default method in SettableTreeReader -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader
[ https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25082: -- Labels: pull-request-available (was: ) > Make updateTimezone a default method on SettableTreeReader > -- > > Key: HIVE-25082 > URL: https://issues.apache.org/jira/browse/HIVE-25082 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Avoid useless TimestampStreamReader instance checks by making > updateTimezone() a default method in SettableTreeReader -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader
[ https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-25082: -- Summary: Make updateTimezone a default method on SettableTreeReader (was: Make SettableTreeReader updateTimezone a default method) > Make updateTimezone a default method on SettableTreeReader > -- > > Key: HIVE-25082 > URL: https://issues.apache.org/jira/browse/HIVE-25082 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > > Avoid useless TimestampStreamReader instance checks by making > updateTimezone() a default method in SettableTreeReader -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25082) Make SettableTreeReader updateTimezone a default method
[ https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-25082: - > Make SettableTreeReader updateTimezone a default method > --- > > Key: HIVE-25082 > URL: https://issues.apache.org/jira/browse/HIVE-25082 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Minor > > Avoid useless TimestampStreamReader instance checks by making > updateTimezone() a default method in SettableTreeReader -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner
[ https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591584 ] ASF GitHub Bot logged work on HIVE-25061: - Author: ASF GitHub Bot Created on: 30/Apr/21 15:34 Start Date: 30/Apr/21 15:34 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2225: URL: https://github.com/apache/hive/pull/2225#discussion_r623972738 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java ## @@ -414,16 +423,35 @@ protected int computeStartPreceding(int rowIdx, PTFPartition p) throws HiveExcep return r + 1; } else { // Use Case 5. + Pair start = binaryPreSearchBack(r, p, sortKey, rowVal, amt); + //start again with linear search from the last point where !isDistanceGreater was true + r = start.getLeft(); + rowVal = start.getRight(); while (r >= 0 && !isDistanceGreater(sortKey, rowVal, amt) ) { Pair stepResult = skipOrStepBack(r, p); r = stepResult.getLeft(); rowVal = stepResult.getRight(); } - return r + 1; } } + private Pair binaryPreSearchBack(int r, PTFPartition p, Object sortKey, Review comment: I guess existing PTF tests should covert this optimization but would be great if we would add specific ones for cases 4 and 5 above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591584) Time Spent: 40m (was: 0.5h) > PTF: Improve BoundaryCache / ValueBoundaryScanner > - > > Key: HIVE-25061 > URL: https://issues.apache.org/jira/browse/HIVE-25061 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png > > Time Spent: 40m > Remaining Estimate: 0h > > First, I need to check whether TreeMap is really needed for our case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner
[ https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591583 ] ASF GitHub Bot logged work on HIVE-25061: - Author: ASF GitHub Bot Created on: 30/Apr/21 15:32 Start Date: 30/Apr/21 15:32 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2225: URL: https://github.com/apache/hive/pull/2225#discussion_r623971857 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java ## @@ -406,6 +411,10 @@ protected int computeStartPreceding(int rowIdx, PTFPartition p) throws HiveExcep // Use Case 4. if ( expressionDef.getOrder() == Order.DESC ) { + Pair start = binaryPreSearchBack(r, p, sortKey, rowVal, amt); Review comment: lets add some context why binary search is useful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591583) Time Spent: 0.5h (was: 20m) > PTF: Improve BoundaryCache / ValueBoundaryScanner > - > > Key: HIVE-25061 > URL: https://issues.apache.org/jira/browse/HIVE-25061 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > First, I need to check whether TreeMap is really needed for our case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner
[ https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591574 ] ASF GitHub Bot logged work on HIVE-25061: - Author: ASF GitHub Bot Created on: 30/Apr/21 15:29 Start Date: 30/Apr/21 15:29 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2225: URL: https://github.com/apache/hive/pull/2225#discussion_r623969469 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java ## @@ -218,6 +220,9 @@ public BasePartitionEvaluator( this.outputOI = outputOI; this.nullsLast = nullsLast; this.isCountEvaluator = wrappedEvaluator instanceof GenericUDAFCount.GenericUDAFCountEvaluator; +// use a periodic logger which ignores very small partitions +this.stopwatch = new PeriodicLoggerWithStopwatch( Review comment: should probably remove/comment out logging here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591574) Time Spent: 20m (was: 10m) > PTF: Improve BoundaryCache / ValueBoundaryScanner > - > > Key: HIVE-25061 > URL: https://issues.apache.org/jira/browse/HIVE-25061 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png > > Time Spent: 20m > Remaining Estimate: 0h > > First, I need to check whether TreeMap is really needed for our case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23458) Introduce unified thread pool for scheduled jobs
[ https://issues.apache.org/jira/browse/HIVE-23458?focusedWorklogId=591454&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591454 ] ASF GitHub Bot logged work on HIVE-23458: - Author: ASF GitHub Bot Created on: 30/Apr/21 11:37 Start Date: 30/Apr/21 11:37 Worklog Time Spent: 10m Work Description: EugeneChung edited a comment on pull request #1919: URL: https://github.com/apache/hive/pull/1919#issuecomment-808207986 If hive.query.timeout.seconds is set to bigger than 0, a new thread is always created (and just destroyed) for every SQL operation by calling Executors.newSingleThreadScheduledExecutor(). Most of the scheduled tasks for cancelling the operation wouldn't be called, too. The unified scheduler pool removes those inefficiencies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591454) Time Spent: 2h 10m (was: 2h) > Introduce unified thread pool for scheduled jobs > > > Key: HIVE-23458 > URL: https://issues.apache.org/jira/browse/HIVE-23458 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Eugene Chung >Assignee: Eugene Chung >Priority: Major > Labels: pull-request-available, todoc4.0 > Fix For: 4.0.0 > > Attachments: HIVE-23458.01.patch, HIVE-23458.02.patch, > HIVE-23458.03.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > As I mentioned in [the comment of > HIVE-23164|https://issues.apache.org/jira/browse/HIVE-23164?focusedCommentId=17089506&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17089506], > I've made the unified scheduled executor service like > org.apache.hadoop.hive.metastore.ThreadPool. > I think it could help > 1. to minimize the possibility of making non-daemon threads when developers > need ScheduledExecutorService > 2. to achieve the utilization of server resources because the current > situation is all of the modules make its own ScheduledExecutorService and all > of the threads are just using for one job. > 3. administrators of Hive servers by providing > hive.exec.scheduler.num.threads configuration so that they can predict and > set how many threads are used and needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits
[ https://issues.apache.org/jira/browse/HIVE-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod resolved HIVE-25076. --- Resolution: Fixed > Get number of write tasks from jobConf for Iceberg commits > -- > > Key: HIVE-25076 > URL: https://issues.apache.org/jira/browse/HIVE-25076 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When writing empty data into Iceberg tables, we can end up with 0 succeeded > task count number. With the current logic, we might then erroneously end up > taking the number of mapper tasks in the commit logic, which would result in > failures. We should instead save the number of succeeded task count into the > JobConf under a specified key and retrieve it from there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits
[ https://issues.apache.org/jira/browse/HIVE-25076?focusedWorklogId=591449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591449 ] ASF GitHub Bot logged work on HIVE-25076: - Author: ASF GitHub Bot Created on: 30/Apr/21 11:13 Start Date: 30/Apr/21 11:13 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2233: URL: https://github.com/apache/hive/pull/2233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591449) Time Spent: 20m (was: 10m) > Get number of write tasks from jobConf for Iceberg commits > -- > > Key: HIVE-25076 > URL: https://issues.apache.org/jira/browse/HIVE-25076 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When writing empty data into Iceberg tables, we can end up with 0 succeeded > task count number. With the current logic, we might then erroneously end up > taking the number of mapper tasks in the commit logic, which would result in > failures. We should instead save the number of succeeded task count into the > JobConf under a specified key and retrieve it from there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25033) HPL/SQL thrift call fails when returning null
[ https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25033. -- Resolution: Fixed > HPL/SQL thrift call fails when returning null > - > > Key: HIVE-25033 > URL: https://issues.apache.org/jira/browse/HIVE-25033 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25033) HPL/SQL thrift call fails when returning null
[ https://issues.apache.org/jira/browse/HIVE-25033?focusedWorklogId=591395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591395 ] ASF GitHub Bot logged work on HIVE-25033: - Author: ASF GitHub Bot Created on: 30/Apr/21 08:18 Start Date: 30/Apr/21 08:18 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2194: URL: https://github.com/apache/hive/pull/2194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591395) Time Spent: 40m (was: 0.5h) > HPL/SQL thrift call fails when returning null > - > > Key: HIVE-25033 > URL: https://issues.apache.org/jira/browse/HIVE-25033 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction
[ https://issues.apache.org/jira/browse/HIVE-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25079 started by Antal Sinkovits. -- > Create new metric about number of writes to tables with manually disabled > compaction > > > Key: HIVE-25079 > URL: https://issues.apache.org/jira/browse/HIVE-25079 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > Create a new metric that measures the number of writes tables that has > compaction turned off manually. It does not matter if the write is committed > or aborted (both are bad...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25081) Put metrics collection behind a feature flag
[ https://issues.apache.org/jira/browse/HIVE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-25081: -- > Put metrics collection behind a feature flag > > > Key: HIVE-25081 > URL: https://issues.apache.org/jira/browse/HIVE-25081 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > Most metrics we're creating are collected in AcidMetricsService, which is > behind a feature flag. However there are some metrics that are collected > outside of the service. These should be behind a feature flag in addition to > hive.metastore.metrics.enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state
[ https://issues.apache.org/jira/browse/HIVE-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-25080: -- > Create metric about oldest entry in "ready for cleaning" state > -- > > Key: HIVE-25080 > URL: https://issues.apache.org/jira/browse/HIVE-25080 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated > with the current time. Then the compaction state is set to "ready for > cleaning". (... and then the Cleaner runs and the state is set to "succeeded" > hopefully) > Based on this we know (roughly) how long a compaction has been in state > "ready for cleaning". > We should create a metric similar to compaction_oldest_enqueue_age_in_sec > that would show that the cleaner is blocked by something i.e. find the > compaction in "ready for cleaning" that has the oldest commit time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction
[ https://issues.apache.org/jira/browse/HIVE-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-25079: -- > Create new metric about number of writes to tables with manually disabled > compaction > > > Key: HIVE-25079 > URL: https://issues.apache.org/jira/browse/HIVE-25079 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > > Create a new metric that measures the number of writes tables that has > compaction turned off manually. It does not matter if the write is committed > or aborted (both are bad...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24722) LLAP cache hydration
[ https://issues.apache.org/jira/browse/HIVE-24722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits resolved HIVE-24722. Fix Version/s: 4.0.0 Resolution: Fixed All subtasks are committed, closing this. > LLAP cache hydration > > > Key: HIVE-24722 > URL: https://issues.apache.org/jira/browse/HIVE-24722 > Project: Hive > Issue Type: Improvement >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: llap > Fix For: 4.0.0 > > > Provide a way to save and reload the contents of the cache in the llap > daemons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting
[ https://issues.apache.org/jira/browse/HIVE-25071?focusedWorklogId=591383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591383 ] ASF GitHub Bot logged work on HIVE-25071: - Author: ASF GitHub Bot Created on: 30/Apr/21 07:28 Start Date: 30/Apr/21 07:28 Worklog Time Spent: 10m Work Description: kasakrisz commented on pull request #2231: URL: https://github.com/apache/hive/pull/2231#issuecomment-829900602 Hi Marta, Thanks for reviewing this patch. This is what I found about distributing rows to reducers while I was debugging: Let's say we have the following statements: ``` create table acidtbl(a int, b int) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); insert ... delete from acidtbl where a = 1 or a = 3; ``` This case the the plan of the delete statement after ReduceSinkDeDuplication looks like: ``` TS[0]-FIL[8]-SEL[2]-RS[5]-SEL[6]-FS[7] ``` So with Tez we have a mapper: TS[0]-FIL[8]-SEL[2]-RS[5] and have two reducers each of them has: SEL[6]-FS[7] RS[5] has Partition keys: GenericUDFBridge ==> UDFToInteger (Column[_col0]) Sort keys: Column[_col0] And maxReducers: 2 where _col0 is the row_id coming from SEL[2]. UDFToInteger() extracts the bucket_id field which is going to be used to generate a `reducesink.key` in the RS operator. This is going to be passed to the wrapped `OutputCollector` with the row. This case this is an `org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput`. This class is part of Tez which I'm not familiar with but I found that this is where rows are distributed to reducers by the key coming from RS. Hive/hadoop also has a setting `hive.exec.reducers.max`/`mapreduce.job.reduces`. This limits the maxReducers in RS operator. If the table has more buckets than the max reducers then FileSink operator also distributes the rows into different files. If I understand correctly this is done by the `multiFileSpray` functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 591383) Time Spent: 0.5h (was: 20m) > Number of reducers limited to fixed 1 when updating/deleting > > > Key: HIVE-25071 > URL: https://issues.apache.org/jira/browse/HIVE-25071 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When updating/deleting bucketed tables an extra ReduceSink operator is > created to enforce bucketing. After HIVE-22538 number of reducers limited to > fixed 1 in these RS operators. > This can lead to performance degradation. > Prior HIVE-22538 multiple reducers was available such cases. The reason for > limiting the number of reducers is to ensure RowId ascending order in delete > delta files produced by the update/delete statements. > This is the plan of delete statement like: > {code} > DELETE FROM t1 WHERE a = 1; > {code} > {code} > TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7] > {code} > RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of > reducers were limited to bucket number in the table or > hive.exec.reducers.max. However RS[5] does not provide any ordering so above > plan may generate unsorted deleted deltas which leads to corrupted data reads. > Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication > and the resulting RS kept the ordering and enabled multiple reducers. It > could do because ReduceSinkDeduplication was prepared for ACID writes. This > was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)