[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938056#comment-13938056 ] Harish Butani commented on HIVE-6518: - +1 for port to 0.13 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, HIVE-6518.2.patch, HIVE-6518.3.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938186#comment-13938186 ] Jitendra Nath Pandey commented on HIVE-6518: Committed to branch-0.13 as well. Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, HIVE-6518.2.patch, HIVE-6518.3.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934667#comment-13934667 ] Hive QA commented on HIVE-6518: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634401/HIVE-6518.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5389 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12634401 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, HIVE-6518.2.patch, HIVE-6518.3.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935444#comment-13935444 ] Gopal V commented on HIVE-6518: --- The test failures don't seem to be related to this fix - they aren't vectorized. Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, HIVE-6518.2.patch, HIVE-6518.3.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916282#comment-13916282 ] Remus Rusanu commented on HIVE-6518: Can you somehow modify the LOG.debug at top of flush() to call out that the flush was triggered by the gcCanary.get() == null? I was thinking: keep a count of gcCanary allocations and print it in the LOG.debug message, this will tell us if the GC is the trigger and also will tell how often has occured in the operator lifetime, when debugging etc. +1 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-6518.1-tez.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915212#comment-13915212 ] Gunther Hagleitner commented on HIVE-6518: -- I like it. Sounds like this will allow you to be more aggressive with caching/flushing params, while having a trigger that will flush out stuff when necessary. +1 (assuming tests pass) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-6518.1-tez.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
[ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915238#comment-13915238 ] Gopal V commented on HIVE-6518: --- Yes, also the ORC scenario is more complex for strings in dictionaries. A substring does not drop the rest of the data off the memory overhead because in vectorized mode, only the start:len get modified, no new allocations are made. So a group by SUBSTR() will keep the entire string in memory, except the VGBY does not know that it does. Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered Key: HIVE-6518 URL: https://issues.apache.org/jira/browse/HIVE-6518 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-6518.1-tez.patch The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit. This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline. This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past the aggregation hashtables should be flushed immediately before another full GC is triggered. -- This message was sent by Atlassian JIRA (v6.1.5#6160)