[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-03-17 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938056#comment-13938056
 ] 

Harish Butani commented on HIVE-6518:
-

+1 for port to 0.13

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, 
 HIVE-6518.2.patch, HIVE-6518.3.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-03-17 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938186#comment-13938186
 ] 

Jitendra Nath Pandey commented on HIVE-6518:


Committed to branch-0.13 as well. 

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, 
 HIVE-6518.2.patch, HIVE-6518.3.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-03-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934667#comment-13934667
 ] 

Hive QA commented on HIVE-6518:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12634401/HIVE-6518.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5389 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12634401

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, 
 HIVE-6518.2.patch, HIVE-6518.3.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-03-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935444#comment-13935444
 ] 

Gopal V commented on HIVE-6518:
---

The test failures don't seem to be related to this fix - they aren't vectorized.

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch, 
 HIVE-6518.2.patch, HIVE-6518.3.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-02-28 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916282#comment-13916282
 ] 

Remus Rusanu commented on HIVE-6518:


Can you somehow modify the LOG.debug at top of flush() to call out that the 
flush was triggered by the gcCanary.get() == null? I was thinking: keep a count 
of gcCanary allocations and print it in the LOG.debug message, this will tell 
us if the GC is the trigger and also will tell how often has occured in the 
operator lifetime, when debugging etc.
+1

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-6518.1-tez.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-02-27 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915212#comment-13915212
 ] 

Gunther Hagleitner commented on HIVE-6518:
--

I like it. Sounds like this will allow you to be more aggressive with 
caching/flushing params, while having a trigger that will flush out stuff when 
necessary.

+1 (assuming tests pass)

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-6518.1-tez.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-02-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915238#comment-13915238
 ] 

Gopal V commented on HIVE-6518:
---

Yes, also the ORC scenario is more complex for strings in dictionaries. 

A substring does not drop the rest of the data off the memory overhead because 
in vectorized mode, only the start:len get modified, no new allocations are 
made.

So a group by SUBSTR() will keep the entire string in  memory, except the VGBY 
does not know that it does.

 Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
 triggered
 

 Key: HIVE-6518
 URL: https://issues.apache.org/jira/browse/HIVE-6518
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-6518.1-tez.patch


 The current VectorGroupByOperator implementation flushes the in-memory hashes 
 when the maximum entries or fraction of memory is hit.
 This works for most cases, but there are some corner cases where we hit GC 
 ovehead limits or heap size limits before either of those conditions are 
 reached due to the rest of the pipeline.
 This patch adds a SoftReference as a GC canary. If the soft reference is 
 dead, then a full GC pass happened sometime in the near past  the 
 aggregation hashtables should be flushed immediately before another full GC 
 is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)