subject:"\[jira\] \[Commented\] \(HBASE\-9865\) WALEdit.heapSize\(\) is incorrect in certain replication scenarios which may cause RegionServers to go OOM"

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-12 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820286#comment-13820286
 ] 

churro morales commented on HBASE-9865:
---

Okay folks, 

We ran a rack of region servers with the patch.  We increased the new 
generation sizes in the hopes that these new gc'able objects would never make 
it into tenured space.  After doing a jmap -histo on the regionservers with the 
patch applied and those without, I noticed a significant drop in the amount of 
space taken by Object[] .  This patch has been running in our cluster on a 
subset of boxes for around a week and everything is looking good from garbage 
collection to replication lag.

Thanks to the community!

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-12 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820350#comment-13820350
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Cool. Thanks [~churromorales]. If there are no objections I will commit this to 
all branches today.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-06 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815094#comment-13815094
 ] 

Jean-Daniel Cryans commented on HBASE-9865:
---

+1, would wait for churro's cluster testing before committing.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-06 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815162#comment-13815162
 ] 

Lars Hofhansl commented on HBASE-9865:
--

If I get time I might write a microtest comparing the reuse approach with new 
allocations and varying batch sizes.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814247#comment-13814247
 ] 

Dave Latham commented on HBASE-9865:


Looks good to me.  Thanks, Lars.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814278#comment-13814278
 ] 

churro morales commented on HBASE-9865:
---

One thing i noticed in WALEdit, we should be accounting for the ArrayList 
object as well
instead of:
{code}
public long heapSize() {
long ret = 0;
{code}
this would be correct, although it doesn't matter very much.
{code}
public long heapSize() {
long ret = ClassSize.ARRAYLIST;
{code}

If you didn't want to bleed the ArrayList implementation that WALEdit uses 
maybe something like this might work:
For WALEdit
{code}
public void removeIf(PredicateKeyValue predicate) {
for (int i = kvs.size()-1; i = 0; i--) {
  KeyValue kv = kvs.get(i);
  if (predicate.apply(kv)) {
kvs.remove(i);
  }
}
if (kvs.size()  size()/2) {
   kvs.trimToSize();
}
  }
{code}

And ReplicationSource would change to:
{code}
protected void removeNonReplicableEdits(WALEdit edit) {
final NavigableMapbyte[], Integer scopes = edit.getScopes();
edit.removeIf(new PredicateKeyValue() {
  @Override
  public boolean apply(KeyValue keyValue) {
return scopes == null || !scopes.containsKey(keyValue.getFamily());
  }
});
}
{code}

I don't think it adds much by doing this but it is an alternative if we don't 
want to bleed that the WALEdit uses an ArrayList. 

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814302#comment-13814302
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Thanks Churro (and Dave). While we're add it, might as well fix 
WALEdit.heapSize().
The other change does not help with readability I think. It's not so bad to 
leak this out of WALEdit, if anything it declares that this is a random access 
list.

I'll make a 0.94 patch as well. Any chance you would try it on a real cluster?

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814312#comment-13814312
 ] 

churro morales commented on HBASE-9865:
---

Hi Lars, 

I'm sure at the very least we will be able to apply it to a few nodes in our 
cluster and monitor the how this patch affects garbage collection.  Upon 
gathering results, I will be sure to share.
Cheers


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813072#comment-13813072
 ] 

Dave Latham commented on HBASE-9865:


Lars, you beat me to the punch on that read failure case.  I'm also not sure 
why it is the way it is or how it should be, but noticed the patch had seemed 
to change it.  Seems like it's best to replicate all you can for a corrupt log. 
 JD, any thoughts or more cool stories?

I also agree that #2 is more serious than #1.  However the issue as filed ad 
described was targeted at #1.  Lars, what do you think about adding a simple 
check in ReplicationSource.removeNonReplicableEdits to trimToSize if more than 
half the KVs are removed?

A little more background as we've deciphered some behavior on our cluster in 
case anyone is curious.  We're running clusters in a pair of data centers, and 
just migrated one of those data centers which involved shutting off replication 
with one cluster and getting it going with another one.  As part of that 
process we managed to get some edits stuck in a replication cycle without 
realizing it ( HBASE-9888 and HBASE-7709 ).  Because those edits got batched up 
with edits from other clusters ( HBASE-9158 ) it created some enormous edits 
that varied by position leading to this particular pain.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Jeffrey Zhong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813129#comment-13813129
 ] 

Jeffrey Zhong commented on HBASE-9865:
--

[~lhofhansl] The following code is a dead code path and should never be called 
in current implementation. The handling here is confusing enough though.
{code}
else if (currentNbEntries != 0) {
...
considerDumping = true;
currentNbEntries = 0;
  }
{code}


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813155#comment-13813155
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Why is that code dead, though? Maybe a few edit are read then we get an 
IOException. In that case currentNbEntries would be  0. If the queue was not 
recovered we could reach this code, no?

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813164#comment-13813164
 ] 

Lars Hofhansl commented on HBASE-9865:
--

bq.  Lars, what do you think about adding a simple check in 
ReplicationSource.removeNonReplicableEdits to trimToSize if more than half the 
KVs are removed?

No harm in that :)
I doubt it makes a difference, the worst - and something that just occurred to 
me - is that all KVs loaded via entriesArray/WALEdit will stay referenced until 
we reuse the WALEdit. The KVs might themselves be large. If we only have a few 
large batches followed by only small batches we'll keep those KVs from the 
higher indexes in memory forever.


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813190#comment-13813190
 ] 

Jean-Daniel Cryans commented on HBASE-9865:
---

bq. Why is that code dead, though? Maybe a few edit are read then we get an 
IOException. In that case currentNbEntries would be  0. If the queue was not 
recovered we could reach this code, no?

In readAllEntriesToReplicateOrNextFile we do:

{code}
  try {
entry = this.repLogReader.readNextAndSetPosition(this.entriesArray, 
this.currentNbEntries);
  } catch (IOException ie) {
LOG.debug(Break on IOE:  + ie.getMessage());
break;
  }
{code}

So that IOE won't come out. Only seek() and the first readNextAndSetPosition() 
calls can throw the IOE.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813308#comment-13813308
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Ahh... Cool. I'll remove that part. I have decouple the ArrayList from the 
method then, i.e. by just passing in the List.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813465#comment-13813465
]

Hadoop QA commented on HBASE-9865:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12612049/9865-trunk-v3.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 2 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestHRegion
org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/7729//console

This message is automatically generated.

WALEdit.heapSize() is incorrect in certain replication scenarios which may
cause RegionServers to go OOM

Key: HBASE-9865
URL: https://issues.apache.org/jira/browse/HBASE-9865
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt,
9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt

WALEdit.heapSize() is incorrect in certain replication scenarios which may
cause RegionServers to go OOM.
A little background on this issue. We noticed that our source replication
regionservers would get into gc storms and sometimes even OOM.
We noticed a case where it showed that there were around 25k WALEdits to
replicate, each one with an ArrayList of KeyValues. The array list had a
capacity of around 90k (using 350KB of heap memory) but had around 6 non null
entries.
When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a
WALEdit it removes all kv's that are scoped other than local.
But in doing so we don't account for the capacity of the ArrayList when
determining heapSize for a WALEdit. The logic for shipping a batch is
whether you have hit a size capacity or number of entries capacity.
Therefore if have a WALEdit with 25k entries and suppose all are removed:
The size of the arrayList is 0 (we don't even count the collection's heap
size currently) but the capacity is ignored.
This will yield a heapSize() of 0 bytes while in the best case it would be at
least 10 bytes (provided you

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813497#comment-13813497
 ] 

Lars Hofhansl commented on HBASE-9865:
--

The test failures are unrelated. Will check the findbugs warnings.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813523#comment-13813523
 ] 

Lars Hofhansl commented on HBASE-9865:
--

None of the classes/methods changed in this patch cause any new findbugs 
warning. Not sure why it would report this (or I cannot read the find bugs 
report correctly).

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813675#comment-13813675
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Any comments on last patch? Should be good to go.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812509#comment-13812509
 ] 

Lars Hofhansl commented on HBASE-9865:
--

This is not quite right in the partial read failure case, yet. (a log was 
partially read and is then found corrupted)


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812626#comment-13812626
 ] 

Lars Hofhansl commented on HBASE-9865:
--

I'm trying to grok the details of the failure logic, this has gotten pretty 
convoluted over time.
Specifically this part in ReplicationSource.run():

{code}
  try {
if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo)) {
  continue;
}
  } catch (IOException ioe) {
 ...
  if (this.replicationQueueInfo.isQueueRecovered()) {
  ...
  considerDumping = true;
  ...
  } else if (currentNbEntries != 0) {
...
considerDumping = true;
currentNbEntries = 0;
  }
  ...
  } finally {
{code}

So when we find a corrupt log file we won't replicate any of it 
({{currentNbEntries = 0}}), unless the queue was recovered, in which case we 
*do* want to replicate the partial set of edits we managed to read?


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812187#comment-13812187
]

Hadoop QA commented on HBASE-9865:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12611785/9865-trunk.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 3 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/7714//console

This message is automatically generated.

WALEdit.heapSize() is incorrect in certain replication scenarios which may
cause RegionServers to go OOM

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812240#comment-13812240
]

Hadoop QA commented on HBASE-9865:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12611797/9865-trunk-v2.txt
against trunk revision .