[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820286#comment-13820286 ] churro morales commented on HBASE-9865: --- Okay folks, We ran a rack of region servers with the patch. We increased the new generation sizes in the hopes that these new gc'able objects would never make it into tenured space. After doing a jmap -histo on the regionservers with the patch applied and those without, I noticed a significant drop in the amount of space taken by Object[] . This patch has been running in our cluster on a subset of boxes for around a week and everything is looking good from garbage collection to replication lag. Thanks to the community! WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820350#comment-13820350 ] Lars Hofhansl commented on HBASE-9865: -- Cool. Thanks [~churromorales]. If there are no objections I will commit this to all branches today. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. --
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815094#comment-13815094 ] Jean-Daniel Cryans commented on HBASE-9865: --- +1, would wait for churro's cluster testing before committing. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815162#comment-13815162 ] Lars Hofhansl commented on HBASE-9865: -- If I get time I might write a microtest comparing the reuse approach with new allocations and varying batch sizes. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814247#comment-13814247 ] Dave Latham commented on HBASE-9865: Looks good to me. Thanks, Lars. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814278#comment-13814278 ] churro morales commented on HBASE-9865: --- One thing i noticed in WALEdit, we should be accounting for the ArrayList object as well instead of: {code} public long heapSize() { long ret = 0; {code} this would be correct, although it doesn't matter very much. {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; {code} If you didn't want to bleed the ArrayList implementation that WALEdit uses maybe something like this might work: For WALEdit {code} public void removeIf(PredicateKeyValue predicate) { for (int i = kvs.size()-1; i = 0; i--) { KeyValue kv = kvs.get(i); if (predicate.apply(kv)) { kvs.remove(i); } } if (kvs.size() size()/2) { kvs.trimToSize(); } } {code} And ReplicationSource would change to: {code} protected void removeNonReplicableEdits(WALEdit edit) { final NavigableMapbyte[], Integer scopes = edit.getScopes(); edit.removeIf(new PredicateKeyValue() { @Override public boolean apply(KeyValue keyValue) { return scopes == null || !scopes.containsKey(keyValue.getFamily()); } }); } {code} I don't think it adds much by doing this but it is an alternative if we don't want to bleed that the WALEdit uses an ArrayList. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) {
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814302#comment-13814302 ] Lars Hofhansl commented on HBASE-9865: -- Thanks Churro (and Dave). While we're add it, might as well fix WALEdit.heapSize(). The other change does not help with readability I think. It's not so bad to leak this out of WALEdit, if anything it declares that this is a random access list. I'll make a 0.94 patch as well. Any chance you would try it on a real cluster? WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814312#comment-13814312 ] churro morales commented on HBASE-9865: --- Hi Lars, I'm sure at the very least we will be able to apply it to a few nodes in our cluster and monitor the how this patch affects garbage collection. Upon gathering results, I will be sure to share. Cheers WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813072#comment-13813072 ] Dave Latham commented on HBASE-9865: Lars, you beat me to the punch on that read failure case. I'm also not sure why it is the way it is or how it should be, but noticed the patch had seemed to change it. Seems like it's best to replicate all you can for a corrupt log. JD, any thoughts or more cool stories? I also agree that #2 is more serious than #1. However the issue as filed ad described was targeted at #1. Lars, what do you think about adding a simple check in ReplicationSource.removeNonReplicableEdits to trimToSize if more than half the KVs are removed? A little more background as we've deciphered some behavior on our cluster in case anyone is curious. We're running clusters in a pair of data centers, and just migrated one of those data centers which involved shutting off replication with one cluster and getting it going with another one. As part of that process we managed to get some edits stuck in a replication cycle without realizing it ( HBASE-9888 and HBASE-7709 ). Because those edits got batched up with edits from other clusters ( HBASE-9158 ) it created some enormous edits that varied by position leading to this particular pain. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813129#comment-13813129 ] Jeffrey Zhong commented on HBASE-9865: -- [~lhofhansl] The following code is a dead code path and should never be called in current implementation. The handling here is confusing enough though. {code} else if (currentNbEntries != 0) { ... considerDumping = true; currentNbEntries = 0; } {code} WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813155#comment-13813155 ] Lars Hofhansl commented on HBASE-9865: -- Why is that code dead, though? Maybe a few edit are read then we get an IOException. In that case currentNbEntries would be 0. If the queue was not recovered we could reach this code, no? WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813164#comment-13813164 ] Lars Hofhansl commented on HBASE-9865: -- bq. Lars, what do you think about adding a simple check in ReplicationSource.removeNonReplicableEdits to trimToSize if more than half the KVs are removed? No harm in that :) I doubt it makes a difference, the worst - and something that just occurred to me - is that all KVs loaded via entriesArray/WALEdit will stay referenced until we reuse the WALEdit. The KVs might themselves be large. If we only have a few large batches followed by only small batches we'll keep those KVs from the higher indexes in memory forever. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813190#comment-13813190 ] Jean-Daniel Cryans commented on HBASE-9865: --- bq. Why is that code dead, though? Maybe a few edit are read then we get an IOException. In that case currentNbEntries would be 0. If the queue was not recovered we could reach this code, no? In readAllEntriesToReplicateOrNextFile we do: {code} try { entry = this.repLogReader.readNextAndSetPosition(this.entriesArray, this.currentNbEntries); } catch (IOException ie) { LOG.debug(Break on IOE: + ie.getMessage()); break; } {code} So that IOE won't come out. Only seek() and the first readNextAndSetPosition() calls can throw the IOE. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813308#comment-13813308 ] Lars Hofhansl commented on HBASE-9865: -- Ahh... Cool. I'll remove that part. I have decouple the ArrayList from the method then, i.e. by just passing in the List. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813465#comment-13813465 ] Hadoop QA commented on HBASE-9865: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612049/9865-trunk-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7729//console This message is automatically generated. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813497#comment-13813497 ] Lars Hofhansl commented on HBASE-9865: -- The test failures are unrelated. Will check the findbugs warnings. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813523#comment-13813523 ] Lars Hofhansl commented on HBASE-9865: -- None of the classes/methods changed in this patch cause any new findbugs warning. Not sure why it would report this (or I cannot read the find bugs report correctly). WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813675#comment-13813675 ] Lars Hofhansl commented on HBASE-9865: -- Any comments on last patch? Should be good to go. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812509#comment-13812509 ] Lars Hofhansl commented on HBASE-9865: -- This is not quite right in the partial read failure case, yet. (a log was partially read and is then found corrupted) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812626#comment-13812626 ] Lars Hofhansl commented on HBASE-9865: -- I'm trying to grok the details of the failure logic, this has gotten pretty convoluted over time. Specifically this part in ReplicationSource.run(): {code} try { if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo)) { continue; } } catch (IOException ioe) { ... if (this.replicationQueueInfo.isQueueRecovered()) { ... considerDumping = true; ... } else if (currentNbEntries != 0) { ... considerDumping = true; currentNbEntries = 0; } ... } finally { {code} So when we find a corrupt log file we won't replicate any of it ({{currentNbEntries = 0}}), unless the queue was recovered, in which case we *do* want to replicate the partial set of edits we managed to read? WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812187#comment-13812187 ] Hadoop QA commented on HBASE-9865: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611785/9865-trunk.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7714//console This message is automatically generated. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-sample-1.txt, 9865-sample.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812240#comment-13812240 ] Hadoop QA commented on HBASE-9865: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611797/9865-trunk-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7717//console This message is automatically generated. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811426#comment-13811426 ] Dave Latham commented on HBASE-9865: [~jdcryans] Would love to hear your thoughts if you have a chance to read over this issue. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample-1.txt, 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811524#comment-13811524 ] churro morales commented on HBASE-9865: --- Hi lars, Thanks for the patch, I will review and apply it on a few nodes in our cluster to monitor how garbage collection is affected in a high volume environment. I will let you know the results. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample-1.txt, 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811644#comment-13811644 ] Lars Hofhansl commented on HBASE-9865: -- Cool. Thanks Churro. The patch address the #2 issue. I'm less worried about #1 as other than #2 it is not permanent. Agree that J-D's input would be good. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample-1.txt, 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811667#comment-13811667 ] Jean-Daniel Cryans commented on HBASE-9865: --- Story time guys! Joking, it's not that bad. When writing replication's first version there was a concern that creating all those new objects would be costly, but I don't think that we've ever measured it. Now it looks like that reusing has its own issues. I'm in favor of the latest approach since it does make the code cleaner, KISS. I would be even +1ier if we get [~churromorales]'s feedback for his real use case. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample-1.txt, 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810543#comment-13810543 ] churro morales commented on HBASE-9865: --- Hi Lars. I believe the batching logic in shipping replication logs is incorrect (if we delete a lot kv's from the HLogEntry and the number of key values for each entry is sufficiently large). In this case, the logic which determines batchSize will underestimate the actual heap size required. We will keep everything in the heap until we ship the edits, after that we can overwrite the values in the array for new HLogEntries and then garbage collection will take care of them. I think the problem is that when we remove quite a kvs from the HLogEntry (ReplicationSource.removeNonReplicableEdits()), we have to account for memory footprint for the ArrayList of kv's in the heap. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810611#comment-13810611 ] Lars Hofhansl commented on HBASE-9865: -- Hi Churro, But that is not useful heap right? It happens to be an implementation detail of ArrayList. (If we cleared trimToSize()'d each ArrayList we'd be fine.) The GC will unfortunately *not* take care of it... The elements array in ArrayList will never shrink unless trimToSize() is called. Reusing or clearing the ArrayList will only reset the size member, but shrink the elements array. That seems to be the core of the problem. As large edit flow though the replication source eventually all reused WALEdits will have been used for a large number of KVs and hence have their ArrayLists capacity grown accordingly, that memory footprint will never be reduced. Rather than accounting for useless heap I'd rather not have that heap used in the first place. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810727#comment-13810727 ] churro morales commented on HBASE-9865: --- Lars, You are totally correct, my apologies. The reuse of the WALEdits allows for only the array list of kv's to only grow. In our scenario, we ran a copyTable job to the same cluster (not replicating the new table). Thus we had batches that were quite large with most of the kv's being removed from the list as they were not to be replicated. Quite a few regionservers are OOM'ing well after the job completed, those that removed the kv's from their replication logs never had the capacity reset. I agree with you that rather than dealing with the useless heap lets not have the arraylist use the heap at all. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810774#comment-13810774 ] Dave Latham commented on HBASE-9865: There are actually two issues at play here that can each cause heap problems: 1. As Churro notes, each time we read a WALEdit then discard KVs that don't have a proper replication scope set. We then try to measure the heap used by that WALEdit and only measure the remaining KVs without accounting for the extra ArrayList capacity (which is not currently trimmed). For this reason we may use far more than replication.source.size.capacity memory while replicating a single batch, even if the WALEdits are all new. So the computed heap usage during a batch is underestimated. 2. As Lars notes, after using the WALEdit entriesArray for one batch we keep them around to be reused. Each one contains an ArrayList with capacity at least as large as the largest single WALEdit that occurred in any batch. So HLogs that have high variance in the size of the batches written will cause each WALEdit to grow large over time. (We saw a region server go out of memory today using 4 GB of memory for this). So the heap usage of the replication source as a whole is not checked or compared to replication.source.size.capacity. For #1 I'd propose kvs.trimToSize() in ReplicationSource.removeNonReplicableEdits if more than X% of the kvs are removed. For #2 I'd propose throwing away the WALEdit reuse altogether unless there are some real numbers about that having significant benefit. Each time we read it we create all the KeyValue instances and their backing arrays which should be more than the WALEdits in any case. Trying to keep the arrays mostly reused seems like a recipe for getting them tenured and then making more old gen GC work. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810782#comment-13810782 ] Dave Latham commented on HBASE-9865: If we only recycle the kvs arrays during readFields there's still an ugly case where successive batches use fewer and fewer WALEdits so the tails of the previous batches don't get GCed and we can still OOM. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811037#comment-13811037 ] Lars Hofhansl commented on HBASE-9865: -- I agree. A simple approach would be to just create the array of HLog.Entries to replicate, and dump it when done. Would be detrimental in a case where most WALEdits only have a few KVs, in that case we'd produce a lot of extra garbage now. But I think that is a good tradeoff. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809789#comment-13809789 ] Ted Yu commented on HBASE-9865: --- You may have seen this: http://stackoverflow.com/questions/2497063/how-to-get-the-capacity-of-the-arraylist-in-java WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809826#comment-13809826 ] Lars Hofhansl commented on HBASE-9865: -- So if I understand this correctly the gist of the problem is that we're reusing the WALEdits (see ReplicationHLogReaderManager.readNextAndSetPosition reusing entriesArray), and thus their internal kvs ArrayList can only grow and never shrink. Some of the WALEdits can have a large list of KVs if they were created by a batch operation. Calculating the correct heapsize would be pasting over the problem (I think). We should ensure that at some point we can reduce the capacity of the internal kvs array of the reused WALEdit. The right point seems to be where the WALEdits are reused for reading. We could look at WALEdit.readFields. There we clear the kvs list (which does not reduce its capacity of course). It's not immediately clear to me what the correct solution is. We do not always want to reset the capacity since that is expensive too and the next time we'll need to recreate the internal array. Upon reuse in WALEdit.readFields, we could check the current size before we call clear, then if the new size is (say) twice the required size call trimToSize() (which will set capacity to 0). I also think a WALEdit should start with the kvs ArrayList of capacity 1 (rather than the default of 16). Or we could a probabilistic approach and reset the ArrayList with a probability proportional to the previous size. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809908#comment-13809908 ] Lars Hofhansl commented on HBASE-9865: -- Might actually be better to keep track of the maximum size seen as a proxy for the capacity. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809935#comment-13809935 ] Lars Hofhansl commented on HBASE-9865: -- Yet another trivial option is to replace kvs.clear() with kvs = new ArrayListKeyValue(length) in WALEdit.readFields. Which approach is better depends on the variance of the sizes of the WALEdits. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Attachments: 9865-sample.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)