[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814278#comment-13814278 ]
churro morales commented on HBASE-9865: --------------------------------------- One thing i noticed in WALEdit, we should be accounting for the ArrayList object as well instead of: {code} public long heapSize() { long ret = 0; {code} this would be correct, although it doesn't matter very much. {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; {code} If you didn't want to bleed the ArrayList implementation that WALEdit uses maybe something like this might work: For WALEdit {code} public void removeIf(Predicate<KeyValue> predicate) { for (int i = kvs.size()-1; i >= 0; i--) { KeyValue kv = kvs.get(i); if (predicate.apply(kv)) { kvs.remove(i); } } if (kvs.size() < size()/2) { kvs.trimToSize(); } } {code} And ReplicationSource would change to: {code} protected void removeNonReplicableEdits(WALEdit edit) { final NavigableMap<byte[], Integer> scopes = edit.getScopes(); edit.removeIf(new Predicate<KeyValue>() { @Override public boolean apply(KeyValue keyValue) { return scopes == null || !scopes.containsKey(keyValue.getFamily()); } }); } {code} I don't think it adds much by doing this but it is an alternative if we don't want to bleed that the WALEdit uses an ArrayList. > WALEdit.heapSize() is incorrect in certain replication scenarios which may > cause RegionServers to go OOM > -------------------------------------------------------------------------------------------------------- > > Key: HBASE-9865 > URL: https://issues.apache.org/jira/browse/HBASE-9865 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.5, 0.95.0 > Reporter: churro morales > Assignee: Lars Hofhansl > Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, > 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt > > > WALEdit.heapSize() is incorrect in certain replication scenarios which may > cause RegionServers to go OOM. > A little background on this issue. We noticed that our source replication > regionservers would get into gc storms and sometimes even OOM. > We noticed a case where it showed that there were around 25k WALEdits to > replicate, each one with an ArrayList of KeyValues. The array list had a > capacity of around 90k (using 350KB of heap memory) but had around 6 non null > entries. > When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a > WALEdit it removes all kv's that are scoped other than local. > But in doing so we don't account for the capacity of the ArrayList when > determining heapSize for a WALEdit. The logic for shipping a batch is > whether you have hit a size capacity or number of entries capacity. > Therefore if have a WALEdit with 25k entries and suppose all are removed: > The size of the arrayList is 0 (we don't even count the collection's heap > size currently) but the capacity is ignored. > This will yield a heapSize() of 0 bytes while in the best case it would be at > least 100000 bytes (provided you pass initialCapacity and you have 32 bit > JVM) > I have some ideas on how to address this problem and want to know everyone's > thoughts: > 1. We use a probabalistic counter such as HyperLogLog and create something > like: > * class CapacityEstimateArrayList implements ArrayList > ** this class overrides all additive methods to update the > probabalistic counts > ** it includes one additional method called estimateCapacity > (we would take estimateCapacity - size() and fill in sizes for all references) > * Then we can do something like this in WALEdit.heapSize: > > {code} > public long heapSize() { > long ret = ClassSize.ARRAYLIST; > for (KeyValue kv : kvs) { > ret += kv.heapSize(); > } > long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); > ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); > if (scopes != null) { > ret += ClassSize.TREEMAP; > ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); > // TODO this isn't quite right, need help here > } > return ret; > } > {code} > 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the > array originally, and we provide some percentage threshold. When that > threshold is met (50% of the entries have been removed) we can call > kvs.trimToSize() > 3. in the heapSize() method for WALEdit we could use reflection (Please don't > shoot me for this) to grab the actual capacity of the list. Doing something > like this: > {code} > public int getArrayListCapacity() { > try { > Field f = ArrayList.class.getDeclaredField("elementData"); > f.setAccessible(true); > return ((Object[]) f.get(kvs)).length; > } catch (Exception e) { > log.warn("Exception in trying to get capacity on ArrayList", e); > return kvs.size(); > } > {code} > I am partial to (1) using HyperLogLog and creating a > CapacityEstimateArrayList, this is reusable throughout the code for other > classes that implement HeapSize which contains ArrayLists. The memory > footprint is very small and it is very fast. The issue is that this is an > estimate, although we can configure the precision we most likely always be > conservative. The estimateCapacity will always be less than the > actualCapacity, but it will be close. I think that putting the logic in > removeNonReplicableEdits will work, but this only solves the heapSize problem > in this particular scenario. Solution 3 is slow and horrible but that gives > us the exact answer. > I would love to hear if anyone else has any other ideas on how to remedy this > problem? I have code for trunk and 0.94 for all 3 ideas and can provide a > patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)