[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633238#comment-15633238 ] Michael Dürig commented on OAK-4966: Nice, I like the way {{GCMemoryBarrier.checkMemory()}} is implemented now! One small oversight: {{GCMemoryBarrier.getMemoryPool()}} should check {{MemoryPoolMXBean.isCollectionUsageThresholdSupported()}} instead of {{MemoryPoolMXBean.isUsageThresholdSupported()}}. Also I think the various accessors of the gc memory threshold should mention that this number is a percentage. And finally a bit of Javadoc on {{GCMemoryBarrier}} would be nice ;-) > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966-jmx-notification-v2.patch, > OAK-4966-jmx-notification.patch, OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632866#comment-15632866 ] Alex Parvulescu commented on OAK-4966: -- re. {{MemoryNotificationInfo.MEMORY_COLLECTION_THRESHOLD_EXCEEDED}} very interesting point. I've switched over to this notification which also removes our direct dependency to FELIX-5394. re. {{GCMemoryBarrier.checkMemory()}} yes unfortunately. there might be any number of listeners and any of them could change the threshold notification. while there's no guarantee that someone would not actually disable it, at the very least we can prepare against an overflow of uninteresting notifications. I've dug around a bit and found something that looks better re. the memory values (seems there's data attached to the notification itself we can use). In the current form it is expected that a listener is created for each compaction run, and not reused. if it does raise the flag, compaction will be canceled so I'm not really expecting this to be a dynamic barrier (flag should be reset on the next run only, and even if theoretically it could be reset from the {{checkMemory()}} method itself, the probability of this happening is pretty small). > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966-jmx-notification-v2.patch, > OAK-4966-jmx-notification.patch, OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632678#comment-15632678 ] Michael Dürig commented on OAK-4966: * Regarding {{MemoryListener}}: shouldn't it better react to {{MemoryNotificationInfo.MEMORY_COLLECTION_THRESHOLD_EXCEEDED}} instead of {{MemoryNotificationInfo.MEMORY_THRESHOLD_EXCEEDED}}?. IIUC the latter will receive notifications even though gc might be able to release most of the memory. This might lead to false positives. * Re. {{GCMemoryBarrier.checkMemory()}}: is this needed in this form? Isn't it sufficient to just look at {{MemoryPoolMXBean.getCollectionUsageThresholdCount()}}? It might even be unsafe to call {{checkMemory()}} from the listener as the listener is expected to be fast but {{MemoryPoolMXBean.getUsage()}} might not be. > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966-jmx-notification.patch, OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629325#comment-15629325 ] Alex Parvulescu commented on OAK-4966: -- sounds good, will do that then. but this also means this fix will have to wait for that bundle's release. > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629318#comment-15629318 ] Michael Dürig commented on OAK-4966: +1 > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629303#comment-15629303 ] Chetan Mehrotra commented on OAK-4966: -- bq. fixing the memoryusage plugin to allow multiple listeners. We can open an issue in Felix and get this fixed. So I would incline towards this approach > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629253#comment-15629253 ] Alex Parvulescu commented on OAK-4966: -- in an interesting turn of events, it seems setting the global usage threshold on the memory pool affects _all_ jmx clients of the pool. While testing out a patch, I accidentally triggered a heap dump from the {{org.apache.felix.webconsole.plugins.memoryusage}}: bq. *WARN* [Service Thread] org.apache.felix.webconsole.plugins.memoryusage Received Memory Threshold Exceeded Notification, dumping Heap it seems there's some overlap in features here, both competing for the same threshold setting. here it could be argued that in fact the {{memoryusage}} plugin needs to check if the notification falls under its own locally defined threshold or not instead of blindly dumping the heap on each event received. this would allow for registering another listener while keeping this one disabled (or coming up with a trick to even allow multiple listeners jointly registered on the same pool using the smallest threshold of all as a common ground). Our single purpose is to cancel compaction, I'm not sure dumping the entire heap is what we want to do here. I see 2 options: * reverting to a polling approach? * fixing the {{memoryusage}} plugin to allow multiple listeners. [~mduerig], [~chetanm] thoughts? [0] https://github.com/apache/felix/blob/trunk/webconsole-plugins/memoryusage/src/main/java/org/apache/felix/webconsole/plugins/memoryusage/internal/MemoryUsageSupport.java#L553 > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610939#comment-15610939 ] Michael Dürig commented on OAK-4966: Interesting. According to https://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryPoolMXBean.html we could use {{MEMORY_COLLECTION_THRESHOLD_EXCEEDED}} to get notified when memory usage is above a certain threshold after a JVM gc. > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610790#comment-15610790 ] Chetan Mehrotra commented on OAK-4966: -- bq. I'll provide a simpler version that only relies on polling available heap and comparing with a configurable threshold (10%?) stopping compaction if needed. [~alex.parvulescu] Also have a look at notification support from {{MemoryMXBean}}. See [1] for details on how it can be used. So instead of polling you can register a listener which gets notified if memory usage lever crosses certain threshold. [1] http://www.javaspecialists.eu/archive/Issue092.html > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608741#comment-15608741 ] Alex Parvulescu commented on OAK-4966: -- bq. I agree on the upfront vs. continuous checking of available heap. Let's start simple and improve when required. Let's keep an eye on this part during our testing. that's not totally correct though. if we only check upfront, before a first compaction node deduplication cache will be empty, so given an instance that has {{1GB}} or {{2GB}} of available heap, caches will spike to > {{2GB}} crashing the instance, so if we go with a % of available heap, it needs to be a continuous check. my proposal was to provide an estimate (even if higher that real) to stop early if there's not enough heap based on max size of all the existing caches. I'll provide a simpler version that only relies on polling available heap and comparing with a configurable threshold (10%?) stopping compaction if needed. > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608700#comment-15608700 ] Alex Parvulescu commented on OAK-4966: -- bq. My general concern here are all those estimates for memory consumption of nodes, templates and strings. They may hold for one repository but might be completely off for another one. this only applies for the strings cache. templates and _especially_ the node cache has a very simple structure based on recordids and computing a max weight based on configured size will not change from one repo to another. bq. Slight changes in the code might greatly affect how much memory e.g. a node needs. agreed, but I don't see the relation to the node deduplication cache. as long as recordids don't change this won't change, and full nodes don't come into play. bq. Also checking memory up front doesn't account for the case where a memory intensive operation is started on an instance where gc is already running. yes, this was already stated on the provided patch. the previous impl didn't do this either and as far as I know if did ok. if this has become a goal, I agree we can also add a polling component that can stop compaction if things get too tight in terms of resources. bq. What about a simpler (I think so) approach where we check the available memory say once a second during compaction. I would augment the existing code with such a check. the current heap needed calculation hints at a {{3GB}} heap with default values. if the instance doesn't have that, it doesn't even make sense to start compaction. besides using the cache weight for compaction or not, I think there is some value in estimating how big a cache can get based on existing data and improving on it without getting too crazy about perfection. simply stating a _number of items_ in a cache has not relation to how much heap it can consume and if we're not able to estimate then how would anyone else be able to? so I would actually keep the cache weight changes even if for display value only (as jmx). > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4966) Re-introduce a blocker for compaction based on available heap
[ https://issues.apache.org/jira/browse/OAK-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608650#comment-15608650 ] Michael Dürig commented on OAK-4966: My general concern here are all those estimates for memory consumption of nodes, templates and strings. They may hold for one repository but might be completely off for another one. Furthermore they are hard to maintain going forward. Slight changes in the code might greatly affect how much memory e.g. a node needs. Also checking memory up front doesn't account for the case where a memory intensive operation is started on an instance where gc is already running. What about a simpler (I think so) approach where we check the available memory say once a second during compaction. Once memory is below a certain threshold for say 10 subsequent checks, we just cancel gc? This should be quite simple to implement in {{FileStore.GarbageCollector.CancelCompactionSupplier}} similar to the way we check available disk space. > Re-introduce a blocker for compaction based on available heap > - > > Key: OAK-4966 > URL: https://issues.apache.org/jira/browse/OAK-4966 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4966.patch > > > As seen in a local test, running compaction on a tight heap can lead to > OOMEs. There used to be a best effort barrier against this situation 'not > enough heap for compaction', but we removed it with the compaction maps. > I think it makes sense to add it again based on the max size of some of the > caches: segment cache {{256MB}} by default [0] and some writer caches which > can go up to {{2GB}} all combined [1] and probably others I missed. > [0] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentCache.java#L48 > [1] > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/WriterCacheManager.java#L50 -- This message was sent by Atlassian JIRA (v6.3.4#6332)