[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010863#comment-15010863 ] Francesco Mari commented on OAK-3612: - The deadlock is more complicated than expected. It boils down to a call to {{SegmentWriter#flush}} performed while holding the implicit lock for the {{SegmentWriter}}. This deadlock is not easy to solve without going through a refactoring of the {{SegmentWriter}}, which should be the scope of OAK-1828. I run my tests using the less invasive solution proposed by [~alex.parvulescu], obtaining good results. The standby node is now able to delete the pre-compacted state when the cleanup operation is triggered through the JMX bean. {noformat} 18.11.2015 12:59:26.520 *INFO* [RMI TCP Connection(16)-10.132.4.207] org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup started. Current repository size is 1.2 GB (1249641472 bytes) 18.11.2015 12:59:26.773 *INFO* [RMI TCP Connection(16)-10.132.4.207] org.apache.jackrabbit.oak.plugins.segment.file.TarReader-GC Cleaned segments from data5a.tar: e724aad7-dbf4-441b-a451-b92f0516c156, 35e3c300-00ea-4477-a005-e37db181c664, fd130fc5-0b2c-432b-a9b8-6b6167ff8312, 9ed0e54d-e3ab-4ef3-ad15-136781fb4b6b, c4e93c7f-754d-4f6d-a9ff-51e5bac7acec, 0512412d-ea96-4eea-ac9d-8e401ea35914, c523dcd2-1e31-495e-a15f-0ceaee0e66ec, 4c54a6f4-3c12-41a2-ac25-9d45dff38fa8, edc6a4cf-c358-4fef-a043-90857135e760, beeafeda-3486-4931-a58e-c57e41e5c51c, 7e63ecc3-2ab5-407e-a86b-896ca1f7e895, 1dd480ee-efe0-4757-afec-1e290af3f8c9, 5e245021-67be-4266-af9c-347d5032e79c, ef99a42b-3c37-4ac3-ae50-0de8aa441d29, efbdd640-4c04-4ef7-a21b-4fc3db8ee96b, 0decb0ae-88bf-4897-a102-15f97e015084, 59ad4f38-3f70-44fc-a013-f259ebf1ca36, ab78d2db-426f-42dc-a03f-9f23e24b4c32, 6e4b1472-5af2-4a1e-a2d3-47a19c1bd109, 58f50b51-458e-4931-aca5-3a78f849a0b9, 68212eb1-c960-421c-a7cf-3e91ba9114f5, c52df7b9-93cc-421b-aa83-d70523ca3476 18.11.2015 12:59:26.783 *INFO* [RMI TCP Connection(16)-10.132.4.207] org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup marking file for deletion: data5a.tar 18.11.2015 12:59:26.783 *INFO* [RMI TCP Connection(16)-10.132.4.207] org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup completed in 263.8 ms (263 ms). Post cleanup size is 1.2 GB (1249584128 bytes) and space reclaimed 57.3 kB (57344 bytes). Compaction map weight/depth is 0 B/0 (0 bytes/0). {noformat} I will perform some more tests, but the fix seems to be effective so far. > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Fix For: 1.4 > > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006684#comment-15006684 ] Francesco Mari commented on OAK-3612: - Good catch, thanks for pointing it out. I will try to fix the deadlock, if possible, and run the integration tests again. If this results too complicated, I will opt for the simpler solution you suggested above. > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Fix For: 1.4 > > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006660#comment-15006660 ] Alex Parvulescu commented on OAK-3612: -- one problem with the patch: it introduces a deadlock [1]. run _SegmentCompactionIT_ with 10 writers for a bit to verify it's not just my setup, also see the existing warning in the FileStore about possible deadlocks when locking access to the tracker's writer [0]. I don't disagree with what you are saying, but changes like these have impacts that we cannot predict, so I'd rather approach this with a bit more caution. [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L752 [1] {code} Found one Java-level deadlock: = "RandomWriter": waiting to lock monitor 0x7f89ab006518 (object 0x0007000c73c8, a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter), which is held by "RandomWriter" "RandomWriter": waiting for ownable synchronizer 0x0007001311b8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "TarMK flush thread [target/SegmentCompactionIT4821636020606907980dir], active since Mon Nov 16 14:24:48 CET 2015, previous max duration 356ms" "TarMK flush thread [target/SegmentCompactionIT4821636020606907980dir], active since Mon Nov 16 14:24:48 CET 2015, previous max duration 356ms": waiting to lock monitor 0x7f89ab006518 (object 0x0007000c73c8, a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter), which is held by "RandomWriter" Java stack information for the threads listed above: === "RandomWriter": at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeTemplate(SegmentWriter.java:1157) - waiting to lock <0x0007000c73c8> (a org.apache.jackrabbit.oak.plugins.segment.SegmentWriter) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1326) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeAdded(SegmentWriter.java:1344) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:413) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352) at org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417) at org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341) at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.getNodeState(SegmentNodeBuilder.java:100) at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.updated(SegmentNodeBuilder.java:85) at org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.updated(MemoryNodeBuilder.java:214) at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.updated(SegmentNodeBuilder.java:81) at org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:346) at org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:337) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.addRandomNode(SegmentCompactionIT.java:550) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.modify(SegmentCompactionIT.java:501) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.access$4(SegmentCompactionIT.java:487) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter$1.call(SegmentCompactionIT.java:466) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter$1.call(SegmentCompactionIT.java:1) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.run(SegmentCompactionIT.java:446) at org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.call(Segmen
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006499#comment-15006499 ] Francesco Mari commented on OAK-3612: - I like your solution because it prevents me to make a change in oak-core, but on the other hand I think that it would be more correct to place that change where I initially proposed. I think that the {{SegmentTracker}} should be responsible for the management of the {{SegmentWriter}} it owns. {{SegmentTracker#clearCache}} looks like a natural hook to make sure that every piece of cache directly or indirectly maintained by the {{SegmentTracker}} is cleared. Moreover, the reference held by the {{SegmentWriter}} looks like a general problem of the Segment Store, and not a specific problem of the Standby Store. Thus, it would probably make more sense to make the fix in oak-core. > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Fix For: 1.4 > > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006379#comment-15006379 ] Alex Parvulescu commented on OAK-3612: -- right. I see what you mean now. I suggest you add the specific cleanup calls to the {{cleanup}} method on the standby [0] this should be easy enough without the need to change anything on the oak-core level. ie: {code} if (delegate instanceof FileStore) { try { delegate.getTracker().getWriter().dropCache(); tracker.clearCache(); ((FileStore) delegate).cleanup(); } catch (IOException e) { {code} [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-tarmk-standby/src/main/java/org/apache/jackrabbit/oak/plugins/segment/standby/store/StandbyStore.java#L242 > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Fix For: 1.4 > > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002086#comment-15002086 ] Francesco Mari commented on OAK-3612: - [~alex.parvulescu], the record cache in the {{SegmentWriter}} is actually cleaned up [here|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L1434] on successful compaction. Anyway, I can confirm that in the context of the Cold Standby it is important to have this change in place. I could convince myself by doing the following: 1. Generate garbage on the primary. 2. Take heap dump #1 on the standby. 3. Run compaction on the primary. 4. Take heap dump #2 on the standby. 5. Cleanup the standby using the JMX bean. 6. Take heap dump #3 on the standby. I use the heap dump #1 to retrieve the {{SegmentId}} of the current root state, and search for that same {{SegmentId}} in the heap dump #2. This allows me to find who references the old root state before cleanup is called on the standby. In my tests, I usually have the following references: - A reference from {{CacheLIRS}}. - A weak reference from {{SegmentTracker}}. Being a weak reference, this can be ignored for the purpose of our conversation. - A reference from {{SegmentWriter}}. On the standby, the cleanup phase is called without executing the compaction phase first. The {{FileStore#cleanup}} method takes care of clearing {{CacheLIRS}} (through {{SegmentTracker#clearCache}}). The only missing (non-weak) reference, that nobody takes care of, is the one from {{SegmentWriter}}. If you reproduce the steps above, you will see from the heap dump #3 that the only reference to the old {{SegmentId}} is precisely the one from {{SegmentWriter}}. > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Fix For: 1.4 > > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup
[ https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998551#comment-14998551 ] Alex Parvulescu commented on OAK-3612: -- bq. cache of the SegmentWriter (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not cleared before the cleanup phase it depends, we already clear the record cache on a successful compaction run [0], so this would only apply for the cases where compaction was skipped so I'm not exactly sure how much more efficient this will make the cleanup phase. any numbers to back this up? [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L1088 > The string/template cache of the SegmentWriter should be cleared before > cleanup > --- > > Key: OAK-3612 > URL: https://issues.apache.org/jira/browse/OAK-3612 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Reporter: Francesco Mari > Attachments: OAK-3612-01.patch > > > The string/template cache of the SegmentWriter > (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not > cleared before the cleanup phase. This might maintain in-memory references to > segments, thus preventing them to be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)