[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-18 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010863#comment-15010863
 ] 

Francesco Mari commented on OAK-3612:
-

The deadlock is more complicated than expected. It boils down to a call to 
{{SegmentWriter#flush}} performed while holding the implicit lock for the 
{{SegmentWriter}}. This deadlock is not easy to solve without going through a 
refactoring of the {{SegmentWriter}}, which should be the scope of OAK-1828.

I run my tests using the less invasive solution proposed by [~alex.parvulescu], 
obtaining good results. The standby node is now able to delete the 
pre-compacted state when the cleanup operation is triggered through the JMX 
bean.

{noformat}
18.11.2015 12:59:26.520 *INFO* [RMI TCP Connection(16)-10.132.4.207] 
org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup 
started. Current repository size is 1.2 GB (1249641472 bytes)
18.11.2015 12:59:26.773 *INFO* [RMI TCP Connection(16)-10.132.4.207] 
org.apache.jackrabbit.oak.plugins.segment.file.TarReader-GC Cleaned segments 
from data5a.tar: 
e724aad7-dbf4-441b-a451-b92f0516c156, 
35e3c300-00ea-4477-a005-e37db181c664, fd130fc5-0b2c-432b-a9b8-6b6167ff8312, 
9ed0e54d-e3ab-4ef3-ad15-136781fb4b6b, 
c4e93c7f-754d-4f6d-a9ff-51e5bac7acec, 
0512412d-ea96-4eea-ac9d-8e401ea35914, c523dcd2-1e31-495e-a15f-0ceaee0e66ec, 
4c54a6f4-3c12-41a2-ac25-9d45dff38fa8, 
edc6a4cf-c358-4fef-a043-90857135e760, 
beeafeda-3486-4931-a58e-c57e41e5c51c, 7e63ecc3-2ab5-407e-a86b-896ca1f7e895, 
1dd480ee-efe0-4757-afec-1e290af3f8c9, 
5e245021-67be-4266-af9c-347d5032e79c, 
ef99a42b-3c37-4ac3-ae50-0de8aa441d29, efbdd640-4c04-4ef7-a21b-4fc3db8ee96b, 
0decb0ae-88bf-4897-a102-15f97e015084, 
59ad4f38-3f70-44fc-a013-f259ebf1ca36, 
ab78d2db-426f-42dc-a03f-9f23e24b4c32, 6e4b1472-5af2-4a1e-a2d3-47a19c1bd109, 
58f50b51-458e-4931-aca5-3a78f849a0b9, 
68212eb1-c960-421c-a7cf-3e91ba9114f5, 
c52df7b9-93cc-421b-aa83-d70523ca3476
18.11.2015 12:59:26.783 *INFO* [RMI TCP Connection(16)-10.132.4.207] 
org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup 
marking file for deletion: data5a.tar
18.11.2015 12:59:26.783 *INFO* [RMI TCP Connection(16)-10.132.4.207] 
org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK GC #0: cleanup 
completed in 263.8 ms (263 ms). Post cleanup size is 1.2 GB (1249584128 bytes) 
and space reclaimed 57.3 kB (57344 bytes). Compaction map weight/depth is 0 B/0 
(0 bytes/0).
{noformat}

I will perform some more tests, but the fix seems to be effective so far.

> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Fix For: 1.4
>
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-16 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006684#comment-15006684
 ] 

Francesco Mari commented on OAK-3612:
-

Good catch, thanks for pointing it out. I will try to fix the deadlock, if 
possible, and run the integration tests again. If this results too complicated, 
I will opt for the simpler solution you suggested above.

> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Fix For: 1.4
>
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-16 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006660#comment-15006660
 ] 

Alex Parvulescu commented on OAK-3612:
--

one problem with the patch: it introduces a deadlock [1]. run 
_SegmentCompactionIT_ with 10 writers for a bit to verify it's not just my 
setup, also see the existing warning in the FileStore about possible deadlocks 
when locking access to the tracker's writer [0].

I don't disagree with what you are saying, but changes like these have impacts 
that we cannot predict, so I'd rather approach this with a bit more caution.

[0]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L752
[1]
{code}
Found one Java-level deadlock:
=
"RandomWriter":
  waiting to lock monitor 0x7f89ab006518 (object 0x0007000c73c8, a 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter),
  which is held by "RandomWriter"
"RandomWriter":
  waiting for ownable synchronizer 0x0007001311b8, (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
  which is held by "TarMK flush thread 
[target/SegmentCompactionIT4821636020606907980dir], active since Mon Nov 16 
14:24:48 CET 2015, previous max duration 356ms"
"TarMK flush thread [target/SegmentCompactionIT4821636020606907980dir], active 
since Mon Nov 16 14:24:48 CET 2015, previous max duration 356ms":
  waiting to lock monitor 0x7f89ab006518 (object 0x0007000c73c8, a 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter),
  which is held by "RandomWriter"

Java stack information for the threads listed above:
===
"RandomWriter":
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeTemplate(SegmentWriter.java:1157)
- waiting to lock <0x0007000c73c8> (a 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1326)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeAdded(SegmentWriter.java:1344)
at 
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:413)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352)
at 
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352)
at 
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter$2.childNodeChanged(SegmentWriter.java:1352)
at 
org.apache.jackrabbit.oak.plugins.memory.ModifiedNodeState.compareAgainstBaseState(ModifiedNodeState.java:417)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentWriter.writeNode(SegmentWriter.java:1341)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.getNodeState(SegmentNodeBuilder.java:100)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.updated(SegmentNodeBuilder.java:85)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.updated(MemoryNodeBuilder.java:214)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder.updated(SegmentNodeBuilder.java:81)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:346)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:337)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.addRandomNode(SegmentCompactionIT.java:550)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.modify(SegmentCompactionIT.java:501)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.access$4(SegmentCompactionIT.java:487)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter$1.call(SegmentCompactionIT.java:466)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter$1.call(SegmentCompactionIT.java:1)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.run(SegmentCompactionIT.java:446)
at 
org.apache.jackrabbit.oak.plugins.segment.SegmentCompactionIT$RandomWriter.call(Segmen

[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-16 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006499#comment-15006499
 ] 

Francesco Mari commented on OAK-3612:
-

I like your solution because it prevents me to make a change in oak-core, but 
on the other hand I think that it would be more correct to place that change 
where I initially proposed.

I think that the {{SegmentTracker}} should be responsible for the management of 
the {{SegmentWriter}} it owns. {{SegmentTracker#clearCache}} looks like a 
natural hook to make sure that every piece of cache directly or indirectly 
maintained by the {{SegmentTracker}} is cleared.

Moreover, the reference held by the {{SegmentWriter}} looks like a general 
problem of the Segment Store, and not a specific problem of the Standby Store. 
Thus, it would probably make more sense to make the fix in oak-core.

> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Fix For: 1.4
>
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-16 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006379#comment-15006379
 ] 

Alex Parvulescu commented on OAK-3612:
--

right. I see what you mean now. I suggest you add the specific cleanup calls to 
the {{cleanup}} method on the standby [0]
this should be easy enough without the need to change anything on the oak-core 
level. ie:
{code}
if (delegate instanceof FileStore) {
  try {
delegate.getTracker().getWriter().dropCache();
tracker.clearCache();
((FileStore) delegate).cleanup();
  } catch (IOException e) {
{code}




[0] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-tarmk-standby/src/main/java/org/apache/jackrabbit/oak/plugins/segment/standby/store/StandbyStore.java#L242

> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Fix For: 1.4
>
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-12 Thread Francesco Mari (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002086#comment-15002086
 ] 

Francesco Mari commented on OAK-3612:
-

[~alex.parvulescu], the record cache in the {{SegmentWriter}} is actually 
cleaned up 
[here|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L1434]
 on successful compaction.

Anyway, I can confirm that in the context of the Cold Standby it is important 
to have this change in place. I could convince myself by doing the following:

1. Generate garbage on the primary.
2. Take heap dump #1 on the standby.
3. Run compaction on the primary.
4. Take heap dump #2 on the standby.
5. Cleanup the standby using the JMX bean.
6. Take heap dump #3 on the standby.

I use the heap dump #1 to retrieve the {{SegmentId}} of the current root state, 
and search for that same {{SegmentId}} in the heap dump #2. This allows me to 
find who references the old root state before cleanup is called on the standby. 
In my tests, I usually have the following references:

- A reference from {{CacheLIRS}}.
- A weak reference from {{SegmentTracker}}. Being a weak reference, this can be 
ignored for the purpose of our conversation.
- A reference from {{SegmentWriter}}.

On the standby, the cleanup phase is called without executing the compaction 
phase first. The {{FileStore#cleanup}} method takes care of clearing 
{{CacheLIRS}} (through {{SegmentTracker#clearCache}}). The only missing 
(non-weak) reference, that nobody takes care of, is the one from 
{{SegmentWriter}}. If you reproduce the steps above, you will see from the heap 
dump #3 that the only reference to the old {{SegmentId}} is precisely the one 
from {{SegmentWriter}}.


> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Fix For: 1.4
>
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3612) The string/template cache of the SegmentWriter should be cleared before cleanup

2015-11-10 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998551#comment-14998551
 ] 

Alex Parvulescu commented on OAK-3612:
--

bq. cache of the SegmentWriter 
(org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
cleared before the cleanup phase
it depends, we already clear the record cache on a successful compaction run 
[0], so this would only apply for the cases where compaction was skipped so I'm 
not exactly sure how much more efficient this will make the cleanup phase. any 
numbers to back this up?



[0] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/file/FileStore.java#L1088

> The string/template cache of the SegmentWriter should be cleared before 
> cleanup
> ---
>
> Key: OAK-3612
> URL: https://issues.apache.org/jira/browse/OAK-3612
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Francesco Mari
> Attachments: OAK-3612-01.patch
>
>
> The string/template cache of the SegmentWriter 
> (org.apache.jackrabbit.oak.plugins.segment.SegmentWriter#records) is not 
> cleared before the cleanup phase. This might maintain in-memory references to 
> segments, thus preventing them to be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)