[ https://issues.apache.org/jira/browse/OAK-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544731#comment-15544731 ]
Tomek Rękawek commented on OAK-4882: ------------------------------------ Thread dump, redacted by Chetan: A read blocks on cache load (multiple thread reading same value) {noformat} "X.X.X.X [1471910520681] GET /... HTTP/1.1" prio=10 tid=0x00007f9c9d296000 nid=0x6b80 waiting for monitor entry [0x00007f9d79058000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.access(CacheLIRS.java:884) - waiting to lock <0x00000004030760a8> (a org.apache.jackrabbit.oak.cache.CacheLIRS$Segment) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.get(CacheLIRS.java:867) at org.apache.jackrabbit.oak.cache.CacheLIRS.getIfPresent(CacheLIRS.java:362) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.getIfPresent(NodeCache.java:127) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.get(NodeCache.java:142) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.getNode(DocumentNodeStore.java:824) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeState.getChildNode(DocumentNodeState.java:253) {noformat} Then the thread which is loading is blocked on eviction {noformat} "X.X.X.X [1471910536795] GET /... HTTP/1.1" prio=10 tid=0x00007f9c9fafe800 nid=0x4437 waiting for monitor entry [0x00007f9dcddbf000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheActionDispatcher.add(CacheActionDispatcher.java:87) - waiting to lock <0x000000040293ada0> (a org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheActionDispatcher) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheWriteQueue.addPut(CacheWriteQueue.java:78) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.evicted(NodeCache.java:221) at org.apache.jackrabbit.oak.plugins.document.DocumentMK$Builder$1.evicted(DocumentMK.java:986) at org.apache.jackrabbit.oak.plugins.document.DocumentMK$Builder$1.evicted(DocumentMK.java:982) at org.apache.jackrabbit.oak.cache.CacheLIRS.evicted(CacheLIRS.java:217) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.evict(CacheLIRS.java:1214) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.put(CacheLIRS.java:1131) - locked <0x00000004030760a8> (a org.apache.jackrabbit.oak.cache.CacheLIRS$Segment) {noformat} Here CacheActionDispatcher.add is synchronized and has 6 threads waiting. Now the thread which has the lock is busy with eviction {noformat} "X.X.X.X [1471909807710] POST /... HTTP/1.1" prio=10 tid=0x00007f9c8d65b800 nid=0x5937 runnable [0x00007f9c43645000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.createEntry(HashMap.java:869) at java.util.HashMap.addEntry(HashMap.java:856) at java.util.HashMap.put(HashMap.java:484) at java.util.HashSet.add(HashSet.java:217) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheWriteQueue.addInvalidate(CacheWriteQueue.java:61) - locked <0x0000000402e522c0> (a org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheWriteQueue) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheActionDispatcher.cleanTheQueue(CacheActionDispatcher.java:103) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheActionDispatcher.add(CacheActionDispatcher.java:88) - locked <0x000000040293ada0> (a org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheActionDispatcher) at org.apache.jackrabbit.oak.plugins.document.persistentCache.async.CacheWriteQueue.addPut(CacheWriteQueue.java:78) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.evicted(NodeCache.java:221) at org.apache.jackrabbit.oak.plugins.document.DocumentMK$Builder$1.evicted(DocumentMK.java:986) at org.apache.jackrabbit.oak.plugins.document.DocumentMK$Builder$1.evicted(DocumentMK.java:982) at org.apache.jackrabbit.oak.cache.CacheLIRS.evicted(CacheLIRS.java:217) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.evict(CacheLIRS.java:1214) at org.apache.jackrabbit.oak.cache.CacheLIRS$Segment.put(CacheLIRS.java:1131) - locked <0x0000000402e55ad8> (a org.apache.jackrabbit.oak.cache.CacheLIRS$Segment) at org.apache.jackrabbit.oak.cache.CacheLIRS.put(CacheLIRS.java:258) at org.apache.jackrabbit.oak.cache.CacheLIRS.put(CacheLIRS.java:269) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.getIfPresent(NodeCache.java:133) at org.apache.jackrabbit.oak.plugins.document.persistentCache.NodeCache.get(NodeCache.java:142) {noformat} > Bottleneck in the asynchronous persistent cache > ----------------------------------------------- > > Key: OAK-4882 > URL: https://issues.apache.org/jira/browse/OAK-4882 > Project: Jackrabbit Oak > Issue Type: Bug > Components: cache, documentmk > Affects Versions: 1.5.10, 1.4.8 > Reporter: Tomek Rękawek > Fix For: 1.6 > > > The class responsible for accepting new cache operations which will be > handled asynchronously is > [CacheActionDispatcher|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/persistentCache/async/CacheActionDispatcher.java]. > In case of a high load, when the queue is full (=1024 entries), the > [add()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/persistentCache/async/CacheActionDispatcher.java#L86] > method removes the oldest 256 entries. However, we can't afford losing the > updates (as it may result in having stale entries in the cache), so all the > removed entries are compacted into one big invalidate action. > The compaction action > ([CacheActionDispatcher#cleanTheQueue|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/persistentCache/async/CacheActionDispatcher.java#L97]) > still holds the lock taken in add() method, so threads which tries to add > something to the queue have to wait until cleanTheQueue() ends. > Maybe we can optimise the CacheActionDispatcher#add->cleanTheQueue part, so > it won't hold the lock for the whole time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)