[jira] [Commented] (CASSANDRA-15153) Ensure Caffeine cache does not return stale entries
[ https://issues.apache.org/jira/browse/CASSANDRA-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416868#comment-17416868 ] Ben Manes commented on CASSANDRA-15153: --- Sorry for the bug here. There are so many moving parts that I probably confused myself when writing that original and obviously bad code.. I'll review the test cases to make sure that this metadata is covered and not just fixed. Please do try to keep updated to recent versions for bug fixes. > Ensure Caffeine cache does not return stale entries > --- > > Key: CASSANDRA-15153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15153 > Project: Cassandra > Issue Type: Bug > Components: Feature/Authorization >Reporter: Per Otterström >Assignee: Aleksei Zotov >Priority: Normal > Labels: security > Fix For: 4.0.2, 4.1 > > > Version 2.3.5 of the Caffeine cache that we're using in various places can > hand out stale entries in some cases. This seem to happen when an update > fails repeatedly, in which case Caffeine may return a previously loaded > value. For instance, the AuthCache may hand out permissions even though the > reload operation is failing, see CASSANDRA-15041. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16154) OOM Error (Direct buffer memory) during intensive reading from large SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240198#comment-17240198 ] Ben Manes edited comment on CASSANDRA-16154 at 11/29/20, 9:05 AM: -- Caffeine uses a write buffer to replay hash table modifications against the eviction policy. This avoids having every writer thread synchronizing on a single lock to perform an eviction, which could become a source of bottleneck. Typically a small burst of writes can be absorbed into this buffer and immediately drained, costing roughly the same as an unbounded ConcurrentHashMap. If the buffer is filled, e.g. due to a stress test, then writers are descheduled by taking the eviction lock directly. This provides backpressure and avoids priority inversion. The cache may grow larger than the maximum size, up to the size of the write buffer. The buffer size starts small but can resize to grow up to 128 * pow2(NCPUs). For a 8 core server that would be up to 1024 scheduled policy inserts. Since the write buffer is inserted into after the hashmap operation, for an 8K entry cache to reach 50K entries would require as 512 core machine or similarly 40K blocked threads all inserting concurrently and blocked on adding to the buffer. As is required by the GC, here too we assume that the JVM is not so severely memory limited that it cannot allow for some slack in order to cope with the system load. If the cache entries are of a reasonable size, here stated as 64kb, then that is 64mb worst case slack on an 8 core machine. This logic can be found in {{BoundedLocalCache#afterWrite}}, described in the class's implementation doc, and the {{Caffeine}} builder specifies that, {{Note that the cache may evict an entry before this limit is exceeded or temporarily exceed the threshold while evicting.}} You can verify the limit is honored by running the [Stresser|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/test/java/com/github/benmanes/caffeine/cache/Stresser.java] test case, which uses synthetic load to observe if the cache has runaway growth. An {{OutOfMemoryError}} is a heisenbug in that future allocations fail due to behavior else where consuming the resources. Instead of using a stacktrace, a heap dump offers the best path for finding the root cause. was (Author: ben.manes): Caffeine uses a write buffer to replay hash table modifications against the eviction policy. This avoids having every writer thread synchronizing on a single lock to perform an eviction, which could become a source of bottleneck. Typically a small burst of writes can be absorbed into this buffer and immediately drained, costing roughly the same as an unbounded ConcurrentHashMap. If the buffer is filled, e.g. due to a stress test, then writers are descheduled by taking the eviction lock directly. This provides backpressure and avoids priority inversion. The cache may grow larger than the maximum size, up to the size of the write buffer. The buffer size starts small but can resize to grow up to 128 * pow2(NCPUs). For a 8 core server that would be up to 1024 scheduled policy inserts. Since the write buffer is inserted into after the hashmap operation, for an 8K entry cache to reach 50K entries would require as 512 core machine or similarly 40K blocked all inserting concurrently and blocked on adding to the buffer. As is required by the GC, here too we assume that the JVM is not so severely memory limited that it cannot allow for some slack in order to cope with the system load. If the cache entries are of a reasonable size, here stated as 64kb, then that is 64mb worst case slack on an 8 core machine. This logic can be found in {{BoundedLocalCache#afterWrite}}, described in the class's implementation doc, and the {{Caffeine}} builder specifies that, {{Note that the cache may evict an entry before this limit is exceeded or temporarily exceed the threshold while evicting.}} You can verify the limit is honored by running the [Stresser|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/test/java/com/github/benmanes/caffeine/cache/Stresser.java] test case, which uses synthetic load to observe if the cache has runaway growth. An {{OutOfMemoryError}} is a heisenbug in that future allocations fail due to behavior else where consuming the resources. Instead of using a stacktrace, a heap dump offers the best path for finding the root cause. > OOM Error (Direct buffer memory) during intensive reading from large SSTables > - > > Key: CASSANDRA-16154 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16154 > Project: Cassandra > Issue Type: Bug >Reporter: Vygantas Gedgaudas >Priority: Normal > > Hello, > We have a certain database, from
[jira] [Commented] (CASSANDRA-16154) OOM Error (Direct buffer memory) during intensive reading from large SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240198#comment-17240198 ] Ben Manes commented on CASSANDRA-16154: --- Caffeine uses a write buffer to replay hash table modifications against the eviction policy. This avoids having every writer thread synchronizing on a single lock to perform an eviction, which could become a source of bottleneck. Typically a small burst of writes can be absorbed into this buffer and immediately drained, costing roughly the same as an unbounded ConcurrentHashMap. If the buffer is filled, e.g. due to a stress test, then writers are descheduled by taking the eviction lock directly. This provides backpressure and avoids priority inversion. The cache may grow larger than the maximum size, up to the size of the write buffer. The buffer size starts small but can resize to grow up to 128 * pow2(NCPUs). For a 8 core server that would be up to 1024 scheduled policy inserts. Since the write buffer is inserted into after the hashmap operation, for an 8K entry cache to reach 50K entries would require as 512 core machine or similarly 40K blocked all inserting concurrently and blocked on adding to the buffer. As is required by the GC, here too we assume that the JVM is not so severely memory limited that it cannot allow for some slack in order to cope with the system load. If the cache entries are of a reasonable size, here stated as 64kb, then that is 64mb worst case slack on an 8 core machine. This logic can be found in {{BoundedLocalCache#afterWrite}}, described in the class's implementation doc, and the {{Caffeine}} builder specifies that, {{Note that the cache may evict an entry before this limit is exceeded or temporarily exceed the threshold while evicting.}} You can verify the limit is honored by running the [Stresser|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/test/java/com/github/benmanes/caffeine/cache/Stresser.java] test case, which uses synthetic load to observe if the cache has runaway growth. An {{OutOfMemoryError}} is a heisenbug in that future allocations fail due to behavior else where consuming the resources. Instead of using a stacktrace, a heap dump offers the best path for finding the root cause. > OOM Error (Direct buffer memory) during intensive reading from large SSTables > - > > Key: CASSANDRA-16154 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16154 > Project: Cassandra > Issue Type: Bug >Reporter: Vygantas Gedgaudas >Priority: Normal > > Hello, > We have a certain database, from when we are reading intensively leads to the > following OOM error: > {noformat} > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_212] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > ~[na:1.8.0_212] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_212] > at > org.apache.cassandra.utils.memory.BufferPool.allocate(BufferPool.java:110) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.memory.BufferPool.access$1000(BufferPool.java:46) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.memory.BufferPool$LocalPool.allocate(BufferPool.java:407) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:334) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:122) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:94) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:155) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:2949) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$15(BoundedLocalCache.java:1807) > ~[caffeine-2.2.6.jar:na] > at > java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) > ~[na:1.8.0_212] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1805) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1788) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97) > ~[caffeine-2.2.6.jar:na] > at >
[jira] [Commented] (CASSANDRA-15177) Reloading of auth caches happens on the calling thread
[ https://issues.apache.org/jira/browse/CASSANDRA-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871194#comment-16871194 ] Ben Manes commented on CASSANDRA-15177: --- Oh of course. There was a request at some point to not do that, so I was mentioning the trick in case if you preferred the mixed behavior. > Reloading of auth caches happens on the calling thread > -- > > Key: CASSANDRA-15177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15177 > Project: Cassandra > Issue Type: Bug > Components: Feature/Authorization >Reporter: Sam Tunnicliffe >Priority: Normal > > When Guava caches were replaced by their Caffeine equivalents in > CASSANDRA-10855, the async reloading of stale AuthCache entries was lost due > to the use of {{MoreExecutors.directExecutor()}} to provide the delegate > executor. Under normal conditions, we can expect these operations to be > relatively expensive, and in failure scenarios where replicas for the auth > data are DOWN this will greatly increase latency, so they shouldn’t be done > on threads servicing requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15177) Reloading of auth caches happens on the calling thread
[ https://issues.apache.org/jira/browse/CASSANDRA-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870788#comment-16870788 ] Ben Manes commented on CASSANDRA-15177: --- If you implement CacheLoader, you can override asyncReload to supply the refresh future tied to a different executor. > Reloading of auth caches happens on the calling thread > -- > > Key: CASSANDRA-15177 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15177 > Project: Cassandra > Issue Type: Bug > Components: Feature/Authorization >Reporter: Sam Tunnicliffe >Priority: Normal > > When Guava caches were replaced by their Caffeine equivalents in > CASSANDRA-10855, the async reloading of stale AuthCache entries was lost due > to the use of {{MoreExecutors.directExecutor()}} to provide the delegate > executor. Under normal conditions, we can expect these operations to be > relatively expensive, and in failure scenarios where replicas for the auth > data are DOWN this will greatly increase latency, so they shouldn’t be done > on threads servicing requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13445) validation executor thread is stuck
[ https://issues.apache.org/jira/browse/CASSANDRA-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968288#comment-15968288 ] Ben Manes commented on CASSANDRA-13445: --- Perhaps this was probably supposed to be {{!=}}, since {{reference()}} increasing the count or returns {{null}} if zero? {code} do buf = cache.get(new Key(source, pageAlignedPos)).reference(); while (buf == null); {code} > validation executor thread is stuck > --- > > Key: CASSANDRA-13445 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13445 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: cassandra 3.10 >Reporter: Roland Otta > > we have the following issue on our 3.10 development cluster. > sometimes the repairs (it is a full repair in that case) hang because > of a stuck validation compaction. > nodetool compactionstats says > a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation bds ad_event > 805955242 841258085 bytes 95.80% > and there is no more progress at this percentage. > i checked the logs on the affected node and could not find any > suspicious errors. > a thread dump shows that the validation executor threads is always repeating > stuff in > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:235) > here is the full stack trace > {noformat} > com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$64/2098345091.accept(Unknown > Source) > com.github.benmanes.caffeine.cache.BoundedBuffer$RingBuffer.drainTo(BoundedBuffer.java:104) > com.github.benmanes.caffeine.cache.StripedBuffer.drainTo(StripedBuffer.java:160) > com.github.benmanes.caffeine.cache.BoundedLocalCache.drainReadBuffer(BoundedLocalCache.java:964) > com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(BoundedLocalCache.java:918) > com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(BoundedLocalCache.java:903) > com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run(BoundedLocalCache.java:2680) > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(BoundedLocalCache.java:875) > com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRead(BoundedLocalCache.java:748) > com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1783) > com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97) > com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66) > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:235) > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:213) > org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65) > org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:59) > org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:88) > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66) > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:420) > org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245) > org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(UnfilteredSerializer.java:610) > org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$deserializeRowBody$1(UnfilteredSerializer.java:575) > org.apache.cassandra.db.rows.UnfilteredSerializer$$Lambda$84/898489541.accept(Unknown > Source) > org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1222) > org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1177) > org.apache.cassandra.db.Columns.apply(Columns.java:377) > org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:571) > org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(UnfilteredSerializer.java:440) > org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:95) > org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:73) > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:122) > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100) > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
[jira] [Commented] (CASSANDRA-13445) validation executor thread is stuck
[ https://issues.apache.org/jira/browse/CASSANDRA-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968065#comment-15968065 ] Ben Manes commented on CASSANDRA-13445: --- If you can make a reproducible unit test that would help. The lambda should be non-blocking (the onAccess(node) method) as it only increments a counter and reorders in a linked list. Those data structures are not concurrent and have no blocking behavior. The other possibility is it is infinitely looping in BoundedBuffer because somehow the head overlapped the tail index (a single-consumer / multi-producer queue). But the loop breaks if it reads a null slot, assuming the entry isn't visible yet, so that race would be benign if discovered. So glancing at the code nothing stands out and a failing unit test would be very helpful. > validation executor thread is stuck > --- > > Key: CASSANDRA-13445 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13445 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: cassandra 3.10 >Reporter: Roland Otta > > we have the following issue on our 3.10 development cluster. > sometimes the repairs (it is a full repair in that case) hang because > of a stuck validation compaction. > nodetool compactionstats says > a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation bds ad_event > 805955242 841258085 bytes 95.80% > and there is no more progress at this percentage. > i checked the logs on the affected node and could not find any > suspicious errors. > a thread dump shows that the validation executor threads is always repeating > stuff in > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:235) > here is the full stack trace > {noformat} > com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$64/2098345091.accept(Unknown > Source) > com.github.benmanes.caffeine.cache.BoundedBuffer$RingBuffer.drainTo(BoundedBuffer.java:104) > com.github.benmanes.caffeine.cache.StripedBuffer.drainTo(StripedBuffer.java:160) > com.github.benmanes.caffeine.cache.BoundedLocalCache.drainReadBuffer(BoundedLocalCache.java:964) > com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(BoundedLocalCache.java:918) > com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(BoundedLocalCache.java:903) > com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run(BoundedLocalCache.java:2680) > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > com.github.benmanes.caffeine.cache.BoundedLocalCache.scheduleDrainBuffers(BoundedLocalCache.java:875) > com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRead(BoundedLocalCache.java:748) > com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1783) > com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97) > com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66) > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:235) > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:213) > org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65) > org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:59) > org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:88) > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:66) > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:420) > org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245) > org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(UnfilteredSerializer.java:610) > org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$deserializeRowBody$1(UnfilteredSerializer.java:575) > org.apache.cassandra.db.rows.UnfilteredSerializer$$Lambda$84/898489541.accept(Unknown > Source) > org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1222) > org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1177) > org.apache.cassandra.db.Columns.apply(Columns.java:377) > org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:571) > org.apache.cassandra.db.rows.UnfilteredSerializer.deserialize(UnfilteredSerializer.java:440) > org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:95) > org.apache.cassandra.io.sstable.SSTableSimpleIterator$CurrentFormatIterator.computeNext(SSTableSimpleIterator.java:73) >
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789855#comment-15789855 ] Ben Manes commented on CASSANDRA-10855: --- LGTM > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789277#comment-15789277 ] Ben Manes commented on CASSANDRA-10855: --- Good observation [~snazy]. Yes, RE is propagated as is and will never be wrapped. A checked exception will be wrapped with a CompletionException, the unchecked version of ExecutionException. (Guava uses its own UncheckedExecutionException) > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756924#comment-15756924 ] Ben Manes commented on CASSANDRA-10855: --- I think it's on your plate. https://github.com/snazy/cassandra/pull/1 > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642116#comment-15642116 ] Ben Manes commented on CASSANDRA-10855: --- The penalty is small. In CLHM (and Guava) we didn't have a system executor to exploit. The same approach is used of amortizing the maintenance work by buffering and replaying operations, instead of locking to perform them immediately. There is a slightly higher cost due to hashing for the {{CountMinSketch}} but overall its tiny. Using a direct executor should be fine. Please read the [HighScalability article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html] for an overview of internals. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642095#comment-15642095 ] Ben Manes commented on CASSANDRA-10855: --- {quote} I've added .executor(MoreExecutors.directExecutor()) - hope I got your suggestion right. {quote} This is good for unit tests to remove asynchronous behavior. My preference is to not use it in production, especially where latencies matter, by not penalizing callers with maintenance or removal notification work. Instead deferring that to FJP should help minimize response times, which I think would be your preference too. I'm not familiar enough with Cassandra's testing to know whether its trivial to flag the executor. Usually its pretty trivial, especially when DI like Guice is used. {quote} There are a couple of cache.asMap() calls. Would it be an option to eagerly create the AsMapView, Values and EntrySet instances in LocalAsyncLoadingCache to get around the ternaries in asMap(), values(), entrySet() and keySet? {quote} {{LocalAsyncLoadingCache}} isn't used by Cassandra (a cache that returns {{CompletableFuture}}. Given the ternaries are null checks to lazily create views, as is common in the Java Collections, I don't think its a measurable penalty to keep. {quote} Do you have some micro-benchmarks in place to actually test against the previous implementation(s)? {quote} For concurrent throughput (JMH) see these [benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks]. They show a refinements over CLHM, with a primary benefit in write. Since the cache now supports memoization, the Cassandra APIs might benefit from using a computation instead of racy _get-load-put_ calls. For hit rates see these [simulations|https://github.com/ben-manes/caffeine/wiki/Efficiency]. They show W-TinyLFU improves upon LRU by taking into account frequency. I'll send a PR to your branch when I get a chance to go through the rest of the comments. Thanks! > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621068#comment-15621068 ] Ben Manes commented on CASSANDRA-10855: --- Released 2.3.4 with JCTools fix. Please revisit. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602688#comment-15602688 ] Ben Manes commented on CASSANDRA-10855: --- The test failure might be due to delegating maintenance work (e.g. writes triggering an eviction) to an executor. CLHM and Guava amortized this on the calling threads, whereas Caffeine tries to hide it on ForkJoinPool to minimize user-facing latencies. By setting Caffeine.executor(Runnable::run) it will behave similar to its predecessors, ideally set only in tests for predictability. Alternatively calling cache.cleanUp() prior to inspecting is another easy alternative. I don't want to delay this again, but would like to ensure Cassandra updates to the upcoming 2.3.4 release. There is a difficult to trigger [race|https://github.com/ben-manes/caffeine/issues/127] in JCTools that we fixed over the weekend. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569354#comment-15569354 ] Ben Manes commented on CASSANDRA-10855: --- Thanks. Fixed by updating the build.xml and removing the old jar. That explodes the patch due to the jars checked into the repo. The linked PR mirrors it. I'd also like to pair on evaluating TinyLFU for OHC, but would like to see this go in before we jump into that. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated CASSANDRA-10855: -- Attachment: CASSANDRA-10855.patch > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch, CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568181#comment-15568181 ] Ben Manes commented on CASSANDRA-10855: --- I rebased and updated the jar in the PR. It's the same as our previous discussion. The upgrade is maintenance improvements since 2.2.6 in use. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated CASSANDRA-10855: -- Attachment: CASSANDRA-10855.patch > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > Attachments: CASSANDRA-10855.patch > > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated CASSANDRA-10855: -- Status: Patch Available (was: Open) > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381078#comment-15381078 ] Ben Manes commented on CASSANDRA-10855: --- Any remaining blockers? > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275473#comment-15275473 ] Ben Manes commented on CASSANDRA-10855: --- [~blambov] [integrated | https://github.com/apache/cassandra/commit/30bb255ec9fb36ace2aab51474bd3bfb9bbd3bed] Caffeine (v2.2.6) for the chunk cache ([CASSANDRA-5863 | https://issues.apache.org/jira/browse/CASSANDRA-5863]). He included an analysis demonstrated good performance and hit rates. Thanks! Note that it was in version 2.2.7 that, thanks to [~blambov] and [~benedict], we added strong protection against HashDOS attacks. I am currently exploring an adaptive version of the policy that improves its hit rate for small, recency-skewed caches. This would also naturally resolve the attack without needing our protection scheme. The chunk cache uses a same-thread executor to delegate the maintenance work to. While there might be a performance gain of using ForkJoinPool#commonPool (default), this was also a very wise choice. [Druid | http://druid.io/] was recently struck by [JDK-8078490 | https://bugs.openjdk.java.net/browse/JDK-8078490] where a race in 8u40 - 8u60 causes the pool to not execute the task. With [~blambov]'s work proving the benefits, can we move this forward for the remaining caches? Afterwards I'd like to work with [~snazy] on adding the policy to OHC to improve Cassandra's hit rates there too. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261535#comment-15261535 ] Ben Manes commented on CASSANDRA-11452: --- Branimir, I assume you're referring to preferring the candidate on equality. It is probably my fault that Roy left it out, as I likely forgot to emphasize your observation. It does have a negative impact on the LIRS traces, such as halving the hit rate of glimpse (analytical) from 34% => 16%. Benedict, since I'm hesitant to start down the path of direct hash table access it seems like a natural solution for OHC. There is always going to be a limit where being on-heap makes no sense, but it has been a nice place to explore algorithms. OHC uses a custom hash table, I believe because using CLHM with off-heap values had too much GC overhead in Cassandra's very large caches. I think the biggest win will come from leverage what we've learned into improving OHC and the custom non-concurrent cache for Cassandra's thread-per-core redesign. Does anyone know what our next steps are for moving CASSANDRA-10855 forward? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255723#comment-15255723 ] Ben Manes commented on CASSANDRA-11452: --- Chatted with Roy and below is his analysis. I tried simulating his gratuitous access idea which had a positive impact on multi3, a negative impact on DS1 (database), and negligible impact on all other traces. It seems he dislikes the probability approach in the code above, while I dislike the additional space of (6). Longer term, I'd like to devise a way to adaptively size the admission window. I think if done right the strategy could help alleviate or resolve the attack. My current idea is to use a bloom filter that is reset on with small sample period, and captures the candidates that were rejected by TinyLFU. If multiple rejections for a key are observed in a sample, then it is admitted and the window size is increased. If not observed in a larger sample (e.g. 3x) then the window is decreased. This would follow some ARC-like equation with a min/max for the window. The admission of the candidate would resolve this attack, which would only result in making the window oversized for a workload. Here is my summary. Assumptions: 1) An oblivious adversary that does not see what all items in the cache are and does not control which items get in the cache; its only way if impacting what's in the cache is by issuing requests for items by itself. 2) Caffeine uses the hash code of an object as its identifier for the frequency histogram management, so if two items have the same hash code, they will access the same locations in CMS/CBF since the respective hash functions will be computed on the objects' hash code. Attack: The attacker can create two items O1 and O2 whose hash code is the same. It issues frequent requests for both objects until both are admitted to the cache. Once this happens, the attacker stops issuing requests for O2 but continues generating enough requests for O1 such that it remains one of the most frequent items. As O2 is no longer being requested, it gets demoted to the bottom of the LRU/SLRU eviction mechanism, so from some point on, it is always the chosen cache victim. However, due to the hash code collision and the high frequency count of O1, TinyLFU will never evict O2 and at that point the cache is completely nullified. Notice, the attacker needs to construct two objects whose hash code will collide, or at least all hash values of the CMS will collide. In the other direction, it is not enough to find to values whose CMS hash values collide – one should create objects whose hash code will match these. Possible solutions: 1. A domain specific solution: Use another identifier, e.g., in a youtube cache, one can use the video identifier, which google ensures is unique. 2. Java enhancement solution: One of the main problems with Java hash codes is that they are only 32 bits long. It may be possible to enhance it to 64 bits, but this is not realistic to expect. 3. Jitter: Instead of always replacing the least recently used item, pick one of the lowest k. The problem with this solution is that in order not to hurt performance, k should be kept small. However, in this case, the attacker can nullify 1/k of the with no additional effort, or the entire cache with k times the effort. 4. Random eviction: As the name suggests, use random eviction instead of LRU. The problem is that in some workloads, this would hurt performance. 5. Circular eviction: Each time evict the next position in the LRU queue rather than the last one. Has same impact as random, but faster to implement. Same as in random. 6. Ignore the filter on cache collisions: If there is a hash-code collision on the cache victim, we ignore TinyLFU. Impact on performance probably marginal since the chance for unintentional collision is small, and would definitely not show up in the traces. But, how do we detect this? Do we maintain a non-counting TinyTable for this storing the entire hash code? This adds space. On the other hand, the cache probably maintains in any case a map of all items – so we can build on this, in which case it simply costs another lookup. I would not worry about false positives here since in a naïve trace, the chances are very small, and azthe chances that both are heavy-hitters is negligible, so in the worst case, we will suffer one extra miss on the entire trace. 7. Count limit: If TinyLFU rejects replacement for k consecutive attempts, ignore TinyLFU. But what should K be? Also, incurs a small performance impact. 8. Probabilistically ignoring TinyLFU with some small probability. Same as above. Also, the attacker could repeat his trick each time his item gets evicted, hurting the performance. 9. Gratuitous access. If the cache victim wins too many times in the LFU filter test, we make a gratuitous access to this
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244885#comment-15244885 ] Ben Manes commented on CASSANDRA-10855: --- We added protection against a HashDoS attack, thanks for Branimir's and Benedict's help. I upgraded the pull request to use this version. I think we're good to merge. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244463#comment-15244463 ] Ben Manes commented on CASSANDRA-11452: --- I checked in our fix and I'll release after we hear back from Roy. I'll forward that along here and might revisit the fix based on his feedback. When the release is out then I'll update my pull request and the notify everyone on the task so we can look at moving it forward. Thanks a lot for all your help on this =) > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244406#comment-15244406 ] Ben Manes commented on CASSANDRA-11452: --- It definitely would be nice to be able to reduce the per-entry cost, have access to the hash, avoid lambdas by inlining into the code. I kept hoping Doug would take a stab at it and see what ideas he'd use. bq. It's a shame we don't have access to the CHM to do the sampling, as that would make it robust to scans since all the members of the LRU would have high frequencies. Sadly Ehcache3 did this for their randomly sampled LRU. CHM has a weak hashing function due to degrading into red-black trees, but they degraded it further for speed. This results in -20% hit rate over LRU by taking an MRU-heavy sample, surprisingly even in large caches. They also made it very slow, taking minutes instead of a few seconds. I'm now very weary of that idea because it can be done so horribly if naively handled. I think for now I'm most comfortable using the following. I think its robust enough, low cost, and should be hard to exploit (especially for an external actor). If we discover it is not strong enough, we have a plethora of options now. :-) {code:java} boolean admit(K candidateKey, K victimKey) { int victimFreq = frequencySketch().frequency(victimKey); int candidateFreq = frequencySketch().frequency(candidateKey); if (candidateFreq > victimFreq) { return true; } else if (candidateFreq <= 5) { return false; } int random = ThreadLocalRandom.current().nextInt(); return ((random & 127) == 0); } {code} > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244357#comment-15244357 ] Ben Manes edited comment on CASSANDRA-11452 at 4/16/16 7:46 PM: CLHM was always a decorator, but in 1.4 it embedded the CHMv8 backport. We did that to help improve performance for very large caches, like Cassandra's were, since JDK8 took a long time. That's probably what your remembering. I agree that reducing per-entry overhead is attractive, though a [rough calculation|https://github.com/ben-manes/caffeine/wiki/Memory-overhead] indicates it isn't a huge savings. My view is that it is a premature optimization and best left to the end after the implementation has matured, to re-evaluate if the impact is worth attempting a direct rewrite. Otherwise it adds greatly to the complexity budget from the get go and leading to less time focused on the unique problems of the domain (API, features, efficiency). For example there is more space savings by using TinyLFU over LIRS's ghost entries, but evaluating took effort that I might have been to overwhelmed to expend. It would also be interesting to see if pairing with [Apache Mnemonic|https://github.com/apache/incubator-mnemonic] could reduce the GC overhead by having off-heap without the serialization penalty. bq. Just to clarify those numbers are for small workloads? Yep. bq. ...it would still leave the gate open for an attacker to reduce the efficacy of the cache for items that have only moderate reuse likelihood. Since the frequency is reduced by half every sample period, my assumption was that this attack would be very difficult. Gil's response was to instead detect if TinyLFU had a large number of consecutive rejections, e.g. 80 (assuming 1:20 is admitted on average). That worked quite well, except on ARC's database trace (ds1) which had a negative impact. It makes sense that scans (db, analytics) will have a high rejection rate. What do you think about combining the approach, e.g. {{(candidateFreq <= 3) || (++unadmittedItems < 80)}}, as a guard prior to performing a 1% random admittance? was (Author: ben.manes): CLHM was always a decorator, but in 1.4 it embedded the CHMv8 backport. We did that to help improve performance for very large caches, like Cassandra's were, since JDK8 took a long time. That's probably what your remembering. I agree that reducing per-entry overhead is attractive, though a [rough calculation|https://github.com/ben-manes/caffeine/wiki/Memory-overhead] indicates it isn't a huge savings. My view is that it is a premature optimization and best left to the end after the implementation has matured, to re-evaluate if the impact is worth attempting a direct rewrite. Otherwise it adds greatly to the complexity budget from the get go and leading to less time focused on the unique problems of the domain (API, features, efficiency). For example there is more space savings by using TinyLFU over LIRS's ghost entries, but evaluating took effort that I might have been to overwhelmed to expend. It would also be interesting to see if pairing with [Apache Mnemonic|https://github.com/apache/incubator-mnemonic] could reduce the GC overhead by having off-heap without the serialization penalty. bq. Just to clarify those numbers are for small workloads? Yep. bq ...it would still leave the gate open for an attacker to reduce the efficacy of the cache for items that have only moderate reuse likelihood. Since the frequency is reduced by half every sample period, my assumption was that this attack would be very difficult. Gil's response was to instead detect if TinyLFU had a large number of consecutive rejections, e.g. 80 (assuming 1:20 is admitted on average). That worked quite well, except on ARC's database trace (ds1) which had a negative impact. It makes sense that scans (db, analytics) will have a high rejection rate. What do you think about combining the approach, e.g. {{(candidateFreq <= 3) || (++unadmittedItems < 80)}}, as a guard prior to performing a 1% random admittance? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244357#comment-15244357 ] Ben Manes commented on CASSANDRA-11452: --- CLHM was always a decorator, but in 1.4 it embedded the CHMv8 backport. We did that to help improve performance for very large caches, like Cassandra's were, since JDK8 took a long time. That's probably what your remembering. I agree that reducing per-entry overhead is attractive, though a [rough calculation|https://github.com/ben-manes/caffeine/wiki/Memory-overhead] indicates it isn't a huge savings. My view is that it is a premature optimization and best left to the end after the implementation has matured, to re-evaluate if the impact is worth attempting a direct rewrite. Otherwise it adds greatly to the complexity budget from the get go and leading to less time focused on the unique problems of the domain (API, features, efficiency). For example there is more space savings by using TinyLFU over LIRS's ghost entries, but evaluating took effort that I might have been to overwhelmed to expend. It would also be interesting to see if pairing with [Apache Mnemonic|https://github.com/apache/incubator-mnemonic] could reduce the GC overhead by having off-heap without the serialization penalty. bq. Just to clarify those numbers are for small workloads? Yep. bq ...it would still leave the gate open for an attacker to reduce the efficacy of the cache for items that have only moderate reuse likelihood. Since the frequency is reduced by half every sample period, my assumption was that this attack would be very difficult. Gil's response was to instead detect if TinyLFU had a large number of consecutive rejections, e.g. 80 (assuming 1:20 is admitted on average). That worked quite well, except on ARC's database trace (ds1) which had a negative impact. It makes sense that scans (db, analytics) will have a high rejection rate. What do you think about combining the approach, e.g. {{(candidateFreq <= 3) || (++unadmittedItems < 80)}}, as a guard prior to performing a 1% random admittance? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243989#comment-15243989 ] Ben Manes commented on CASSANDRA-11452: --- The hash table trick isn't applicable since I didn't fork it for Caffeine or CLHM. I was opposed to Guava's decision to do that, other than for computation, as I feel the trade-off is sharply negative. The random walk has a mixed effect in small traces (512 entry). For most its equivalent, for multi3 its better, and negative otherwise. I think multi3 is better only because its a mixed workload that is TinyLFU struggles on (in comparison to LIRS). For larger workloads (database, search, oltp) its equivalent, as we'd expect. (multi1: -4%, multi3: +3%, gli: -2.5%, cs: -2%) A 1% random admittance can have a similar 1-2% reduction, but goes away at a lower rate like 0.4% (1/255). That also passes the collision test since it causes some jitter. It may not be enough in a more adversarial test. Branimir had noticed earlier that using _greater or equal to_ was a solution, but as noted it had a negative impact. However we care mostly about hot candidates being rejected by an artificially hot victim. Most candidates are very cold so the filter avoids polluting the cache. If we add a constraint to the randomization to only be applied to warm candidates then we pass the test and don't see a degredation. I used a constraint of greater than 5, where the maximum frequency is 15 (4-bit counters). I lean towards large traces being more realistic and meaningful, so I am not overly worried either way. But I would like to keep the small traces in good standing as they are easy comparisons for understandable patterns. What are your thoughts on applying the randomness only when the candidate has at least a moderate frequency? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243798#comment-15243798 ] Ben Manes commented on CASSANDRA-11452: --- I was able to sneak in a little [coding|https://github.com/ben-manes/caffeine/tree/collisions] during my morning commute and a much less hectic Friday. The random walk nicely passes Branimir's test, but I have a few eviction tests that still need fixing due to the non-deterministic behavior. I'll try to work on that this weekend. Gil suggested ignoring TinyLFU for at a small probability, like 1%, to admit the candidate. This might have the benefit that an attacker can't use the maximum walking distance as the threshold of if they can break the protection. It also keeps the admission and eviction decoupled, e.g. making it easier to add the filter on top of {{LinkedHashMap}}. I could also see there being a benefit of using multiple strategies in tandem. Roy plans on analyzing the problem, proposed solutions, and detailing his recommendations. I think this will be a good topic for him during his long international flight tomorrow. I'll share his thoughts. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242647#comment-15242647 ] Ben Manes commented on CASSANDRA-11452: --- Oh sheesh, of course. I'm sorry that I'm so dense this week. Thanks for the patience as I fumble through this issue. It didn't click that we evaluate by the random walk, but evict the victim. This is perfect. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242600#comment-15242600 ] Ben Manes commented on CASSANDRA-11452: --- I chatted with Roy about this problem and he's going to ponder it as well. I thought about this and toyed with the code a little. - You're right that my hash detection is very weak and poor enough probably not bother. - I do wish Java would add {{longHashCode}} to Object, finally. Its a good improvement for other platforms. - Your random walk requires that the attacker can exploit the full walking distance (last 16) potential victims. The stepwise walk means it could exploit fewer (e.g. 8) to have an effect, but have to restart when the cache jitters out. If quantified it might prove the attack impractical. - Moving the victim to the MRU position after an admission is rejected works at a small cost to the hit rate. This could be reduced by adding a smaller segment (e.g. bottom N items) so that we don't degrade the recency effects. - A sampled bloom filter of past rejections to bypass if TinyLFU present would probably work well. I think it could also be used to adapt the window size or improve the hit rate by reducing mispredictions. That would require a lot more work to evaluate, so impractical for the immediate term. My plan is to iterate on your random walk. I still think that the victim chosen should be by the random walk and I haven't grokked your reason not to. I'm tired so maybe it will be seem obvious later. The long week of production issues is making me feel pretty slow on this one... > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242133#comment-15242133 ] Ben Manes commented on CASSANDRA-11452: --- {{quote}} I think it's better for the jitter to not affect the victim, since if there is a collision that doesn't get flushed out that would permit the cache efficiency to remain degraded indefinitely {{quote}} I'd expect the collision would be flushed out by the eviction when we detect that the victim's and candidates hash codes are equal. To me the victim means the item that the eviction policy selected, so the jittered LRU is selecting the guard. It might also be simpler code that method is long to handle the various edge cases. {{quote}} I don't recall that suggestion, and don't see a corresponding change in the codebase; remind me? {{quote}} Sorry this is existing code in the sketch, as suggested by Thomas Meuller (H2). That was to protect against hash collision attacks exploiting the hash function. I know this is a bit weak since Java originally tried that and switched to red-black tree bins instead. It provides a little unpredictability on the sketch which might be a good thing. {{quote}} There's a wealth of possible avenues to explore. {{quote}} I'm really interested to see what other avenues people take to exploit sketches in a cache policy. The two citations of the original paper were dismissive. I think the revision has more weight due to the comparative analysis. There seems to be a lot of optimization tricks to explore. Unfortunately good traces are also hard to find. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241872#comment-15241872 ] Ben Manes commented on CASSANDRA-11452: --- Thanks. I won't have the bandwidth to test this until the evening. Roy flew into SF for a conference (from Israel) so we're going to meet. If you have any questions for me to discuss with him I'll proxy. A quick glance and your trick has a nice distribution. A 1M iteration into a multiset showed, [0 x 750485, 1 x 186958, 2 x 46910, 3 x 11731, 4 x 2901, 5 x 776, 6 x 171, 7 x 49, 8 x 15, 9 x 3, 11] I'd probably jitter when as the selection of the victim near the top of the loop and add a check to handle zero weight entries. I'll take care of that part. It seems like we'd need both your jitter and the hash check added in the prior commit. It does sound that the combination would be an effective guard against this type of attack. Do you think the random seed used by the sketch is still a good addition? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241524#comment-15241524 ] Ben Manes commented on CASSANDRA-11452: --- Sorry if I'm being a bit obtuse. If you write a short snippet I can try applying that approach. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241521#comment-15241521 ] Ben Manes commented on CASSANDRA-11452: --- I assumed that it would be acceptable to reduce the penalty when a clash was detected. The current version ejects the victim so that the candidates flow through the probation space. I think that should be similar to your >= approach, without reducing the hit rate in the small traces. Can you review the [patch|https://github.com/ben-manes/caffeine/commit/22ce6339ec91fd7eadfb462fcb176aac69aeb47f]? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241420#comment-15241420 ] Ben Manes commented on CASSANDRA-11452: --- For large traces the difference is marginal, with s3 showing a 2% loss. For small traces the difference can be substantial db: 51.29 -> 51.52 s3: 51.10 -> 49.12 oltp: 37.91 -> 38.10 multi1: 55.59 -> 50.50 gli: 34.16 -> 16.11 cs: 30.31 -> 26.74 > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240902#comment-15240902 ] Ben Manes commented on CASSANDRA-11452: --- The simple hack recycling when the hash codes are equal seems to work well. This would be done in {{evictFromMain}} at the end after the candidate was evicted. Since a weighted cache might evict multiple entries we have to reset the victim for the next loop. {code} // Recycle to guard against hash collision attacks if (victimKey.hashCode() == candidateKey.hashCode()) { NodenextVictim = victim.getNextInAccessOrder(); accessOrderProbationDeque().moveToBack(victim); victim = nextVictim; } {code} The LIRS paper's traces (short) indicate that the difference noise. {{ multi1: 55.28 -> 55.40 multi2: 48.37 -> 48.42 multi3: 41.78 -> 42.00 gli: 34.15 -> 34.06 ps: 57.15 -> 57.17 sprite: 54.95 -> 55.33 cs: 30.19 -> 29.82 loop: 49.95 -> 49.90 2_pools: 52.02 -> 51.96 }} Tomorrow I'll check some of the ARC traces, clean-up the patch, and convert Branimir into a unit test. Thoughts? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240860#comment-15240860 ] Ben Manes commented on CASSANDRA-11452: --- Yes, a custom interface to define a {{longHashCode()}} would work, but that doesn't seem to be a good general solution. I hope one day Java adds it onto Object. I was playing with a [randomized W-TinyLFU|https://github.com/ben-manes/caffeine/commit/92f92f7a79a991d148cc88c9e691030dcebba22b] earlier today. It works well. A random walk in Caffeine (as is) would be a little concerning since that is a linked list traversal. Cycling the victim through the probation passes Branimir's test and is my preference so far. I need to run more simulations to ensure it doesn't have any surprising results. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240841#comment-15240841 ] Ben Manes commented on CASSANDRA-11452: --- I did mention the attack vector during the revision, but that addition was rejected due to concern over the peer reviewer process. Unfortunately I never investigated it further and I'd appreciate help on that front. How would you go about obtaining a larger hash? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240829#comment-15240829 ] Ben Manes commented on CASSANDRA-11452: --- If you are comfortable with Caffeine regardless of the above, then I'd really appreciate the help in getting CASSANDRA-10855 merged. I am very much willing to work together to resolve any issues the Cassandra team discovers. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240823#comment-15240823 ] Ben Manes commented on CASSANDRA-11452: --- Neither the original nor revised papers are clear about that. At first I used {{>=}} assuming it would be better, but in small traces the change made a [large difference|https://github.com/ben-manes/caffeine/commit/3e83411c670ca61a859f0e1ed24e216b847ccd58]. That was prior to the window, so it may be less impactful now. The small traces are also very specific patterns, whereas the larger ones reflect real-world use, so that change might be even less noticeable. Another option might be to recycle the victim in the probation space or degrade to {{>=}} if we detect the clash. So basically I'd need to analyze it and play with your test (thanks!) to figure out a good strategy. An idea that I've wanted to try is if we can detect miss predictions using a bloom filter. When a candidate is rejected it could be added to this sketch. If rejected again within the a shorter sample period, then we bypass TinyLFU. This might mean that the window is too small and could be dynamically sized based on the feedback. I think it would also help mitigate this collision. I tried to protect the most likely attach by using a random seed, since I saw that degredation was possible. But making it more resilient is definitely worth while and your help here is great. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238285#comment-15238285 ] Ben Manes commented on CASSANDRA-11452: --- For the case of 128mb cache with 190mb data set, the degredation may be due to Caffeine's lazily initialization of the sketch. This is done when cache is at least 50% full to avoid penalizing by setting an artificially high bound. This change was suggested on the other Cassandra ticket to also avoid the unnecessary penalty when the cache exceeds the data size, e.g. due to expiration and a safety threshold that is never reachable. For maximum weight the expected number of entries is unknown. This probably causes it to resize more frequently in order to keep the sketch's error rate constant, causing it to lose the history. When the cache is full the entry count won't differ all that much, so it begins to capture the frequency histogram but the trace ends before its made up for the poor start. This might be why the smaller sizes did well and the larger ones suffered. In the latest SNAPSHOT the {{initialCapacity}} setting will pre-size the sketch in addition to the hash table ([commit|https://github.com/ben-manes/caffeine/commit/9711f8bb75cd6feb060ac4f4a58ad27ea0065deb]). Then if you know the average entry size this problem won't occur. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237796#comment-15237796 ] Ben Manes commented on CASSANDRA-11452: --- h5. Heavy CPU This only occurs in synthetic stress tests where there is no spare CPU cycles for the maintenance thread. Cassandra's current LRU cache (via CLHM) has the same general design and we've never encountered an issue in practice. In my [stress test|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/test/java/com/github/benmanes/caffeine/cache/Stresser.java] a simple `Thread.yield()` to mimic other system behavior showed this not to be a problem. Without seeing your benchmark my analysis is that its not a real-world issue. I do plan on someday switching to a bounded array-based queue which would throttle the writes and produce a little less garbage (no linked nodes), but its been low priority given the data I have so far. Using a direct executor is fine, as it amortizes the penalty on the caller. That's what CLHM (and Guava) do. In JDK8 we have the shared FJP so we can reduce the variability of response times by offloading this work. Unlike CLHM (and Guava), this setting would unfairly burden the evicting thread with processing removal notifications if a removal listener is defined (the others publish to a queue and try to share the penalty). A direct executor wouldn't fix the problem as still one thread is overwhelmed by others, but penalizes to reduce the insertion rate so it could only appear to help. h5. Independent admission / eviction strategy I'm not very clear on what you want here. If want to reimplement some of this yourself then take a look at the [frequency sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]. h5. Long-term stability of the cache The reset interval is used for aging the sketch(es) and multiple hashes should reduce the error rate. The SLRU's probation space also allows mispredictions to quickly fade out. It also only compares the candidate and victim, so a hot entry would probably not be selected as a victim by the eviction policy. The sketch uses a random seed to protect against hash flooding attacks, which may be similar to your train of thought. You should also read the [updated paper|http://arxiv.org/pdf/1512.00727.pdf] which goes into much more depth. > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234296#comment-15234296 ] Ben Manes commented on CASSANDRA-10855: --- When Cassandra moves to TPC, it should take less than a day to write a TinyLFU cache if you borrow my [4-bit CountMin sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]. I might evolve that to include the doorkeeper and other tricks, but it hasn't been a priority yet. The simulator includes a variant using incremental reset (which would be my preference for a TPC) and shows a negligible difference in hit rates. A Go developer read the paper and wrote an implementation for fun in a few days, and I'm happy to review anyone's version. Caffeine tackled the research and concurrency side, but no reason to adopt it once TPC. Best to take ideas, leverage its simulator to fine tune, and share insights. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234277#comment-15234277 ] Ben Manes commented on CASSANDRA-10855: --- Some unfortunate duplication of work on the DataStax side, where a custom LIRS page cache was implemented for evaluation. HBase (and Accumulo) are evaluating a migration from their custom SLRU on-heap caches. That might add some insight that could aid in Cassandra's decision. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache
[ https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234269#comment-15234269 ] Ben Manes commented on CASSANDRA-11452: --- [CASSANDRA-10855|https://issues.apache.org/jira/browse/CASSANDRA-10855] introduces [TinyLFU|https://github.com/ben-manes/caffeine/wiki/Efficiency] which matches or beats LIRS at much lower space usage and complexity. This is because it uses a frequency sketch instead of retaining a large number of ghost entries. See the [HighScalability article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html] for details. The patch uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's cache and CLHM (used by Cassandra) which are LRU. Can you evaluate against that? > Cache implementation using LIRS eviction for in-process page cache > -- > > Key: CASSANDRA-11452 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11452 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Branimir Lambov >Assignee: Branimir Lambov > > Following up from CASSANDRA-5863, to make best use of caching and to avoid > having to explicitly marking compaction accesses as non-cacheable, we need a > cache implementation that uses an eviction algorithm that can better handle > non-recurring accesses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178101#comment-15178101 ] Ben Manes commented on CASSANDRA-10855: --- It sounds like we all thought Weibull was a good choice. Another option is to use [YCSB|https://github.com/brianfrankcooper/YCSB] which handles correlated omission and is popular benchmark for comparing data stores. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136182#comment-15136182 ] Ben Manes commented on CASSANDRA-10855: --- I think Weibull should be fine, too. I rebased the pull request. I don't have permissions to schedule a run on cstar (nor do I think they should they be granted). It would be nice if Robert or another team member ran another round of performance tests. My expectation is that in a scenario similar to a realistic workload there will be a less I/O due to fewer cache misses. If there is a slight degredation (or gain) in other aspects of the new cache that should have a negligible impact, as reduced I/O will be the dominant effect. This would then provide a good argument for OHC to revisit its eviction policy. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115737#comment-15115737 ] Ben Manes commented on CASSANDRA-10855: --- [2.1.0|https://github.com/ben-manes/caffeine/releases/tag/v2.1.0] was released which includes the above mentioned optimizations. So the cache should seem artificially better for an artificial workload =) I haven't had a chance to try to add more workload profiles to the Cassandra tool. It would be nice if we knew what real-world distributions were like, as Zipf-like is what researchers published. From the traces I've experimented with, I am fairly confident in a net positive result. P.S. An [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html] on HighScalability describes the overall design. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087252#comment-15087252 ] Ben Manes commented on CASSANDRA-10855: --- The latest [snapshot jar|https://oss.sonatype.org/content/repositories/snapshots/com/github/ben-manes/caffeine/caffeine] includes a two optimizations. Insertions now avoid an unnecessary lambda. I suspect that will have a negligible benefit, but its always good to be more GC hygienic. A cache below 50% capacity will skip read policy work. That means it won't record the access in ring buffers which reduces contention. That also reduces the how often policy the maintenance work is scheduled, as the buffers don't need to be drained. A write will still trigger a maintenance cycle, but that should be shorter by doing less. This result in throughput close to a raw ConcurrentHashMap and then incurring the penalty when the threshold is crossed. That should improve _trades-fwd-lcs-nolz4_ and anyone else's usage where the cache is merely a safety threshold but isn't likely to grow close to the maximum. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086242#comment-15086242 ] Ben Manes commented on CASSANDRA-10855: --- Happy new year. Anything I can do to help keep this moving? Ariel's comment explains the poor hit rate, as a uniform distribution will result in a fixed and low hit rate regardless of policy. An effective cache is often at around 85%, ideally in the high 90s to make reads the dominant case, but even 65% is useful. Even when the hit rate is maxed out, the effect of a better policy can be noticeable. In that case it reduces the TCO by being able to achieve the same performance with smaller, cheaper machines. Glancing at the uniform results the degredation is small enough to probably be within the margin of error where the run and other system effects dominate. In an update heavy workload the new cache should be faster due to synchronization having less penalty than CAS storms. But on the perf test's insertion heavy workload it is probably a little slower due to features incurring more complexity. Another set of eyes might uncover some improvements, so that's always welcome. [Zipf-like|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.2253=1] distributions are considered the most common workload patterns. Ideally we could capture a production trace and simulate it, as the [database trace|https://github.com/ben-manes/caffeine/wiki/Efficiency#database] I use shows very promising results. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086276#comment-15086276 ] Ben Manes commented on CASSANDRA-10855: --- cstar returns "Internal Server Error" so I can't look at the moment. I use YCSB's distributions for [synthetic|https://github.com/ben-manes/caffeine/blob/master/simulator/src/main/java/com/github/benmanes/caffeine/cache/simulator/Synthetic.java#L41] workloads. I'd expect adding those would be trivial. If the cache doesn't reach 50% capacity then Caffeine doesn't even initialize the frequency sketch to save memory (e.g. someone sets the max to a ridiculous threshold as a worst case bound). It could probably also by avoid the LRU shuffling, which would reduce the maintenance penalty. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071747#comment-15071747 ] Ben Manes commented on CASSANDRA-10855: --- Do you know why the hit rate is so poor? It being low regardless of policy is troubling. So it seems important to figure out why that is and what (if anything) can be done to improve that. An ineffective cache impacts the design assumptions by making writes (not reads) the common case. The slightly heavier policy, default delegation to FJP, etc would all be more noticeable when writes are the dominate behavior. I'm not sure off-hand why its worse, but that seems secondary to fixing the hit rate problem. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056445#comment-15056445 ] Ben Manes commented on CASSANDRA-10855: --- A little background from private discussions with Jonathan and Robert leading to this ticket. If the performance analysis is positive then it provides a strong motivation to integrate W-TinyLFU into OHC. As Robert described it, the on-heap caches are low hanging fruit where the key cache is performance critical. These caching projects have always been on my personal time as a research hobby. My original interest was on concurrency, since all other caches either use coarse locking or make sacrifices for throughput (e.g. lower hit rates, O(n) eviction). More recently I worked on improving hit rates, as described in that paper. I've tended to keep a narrow focus until solving a particularly hard problem and off-heap adds more complexity so it wasn't tackled. I'm a little disappointed that OHC didn't borrow ideas from CLHM, as the fundamentals are transferable, but perhaps we'll combine them all together in a future project. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
Ben Manes created CASSANDRA-10855: - Summary: Use Caffeine (W-TinyLFU) for on-heap caches Key: CASSANDRA-10855 URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 Project: Cassandra Issue Type: Improvement Reporter: Ben Manes Cassandra currently uses [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] for performance critical caches (key, counter) and Guava's cache for non-critical (auth, metrics, security). All of these usages have been replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the author of the previously mentioned libraries. The primary incentive is to switch from LRU policy to W-TinyLFU, which provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] hit rates. It performs particularly well in database and search traces, is scan resistant, and as adds a very small time/space overhead to LRU. Secondarily, Guava's caches never obtained similar [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM due to some optimizations not being ported over. This change results in faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10855) Use Caffeine (W-TinyLFU) for on-heap caches
[ https://issues.apache.org/jira/browse/CASSANDRA-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055546#comment-15055546 ] Ben Manes commented on CASSANDRA-10855: --- See the [pull request|https://github.com/apache/cassandra/pull/59] for this task. > Use Caffeine (W-TinyLFU) for on-heap caches > --- > > Key: CASSANDRA-10855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10855 > Project: Cassandra > Issue Type: Improvement >Reporter: Ben Manes > Labels: performance > > Cassandra currently uses > [ConcurrentLinkedHashMap|https://code.google.com/p/concurrentlinkedhashmap] > for performance critical caches (key, counter) and Guava's cache for > non-critical (auth, metrics, security). All of these usages have been > replaced by [Caffeine|https://github.com/ben-manes/caffeine], written by the > author of the previously mentioned libraries. > The primary incentive is to switch from LRU policy to W-TinyLFU, which > provides [near optimal|https://github.com/ben-manes/caffeine/wiki/Efficiency] > hit rates. It performs particularly well in database and search traces, is > scan resistant, and as adds a very small time/space overhead to LRU. > Secondarily, Guava's caches never obtained similar > [performance|https://github.com/ben-manes/caffeine/wiki/Benchmarks] to CLHM > due to some optimizations not being ported over. This change results in > faster reads and not creating garbage as a side-effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770072#comment-13770072 ] Ben Manes commented on CASSANDRA-5661: -- sounds good to me. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5821) Test new ConcurrentLinkedHashMap implementation (1.4RC)
[ https://issues.apache.org/jira/browse/CASSANDRA-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767081#comment-13767081 ] Ben Manes commented on CASSANDRA-5821: -- yes, released as v1.4 Test new ConcurrentLinkedHashMap implementation (1.4RC) --- Key: CASSANDRA-5821 URL: https://issues.apache.org/jira/browse/CASSANDRA-5821 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Ryan McGuire There are some [improvements being made to CLHM| https://code.google.com/p/concurrentlinkedhashmap/source/detail?r=888ad7cebe5b509e5e713b00836f5df9f50baf32] that we should test. Create a small enough dataset that it can fit in memory and devise a test that has a high key cache hit rate, and compare results to CLHM 1.3 we already use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753423#comment-13753423 ] Ben Manes commented on CASSANDRA-5661: -- Sorry that I haven't had the time to work on this for a while. I've been playing with writing a queue using a combining arena, similar to how the stack has an elimination arena, and how to incorporate thread locals to reduce contention. That made me think about the flat combining technique, so after a little digging I uncovered a conversation I had with Chris Vest. At the time he was starting on an object pool, which he's released as Stormpot (http://chrisvest.github.io/stormpot). The implementation has some excellent ideas that are worth borrowing and mixing in. While it is not multi-way, he might be inclined to add that capability after playing with my prototype. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5821) Test new ConcurrentLinkedHashMap implementation (1.4RC)
[ https://issues.apache.org/jira/browse/CASSANDRA-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754272#comment-13754272 ] Ben Manes commented on CASSANDRA-5821: -- That makes sense, Chris. Currently I don't expose any v8 methods, so the new `long mappingCount()` method isn't exposed. You probably are not asking for the count and only for better internal handling, right? I plan on exposing more v8 style methods, but deferred that for this release because I found two issues with computeIfAbsent(). I might try re-implementing Guava Cache as a decorator on top of CLHM since its 33x-50x faster under a synthetic load. Test new ConcurrentLinkedHashMap implementation (1.4RC) --- Key: CASSANDRA-5821 URL: https://issues.apache.org/jira/browse/CASSANDRA-5821 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Ryan McGuire There are some [improvements being made to CLHM| https://code.google.com/p/concurrentlinkedhashmap/source/detail?r=888ad7cebe5b509e5e713b00836f5df9f50baf32] that we should test. Create a small enough dataset that it can fit in memory and devise a test that has a high key cache hit rate, and compare results to CLHM 1.3 we already use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5821) Test new ConcurrentLinkedHashMap implementation (1.4RC)
[ https://issues.apache.org/jira/browse/CASSANDRA-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750940#comment-13750940 ] Ben Manes commented on CASSANDRA-5821: -- Thanks Ryan. This makes sense as the goal of the previous version(s) were low overhead in real-world workloads. The newest version removes overhead in synthetic workloads as well. That may not translate into significant application gains as most of the time is hopefully spent elsewhere. The observable benefit might be less pressure on the young gen GC as reads no longer create garbage in the heap. Test new ConcurrentLinkedHashMap implementation (1.4RC) --- Key: CASSANDRA-5821 URL: https://issues.apache.org/jira/browse/CASSANDRA-5821 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Ryan McGuire There are some [improvements being made to CLHM| https://code.google.com/p/concurrentlinkedhashmap/source/detail?r=888ad7cebe5b509e5e713b00836f5df9f50baf32] that we should test. Create a small enough dataset that it can fit in memory and devise a test that has a high key cache hit rate, and compare results to CLHM 1.3 we already use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735727#comment-13735727 ] Ben Manes commented on CASSANDRA-5661: -- EBS is 28% faster than LTQ. There might be opportunities to make it slightly faster with some tuning. I could probably add blocking methods if there was interest in experimenting with it for CASSANDRA-4718. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733830#comment-13733830 ] Ben Manes commented on CASSANDRA-5661: -- My benchmark had a bug and EBS may only be on par with LTQ performance wise. I need to investigate that again, though. I shifted focus to fixing the performance bottleneck in Guava's cache. The way we tracked usage history (e.g. LRU) was focused on common usage, but is a bottleneck on synthetic benchmarks. I made the fixes to CLHM (v1.4) and offered them upstream (issue 1487). I'll experiment with using CLHM instead to see if that removes the hotspot. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5821) Test new ConcurrentLinkedHashMap implementation (1.4RC)
[ https://issues.apache.org/jira/browse/CASSANDRA-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728484#comment-13728484 ] Ben Manes commented on CASSANDRA-5821: -- In addition to lower latencies and higher throughput, the GC impact is much smaller. Adaption to Guava is being evaluated. -- CACHE BUILDER -- 'CacheBuilder' Finished with 1 thread(s). Average time: 94 ms number of young gc collections: 0, number of old gc collections: 0 Jit compilation: 1 'CacheBuilder' Finished with 2 thread(s). Average time: 294 ms number of young gc collections: 0, number of old gc collections: 0 Jit compilation: 0 'CacheBuilder' Finished with 4 thread(s). Average time: 487 ms number of young gc collections: 0, number of old gc collections: 0 Jit compilation: 0 'CacheBuilder' Finished with 8 thread(s). Average time: 964 ms number of young gc collections: 1, number of old gc collections: 0 Jit compilation: 0 'CacheBuilder' Finished with 16 thread(s). Average time: 1504 ms number of young gc collections: 1, number of old gc collections: 0 Jit compilation: 15 'CacheBuilder' Finished with 32 thread(s). Average time: 9817 ms number of young gc collections: 3, number of old gc collections: 1 Jit compilation: 74 'CacheBuilder' Finished with 64 thread(s). Average time: 15467 ms number of young gc collections: 11, number of old gc collections: 2 Jit compilation: 0 Young Space in mb: (min:0/max:275) : | : | | || : || || || ||| || | | | || :| : 2 1 3 0 0 0 9 0 0 0 0 0 0 2 8 5 : 7 4 9 1 3 5 : 5 Survivor Space in mb: (min:0/max:137) : ||| || : ||| || : |||| || :| : 0 0 5 3 3 3 3 1 0 0 1 1 1 1 0 0 : 2 4 4 4 4 3 3 6 6 6 : 7 7 Old Space in mb: (min:0/max:504) : : ||| : || :| : 0 0 9 3 3 3 3 3 2 2 2 5 5 5 5 5 : 5 5 5 5 5 6 7 7 7 0 0 0 0 0 : 3 3 3 3 7 4 4 4 4 4 4 4 4 Perm Space in mb: (min:6/max:6) :| : 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 -- CLHM V1.3.2 -- 'CLHM' Finished with 1 thread(s). Average time: 87 ms 00:38:34.586 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 0, number of old gc collections: 0 Jit compilation: 0 'CLHM' Finished with 2 thread(s). Average time: 243 ms 00:38:34.832 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 1, number of old gc collections: 0 Jit compilation: 0 'CLHM' Finished with 4 thread(s). Average time: 396 ms 00:38:35.277 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 1, number of old gc collections: 0 Jit compilation: 0 'CLHM' Finished with 8 thread(s). Average time: 944 ms 00:38:36.360 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 3, number of old gc collections: 0 Jit compilation: 0 'CLHM' Finished with 16 thread(s). Average time: 2344 ms 00:38:39.178 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 8, number of old gc collections: 0 Jit compilation: 0 'CLHM' Finished with 32 thread(s). Average time: 6873 ms 00:38:46.379 [main] INFO org.greencheek.annotations.GCMonitor - number of young gc collections: 9, number of old gc collections: 2 Jit compilation: 0 'CLHM' Finished with 64 thread(s). Average time: 9522 ms 00:38:57.278 [main] INFO
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714921#comment-13714921 ] Ben Manes commented on CASSANDRA-5661: -- Since the EBS version is still in progress, the code is shared below. It uses a treiber stack with backoff to an elimination array, mixing in optimizations borrowed from j.u.c.Exchanger. It performs superior to the queues by not to honor FIFO ordering, making cancellation easy to achieve. While all tests pass, I think that the time-to-idle policy is corrupted as it assumed fifo ordering. I can make it tolerant of running out-of-order. I may try writing an elimination queue (LTQ uses a dual queue design). https://github.com/ben-manes/multiway-pool/tree/elimination Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714389#comment-13714389 ] Ben Manes commented on CASSANDRA-5661: -- I replaced the external handle with directly providing the resource and maintaining the association in a threadlocal. This should better match your usage and resolve your concern above. The primary motivation was to reduce object churn, as a handle was created per borrow. This reduced the hot spot time from an average invocation time of 1001us to 704us, when summing up the worst offenders. This may remove the random spiked that you observed if they were caused by garbage collection. 98% of the overhead is now due to usage of other collections (Guava's Cache, LTQ, CLQ). Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714403#comment-13714403 ] Ben Manes commented on CASSANDRA-5661: -- Switching from LTQ to a custom elimination backoff stack appears to have dropped the 98% to 179us. The single threaded benchmark improves by 30ns. A significant gain was also observed when using an EBS instead of an array of CLQs in the time-to-idle policy. I'm surprised by how much of a gain occurs, so I'll have to experiment further to understand if its factual. LTQ/CLQ are hindered by having to honor FIFO with j.u.c. interfaces, and LIFO elimination is the ideal strategy for an object pool. The more frequently successful exchanges may reduce down to eden-space GC, resulting in major net wins. That, or I'm prematurely believing that its working correctly. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714543#comment-13714543 ] Ben Manes commented on CASSANDRA-5661: -- Thanks Pavel. I'm not sure why it got worse recently, except that you did turn on recordStats() in the last few runs. That can incur significant overhead by maintaining multiple LongAdders. Since you did not turn it on in the FileCache patch, which would provide similar stats, it may be an unfair comparison. I'll try to wrap up my EBS prototype and push those changes soon. Those aren't on github yet. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714610#comment-13714610 ] Ben Manes commented on CASSANDRA-5661: -- I was able to reduce the EBS version down to 120us. I probably won't have it on github until tomorrow, though. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0.1 Attachments: CASSANDRA-5661-global-multiway-cache.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707974#comment-13707974 ] Ben Manes commented on CASSANDRA-5661: -- In a simple single-threaded benchmark, LTQ is relatively on par within the object pool. Currently I have a finalizer on the handle as a safety net, both to catch my bugs and usage mistakes. This includes a note on the performance impact, which appears to have add 2.5x overhead. I had intended to replace this with phantom references instead, though now I'm wondering if I should not put any safety net in whatsoever. # Finalizer queueType ns linear runtime ABQ 489 = SAQ 545 CLQ 535 === LBQ 578 == LTQ 555 LBD 490 = # No finalizer queueType ns linear runtime ABQ 176 = SAQ 159 == CLQ 166 === LBQ 210 == LTQ 183 == LBD 181 = Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707990#comment-13707990 ] Ben Manes commented on CASSANDRA-5661: -- yes, I understand that and that was documented. It was correct to add it early on, due to prototyping to help catch my bugs if there were race conditions. When looking at performance then it became appropriate to remove it as tests have baked the code. The only aspect I'm grudgingly punting on is that I prefer warning developers when they have resource leaks, when possible without overhead, instead of silently letting production environments crash. This can be done with phantom references, but I dislike having libraries spawn its own threads (e.g. MapMaker did) and prefer amortizing it (e.g. CacheBuilder). There's no free hook in my pool to tie into, so I'm not providing that warning given you don't need it atm. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707991#comment-13707991 ] Ben Manes commented on CASSANDRA-5661: -- anyways, this is now removed so hopefully your performance tests will see a favorable impact like mine do. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708114#comment-13708114 ] Ben Manes commented on CASSANDRA-5661: -- Can you test without time-to-idle? Most likely there are bursts of expirations and the penalty is now better spread out. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708209#comment-13708209 ] Ben Manes commented on CASSANDRA-5661: -- Profiled to reduce allocations, cutting out about 10ms in a caliper benchmark. The dominating factor in a profile are reads from the cache. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707667#comment-13707667 ] Ben Manes commented on CASSANDRA-5661: -- I think I just fixed this issue in my last push. Sorry I didn't check my email earlier, as I found it when writing more test cases. The problem is that I forgot to default the lifecycle to a discarding instance if not used, after I made it an optional setting. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707672#comment-13707672 ] Ben Manes commented on CASSANDRA-5661: -- okay, fixed. thanks for catching this. The tests no longer use raw keys, which should catch this from occurring again. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707689#comment-13707689 ] Ben Manes commented on CASSANDRA-5661: -- LTQ is best when you allow there to be some spin between producers and consumers, as its optimized for message passing scenarios. In your usage you don't allow any delay, so the likelihood of a successful transfer is low. When transfers are common, the overhead is less due to fewer contented CAS operations. If desired, I can make the pool parameterized to take a supplier of queues to produce so you can parameterize that as well. The pool will always be slower than the FileCacheService patch, since it does more. The decision is whether the performance degradation is acceptable and if the rational for the pool is to provide a finer grained eviction policy is still desired. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, CASSANDRA-5661-v2-global-multiway-per-sstable.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13707652#comment-13707652 ] Ben Manes commented on CASSANDRA-5661: -- I rewrote the time-to-idle policy, so it should be faster when enabled. Details (if interested) For prototyping purposes, I previously used a secondary Guava Cache to track idle resources. Unlike a cache's time-to-idle, which is reset when an entry is read, an object pool's concept of idle time is when a resource resides unused and ready to be borrowed. The use of a secondary Guava Cache meant that the resource had to be added and removed frequently, resulting in locking on the hashtable segments and incurring other maintenance overhead. In Guava's cache we observed that expiration policies mirrored maximum size policies, but time based. Thus time-to-live is a FIFO queue and time-to-idle is an LRU queue. That let us leverage the amortization technique in CLHM to be used for expiration with O(1) reorder costs. The new implementation strips off the unnecessary work by maintaining a time ordered queue that only supports adds and removals. For our definition of idle there is no need to reorder so it is effectively a FIFO. A tryLock guards the policy operations, draining a queue of pending operations if acquired. I decided to allow this to be proactively drained whenever possible, though if we see a need then we can buffer the operations for longer like the caches do. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704086#comment-13704086 ] Ben Manes commented on CASSANDRA-5661: -- I think part of the problem is that idle caching is not overly efficient in this version. That can be improved upon, but maximum size might be better to verify as a baseline with first. Weights are supported as of the 7th. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 2.0 Attachments: CASSANDRA-5661-multiway-per-sstable.patch, CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701530#comment-13701530 ] Ben Manes commented on CASSANDRA-5661: -- Weights are trivial to add and I wanted to avoid adding non-critical features without more details. In your patch, it appears assumed that every queue as a single entry with the same size buffer and privately Jonathan's description of the problem stated 128KB per CRAR. If the weight is constant than they are merely a convenience mapping as it really is the number of entries. Uncontended CAS is cheap, short lived allocations are trivial, and Doug Lea describes LTQ as lower overhead than CLQ (especially under load). The use of weak references was an easy way to avoid race conditions when flushing out the primary structure. It could be replaced with lazy clean-up passes, which is what I originally started with. At this point it seemed unwise to complicate things without more information so I simplified it. The number of queues is probably going to be quite small, on the order of dozens, so the reference cost in this case is quite small. You're trying to compare approaches, which is valid but better oriented towards discussing with Jonathan. The challenge presented to me is as described in the class's JavaDoc: an multiway object pool bounded by the total number of entries. I took a more general approach due to not knowing the trade-offs one could make with context to Cassandra's behavior. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.2.7 Attachments: CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701544#comment-13701544 ] Ben Manes commented on CASSANDRA-5661: -- I solved the LRU problem years ago, which gave you CLHM and Guava's Cache. It scales very well without degrading due to LRU management under higher thread count, limited primarily by the hash table usage. Previous approaches didn't scale to 4-8 threads, but 32+ is limited by the chosen hash table design. In neither approaches will there be significant contention or overhead. The difference is about the level of granularity to bound the resources by and how to evict them. You seem to be focusing on tuning parameters, minute details, etc. for a class written in a few evenings as a favor, knowing that those things are trivial to change. There's not much of a point debating it with me as I don't care and have no stake or interest in what is decided. Especially when you're comparing it against a simplistic usage relying on another class I wrote much of, Guava's. In the end something I wrote will be used to solve this bug. ;) Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.2.7 Attachments: CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701495#comment-13701495 ] Ben Manes commented on CASSANDRA-5661: -- rarely explicitly invalidated is regards to a cache, as Jonathan originally described the problem as a multimap cache instead of as an object pool. He also expressed concern with evicting a block of buffers at once when he conceived of the same model that you implemented. I am intimately familiar with Guava's cache as I designed the algorithms, ported and wrote code for it, and advised on the api. Unfortunately I am not familiar with Cassandra's needs and its code, so the pool was implemented based on a brief description of the problem and ideal behavior. It was a fun exercise for a long weekend. I'd recommend writing tests and benchmarks, which unfortunately appears to be missing with the patch in its current form. Of couse use whatever makes the most sense. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.2.7 Attachments: CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5661) Discard pooled readers for cold data
[ https://issues.apache.org/jira/browse/CASSANDRA-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701495#comment-13701495 ] Ben Manes edited comment on CASSANDRA-5661 at 7/7/13 3:35 AM: -- rarely explicitly invalidated is in regards to a cache, as Jonathan originally described the problem as a multimap cache instead of as an object pool. He also expressed concern with evicting a block of buffers at once when he conceived of the same model that you implemented. I am intimately familiar with Guava's cache as I designed the algorithms, ported and wrote code for it, and advised on the api. Unfortunately I am not familiar with Cassandra's needs and its code, so the pool was implemented based on a brief description of the problem and ideal behavior. It was a fun exercise for a long weekend. I'd recommend writing tests and benchmarks, which unfortunately appears to be missing with the patch in its current form. Of couse use whatever makes the most sense. was (Author: ben.manes): rarely explicitly invalidated is regards to a cache, as Jonathan originally described the problem as a multimap cache instead of as an object pool. He also expressed concern with evicting a block of buffers at once when he conceived of the same model that you implemented. I am intimately familiar with Guava's cache as I designed the algorithms, ported and wrote code for it, and advised on the api. Unfortunately I am not familiar with Cassandra's needs and its code, so the pool was implemented based on a brief description of the problem and ideal behavior. It was a fun exercise for a long weekend. I'd recommend writing tests and benchmarks, which unfortunately appears to be missing with the patch in its current form. Of couse use whatever makes the most sense. Discard pooled readers for cold data Key: CASSANDRA-5661 URL: https://issues.apache.org/jira/browse/CASSANDRA-5661 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.1 Reporter: Jonathan Ellis Assignee: Pavel Yaskevich Fix For: 1.2.7 Attachments: CASSANDRA-5661.patch, DominatorTree.png, Histogram.png Reader pooling was introduced in CASSANDRA-4942 but pooled RandomAccessReaders are never cleaned up until the SSTableReader is closed. So memory use is the worst case simultaneous RAR we had open for this file, forever. We should introduce a global limit on how much memory to use for RAR, and evict old ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2681) Upgrade ConcurrentLinkedHashMap (v1.2)
[ https://issues.apache.org/jira/browse/CASSANDRA-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039262#comment-13039262 ] Ben Manes commented on CASSANDRA-2681: -- Yep, its a hack. I added it to the list of pending tasks for considering in the next version of the CLHM. The only concern with allowing the backing map to be specified is if the caller begins operating on it directly. The decorator may actually be tolerant to that abuse, actually, but it was never a consideration. Those manipulations would be obviously hacky, though, so its probably not worth worrying about. Upgrade ConcurrentLinkedHashMap (v1.2) -- Key: CASSANDRA-2681 URL: https://issues.apache.org/jira/browse/CASSANDRA-2681 Project: Cassandra Issue Type: Task Components: Core Reporter: Ben Manes Assignee: Jonathan Ellis Priority: Minor Fix For: 1.0 This release has numerous performance improvements. See the change log for details. It also includes a few useful features that may be of interest, - Snapshot iteration in order of hotness (CASSANDRA-1966) - Optionally defer LRU maintenance penalty to a background executor (instead of amortized on caller threads) (Note that this setting is not advised if write storms are not rate limited, since it defers eviction until the executor runs.) http://code.google.com/p/concurrentlinkedhashmap/ http://code.google.com/p/concurrentlinkedhashmap/wiki/ExampleUsage Verified compatibility with NonBlockingHashMap. Cassandra may want to consider adding the java_util_concurrent_chm.jar to the bootclasspath to swap all CHM usages with NBHM (CLHM is a decorator on top of CHM). http://high-scale-lib.cvs.sourceforge.net/viewvc/high-scale-lib/high-scale-lib/README -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2681) Upgrade ConcurrentLinkedHashMap (v1.2)
Upgrade ConcurrentLinkedHashMap (v1.2) -- Key: CASSANDRA-2681 URL: https://issues.apache.org/jira/browse/CASSANDRA-2681 Project: Cassandra Issue Type: Task Reporter: Ben Manes Priority: Minor This release has numerous performance improvements. See the change log for details. It also includes a few useful features that may be of interest, - Snapshot iteration in order of hotness (CASSANDRA-1966) - Optionally defer LRU maintenance penalty to a background executor (instead of amortized on caller threads) - Note that this setting is not advised if write storms are not rate limited, since it defers eviction until the executor runs. http://code.google.com/p/concurrentlinkedhashmap/ http://code.google.com/p/concurrentlinkedhashmap/wiki/ExampleUsage Verified compatibility with NonBlockingHashMap. Cassandra may want to consider adding the java_util_concurrent_chm.jar to the bootclasspath to swap all CHM usages with NBHM (CLHM is a decorator on top of CHM). http://high-scale-lib.cvs.sourceforge.net/viewvc/high-scale-lib/high-scale-lib/README -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2681) Upgrade ConcurrentLinkedHashMap (v1.2)
[ https://issues.apache.org/jira/browse/CASSANDRA-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Manes updated CASSANDRA-2681: - Description: This release has numerous performance improvements. See the change log for details. It also includes a few useful features that may be of interest, - Snapshot iteration in order of hotness (CASSANDRA-1966) - Optionally defer LRU maintenance penalty to a background executor (instead of amortized on caller threads) (Note that this setting is not advised if write storms are not rate limited, since it defers eviction until the executor runs.) http://code.google.com/p/concurrentlinkedhashmap/ http://code.google.com/p/concurrentlinkedhashmap/wiki/ExampleUsage Verified compatibility with NonBlockingHashMap. Cassandra may want to consider adding the java_util_concurrent_chm.jar to the bootclasspath to swap all CHM usages with NBHM (CLHM is a decorator on top of CHM). http://high-scale-lib.cvs.sourceforge.net/viewvc/high-scale-lib/high-scale-lib/README was: This release has numerous performance improvements. See the change log for details. It also includes a few useful features that may be of interest, - Snapshot iteration in order of hotness (CASSANDRA-1966) - Optionally defer LRU maintenance penalty to a background executor (instead of amortized on caller threads) - Note that this setting is not advised if write storms are not rate limited, since it defers eviction until the executor runs. http://code.google.com/p/concurrentlinkedhashmap/ http://code.google.com/p/concurrentlinkedhashmap/wiki/ExampleUsage Verified compatibility with NonBlockingHashMap. Cassandra may want to consider adding the java_util_concurrent_chm.jar to the bootclasspath to swap all CHM usages with NBHM (CLHM is a decorator on top of CHM). http://high-scale-lib.cvs.sourceforge.net/viewvc/high-scale-lib/high-scale-lib/README Upgrade ConcurrentLinkedHashMap (v1.2) -- Key: CASSANDRA-2681 URL: https://issues.apache.org/jira/browse/CASSANDRA-2681 Project: Cassandra Issue Type: Task Reporter: Ben Manes Priority: Minor This release has numerous performance improvements. See the change log for details. It also includes a few useful features that may be of interest, - Snapshot iteration in order of hotness (CASSANDRA-1966) - Optionally defer LRU maintenance penalty to a background executor (instead of amortized on caller threads) (Note that this setting is not advised if write storms are not rate limited, since it defers eviction until the executor runs.) http://code.google.com/p/concurrentlinkedhashmap/ http://code.google.com/p/concurrentlinkedhashmap/wiki/ExampleUsage Verified compatibility with NonBlockingHashMap. Cassandra may want to consider adding the java_util_concurrent_chm.jar to the bootclasspath to swap all CHM usages with NBHM (CLHM is a decorator on top of CHM). http://high-scale-lib.cvs.sourceforge.net/viewvc/high-scale-lib/high-scale-lib/README -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1966) Option to control how many items are read on cache load
[ https://issues.apache.org/jira/browse/CASSANDRA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980552#action_12980552 ] Ben Manes commented on CASSANDRA-1966: -- This could be added to CLHM if Cassandra decides its valuable enough and opens a ticket against the library. The current implementation uses an unspecified order since this is cheap and the common-case. Additional ordered API methods could be added. These would be more expensive as they would require traversing the LRU chain to perform a copy and be a blocking operation, but would not affect read/write operations. This example it would be a fair usage and justification of ordered iteration. Its a trivial change, but its an enhancement I've avoided eagerly performing until a project considers it a worthwhile feature. If Cassandra's devs think so, then they can add a feature request. Option to control how many items are read on cache load --- Key: CASSANDRA-1966 URL: https://issues.apache.org/jira/browse/CASSANDRA-1966 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Burroughs CASSANDRA-1417 added an option to save the key and/or row cache keys which is cool. However, for a row large cache it can take a long time to read all of the rows. For example I have a 400,000 item row cache, and loading that on restart takes a little under an hour. In addition to configuring the size of the row cache, and how often it should be saved to disk, I propose an option to control how many items are loaded on startup (or alternately only saving n items out of the full row cache to begin with). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928130#action_12928130 ] Ben Manes commented on CASSANDRA-975: - Released. Sorry for the delay (and for my tired incoherent ramplings in the last post ;-) v1.1 is available for JDK5 and JDK6 in the download section or through Maven. explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926815#action_12926815 ] Ben Manes commented on CASSANDRA-975: - Lets give this another shot. I have finalized v1.1 and am working on providing the release JAR. As before, a stress test may not show improved performance by not demonstrating real-world behavior. The improvements in the official releases provides more consistent behavior without scenarios degradation previously. Unlike the Second Chance implementation currently in use, it supports non-blocking behavior for concurrent writes to different segments. The improvements to provides a stricter LRU may have a slight overhead, but due to the non-blocking nature it shouldn't be significant in practice. This version should also reduced the memory overhead compared to v1.0. I plan to take another stab at the LIRS eviction policy soon. This would again not be noticeable in an artificial stress test, but it would improve the hit rate (thereby improving average performance in real-world usages). Like LRU it can be performed in O(1) time cheaply, but its more difficult to implement. I think I feel comfortable enough to give it another shot. Alternatively, Google Guava will provide a variant of this concurrent LRU algorithm in MapMaker (r08). It is currently being used internally by early adopters to wring out the bugs (admittedly we've had a few). The advantage is that it supports additional configuration options (expiration, soft references, memoization) and will be more widely supported. The disadvantage is that it does not provide dynamic resizing (which Cassandra exposes, I believe) and uses per-segment LRU chains. The per-segment nature has some interesting algorithmic trade-offs, but most likely that isn't of concern to users. It also has the backing of Google's Java. explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878723#action_12878723 ] Ben Manes commented on CASSANDRA-975: - In CHM, the concurrency level is the number of concurrent writer threads (sharded by segments) so 64 is quite high. In CLHM, there is a read queue per segment to buffer LRU reorder operations. The buffer is drained when it exceeds a threshold (64 entries) or a write occurs. This probably accounts for the additional memory overhead, as there is a max of 64x64=4096 entries that may be buffered. The reason it is per segment is that reads are by far the common case, so if there are a large number of writers there should be far more readers. So the increased number of buffers avoids contending on the queue but adds memory overhead. A flaw in v1.0 is that only one segment's buffer is drained at a time. Ideally when the lock is acquired (to drain the buffer, apply writes, and optionally evict) it should drain all of the read buffers. This would reduce memory and improve the hit rate, which would probably resolve the regression issues. This was fixed in /trunk, but it can be back-ported. I also think that I can improve the write performance by removing the lock. This is needed to maintain a strict write ordering on the write queue to match the segment's, but I believe that with a little magic I can remove that constraint. That would make a minor boost. I'd be happy to work on a v1.1-LRU release that focuses on performance tuning. explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878822#action_12878822 ] Ben Manes commented on CASSANDRA-975: - I added a ticket on my side to track this. Feel free to suggest optimization ideas. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=17 explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877948#action_12877948 ] Ben Manes commented on CASSANDRA-975: - I've been thinking over the test and I don't think that the performance numbers should be given much weight. It is an excellent stress test, though, by showing that the interleaving of operations does not cause a system failure. As a performance test, it primarily indicates that there is no added bottleneck. Writing a robust benchmark is extremely difficult, especially on the JVM. For now I've been punting by using JBoss' benchmark, but I haven't evaluated its correctness to determine how valid it is. At some point I should write my own, but that's a non-trivial undertaking. JBoss' test only show the per-operation overhead, which are nearly equivalent to a ConcurrentHashMap (which I decorate). It does not take into account the system performance due to a cache miss, so a higher hit rate but slower per-operation execution may result in much better system performance. Example concerns with this test from a benchmark perspective include: (1) Test environment does not reflect production (laptop) (2) JVM parameters and OS are not turned (e.g. GC algorithm) (3) Short-running test does not show if there is degradation/failures over time (4) Working set does not reflect production usage (random - should use trace data) (5) Impact of hit-rate has a dramatic impact (miss penalty = I/O), so the test may artificially benefit an eviction policy. It would be nice to see what the hit-rate is between the two implementations. I suspect that in Matthew's test the SecondChance is a tad better, so the fewer I/O calls to a slow laptop disk can account for much of the difference. If the LIRS was stable, it would probably indicate much faster system performance due to having a 5-10% higher hit-rate. explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-975) explore upgrading CLHM
[ https://issues.apache.org/jira/browse/CASSANDRA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877618#action_12877618 ] Ben Manes commented on CASSANDRA-975: - That's understandable, but I was hoping it would be better. The new version has the advantage of avoiding degradation scenarios which could affect Cassandra due to its large caches. In the future an improved eviction policy (LIRS) is now possible, which could make up for this slightly lower performance by increasing the hit rate. That work is still experimental though and I've been too busy to put much effort into it. The older version is in SECOND_CHANCE mode which does no work on a read (sets a volatile flag per entry), but on a write it iterates over the FIFO queue to evict the first entry without that flag set. Any entry visited that had the flag set has it unset and is resubmitted to the tail of the queue. This allows reads to avoid locking, but it can result in flushing the queue if all the entries were marked. That's an O(n) operation and writes are blocking, but for small application caches (100 - 1000 avg) it isn't bad. For large caches (1M+), that could be noticeable and unacceptable. I would also suspect that the hit-rate of the policy would degrade with the cache's size, given that it may not follow the rule of thumb that only 10-20% is hot, as in normal in applications. This may have been reported in Cassandra (http://comments.gmane.org/gmane.comp.db.cassandra.user/3039). The new design trades off slightly higher memory usage and an amortized penalty for performing LRU reordering for (mostly) non-blocking writes without a degradation scenario. The memory usage and penalty could probably be improved by tweaking the heuristics (currently just magic numbers). In a dedicated server like Cassandra, we could also experiment with using ThreadLocal reorder buffers vs. per-segment, which would avoid contention on the buffer if that was found to be an issue. With help, I can probably bring it on par if there is concern with the slightly slower results. For now, its probably a choice between potentially suffering a degradation scenario vs. slightly slower cache operations. explore upgrading CLHM -- Key: CASSANDRA-975 URL: https://issues.apache.org/jira/browse/CASSANDRA-975 Project: Cassandra Issue Type: Task Reporter: Jonathan Ellis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8 Attachments: 0001-trunk-975.patch, clhm_test_results.txt, insertarator.py, readarator.py The new version should be substantially better on large caches where many entries were readon large caches where many entries were read, which is exactly what you see in our row and key caches. http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=9 Hopefully we can get Digg to help test, since they could reliably break CLHM when it was buggy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.