[ https://issues.apache.org/jira/browse/SAMZA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yi Pan (Data Infrastructure) updated SAMZA-873: ----------------------------------------------- Assignee: Nicolas Maquet > Avoid unnecessary flushes in CachedStore > ---------------------------------------- > > Key: SAMZA-873 > URL: https://issues.apache.org/jira/browse/SAMZA-873 > Project: Samza > Issue Type: Improvement > Components: kv > Affects Versions: 0.10.0 > Reporter: Nicolas Maquet > Assignee: Nicolas Maquet > Fix For: 0.10.1 > > Attachments: > 0001-SAMZA-873-Fix-CachedStore-to-not-call-flush-unnecess.patch > > > The class {{org.apache.samza.storage.kv.CachedStore}} is currently calling > {{store.flush()}} when evicting dirty entries. This in turn causes RocksDB to > flush its memtables much more than necessary, causing slowdowns. > In a mixed put / get workload, e.g. 2 gets for 1 put with an object cache > size of 1000, RocksDB will flush its memtable roughly every 333 calls to > put(); that is every time the eldest entry from the cache is dirty. In our > benchmarks, this leads to a more than 20x drop in throughput. > The attached patch fixes the issue as follows: > - {{CachedStore.put()}} no longer flushes when evicting dirty entries. > It calls {{store.putAll()}} with all dirty entries and resets the dirty list > and count but does not call {{store.flush()}}. > - Likewise, {{CachedStore.cache.removeEldestEntry()}} no longer flushes when > evicting dirty entries. > It calls {{store.putAll()}} on all dirty entries and resets the dirty list > and count. > - The behavior of {{CachedStore.flush()}} is unaffected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)