----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20811/#review41764 -----------------------------------------------------------
samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngineFactory.scala <https://reviews.apache.org/r/20811/#comment75344> Gut feeling (not backed by any data) is that a threshold of 1000 might be quite low for a default. If there is a lot of data in the store, the compaction itself may start taking a long time. Rather than an absolute number, how about setting the threshold in terms of the proportion of keys in the keys? e.g. perform a compaction if more than (say) 20% of the keys in the store have been deleted? That way, if the store is big (=compaction is expensive), the threshold is automatically higher. (For purposes of that calculation I think it would be fine to simply count number of put requests and number of delete requests -- no need to track unique keys.) - Martin Kleppmann On April 28, 2014, 10:52 p.m., Chris Riccomini wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20811/ > ----------------------------------------------------------- > > (Updated April 28, 2014, 10:52 p.m.) > > > Review request for samza. > > > Repository: samza > > > Description > ------- > > add javadocs, and reset deletion counter in compact > > > make delete threshold configurable. add a performance test (takes 25s to run). > > > make compaction lazy on read-side so we can take advantage of cached writes > > > trigger compactions periodically to remove deleted keys from levels > > > Diffs > ----- > > > samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngineFactory.scala > 81fe86165019f72a15be1ac9cfcfff0598b4b92b > > samza-kv/src/main/scala/org/apache/samza/storage/kv/LevelDbKeyValueStore.scala > 8602a328673e6fa7d435366abcd9a96a99d9cd88 > > samza-kv/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala > 85ba11a3362ad7cf4f84fbcbd944cd790e572cbe > > Diff: https://reviews.apache.org/r/20811/diff/ > > > Testing > ------- > > > Thanks, > > Chris Riccomini > >
