Here is another half-baked idea. :-) The key-value storage engine maintains values until they are overwritten or deleted. Here are two very common use cases that could potentially be optimized significantly by having a storage engine that allowed maintaining data for a window of time: 1. Stream join. This is the case where you have two nearly aligned streams like clicks and impressions and you want to join them. You want to store clicks until a corresponding impression arrives or impressions until a corresponding click arrives. Since not every impression has a click you want to eventually time these out and discard them. 2. Buffered sort: A fast key-value storage engine that supported range queries could be used to implement a buffered sort.
The implementation in something like leveldb is actually quite simple: in addition to garbage collection you just discard data segments after a period of time. This collection mechanism works well with the changelogs too: obviously kafka already supports a time-based retention. This gets rid of the need for garbage collection in the log and also gets rid of the need to log deletes. You would ideally want the retention to be deterministic on the storage engine and the kafka topic to retain more data than the storage engine so a complete restore is always possible. -Jay
