Hi Veeral, Two points: 1. Keys in our LSM index are not mutually exclusive between different components. 2. We have the concept of filtered LSM index in which we store (min-max) of a field(not necessarily the key) and we can use them then when answering queries to know which components to look at.
Hope this helps, Abdullah. On Mon, Mar 7, 2016 at 4:48 PM, Veeral Shah <[email protected]> wrote: > Had a small question around asterixdb LSMified secondry index structure > (Disclaimer : I dont have complete familiarity with the LSM file formats in > asterixdb - hence below also sort of validates my understanding), > > In a classic LSM implementation, files (aka SST files in leveldb) at level > L1 onwards (L1, L2, L3,...), are sorted based on keys (assuming key types > where linear order can be imposed) and the key ranges of any two SST files, > in a level, are mutually exclusive. Since the LSM files are immutable, a > file's key range at a level remains fixed (unless compaction happens). So > taking a case of a asterixdb LSMified secondary index, assuming the keys > are byte strings, all keys should be alphabetically sorted. > > If above is true, there may be a merit in creating an additional in-memory > "coarse-index" (for each level) which keeps info about key.max and key.min > for each file region (say an SST file is broken into multiple smaller > regions of fixed size). This would help in quickly figuring out where a key > could fall, in a given LSM level. (The coarse-index is to be > created/recreated during startup and compaction) > > Thanks and regards > Veeral Shah >
