Ivan Bessonov created IGNITE-16102:
--------------------------------------

             Summary: Store all RocksDB partitions in a single column family.
                 Key: IGNITE-16102
                 URL: https://issues.apache.org/jira/browse/IGNITE-16102
             Project: Ignite
          Issue Type: Improvement
    Affects Versions: 3.0.0-alpha3
            Reporter: Ivan Bessonov


Current storage implementation puts each partition in its own column family. 
This effectively means that every partition lives in it's own database, sharing 
only WAL and some in-memory resources. Given that each column family has 
multiple files for LSM trees, the amount of opened file descriptors is bigger 
than it needs to be.

Now, the idea is to have a single column family for partitions within a table. 
And we should think of possibility of storing several tables in the same 
RocksDB instance, for similar reasons. You can think about is as of cache 
groups in Ignite 2.x.

There's also an "optimization" to be implemented that is missing in code - 
using key hashes as prefixes.
h3. What should be implemented:

First of all, code will be heavily refactored. This will lead to 
simplifications in many places.

Otherwise, I see the following list of goals to achieve:
 * current implementation allows to derive the list of partitions from the list 
of column families. This won't be possible, I suggest storing this list 
explicitly in "meta" CF, in any format that'll be convenient during the 
implementation
 * there should be a way of having compact "tableId" representation. IgniteUUID 
or even UUID is too much I think, but it might work as a basis. This problem 
should be discussed
 * binary representation for keys should now include following information:
 ** tableId - fixed-length set of bytes to be used as a prefix
 ** partitionId - 2 bytes that will follow the tableId. This layout will allow 
making range queries for specific partitions of specific tables
 ** key hash - 4 bytes. This one is required to optimize comparison time for 
keys. Generally speaking, it's safe to assume that hashes will be mostly 
different for different keys, meaning that hashes will be enough to determine 
keys inequality
 ** actual key payload goes after all these prefixes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to