Igniters, I'd like to start a discussion about new storage format for Ignite. Our current approach is so-called *heap-organized* storage with secondary index per partition. It has a number of drawbacks: 1) Slow scans (joins, OLAP workload) - data is writen in arbitrary manner, so iteration over base index leads to multiple page reads and page locks 2) Slow writes in case of OLTP workload- every update touches miltiple index and free-list pages (a kind of write amplification) 3) Duplicated PK index when SQL is enabled - our base index cannot be used for lookups or range scans. This makes write amplification effects even worse.
All mature RDBMS systems emply alternative format as default - *index-organized* storage. In this case primary index leaf pages is data pages. Rowse are sorted inside data pages. This gives: - Blazingly fast scans (no dereference, less page reads, less evictions, less locks) - Fast writes in OLTP workloads when PK index column (e.g. ID) grows monotonically (you need to *update only one page* if there are no splits) - Slower random writes due to index fragmentation compared to heap I propose to adopt this approach in two phases: 1) Optionally add data to leaf pages [1]. This should improve our ScanQuery dramatically 2) Optionally has single primary index instead of per-partition index [2]. This should improve our updates and SQL scans at the cost of harder rebalance and recovery. Thoughts? [1] https://issues.apache.org/jira/browse/IGNITE-7026 [2] https://issues.apache.org/jira/browse/IGNITE-7027