[ https://issues.apache.org/jira/browse/IGNITE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Ozerov updated IGNITE-11433: ------------------------------------- Issue Type: Task (was: Improvement) > MVCC: Link entry versions at the Data Store layer. > -------------------------------------------------- > > Key: IGNITE-11433 > URL: https://issues.apache.org/jira/browse/IGNITE-11433 > Project: Ignite > Issue Type: Task > Components: mvcc, sql > Reporter: Igor Seliverstov > Priority: Major > > At now all tuple versions are placed inside index trees. CacheDataTree is > used to link versions each to other (using their order inside an index page). > Despite the fact that this approach is easy to implement and preferable at > the first point, it brings several disadvantages: > 1) We need to iterate over tuple versions at update time under a read (or > even write) lock on an index page which blocks other write (read) operations > for a relatively long period of time. > 2) Write amplification suffers not only Data Store layer, but indexes as > well, which makes read/lookup ops into indexes much slower. > 3) We cannot implement several important improvements (data streamer > optimizations) because having several versions of one key in an index page > doesn't allow using of Invoke operations. > Using versions linking at the Data Store only (like it do other vendors) > solves or decreases impact of that issues. > So, the proposed changes: > 1) Change data page layout adding two fields into its header: {{link}} (a > link to the next tuple in a versions chain) and {{lock}} (a tx, which holds a > write lock on the HEAD of the chain) There are several possible > optimizations: 1) leave lock as is (in the cache index item) 2) use max > version as lock version as well > 2) Do not save all versions of a tuple in indexes; this mean removing version > from key - newest version will overwrite an existing entry > There are two approaches with some pros and cons of how to link versions: > 1) N2O (newer to older) - a reader (writer) gets the newest tuple version > first and iterates over tuple versions from newer to older until it gets a > position where it's snapshot placed between min and max versions of the > examined tuple. Approach implies faster reads (more actual versions are get > first) and necessity of updating all involved indexes on each write operation > - slower writes in other words (may be optimized using logical pointers to > the head of tuple versions chain). Cooperative VAC (update operations remove > invisible for all readers tuple versions) is possible. > 2) O2N (older to newer) - a reader gets the oldest visible tuple version and > iterates over versions until it gets visible version. It allows not to update > all indexes (except the case when an index value is changed), write > operations become lighter. Cooperative VAC almost impossible. > We need to decide which approach to use depending on that load profile is > preferable (OLTP/OLAP) -- This message was sent by Atlassian JIRA (v7.6.3#76005)