Roman Kondakov created IGNITE-9592:
--------------------------------------

             Summary: MVCC: Use linked lists to store multiple versions.
                 Key: IGNITE-9592
                 URL: https://issues.apache.org/jira/browse/IGNITE-9592
             Project: Ignite
          Issue Type: Improvement
          Components: mvcc
            Reporter: Roman Kondakov


Currently we store all versions of each row in primary index. It is not very 
efficient for several reasons:
 * We have to insert and remove tree item each time a new version is created or 
an old version deleted. This leads to a considerable tree operations number 
increasing as well as tree splits and merges operations.
 * Also this leads to a contention on leaf page write lock - each update 
operation has to obtain exclusive access to insert a new version of row. During 
this update no body on that leaf can not be able to update or even read data of 
neighbour keys.
 * Primary key tree consumes more space if it stores each version.
 * Other vendors do not store each version in primary indexes (as far as I 
know).

Possible solution - store only key and link to the newest version in the 
primary index. Instead of this {{CacheDataTree}} item
{{|           key part          |           |         |}}
{{|-----------------------------|  lockVer  |   link  |}}
{{| cache_id | hash |  mvccVer  |           |         |}}
 we'll have:
{{|   key part      |           |  link to the   |}}
{{|-----------------|  lockVer  |     newest     |}}
{{| cache_id | hash |           |    version     |}}
Note, we do not have mvccVer in tree item. Link to the newer version leads to 
the most recent inserted data item. To find older versions, each DatRow is 
provided with "prevLink" - the link to previous version of row. DataRow layout 
can be changed from

| header | xid_max | xid_min | KV bytes |

to the next one:

| header | xid_max | xid_min | *PREV_LINK* | KV bytes |

Where *PREV_LINK* field points to the previous version of the row.  Traversing 
this prevRow links we can iterate over all available versions without affecting 
primary key tree.

When the new version is created we just insert the new row in datastore, then 
do CAS on the link to the newest version in primary key tree in order it points 
to the new row. PrevLink in the new row should point on the previous one. That 
is all. We've just inserted a new version just with a long field CAS in the 
CacheDataTree. Without obtaining write lock and other headache.

Secondary indexes are handled in the same manner as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to