[ 
https://issues.apache.org/jira/browse/IGNITE-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov closed IGNITE-9592.
-----------------------------------

> MVCC: Use linked lists to store multiple versions.
> --------------------------------------------------
>
>                 Key: IGNITE-9592
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9592
>             Project: Ignite
>          Issue Type: Improvement
>          Components: mvcc
>            Reporter: Roman Kondakov
>            Priority: Major
>              Labels: performance
>
> Currently we store all versions of each row in primary index. It is not very 
> efficient for several reasons:
>  * We have to insert and remove tree item each time a new version is created 
> or an old version deleted. This leads to a considerable tree operations 
> number increasing as well as extra tree splits and merges operations.
>  * Also this leads to a contention on leaf page write lock - each update 
> operation has to obtain exclusive access to insert a new version or remove 
> old version of row. During this update no body on that leaf can not be able 
> to update or even read data of neighbour keys.
>  * Primary key tree consumes more space if it stores each version.
>  * Other vendors do not store each version in primary indexes (as far as I 
> know).
> Possible solution - store only key and link to the newest version in the 
> primary index. Instead of this {{CacheDataTree}} item
>  {{|           key part          |           |         |}}
>  {{|-----------------------------|  lockVer  |   link  |}}
>  {{| cache_id | hash |  mvccVer  |           |         |}}
>   we'll have:
>  {{|   key part      |           |  link to the   |}}
>  {{|-----------------|  lockVer  |     newest     |}}
>  {{| cache_id | hash |           |    version     |}}
>  Note, we do not have mvccVer in tree item. Link to the newer version leads 
> to the most recent inserted data item. To find older versions, each DatRow is 
> provided with "prevLink" - the link to previous version of row. DataRow 
> layout can be changed from
> |header|xid_max|xid_min |KV bytes|
> to the next one:
> |header|xid_max|xid_min|*PREV_LINK*|KV bytes|
> Where *PREV_LINK* field points to the previous version of the row.  Doing 
> traverse overt this prevRow links we can iterate over all available versions 
> without using and affecting the primary key tree.
> When the new version is created we just insert the new row in datastore, then 
> do CAS on the link to the newest version in primary key tree in order it 
> points to the new row. PrevLink in the new row should point on the previous 
> one. That is all. We've just inserted a new version just with a long field 
> CAS in the CacheDataTree. Without obtaining write lock and other headache.
> Secondary indexes are handled in the same manner as before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to