[ 
https://issues.apache.org/jira/browse/IGNITE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953560#comment-16953560
 ] 

Sergey Kalashnikov commented on IGNITE-12295:
---------------------------------------------

The following changes were made to demonstrate the approach.

1) A special cursor {{PurgeCursor}} is introduced to B+Tree to clean up tree 
items according to a provided filter.
The filter is passed a page IO object and item index. An existing 
{{TreeRowClosure}} is re-used for the filter.

For the purpose of the index partition clearing the filter passed to the above 
cursor does the following:
- reads the link of the page item;
- extracts partition id from the link value;
- determines if the partition is requested for clearing.
This code is made a part of {{H2Tree}} class (which represents one segment of 
the index), method {{purge}}.

The purge cursor may delete multiple items from the page at once. To reflect 
that, the new physical delta WAL record {{PurgeRecord}} is introduced.
The new record contains the indexes of all removed items and the new count of 
items that remains in the page.

2) A new {{GridQueryIndexing}} method to provide the caller with a collection 
of runnables and futures each responsible for clearing a single index.
{{List<IgniteBiTuple<Runnable, IgniteInternalFuture<Void>>> 
purgeIndexPartitions(CacheGroupContext grp, Set<Integer> parts)}}

This specific choice of API hides the details of enumerating tables and indexes 
of cache group from the caller and allows to move actual processing to some 
other context.

The implementation is done in {{IgniteH2Indexing}} class.

3) Changes in pipeline of {{PartitionsEvictManager}}.

An API entry point for starting index partition eviction mechanism on a cache 
group and listening for single result.
{{public IgniteInternalFuture<Void> 
purgePartitionsExclusively(CacheGroupContext grp, List<GridDhtLocalPartition> 
parts) throws IgniteCheckedException}}

It makes sure that all the given partitions are not reserved and are not being 
cleared at the moment.
The {{clearFuture}} futures of every partition is reinitialized to prevent 
further clashes with regular clearing requests.
After all partitions are available for purge, the index purge runnables 
provided by {{GridQueryIndexing}} are converted to tasks that are put through 
usual queues/pipeline of {{PartitionsEvictManager}}.

Once all index purge tasks are completed, the additional tasks are run:
- to clean up all the CacheMapEntries for affected partitions;
- to clean up H2RowCache.

5) Limitations/applicability.
- It is assumed that no concurrent changes are made to affected partitions, 
otherwise there is no guarantee that index is free of rows of such partitions 
after iteration is completed.
In current design of file-based rebalancing, the concurrent changes are not 
applied to the data storage.
- Only applicable to indexes that use Ignite's internal B+Tree. Cache groups 
with full text indexes cannot be cleared that way.
- Per-entry event notifications are missing, since we are not clearing actual 
data store partition.


> Faster index eviction
> ---------------------
>
>                 Key: IGNITE-12295
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12295
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Sergey Kalashnikov
>            Assignee: Sergey Kalashnikov
>            Priority: Major
>
> For the file-based rebalancing approach, it seems feasible to avoid iterating 
> the old partition data in order to clear the indexes.
> One can independently clear the shared index structures of all the rows 
> referencing entries from moving partitions by deducing partition id from the 
> links in the leaf pages.
> The proposed algorithm is simple and takes the set of integer partition ids 
> as an input:
> 1. Iterate over leaf pages of the index and remove items attributed to any of 
> indicated partitions, unless it is the only or the rightmost item on a page.
> 2. If the rightmost item (or the only item) on a page happens to belong to 
> any of the indicated partitions, employ a regular remove algorithm 
> (descending from the root) so that inner pages get correctly updated.
> Restart iteration from the leaf page where the removed item would be inserted 
> (descend from the root to find it).
> The use of such algorithm can be justified (as having performance advantage) 
> when the number of keys that'd be removed is bigger than the number of leaf 
> pages in the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to