Igniters,

My use case involves scenario where it's necessary to iterate over
large(many TBs) persistent cache doing some calculation on read data.

The basic solution is to iterate cache using ScanQuery.

This turns out to be slow because iteration over cache involves a lot of
random disk access for reading data pages referenced from leaf pages by
links.

This is especially true when data is stored on disks with slow random
access, like SAS disks. In my case on modern SAS disks array reading speed
was like several MB/sec while sequential read speed in perf test was about
GB/sec.

I was able to fix the issue by using ScanQuery with explicit partition set
and running simple warmup code before each partition scan.

The code pins cold pages in memory in sequential order thus eliminating
random disk access. Speedup was like x100 magnitude.

I suggest adding the improvement to the product's core  by always
sequentially preloading pages for all internal partition iterations (cache
iterators, scan queries, sql queries with scan plan) if partition is cold
(low number of pinned pages).

This also should speed up rebalancing from cold partitions.

Ignite JIRA ticket [1]

Thoughts ?

[1] https://issues.apache.org/jira/browse/IGNITE-8873

-- 

Best regards,
Alexei Scherbakov

Reply via email to