I would rather fix the scan than hack the scan. Is there any technical
reason for hacking it now instead of fixing it properly? Can some of the
experts in this thread provide an estimate of complexity and difference in
work that would be required for each approach?

D.

On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <alexey.goncha...@gmail.com>
wrote:

> I think it would be beneficial for some Ignite users if we added such a
> partition warmup method to the public API. The method should be
> well-documented and state that it may invalidate existing page cache. It
> will be a very effective instrument until we add the proper scan ability
> that Vladimir was referring to.
>
> пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <maxmu...@gmail.com>:
>
> > Folks,
> >
> > Such warming up can be an effective technique for performing calculations
> > which required large cache
> > data reads, but I think it's the single narrow use case of all over
> Ignite
> > store usages. Like all other
> > powerfull techniques, we should use it wisely. In the general case, I
> think
> > we should consider other
> > techniques mentioned by Vladimir and may create something like `global
> > statistics of cache data usage`
> > to choose the best technique in each case.
> >
> > For instance, it's not obvious what would take longer: multi-block reads
> or
> > 50 single-block reads issues
> > sequentially. It strongly depends on used hardware under the hood and
> might
> > depend on workload system
> > resources (CPU-intensive calculations and I\O access) as well. But
> > `statistics` will help us to choose
> > the right way.
> >
> >
> > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <dpavlov....@gmail.com>
> wrote:
> >
> > > Hi Alexei,
> > >
> > > I did not find any PRs associated with the ticket for check code
> changes
> > > behind this idea. Are there any PRs?
> > >
> > > If we create some forwards scan of pages, it should be a very
> > intellectual
> > > algorithm including a lot of parameters (how much RAM is free, how
> > probably
> > > we will need next page, etc). We had the private talk about such idea
> > some
> > > time ago.
> > >
> > > By my experience, Linux systems already do such forward reading of file
> > > data (for corresponding sequential flagged file descriptors), but some
> > > prefetching of data at the level of application may be useful for
> > O_DIRECT
> > > file descriptors.
> > >
> > > And one more concern from me is about selecting a right place in the
> > system
> > > to do such prefetch.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <voze...@gridgain.com>:
> > >
> > > > HI Alex,
> > > >
> > > > This is good that you observed speedup. But I do not think this
> > solution
> > > > works for the product in general case. Amount of RAM is limited, and
> > > even a
> > > > single partition may need more space than RAM available. Moving a lot
> > of
> > > > pages to page memory for scan means that you evict a lot of other
> > pages,
> > > > what will ultimately lead to bad performance of subsequent queries
> and
> > > > defeat LRU algorithms, which are of great improtance for good
> database
> > > > performance.
> > > >
> > > > Database vendors choose another approach - skip BTrees, iterate
> > direclty
> > > > over data pages, read them in multi-block fashion, use separate scan
> > > buffer
> > > > to avoid excessive evictions of other hot pages. Corresponding ticket
> > for
> > > > SQL exists [1], but idea is common for all parts of the system,
> > requiring
> > > > scans.
> > > >
> > > > As far as proposed solution, it might be good idea to add special API
> > to
> > > > "warmup" partition with clear explanation of pros (fast scan after
> > > warmup)
> > > > and cons (slowdown of any other operations). But I think we should
> not
> > > make
> > > > this approach part of normal scans.
> > > >
> > > > Vladimir.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > >
> > > >
> > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > alexey.scherbak...@gmail.com> wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > My use case involves scenario where it's necessary to iterate over
> > > > > large(many TBs) persistent cache doing some calculation on read
> data.
> > > > >
> > > > > The basic solution is to iterate cache using ScanQuery.
> > > > >
> > > > > This turns out to be slow because iteration over cache involves a
> lot
> > > of
> > > > > random disk access for reading data pages referenced from leaf
> pages
> > by
> > > > > links.
> > > > >
> > > > > This is especially true when data is stored on disks with slow
> random
> > > > > access, like SAS disks. In my case on modern SAS disks array
> reading
> > > > speed
> > > > > was like several MB/sec while sequential read speed in perf test
> was
> > > > about
> > > > > GB/sec.
> > > > >
> > > > > I was able to fix the issue by using ScanQuery with explicit
> > partition
> > > > set
> > > > > and running simple warmup code before each partition scan.
> > > > >
> > > > > The code pins cold pages in memory in sequential order thus
> > eliminating
> > > > > random disk access. Speedup was like x100 magnitude.
> > > > >
> > > > > I suggest adding the improvement to the product's core  by always
> > > > > sequentially preloading pages for all internal partition iterations
> > > > (cache
> > > > > iterators, scan queries, sql queries with scan plan) if partition
> is
> > > cold
> > > > > (low number of pinned pages).
> > > > >
> > > > > This also should speed up rebalancing from cold partitions.
> > > > >
> > > > > Ignite JIRA ticket [1]
> > > > >
> > > > > Thoughts ?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> > --
> > --
> > Maxim Muzafarov
> >
>

Reply via email to