Re[4]: Cache scan efficiency

Zhenya Stanilovsky Tue, 18 Sep 2018 23:59:36 -0700

Great, i don`t think about that.


>Среда, 19 сентября 2018, 9:40 +03:00 от Vladimir Ozerov <voze...@gridgain.com>:
>
>Pinning is even worse thing, because you loose control on how data is moved
>within a single region. Instead, I would suggest to use partition warmup +
>separate data region to achieve "pinning" semantics.
>
>On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
>< arzamas...@mail.ru.invalid > wrote:
>
>> hi, but how to deal with page replacements, which Dmitriy Pavlov mentioned?
>> this approach would be efficient if all data fits into memory, may be
>> better to have method to pin some critical caches?
>>
>>
>> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
>>  dpavlov....@gmail.com >:
>> >
>> >Even better, if RAM is exhausted page replacement process will be started.
>> >
>>  
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
>> )
>> >
>> >Effect of the preloading will be still markable, but not as excelled as
>> >with full-fitting into RAM. Later I can review or improve javadocs if it
>> is
>> >necessary.
>> >
>> >ср, 19 сент. 2018 г. в 0:18, Denis Magda <  dma...@apache.org >:
>> >
>> >> Agree, it's just a matter of the documentation. If a user stores 100% in
>> >> RAM and on disk, and just wants to warm RAM up after a restart then he
>> >> knows everything will fit there. If during the preloading we detect that
>> >> the RAM is exhausted we can halt it and print out a warning.
>> >>
>> >> --
>> >> Denis
>> >>
>> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <  dpavlov....@gmail.com
>> >
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I totally support the idea of cache preload.
>> >> >
>> >> > IMO it can be expanded. We can iterate over local partitions of the
>> cache
>> >> > group and preload each.
>> >> >
>> >> > But it should be really clear documented methods so a user can be
>> aware
>> >> of
>> >> > the benefits of such method (e.g. if RAM region is big enough, etc).
>> >> >
>> >> > Sincerely,
>> >> > Dmitriy Pavlov
>> >> >
>> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <  dma...@apache.org >:
>> >> >
>> >> > > Folks,
>> >> > >
>> >> > > Since we're adding a method that would preload a certain partition,
>> can
>> >> > we
>> >> > > add the one which will preload the whole cache? Ignite persistence
>> >> users
>> >> > > I've been working with look puzzled once they realize there is no
>> way
>> >> to
>> >> > > warm up RAM after the restart. There are use cases that require
>> this.
>> >> > >
>> >> > > Can the current optimizations be expanded to the cache preloading
>> use
>> >> > case?
>> >> > >
>> >> > > --
>> >> > > Denis
>> >> > >
>> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
>> >> > >  alexey.scherbak...@gmail.com > wrote:
>> >> > >
>> >> > > > Summing up, I suggest adding new public
>> >> > > > method IgniteCache.preloadPartition(partId).
>> >> > > >
>> >> > > > I will start preparing PR for IGNITE-8873
>> >> > > > <  https://issues.apache.org/jira/browse/IGNITE-8873 > if no more
>> >> > > objections
>> >> > > > follow.
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
>> >> > >  alexey.goncha...@gmail.com
>> >> > > > >:
>> >> > > >
>> >> > > > > Dmitriy,
>> >> > > > >
>> >> > > > > In my understanding, the proper fix for the scan query looks
>> like a
>> >> > big
>> >> > > > > change and it is unlikely that we include it in Ignite 2.7. On
>> the
>> >> > > other
>> >> > > > > hand, the method suggested by Alexei is quite simple  and it
>> >> > definitely
>> >> > > > > fits Ignite 2.7, which will provide a better user experience.
>> Even
>> >> > > > having a
>> >> > > > > proper scan query implemented this method can be useful in some
>> >> > > specific
>> >> > > > > scenarios, so we will not have to deprecate it.
>> >> > > > >
>> >> > > > > --AG
>> >> > > > >
>> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
>> >>  dpavlov....@gmail.com
>> >> > >:
>> >> > > > >
>> >> > > > > > As I understood it is not a hack, it is an advanced feature
>> for
>> >> > > warming
>> >> > > > > up
>> >> > > > > > the partition. We can build warm-up of the overall cache by
>> >> calling
>> >> > > its
>> >> > > > > > partitions warm-up. Users often ask about this feature and are
>> >> not
>> >> > > > > > confident with our lazy upload.
>> >> > > > > >
>> >> > > > > > Please correct me if I misunderstood the idea.
>> >> > > > > >
>> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
>> >> > >  dsetrak...@apache.org
>> >> > > > >:
>> >> > > > > >
>> >> > > > > > > I would rather fix the scan than hack the scan. Is there any
>> >> > > > technical
>> >> > > > > > > reason for hacking it now instead of fixing it properly? Can
>> >> some
>> >> > > of
>> >> > > > > the
>> >> > > > > > > experts in this thread provide an estimate of complexity and
>> >> > > > difference
>> >> > > > > > in
>> >> > > > > > > work that would be required for each approach?
>> >> > > > > > >
>> >> > > > > > > D.
>> >> > > > > > >
>> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
>> >> > > > > > >  alexey.goncha...@gmail.com >
>> >> > > > > > > wrote:
>> >> > > > > > >
>> >> > > > > > > > I think it would be beneficial for some Ignite users if we
>> >> > added
>> >> > > > > such a
>> >> > > > > > > > partition warmup method to the public API. The method
>> should
>> >> be
>> >> > > > > > > > well-documented and state that it may invalidate existing
>> >> page
>> >> > > > cache.
>> >> > > > > > It
>> >> > > > > > > > will be a very effective instrument until we add the
>> proper
>> >> > scan
>> >> > > > > > ability
>> >> > > > > > > > that Vladimir was referring to.
>> >> > > > > > > >
>> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
>> >> > >  maxmu...@gmail.com
>> >> > > > >:
>> >> > > > > > > >
>> >> > > > > > > > > Folks,
>> >> > > > > > > > >
>> >> > > > > > > > > Such warming up can be an effective technique for
>> >> performing
>> >> > > > > > > calculations
>> >> > > > > > > > > which required large cache
>> >> > > > > > > > > data reads, but I think it's the single narrow use case
>> of
>> >> > all
>> >> > > > over
>> >> > > > > > > > Ignite
>> >> > > > > > > > > store usages. Like all other
>> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
>> >> general
>> >> > > > > case, I
>> >> > > > > > > > think
>> >> > > > > > > > > we should consider other
>> >> > > > > > > > > techniques mentioned by Vladimir and may create
>> something
>> >> > like
>> >> > > > > > `global
>> >> > > > > > > > > statistics of cache data usage`
>> >> > > > > > > > > to choose the best technique in each case.
>> >> > > > > > > > >
>> >> > > > > > > > > For instance, it's not obvious what would take longer:
>> >> > > > multi-block
>> >> > > > > > > reads
>> >> > > > > > > > or
>> >> > > > > > > > > 50 single-block reads issues
>> >> > > > > > > > > sequentially. It strongly depends on used hardware under
>> >> the
>> >> > > hood
>> >> > > > > and
>> >> > > > > > > > might
>> >> > > > > > > > > depend on workload system
>> >> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
>> >> > well.
>> >> > > > But
>> >> > > > > > > > > `statistics` will help us to choose
>> >> > > > > > > > > the right way.
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
>> >> > > >  dpavlov....@gmail.com
>> >> > > > > >
>> >> > > > > > > > wrote:
>> >> > > > > > > > >
>> >> > > > > > > > > > Hi Alexei,
>> >> > > > > > > > > >
>> >> > > > > > > > > > I did not find any PRs associated with the ticket for
>> >> check
>> >> > > > code
>> >> > > > > > > > changes
>> >> > > > > > > > > > behind this idea. Are there any PRs?
>> >> > > > > > > > > >
>> >> > > > > > > > > > If we create some forwards scan of pages, it should
>> be a
>> >> > very
>> >> > > > > > > > > intellectual
>> >> > > > > > > > > > algorithm including a lot of parameters (how much RAM
>> is
>> >> > > free,
>> >> > > > > how
>> >> > > > > > > > > probably
>> >> > > > > > > > > > we will need next page, etc). We had the private talk
>> >> about
>> >> > > > such
>> >> > > > > > idea
>> >> > > > > > > > > some
>> >> > > > > > > > > > time ago.
>> >> > > > > > > > > >
>> >> > > > > > > > > > By my experience, Linux systems already do such
>> forward
>> >> > > reading
>> >> > > > > of
>> >> > > > > > > file
>> >> > > > > > > > > > data (for corresponding sequential flagged file
>> >> > descriptors),
>> >> > > > but
>> >> > > > > > > some
>> >> > > > > > > > > > prefetching of data at the level of application may be
>> >> > useful
>> >> > > > for
>> >> > > > > > > > > O_DIRECT
>> >> > > > > > > > > > file descriptors.
>> >> > > > > > > > > >
>> >> > > > > > > > > > And one more concern from me is about selecting a
>> right
>> >> > place
>> >> > > > in
>> >> > > > > > the
>> >> > > > > > > > > system
>> >> > > > > > > > > > to do such prefetch.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Sincerely,
>> >> > > > > > > > > > Dmitriy Pavlov
>> >> > > > > > > > > >
>> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
>> >> > > > > >  voze...@gridgain.com
>> >> > > > > > > >:
>> >> > > > > > > > > >
>> >> > > > > > > > > > > HI Alex,
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > This is good that you observed speedup. But I do not
>> >> > think
>> >> > > > this
>> >> > > > > > > > > solution
>> >> > > > > > > > > > > works for the product in general case. Amount of
>> RAM is
>> >> > > > > limited,
>> >> > > > > > > and
>> >> > > > > > > > > > even a
>> >> > > > > > > > > > > single partition may need more space than RAM
>> >> available.
>> >> > > > > Moving a
>> >> > > > > > > lot
>> >> > > > > > > > > of
>> >> > > > > > > > > > > pages to page memory for scan means that you evict a
>> >> lot
>> >> > of
>> >> > > > > other
>> >> > > > > > > > > pages,
>> >> > > > > > > > > > > what will ultimately lead to bad performance of
>> >> > subsequent
>> >> > > > > > queries
>> >> > > > > > > > and
>> >> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
>> >> for
>> >> > > good
>> >> > > > > > > > database
>> >> > > > > > > > > > > performance.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Database vendors choose another approach - skip
>> BTrees,
>> >> > > > iterate
>> >> > > > > > > > > direclty
>> >> > > > > > > > > > > over data pages, read them in multi-block fashion,
>> use
>> >> > > > separate
>> >> > > > > > > scan
>> >> > > > > > > > > > buffer
>> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
>> >> > > > Corresponding
>> >> > > > > > > ticket
>> >> > > > > > > > > for
>> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts of
>> the
>> >> > > > system,
>> >> > > > > > > > > requiring
>> >> > > > > > > > > > > scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > As far as proposed solution, it might be good idea
>> to
>> >> add
>> >> > > > > special
>> >> > > > > > > API
>> >> > > > > > > > > to
>> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
>> (fast
>> >> > > scan
>> >> > > > > > after
>> >> > > > > > > > > > warmup)
>> >> > > > > > > > > > > and cons (slowdown of any other operations). But I
>> >> think
>> >> > we
>> >> > > > > > should
>> >> > > > > > > > not
>> >> > > > > > > > > > make
>> >> > > > > > > > > > > this approach part of normal scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Vladimir.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > [1]
>>  https://issues.apache.org/jira/browse/IGNITE-6057
>> >> > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
>> >> > > > > > > > > > >  alexey.scherbak...@gmail.com > wrote:
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > > Igniters,
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > My use case involves scenario where it's
>> necessary to
>> >> > > > iterate
>> >> > > > > > > over
>> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
>> >> calculation
>> >> > > on
>> >> > > > > read
>> >> > > > > > > > data.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The basic solution is to iterate cache using
>> >> ScanQuery.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This turns out to be slow because iteration over
>> >> cache
>> >> > > > > > involves a
>> >> > > > > > > > lot
>> >> > > > > > > > > > of
>> >> > > > > > > > > > > > random disk access for reading data pages
>> referenced
>> >> > from
>> >> > > > > leaf
>> >> > > > > > > > pages
>> >> > > > > > > > > by
>> >> > > > > > > > > > > > links.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This is especially true when data is stored on
>> disks
>> >> > with
>> >> > > > > slow
>> >> > > > > > > > random
>> >> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
>> >> disks
>> >> > > > array
>> >> > > > > > > > reading
>> >> > > > > > > > > > > speed
>> >> > > > > > > > > > > > was like several MB/sec while sequential read
>> speed
>> >> in
>> >> > > perf
>> >> > > > > > test
>> >> > > > > > > > was
>> >> > > > > > > > > > > about
>> >> > > > > > > > > > > > GB/sec.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
>> with
>> >> > > > explicit
>> >> > > > > > > > > partition
>> >> > > > > > > > > > > set
>> >> > > > > > > > > > > > and running simple warmup code before each
>> partition
>> >> > > scan.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The code pins cold pages in memory in sequential
>> >> order
>> >> > > thus
>> >> > > > > > > > > eliminating
>> >> > > > > > > > > > > > random disk access. Speedup was like x100
>> magnitude.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I suggest adding the improvement to the product's
>> >> core
>> >> > > by
>> >> > > > > > always
>> >> > > > > > > > > > > > sequentially preloading pages for all internal
>> >> > partition
>> >> > > > > > > iterations
>> >> > > > > > > > > > > (cache
>> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
>> plan)
>> >> if
>> >> > > > > > partition
>> >> > > > > > > > is
>> >> > > > > > > > > > cold
>> >> > > > > > > > > > > > (low number of pinned pages).
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This also should speed up rebalancing from cold
>> >> > > partitions.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Ignite JIRA ticket [1]
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Thoughts ?
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > [1]
>> >>  https://issues.apache.org/jira/browse/IGNITE-8873
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > --
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Best regards,
>> >> > > > > > > > > > > > Alexei Scherbakov
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > >
>> >> > > > > > > > > --
>> >> > > > > > > > > --
>> >> > > > > > > > > Maxim Muzafarov
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > >
>> >> > > > Best regards,
>> >> > > > Alexei Scherbakov
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>>
>> --
>> Zhenya Stanilovsky
>>


-- 
Zhenya Stanilovsky

Re[4]: Cache scan efficiency

Reply via email to