Re: Rebalancing - how to make it faster

Anton Vinogradov Mon, 09 Apr 2018 09:17:06 -0700

Ilya,

WAL should be automatically disabled at initial rebalancing.
It can't be disabled at regilar rebalancing, since you never ready to lose
data you protected by WAL.
Is there any need to have special param in that case?


2018-04-09 16:41 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:

> Igniters,
>
> I am currently at the finish line of
> https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable WAL during
> initial preloading") implementation. And I need that such behavior should
> be configurable. In my intermediate implementation I have parameter called
> "disableWalDuringRebalancing" in IgniteConfiguration. Do you thing such
> name is meaningful and self-explanatory? Do we need to ensure that it has
> the same value on every node? Should I make it configurable per cache
> rather than globally?
>
> Please share your thoughts.
>
> On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh <ilant...@gridgain.com>
> wrote:
>
> > Denis,
> >
> > Those ticket are rather complex, and so I don't know when I'll be able to
> > start working on them.
> >
> > On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <dma...@apache.org> wrote:
> >
> >> Ilya,
> >>
> >> Just came across the IEP put together by you:
> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16%
> >> 3A+Optimization+of+rebalancing
> >>
> >> Excellent explanation, thanks for aggregating everything there.
> >>
> >> Two tickets below don't have a fixed version assigned:
> >> https://issues.apache.org/jira/browse/IGNITE-8020
> >> https://issues.apache.org/jira/browse/IGNITE-7935
> >>
> >> Do you plan to work on them in 2.6 time frame, right?
> >>
> >> --
> >> Denis
> >>
> >> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <dma...@apache.org> wrote:
> >>
> >> > Ilya, granted you all the required permissions. Please let me know if
> >> you
> >> > still have troubles with the wiki.
> >> >
> >> > --
> >> > Denis
> >> >
> >> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <ilant...@gridgain.com>
> >> > wrote:
> >> >
> >> >> Unfortunately, I don't have permission to create page for IEP on
> wiki.
> >> >> Denis, can you grant it? My username is ilantukh.
> >> >>
> >> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <a...@apache.org>
> >> wrote:
> >> >>
> >> >> > >> It is impossible to disable WAL only for certain partitions
> >> without
> >> >> > >> completely overhauling design of Ignite storage mechanism. Right
> >> now
> >> >> we
> >> >> > can
> >> >> > >> afford only to change WAL mode per cache group.
> >> >> >
> >> >> > Cache group rebalancing is a one cache rebalancing, and then this
> >> cache
> >> >> > ("cache group") can be presented as a set of virtual caches.
> >> >> > So, there is no issues for initial rebalancing.
> >> >> > Lets disable WAL on initial rebalancing.
> >> >> >
> >> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
> >> >> >
> >> >> > > Dmitry,
> >> >> > > It is impossible to disable WAL only for certain partitions
> without
> >> >> > > completely overhauling design of Ignite storage mechanism. Right
> >> now
> >> >> we
> >> >> > can
> >> >> > > afford only to change WAL mode per cache group.
> >> >> > >
> >> >> > > The idea is to disable WAL when node doesn't have any partition
> in
> >> >> OWNING
> >> >> > > state, which means it doesn't have any consistent data and won't
> be
> >> >> able
> >> >> > to
> >> >> > > restore from WAL anyway. I don't see any potential use for WAL on
> >> such
> >> >> > > node, but we can keep a configurable parameter indicating can we
> >> >> > > automatically disable WAL in such case or not.
> >> >> > >
> >> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
> >> >> dpavlov....@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > Denis, as I understood, there is and idea to exclude only
> >> rebalanced
> >> >> > > > partition(s) data. All other data will go to the WAL.
> >> >> > > >
> >> >> > > > Ilya, please correct me if I'm wrong.
> >> >> > > >
> >> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
> >> >> > > >
> >> >> > > > > Ilya,
> >> >> > > > >
> >> >> > > > > That's a decent boost (5-20%) even having WAL enabled. Not
> sure
> >> >> that
> >> >> > we
> >> >> > > > > should stake on the WAL "off" mode here because if the whole
> >> >> cluster
> >> >> > > goes
> >> >> > > > > down, it's then the data consistency is questionable. As an
> >> >> > architect,
> >> >> > > I
> >> >> > > > > wouldn't disable WAL for the sake of rebalancing; it's too
> >> risky.
> >> >> > > > >
> >> >> > > > > If you agree, then let's create the IEP. This way it will be
> >> >> easier
> >> >> > to
> >> >> > > > > track this endeavor. BTW, are you already ready to release
> any
> >> >> > > > > optimizations in 2.5 that is being discussed in a separate
> >> thread?
> >> >> > > > >
> >> >> > > > > --
> >> >> > > > > Denis
> >> >> > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <
> >> >> ilant...@gridgain.com
> >> >> > >
> >> >> > > > > wrote:
> >> >> > > > >
> >> >> > > > > > Denis,
> >> >> > > > > >
> >> >> > > > > > > - Don't you want to aggregate the tickets under an IEP?
> >> >> > > > > > Yes, I think so.
> >> >> > > > > >
> >> >> > > > > > > - Does it mean we're going to update our B+Tree
> >> >> implementation?
> >> >> > Any
> >> >> > > > > ideas
> >> >> > > > > > how risky it is?
> >> >> > > > > > One of tickets that I created (
> >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-7935)
> involves
> >> >> B+Tree
> >> >> > > > > > modification, but I am not planning to do it in the nearest
> >> >> future.
> >> >> > > It
> >> >> > > > > > shouldn't affect existing tree operations, only introduce
> new
> >> >> ones
> >> >> > > > > (putAll,
> >> >> > > > > > invokeAll, removeAll).
> >> >> > > > > >
> >> >> > > > > > > - Any chance you had a prototype that shows performance
> >> >> > > optimizations
> >> >> > > > > the
> >> >> > > > > > approach you are suggesting to take?
> >> >> > > > > > I have a prototype for simplest improvements (
> >> >> > > > https://issues.apache.org/
> >> >> > > > > > jira/browse/IGNITE-8019 & https://issues.apache.org/
> >> >> > > > > > jira/browse/IGNITE-8018)
> >> >> > > > > > - together they increase throughput by 5-20%, depending on
> >> >> > > > configuration
> >> >> > > > > > and environment. Also, I've tested different WAL modes -
> >> >> switching
> >> >> > > from
> >> >> > > > > > LOG_ONLY to NONE gives over 100% boost - this is what I
> >> expect
> >> >> from
> >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-8017.
> >> >> > > > > >
> >> >> > > > > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <
> >> dma...@apache.org
> >> >> >
> >> >> > > > wrote:
> >> >> > > > > >
> >> >> > > > > > > Ilya,
> >> >> > > > > > >
> >> >> > > > > > > That's outstanding research and summary. Thanks for
> >> spending
> >> >> your
> >> >> > > > time
> >> >> > > > > on
> >> >> > > > > > > this.
> >> >> > > > > > >
> >> >> > > > > > > Not sure I have enough expertise to challenge your
> >> approach,
> >> >> but
> >> >> > it
> >> >> > > > > > sounds
> >> >> > > > > > > 100% reasonable to me. As side notes:
> >> >> > > > > > >
> >> >> > > > > > >    - Don't you want to aggregate the tickets under an
> IEP?
> >> >> > > > > > >    - Does it mean we're going to update our B+Tree
> >> >> > implementation?
> >> >> > > > Any
> >> >> > > > > > >    ideas how risky it is?
> >> >> > > > > > >    - Any chance you had a prototype that shows
> performance
> >> >> > > > > optimizations
> >> >> > > > > > of
> >> >> > > > > > >    the approach you are suggesting to take?
> >> >> > > > > > >
> >> >> > > > > > > --
> >> >> > > > > > > Denis
> >> >> > > > > > >
> >> >> > > > > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <
> >> >> > > ilant...@gridgain.com
> >> >> > > > >
> >> >> > > > > > > wrote:
> >> >> > > > > > >
> >> >> > > > > > > > Igniters,
> >> >> > > > > > > >
> >> >> > > > > > > > I've spent some time analyzing performance of
> rebalancing
> >> >> > > process.
> >> >> > > > > The
> >> >> > > > > > > > initial goal was to understand, what limits it's
> >> throughput,
> >> >> > > > because
> >> >> > > > > it
> >> >> > > > > > > is
> >> >> > > > > > > > significantly slower than network and storage device
> can
> >> >> > > > > theoretically
> >> >> > > > > > > > handle.
> >> >> > > > > > > >
> >> >> > > > > > > > Turns out, our current implementation has a number of
> >> issues
> >> >> > > caused
> >> >> > > > > by
> >> >> > > > > > a
> >> >> > > > > > > > single fundamental problem.
> >> >> > > > > > > >
> >> >> > > > > > > > During rebalance data is sent in batches called
> >> >> > > > > > > > GridDhtPartitionSupplyMessages. Batch size is
> >> configurable,
> >> >> > > > default
> >> >> > > > > > > value
> >> >> > > > > > > > is 512KB, which could mean thousands of key-value
> pairs.
> >> >> > However,
> >> >> > > > we
> >> >> > > > > > > don't
> >> >> > > > > > > > take any advantage over this fact and process each
> entry
> >> >> > > > > independently:
> >> >> > > > > > > > - checkpointReadLock is acquired multiple times for
> every
> >> >> > entry,
> >> >> > > > > > leading
> >> >> > > > > > > to
> >> >> > > > > > > > unnecessary contention - this is clearly a bug;
> >> >> > > > > > > > - for each entry we write (and fsync, if configuration
> >> >> assumes
> >> >> > > it)
> >> >> > > > a
> >> >> > > > > > > > separate WAL record - so, if batch contains N entries,
> we
> >> >> might
> >> >> > > end
> >> >> > > > > up
> >> >> > > > > > > > doing N fsyncs;
> >> >> > > > > > > > - adding every entry into CacheDataStore also happens
> >> >> > completely
> >> >> > > > > > > > independently. It means, we will traverse and modify
> each
> >> >> index
> >> >> > > > tree
> >> >> > > > > N
> >> >> > > > > > > > times, we will allocate space in FreeList N times and
> we
> >> >> will
> >> >> > > have
> >> >> > > > to
> >> >> > > > > > > > additionally store in WAL O(N*log(N)) page delta
> records.
> >> >> > > > > > > >
> >> >> > > > > > > > I've created a few tickets in JIRA with very different
> >> >> levels
> >> >> > of
> >> >> > > > > scale
> >> >> > > > > > > and
> >> >> > > > > > > > complexity.
> >> >> > > > > > > >
> >> >> > > > > > > > Ways to reduce impact of independent processing:
> >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8019 -
> >> >> > > > aforementioned
> >> >> > > > > > > bug,
> >> >> > > > > > > > causing contention on checkpointReadLock;
> >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8018 -
> >> >> > > inefficiency
> >> >> > > > > in
> >> >> > > > > > > > GridCacheMapEntry implementation;
> >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8017 -
> >> >> > > > automatically
> >> >> > > > > > > > disable
> >> >> > > > > > > > WAL during preloading.
> >> >> > > > > > > >
> >> >> > > > > > > > Ways to solve problem on more global level:
> >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-7935 -
> a
> >> >> ticket
> >> >> > > to
> >> >> > > > > > > > introduce
> >> >> > > > > > > > batch modification;
> >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8020 -
> >> >> complete
> >> >> > > > > > redesign
> >> >> > > > > > > of
> >> >> > > > > > > > rebalancing process for persistent caches, based on
> file
> >> >> > > transfer.
> >> >> > > > > > > >
> >> >> > > > > > > > Everyone is welcome to criticize above ideas, suggest
> new
> >> >> ones
> >> >> > or
> >> >> > > > > > > > participate in implementation.
> >> >> > > > > > > >
> >> >> > > > > > > > --
> >> >> > > > > > > > Best regards,
> >> >> > > > > > > > Ilya
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > > > --
> >> >> > > > > > Best regards,
> >> >> > > > > > Ilya
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Best regards,
> >> >> > > Ilya
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best regards,
> >> >> Ilya
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> > Ilya
> >
>
>
>
> --
> Best regards,
> Ilya
>

Re: Rebalancing - how to make it faster

Reply via email to