Re: Unknown known issue on cache rebalancing delayed

Maxim Muzafarov Tue, 04 Sep 2018 03:09:51 -0700

Anton,

I agree with you 20 time is not enough. I've checked the single run of the
test class - it consumes ~7min per each execution.
CacheSuite8 total execution timeout - 210 min, so we can perform only 30
class execution in this suite. Our strategy here is
to `20 times within single` and put into the TC queue 50 runs. Total ~7000
min or 5 days.


Not sure that we should perform exactly 1000 executions, hopefully, we will
stop adding to the queue new tasks at some point.

On Tue, 4 Sep 2018 at 12:59 Anton Vinogradov <a...@apache.org> wrote:

> Maxim,
> 20 is not 1k :)
> Also, you forgot to check GridCacheRebalancingAsyncSelfTest
>
> I'm not sure we should have exactly 1k runs, but 20 is definitely not
> enough.
>
> Roman,
> I propose to use IDEA "run until failure" feature and perform test locally
> (at your PC) while you're not using PC.
>
> вт, 4 сент. 2018 г. в 12:51, Maxim Muzafarov <maxmu...@gmail.com>:
>
> > Roman, Anton,
> >
> > I've already created additional PR [2] all and run it on TC [1].
> > Please, follow up with the results.
> >
> > [1]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
> > [2] https://github.com/apache/ignite/pull/4676/files
> >
> >
> > On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rsht...@yahoo.com.invalid>
> > wrote:
> >
> > > Anton,
> > > Thank you. I would like to recheck it. How can this (1_000 runs) be
> done
> > > in TC?
> > >
> > >
> > >     On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton
> Vinogradov <
> > > a...@apache.org> wrote:
> > >
> > >  Roman,
> > >
> > > I see you uncommented this line.
> > > I do not remember deadlock detail, but I remember it was the extremely
> > rare
> > > case.
> > > I found and "fixed" it some days before merge when I had 24x7 sanity
> > check
> > > week :)
> > >
> > > So, I propose to have at least 1_000 runs of this tests before keeping
> > this
> > > uncommented.
> > >
> > >
> > >
> > > вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <maxmu...@gmail.com>:
> > >
> > > > Roman,
> > > >
> > > > I worked recently on rebalance improvements and haven't found any
> > > problems
> > > > with delayed cache rebalacne.
> > > > Agree with you - let's uncomment this and remove scary comment. Will
> > you
> > > > create a ticket for it?
> > > >
> > > > In case of any problems we can easily detec deadlock with newly
> > > configured
> > > > `FailureHandler`.
> > > >
> > > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rsht...@yahoo.com> wrote:
> > > >
> > > > > Hi Maxim,
> > > > >
> > > > > I have some issues with a cluster with rebalance delay enabled, but
> > > need
> > > > > to check more -- if I find it's related I'll share.
> > > > > Just wanted to make sure it's not an issue anymore from someone
> > working
> > > > on
> > > > > rebalancing. We should remove that comment then, it looks scary :)
> > > > >
> > > > > --
> > > > > Roman Shtykh
> > > > >
> > > > >
> > > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > > > > maxmu...@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Hello Roman,
> > > > >
> > > > > Did you faced with real issue of delayed rebalance or it's just
> only
> > > for
> > > > > your personal interest?
> > > > > If yes, please, share details and we will try to help you.
> > > > >
> > > > > As for this comment I don't think he is actual. That change was in
> > > 2015.
> > > > > Much has changed
> > > > > within rebalance process since that time. I've uncommented it and
> > > > > rechecked with that
> > > > > cache configuration and haven't seen any failed tests or issues.
> > > > >
> > > > > Probably, that problem was about cache in SYNC mode does not start
> > util
> > > > it
> > > > > loads all data
> > > > > from other nodes. But currently delayed rebalance works the same
> way
> > as
> > > > > IgniteCache#rebalance(),
> > > > > so you can `setRebalanceDelay` to `-1` and call it manually to
> check.
> > > > >
> > > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh
> <rsht...@yahoo.com.invalid
> > >
> > > > > wrote:
> > > > >
> > > > > Igniters,
> > > > > I have found "Known issue, possible deadlock in case of low
> priority
> > > > cache
> > > > > rebalancing delayed" comment in
> > > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> > > explain
> > > > > when using rebalance delay can be an issue and why?
> > > > >
> > > > > -- Roman
> > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > > > --
> > > > --
> > > > Maxim Muzafarov
> > > >
> >
> > --
> > --
> > Maxim Muzafarov
> >
>
-- 
--
Maxim Muzafarov

Re: Unknown known issue on cache rebalancing delayed

Reply via email to