Re: JCache 1.1: IGNITE-8715 Problems with Closeable objects from Factory

2018-06-21 Thread Александр Меньшиков
Hi, Kolay.

This solution looks a bit ugly for me. So I'm not sure about it.
Also, I have tried implementing it but for some reason, only 5 tests start
to pass (95 fails left).
So it's not so obvious where and how cache cleanup happens.

I have found the code which removes for some reason factories from config
at closing time (but doesn't close objects inside).
So maybe there is a raw version of the cleanup mechanism, which I miss.



2018-06-21 18:06 GMT+03:00 Nikolay Izhikov :

> Hello, Alex.
>
> Issue looks trivial from your description.
> Can you clarify - What is the issues with obvious decision you describe?
>
> В Чт, 21/06/2018 в 17:48 +0300, Александр Меньшиков пишет:
> > Hi, Igniters!
> > Can someone give me advice about IGNITE-8715 [1]?
> > It's a part of IEP-21 [2] about updating to JCache 1.1.
> >
> > According to JCache 1.0 and 1.1 specifications Cache#close() should clean
> > up all Closeable objects (CacheLoader, CacheWriter, CacheEntryListener,
> > ExpiryPolicy) created by factories. And in TCK 1.1 there are tests which
> > check it.
> >
> > As I see we haven't such functionality. Am I right?
> >
> > If so which solution will be the best one? I'm thinking about storing all
> > closeable objects in something like GridConcurrentHashSet and close all
> of
> > them when cache going to be closed.
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-8715
> > [2]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 21:+JCache+1.1+support
> > <https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 21%3A+JCache+1.1+support>


JCache 1.1: IGNITE-8715 Problems with Closeable objects from Factory

2018-06-21 Thread Александр Меньшиков
Hi, Igniters!
Can someone give me advice about IGNITE-8715 [1]?
It's a part of IEP-21 [2] about updating to JCache 1.1.

According to JCache 1.0 and 1.1 specifications Cache#close() should clean
up all Closeable objects (CacheLoader, CacheWriter, CacheEntryListener,
ExpiryPolicy) created by factories. And in TCK 1.1 there are tests which
check it.

As I see we haven't such functionality. Am I right?

If so which solution will be the best one? I'm thinking about storing all
closeable objects in something like GridConcurrentHashSet and close all of
them when cache going to be closed.
Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-8715
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-21:+JCache+1.1+support



Re: Upgrading Ignite to JCache 1.1

2018-06-15 Thread Александр Меньшиков
Denis, I think we can include it to a minor release. Because the network
protocol, API, binary compatibility will be saved. And all behavior
changes, in fact, make implementation closer to the documentation of JCache
1.0. Because TCK 1.1.0 in general fixes differents between documentation
and tests in 1.0.

2018-06-14 21:41 GMT+03:00 Denis Magda :

> Guys, are you targeting this for the next big Ignite release? Should be in
> 3 m from now.
>
> --
> Denis
>
> On Thu, Jun 14, 2018 at 2:58 AM Anton Vinogradov  wrote:
>
> > Corrected IEP URL:
> >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 21%3A+JCache+1.1+support
> >
> > чт, 14 июн. 2018 г. в 12:48, Александр Меньшиков :
> >
> > > Igniters,
> > >
> > > I've prepared IEP-21 [1] for this JCache updating task.
> > > It will help us to manage the issues and see the progress in one place.
> > > Also, we have finally added tests for TCK 1.1.0 [2] to our TC which can
> > be
> > > run on any branch.
> > > Both tests cases (for 1.0.1 and for 1.1.0) will coexist until IEP-21
> > > finish.
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-21:+JCache+1.1
> > > [2]
> > >
> > >
> > https://ci.ignite.apache.org/viewType.html?buildTypeId=
> IgniteTests24Java8_JCacheTck11
> > >
> > > 2018-06-06 0:49 GMT+03:00 Denis Magda :
> > >
> > > > Agree, I see zero benefits of being compliant with both specification
> > > > versions. Let’s just focus on the latest one.
> > > >
> > > > Denis
> > > >
> > > > On Tuesday, June 5, 2018, Dmitriy Setrakyan 
> > > wrote:
> > > >
> > > > > Alex,
> > > > >
> > > > > I think it is OK to break TCK 1.0.1 tests in favor of TCK 1.1. Once
> > we
> > > > > finish the migration, I would remove the TCK 1.0.1 test suite
> > > altogether.
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, Jun 5, 2018 at 11:13 AM, Александр Меньшиков <
> > > > sharple...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Okay. There are tests results:
> > > > > >
> > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=1361493;
> > > > > > tab=buildResultsDiv=IgniteTests24Java8_JCacheTck11
> > > > > >
> > > > > > It's the same as locally.
> > > > > >
> > > > > > Also, I have created sub-tasks for all problems we have:
> > > > > >
> > > > > > 1) CacheManagerTest.getUnsafeTypedCacheRequest failed.
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8704
> > > > > >
> > > > > > 2) CacheMBStatisticsBeanTest.testClear failed.
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8705
> > > > > >
> > > > > > 3) CacheManagerTest.close_cachesEmpty failed.
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8708
> > > > > >
> > > > > > 4) CacheMBStatisticsBeanTest.testPutIfAbsent failed.
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8709
> > > > > >
> > > > > > 5) CacheEntryEvent.getOldValue should be available.
> > > > > > Two tests fail because of it.
> > > > > >
> > > > > > Looks like a bug.
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8714
> > > > > >
> > > > > > 6) Problems with Closeable objects from Factory
> > > > > >
> > > > > > *98* tests fail because of it.
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/IGNITE-8715
> > > > > >
> > > > > >
> > > > > > For first 4 problems, I already have PRs.
> > > > > > Problems 2) and 3) will break tests for TCK 1.0.1 because these
> > tests
> > > > > work
> > > > > > wrong in 1.0.1,
> > > > > > and were fixed in 1.1.0.
> > > > > >
> > > > > > 2018-06-05 14:37 GMT+03:00 Dmitry Pavlov  >:
> > > > > >
> > > > > > > Agree with Nikolay we should create build plan, and we can use
> > this
> > > > > 

Re: Upgrading Ignite to JCache 1.1

2018-06-14 Thread Александр Меньшиков
Igniters,

I've prepared IEP-21 [1] for this JCache updating task.
It will help us to manage the issues and see the progress in one place.
Also, we have finally added tests for TCK 1.1.0 [2] to our TC which can be
run on any branch.
Both tests cases (for 1.0.1 and for 1.1.0) will coexist until IEP-21 finish.

[1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-21:+JCache+1.1
[2]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_JCacheTck11

2018-06-06 0:49 GMT+03:00 Denis Magda :

> Agree, I see zero benefits of being compliant with both specification
> versions. Let’s just focus on the latest one.
>
> Denis
>
> On Tuesday, June 5, 2018, Dmitriy Setrakyan  wrote:
>
> > Alex,
> >
> > I think it is OK to break TCK 1.0.1 tests in favor of TCK 1.1. Once we
> > finish the migration, I would remove the TCK 1.0.1 test suite altogether.
> >
> > D.
> >
> > On Tue, Jun 5, 2018 at 11:13 AM, Александр Меньшиков <
> sharple...@gmail.com
> > >
> > wrote:
> >
> > > Okay. There are tests results:
> > >
> > > https://ci.ignite.apache.org/viewLog.html?buildId=1361493;
> > > tab=buildResultsDiv=IgniteTests24Java8_JCacheTck11
> > >
> > > It's the same as locally.
> > >
> > > Also, I have created sub-tasks for all problems we have:
> > >
> > > 1) CacheManagerTest.getUnsafeTypedCacheRequest failed.
> > > https://issues.apache.org/jira/browse/IGNITE-8704
> > >
> > > 2) CacheMBStatisticsBeanTest.testClear failed.
> > > https://issues.apache.org/jira/browse/IGNITE-8705
> > >
> > > 3) CacheManagerTest.close_cachesEmpty failed.
> > > https://issues.apache.org/jira/browse/IGNITE-8708
> > >
> > > 4) CacheMBStatisticsBeanTest.testPutIfAbsent failed.
> > > https://issues.apache.org/jira/browse/IGNITE-8709
> > >
> > > 5) CacheEntryEvent.getOldValue should be available.
> > > Two tests fail because of it.
> > >
> > > Looks like a bug.
> > >
> > > https://issues.apache.org/jira/browse/IGNITE-8714
> > >
> > > 6) Problems with Closeable objects from Factory
> > >
> > > *98* tests fail because of it.
> > >
> > > https://issues.apache.org/jira/browse/IGNITE-8715
> > >
> > >
> > > For first 4 problems, I already have PRs.
> > > Problems 2) and 3) will break tests for TCK 1.0.1 because these tests
> > work
> > > wrong in 1.0.1,
> > > and were fixed in 1.1.0.
> > >
> > > 2018-06-05 14:37 GMT+03:00 Dmitry Pavlov :
> > >
> > > > Agree with Nikolay we should create build plan, and we can use this
> > build
> > > > plan in developement branch.
> > > >
> > > > Merge to master is not necessary before issue is ready.
> > > >
> > > > вт, 5 июн. 2018 г. в 14:04, Nikolay Izhikov :
> > > >
> > > >> Alex, please try to run this build plan for your branch
> > > >>
> > > >> https://ci.ignite.apache.org/viewType.html?buildTypeId=
> > > >> IgniteTests24Java8_JCacheTck11
> > > >>
> > > >> В Вт, 05/06/2018 в 13:56 +0300, Nikolay Izhikov пишет:
> > > >> > Guys, we had a private talk with Vyacheslav and Dmitriy Pavlov.
> > > >> > Here are its resulst:
> > > >> >
> > > >> > 1. I will create JCache1.1 build plan.
> > > >> > I will be able to run tests with Alex new profile enabled.
> > > >> >
> > > >> > So, Alex can run and share with community tests results both for
> > > jcache
> > > >> 1.0 and jcache 1.1.
> > > >> >
> > > >> > 2. Alex, please, create tickets for a JCache 1.1. issues.
> > > >> >
> > > >> >
> > > >> > В Вт, 05/06/2018 в 13:36 +0300, Vyacheslav Daradur пишет:
> > > >> > > AFAIK TeamCity is not able to create such build-plan on the fly.
> > > >> > > Moreover, we will not be able to test master branch in both
> > profiles
> > > >> > > in this case.
> > > >> > >
> > > >> > > Am I miss something?
> > > >> > >
> > > >> > > On Tue, Jun 5, 2018 at 1:31 PM, Nikolay Izhikov <
> > > nizhi...@apache.org>
> > > >> wrote:
> > > >> > > > Vyacheslav,
> > > >> > > >
> > > >> > > > Let's crea

Re: Upgrading Ignite to JCache 1.1

2018-06-05 Thread Александр Меньшиков
Okay. There are tests results:

https://ci.ignite.apache.org/viewLog.html?buildId=1361493=buildResultsDiv=IgniteTests24Java8_JCacheTck11

It's the same as locally.

Also, I have created sub-tasks for all problems we have:

1) CacheManagerTest.getUnsafeTypedCacheRequest failed.
https://issues.apache.org/jira/browse/IGNITE-8704

2) CacheMBStatisticsBeanTest.testClear failed.
https://issues.apache.org/jira/browse/IGNITE-8705

3) CacheManagerTest.close_cachesEmpty failed.
https://issues.apache.org/jira/browse/IGNITE-8708

4) CacheMBStatisticsBeanTest.testPutIfAbsent failed.
https://issues.apache.org/jira/browse/IGNITE-8709

5) CacheEntryEvent.getOldValue should be available.
Two tests fail because of it.

Looks like a bug.

https://issues.apache.org/jira/browse/IGNITE-8714

6) Problems with Closeable objects from Factory

*98* tests fail because of it.

https://issues.apache.org/jira/browse/IGNITE-8715


For first 4 problems, I already have PRs.
Problems 2) and 3) will break tests for TCK 1.0.1 because these tests work
wrong in 1.0.1,
and were fixed in 1.1.0.

2018-06-05 14:37 GMT+03:00 Dmitry Pavlov :

> Agree with Nikolay we should create build plan, and we can use this build
> plan in developement branch.
>
> Merge to master is not necessary before issue is ready.
>
> вт, 5 июн. 2018 г. в 14:04, Nikolay Izhikov :
>
>> Alex, please try to run this build plan for your branch
>>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=
>> IgniteTests24Java8_JCacheTck11
>>
>> В Вт, 05/06/2018 в 13:56 +0300, Nikolay Izhikov пишет:
>> > Guys, we had a private talk with Vyacheslav and Dmitriy Pavlov.
>> > Here are its resulst:
>> >
>> > 1. I will create JCache1.1 build plan.
>> > I will be able to run tests with Alex new profile enabled.
>> >
>> > So, Alex can run and share with community tests results both for jcache
>> 1.0 and jcache 1.1.
>> >
>> > 2. Alex, please, create tickets for a JCache 1.1. issues.
>> >
>> >
>> > В Вт, 05/06/2018 в 13:36 +0300, Vyacheslav Daradur пишет:
>> > > AFAIK TeamCity is not able to create such build-plan on the fly.
>> > > Moreover, we will not be able to test master branch in both profiles
>> > > in this case.
>> > >
>> > > Am I miss something?
>> > >
>> > > On Tue, Jun 5, 2018 at 1:31 PM, Nikolay Izhikov 
>> wrote:
>> > > > Vyacheslav,
>> > > >
>> > > > Let's create build plan on TC for this profile.
>> > > > Why we need to merge it in master now?
>> > > >
>> > > > В Вт, 05/06/2018 в 13:29 +0300, Vyacheslav Daradur пишет:
>> > > > > Nikolay, there isn't anything broken in PR.
>> > > > >
>> > > > > The PR is needed to add new build-plan on TC.
>> > > > >
>> > > > > We need tools to check that fixes for 1.1 don't break
>> compatibility with 1.0.
>> > > > >
>> > > > > On Tue, Jun 5, 2018 at 1:21 PM, Кузнецов Алексей Львович
>> > > > >  wrote:
>> > > > > > Hi
>> > > > > >
>> > > > > > After Alexander create separate tickets for failed tests,
>> everybody is free
>> > > > > > to fix them.
>> > > > > > So we can proceed faster issue resolving.
>> > > > > >
>> > > > > > > Hello, Igniters.
>> > > > > > >
>> > > > > > > Actually, I don't understand why we should merge in master
>> something
>> > > > > > > broken.
>> > > > > > > Currently, Ignite is not ready for JCache 1.1.
>> > > > > > >
>> > > > > > > Only change I see in PR is new profile [1].
>> > > > > > > Is it required to have it to continue jcache 1.1 support
>> implementation?
>> > > > > > >
>> > > > > > > I think Alexandex can proceed with current profile and change
>> it  to run
>> > > > > > > tests for JCache 1.1 his own branch.
>> > > > > > >
>> > > > > > > Am I miss something?
>> > > > > > >
>> > > > > > > [1] https://github.com/apache/ignite/pull/4114
>> > > > > > >
>> > > > > > > В Вт, 05/06/2018 в 12:50 +0300, Dmitry Pavlov пишет:
>> > > > > > > >
>> > > > > > > > Hi Alexander,
>> > > > > > > >
&

Re: Upgrading Ignite to JCache 1.1

2018-06-04 Thread Александр Меньшиков
Hi,
Igniters!

I have taken a look at the jcache 1.1 spec and TCK.
And I can write a brief summary of my plan to solve the task.

I have found 6 problems in current master with TCK 1.1 (104 failed tests).
Of course, we should run this TCK on CI to be absolutely sure we didn't
miss something.

So the first step is an adding TCK 1.1 suite to our Team City.
I have created sub-task [1] for it and prepared the PR [2].
I need someone with access to merge PR and add suite to Team City.
It going to be just a clone of the current jcache-tck suite, but with using
the new profile.
You can test new profile locally with the following command:

mvn test -P-release,jcache-tck-1.1 -pl :ignite-core -am


After that, I will start to add sub-task for every problem we have.

Nikolay, can you please help me with merging [1] and adding to the suite?

[1]JIRA: https://issues.apache.org/jira/browse/IGNITE-8687
[2]PR: https://github.com/apache/ignite/pull/4114/files
[3]CI:
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll_IgniteTests24Java8=pull/4114/head=buildTypeStatusDiv

2018-05-23 14:31 GMT+03:00 Александр Меньшиков :

> Thanks, Slava. You are right.
>
> 2018-05-23 14:00 GMT+03:00 Vyacheslav Daradur :
>
>> Hi, Alex!
>>
>> Please have a look at maven profile named "jcache-tck".
>>
>> I believe this is what you are looking for.
>>
>> On Wed, May 23, 2018 at 11:50 AM, Александр Меньшиков
>>  wrote:
>> > Igniters,
>> > I think I can do it. As I see we already have JCache TCK tests in TC.
>> > Can I take somewhere settings/script which we are using?
>> >
>> > 2018-05-23 2:58 GMT+03:00 Dmitriy Setrakyan :
>> >
>> >> Igniters,
>> >>
>> >> It will be great if someone in the community would pick this up. The
>> amount
>> >> of changes are minimal and many of them only have to do with
>> clarifying the
>> >> documentation. However, removing JSR 107 license confusion in 1.1
>> would be
>> >> great for Ignite.
>> >>
>> >> D.
>> >>
>> >> On Tue, May 22, 2018 at 3:04 PM, Denis Magda 
>> wrote:
>> >>
>> >> > Here is a list of all changes:
>> >> > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
>> >> >
>> >> > The primary argument for the migration is a license. JCache 1.0 is
>> >> licensed
>> >> > by Oracle that causes legal issues for some of the users. Once we
>> upgrade
>> >> > to JCache 1.1 the won't longer be a big deal.
>> >> >
>> >> > However, once we move to 1.1 we need to be sure that we comply with
>> the
>> >> > updated specification.
>> >> >
>> >> > --
>> >> > Denis
>> >> >
>> >> > On Tue, May 22, 2018 at 5:20 AM, Dmitry Pavlov <
>> dpavlov@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Denis,
>> >> > >
>> >> > > What was improved in JCache 1.1?
>> >> > >
>> >> > > Would it be useful for product to change supported spec. version?
>> >> > >
>> >> > > Sincerely,
>> >> > > Dmitriy Pavlov
>> >> > >
>> >> > > пн, 21 мая 2018 г. в 20:12, Denis Magda :
>> >> > >
>> >> > > > Igniters,
>> >> > > >
>> >> > > > Eventually, JCache was relicensed to Apache 2.0 and released 1.1
>> >> > version:
>> >> > > > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
>> >> > > >
>> >> > > > Is there anyone interested in upgrading Ignite to the new
>> version for
>> >> > the
>> >> > > > next release?
>> >> > > > https://issues.apache.org/jira/browse/IGNITE-8548
>> >> > > >
>> >> > > > --
>> >> > > > Denis
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>
>
>


Re: IGNITE-8238 - Operation can fails with unexpected RuntimeException when node is stopping

2018-05-31 Thread Александр Меньшиков
Okay, I have changed the logging level to debug, updated master and rerun
TC tests [1].

[1]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll_IgniteTests24Java8=pull%2F3993%2Fhead=buildTypeStatusDiv

2018-05-30 16:13 GMT+03:00 Andrey Mashenkov :

> Hi,
>
> Yes, now you are on the right way.
>
> It is ok if checkpointReadLock() throws runtime exception and in your PR
> reason become obvious.
> Now, it is easy to handle exceptions thrown from checkpointReadLock()
> method on node stop if needed.
>
> TtlManager looks correctly ignore NodeStopping exception, but I'd avoid
> logging of warning message with stacktrace here.
> I've left comments in ticket.
>
> Normally, this change shouldn't affect any other behavior and you
> shouldn't be bother to catch this exception on every checkpointReadLock
> call.
> Those places where checkpointReadLock is called from, now fixed and will
> correctly handle exception with cause (NodeStoppingExcpetion) if they tried
> to do this before, of cause.
>
> checkpointReadLock() could throw IgniteException before your changes. So
> changing type RuntimeException -> IgniteException shouldn't have any
> affect (otherwise it is a bug of incorrect exception handling).
>
> Please run TC test and check if there is no new failures.
>
>
>
> I see nothing in GridCacheAdapter#getAllAsync0 that may be affected by
> your changes.
>
>
> On Tue, May 29, 2018 at 8:05 PM, Dmitriy Setrakyan 
> wrote:
>
>> As suggested before, please do not put blank ticket numbers into subjects
>> because no one understands ticket numbers. Please add titles or some other
>> context to the subject. This will improve the level of engagement from the
>> community.
>>
>> I have changed the subject of this thread, let's continue the discussion
>> here.
>>
>> D.
>>
>> On Tue, May 29, 2018 at 9:10 AM, Александр Меньшиков <
>> sharple...@gmail.com> wrote:
>>
>>> Hi, Andrew.
>>>
>>> You suggested continuing the discussion on dev-list about IGNITE-8238 [1]
>>> I have reworked my PR [2]. Have I moved in the right direction?
>>>
>>> I see 57 usages of `checkpointReadLock` in the core module (without
>>> tests).
>>> And it still unclear for me should I catch all exceptions from
>>> `checkpointReadLock` or not?
>>> Some places look okay like usage inside overrides of `GridWorker#body`.
>>> But for example `GridCacheAdapter#getAllAsync0` makes me worry.
>>>
>>> [1] https://issues.apache.org/jira/browse/IGNITE-8238
>>> [2] https://github.com/apache/ignite/pull/3993/files
>>>
>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


IGNITE-8238

2018-05-29 Thread Александр Меньшиков
Hi, Andrew.

You suggested continuing the discussion on dev-list about IGNITE-8238 [1]
I have reworked my PR [2]. Have I moved in the right direction?

I see 57 usages of `checkpointReadLock` in the core module (without tests).
And it still unclear for me should I catch all exceptions from
`checkpointReadLock` or not?
Some places look okay like usage inside overrides of `GridWorker#body`.
But for example `GridCacheAdapter#getAllAsync0` makes me worry.

[1] https://issues.apache.org/jira/browse/IGNITE-8238
[2] https://github.com/apache/ignite/pull/3993/files


Re: Upgrading Ignite to JCache 1.1

2018-05-23 Thread Александр Меньшиков
Thanks, Slava. You are right.

2018-05-23 14:00 GMT+03:00 Vyacheslav Daradur <daradu...@gmail.com>:

> Hi, Alex!
>
> Please have a look at maven profile named "jcache-tck".
>
> I believe this is what you are looking for.
>
> On Wed, May 23, 2018 at 11:50 AM, Александр Меньшиков
> <sharple...@gmail.com> wrote:
> > Igniters,
> > I think I can do it. As I see we already have JCache TCK tests in TC.
> > Can I take somewhere settings/script which we are using?
> >
> > 2018-05-23 2:58 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
> >
> >> Igniters,
> >>
> >> It will be great if someone in the community would pick this up. The
> amount
> >> of changes are minimal and many of them only have to do with clarifying
> the
> >> documentation. However, removing JSR 107 license confusion in 1.1 would
> be
> >> great for Ignite.
> >>
> >> D.
> >>
> >> On Tue, May 22, 2018 at 3:04 PM, Denis Magda <dma...@apache.org> wrote:
> >>
> >> > Here is a list of all changes:
> >> > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
> >> >
> >> > The primary argument for the migration is a license. JCache 1.0 is
> >> licensed
> >> > by Oracle that causes legal issues for some of the users. Once we
> upgrade
> >> > to JCache 1.1 the won't longer be a big deal.
> >> >
> >> > However, once we move to 1.1 we need to be sure that we comply with
> the
> >> > updated specification.
> >> >
> >> > --
> >> > Denis
> >> >
> >> > On Tue, May 22, 2018 at 5:20 AM, Dmitry Pavlov <dpavlov@gmail.com
> >
> >> > wrote:
> >> >
> >> > > Hi Denis,
> >> > >
> >> > > What was improved in JCache 1.1?
> >> > >
> >> > > Would it be useful for product to change supported spec. version?
> >> > >
> >> > > Sincerely,
> >> > > Dmitriy Pavlov
> >> > >
> >> > > пн, 21 мая 2018 г. в 20:12, Denis Magda <dma...@apache.org>:
> >> > >
> >> > > > Igniters,
> >> > > >
> >> > > > Eventually, JCache was relicensed to Apache 2.0 and released 1.1
> >> > version:
> >> > > > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
> >> > > >
> >> > > > Is there anyone interested in upgrading Ignite to the new version
> for
> >> > the
> >> > > > next release?
> >> > > > https://issues.apache.org/jira/browse/IGNITE-8548
> >> > > >
> >> > > > --
> >> > > > Denis
> >> > > >
> >> > >
> >> >
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: Upgrading Ignite to JCache 1.1

2018-05-23 Thread Александр Меньшиков
Igniters,
I think I can do it. As I see we already have JCache TCK tests in TC.
Can I take somewhere settings/script which we are using?

2018-05-23 2:58 GMT+03:00 Dmitriy Setrakyan :

> Igniters,
>
> It will be great if someone in the community would pick this up. The amount
> of changes are minimal and many of them only have to do with clarifying the
> documentation. However, removing JSR 107 license confusion in 1.1 would be
> great for Ignite.
>
> D.
>
> On Tue, May 22, 2018 at 3:04 PM, Denis Magda  wrote:
>
> > Here is a list of all changes:
> > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
> >
> > The primary argument for the migration is a license. JCache 1.0 is
> licensed
> > by Oracle that causes legal issues for some of the users. Once we upgrade
> > to JCache 1.1 the won't longer be a big deal.
> >
> > However, once we move to 1.1 we need to be sure that we comply with the
> > updated specification.
> >
> > --
> > Denis
> >
> > On Tue, May 22, 2018 at 5:20 AM, Dmitry Pavlov 
> > wrote:
> >
> > > Hi Denis,
> > >
> > > What was improved in JCache 1.1?
> > >
> > > Would it be useful for product to change supported spec. version?
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пн, 21 мая 2018 г. в 20:12, Denis Magda :
> > >
> > > > Igniters,
> > > >
> > > > Eventually, JCache was relicensed to Apache 2.0 and released 1.1
> > version:
> > > > https://groups.google.com/forum/#!topic/jsr107/BC1qKqknzKU
> > > >
> > > > Is there anyone interested in upgrading Ignite to the new version for
> > the
> > > > next release?
> > > > https://issues.apache.org/jira/browse/IGNITE-8548
> > > >
> > > > --
> > > > Denis
> > > >
> > >
> >
>


Re: Tests coverage report of Ignite 2.5 core module

2018-05-14 Thread Александр Меньшиков
Hi Dmitry,
You are right this is not a sorting, this is a ranking by two parameters
where both of them equal in priority.
For such task in math, we have an idea of "dominating subset".
It simply too understand by example:

  P1, P2
E1  2, 2
E2  1, 2
E3  2, 1
E4  1, 1

Here element E1 dominate all other elements. E2 and E3 dominate E4, and E4
is a loser.
So we can do a ranking like that: E1 has rank 1, E2 and E3 have rank 2, and
E4 has rank 3.

I have done the same with JoCoCo results.
Where less coverage is better, and greate complexity is better (because we
are looking for classes with the problems).
In my list, there are all classes with rank 1 (except well covered, small,
and with a strange result of 0% coverage).

Finally, I have sorted this list by one value "coverage/complexity" -- just
for better viewing.

I have written R script for this, and if you think about some other
parameters or filtering -- let me know.


2018-05-14 20:54 GMT+03:00 Dmitry Pavlov <dpavlov@gmail.com>:

> Hi Alexander,
>
> I did not quite understand how this list of top tests was obtained. It does
> not look like sorted.
>
> Can you clarify?
>
> Sincerely,
> Dmitriy Pavlov
>
> пн, 14 мая 2018 г. в 20:48, Александр Меньшиков <sharple...@gmail.com>:
>
> > Vyacheslav,
> > I have made a simple ranking for your data for highlighting classes which
> > more than other needs testing.
> >
> > Please take a look at the list which I have done (36 classes):
> >
> > https://drive.google.com/file/d/1RJrtnN77MeEWLTShFzqaLInCdOSb9
> WdC/view?usp=sharing
> >
> >
> > Below you can read how did it in details:
> >
> > First I have removed all small classes (less than 200 lines), classes
> with
> > more than 80% coverage, and with 0% (too strange result).
> > After that, I have extracted subset which dominates other classes by two
> > parameters: coverage and complexity.
> > It means all other classes have not less coverage and not greater
> > complexity, and also better at least at one of these parameters.
> >
> > Finally, I have sorted this list by value "coverage/complexity".
> >
>


Re: Tests coverage report of Ignite 2.5 core module

2018-05-14 Thread Александр Меньшиков
Vyacheslav,
I have made a simple ranking for your data for highlighting classes which
more than other needs testing.

Please take a look at the list which I have done (36 classes):
https://drive.google.com/file/d/1RJrtnN77MeEWLTShFzqaLInCdOSb9WdC/view?usp=sharing


Below you can read how did it in details:

First I have removed all small classes (less than 200 lines), classes with
more than 80% coverage, and with 0% (too strange result).
After that, I have extracted subset which dominates other classes by two
parameters: coverage and complexity.
It means all other classes have not less coverage and not greater
complexity, and also better at least at one of these parameters.

Finally, I have sorted this list by value "coverage/complexity".


Re: Tests coverage report of Ignite 2.5 core module

2018-05-14 Thread Александр Меньшиков
Hi Vyacheslav,
Great job!

I specifically happy to see jacoco.csv file into the archive.
I think I can range classes by some complex criterion, like
coverage/complexity.
That classes could be the first in the queue for testing.

2018-05-14 17:49 GMT+03:00 Vyacheslav Daradur :

> TestSuites weren't used.
>
> Set of tests classes was created as result of finding not abstract
> classes which names contain '*Test*' except '*TestCase*' in
> '/src/test/java/org/apache/ignite' as Maven does [1].
>
> Each tests class was executed and covered separately.
>
> You can see used tests classes in the file 'tests-classes.txt' which
> was included in the archive.
>
>
> [1] http://maven.apache.org/surefire/maven-surefire-
> plugin/examples/inclusion-exclusion.html
>
>
>
> On Mon, May 14, 2018 at 5:30 PM, Dmitry Pavlov 
> wrote:
> > Ok, thank you. I missed that I should download the archive first.
> >
> > I've found for example that method truncate in FSyncWalManager is not
> > covered.
> >
> > Which test suites were executed?
> >
> > пн, 14 мая 2018 г. в 17:16, Vyacheslav Daradur :
> >
> >> Hi, Dmitry,
> >>
> >> Download and unpack the archive first:
> >>
> >> https://www.dropbox.com/s/rdgs1svvojm757x/ignite-2.5-
> core-module-tests-coverage-report-rev-d83f1ec.zip
> >>
> >>
> >> On Mon, May 14, 2018 at 5:12 PM, Dmitry Pavlov 
> >> wrote:
> >> > Hi Vyacheslav,
> >> >
> >> >
> >> https://www.dropbox.com/s/rdgs1svvojm757x/ignite-2.5-
> core-module-tests-coverage-report-rev-d83f1ec.zip?dl=0&
> file_subpath=%2Fcoverage-report%2Findex.html
> >> >
> >> > says error 400 for me.
> >> >
> >> > Sincerely,
> >> > Dmitriy Pavlov
> >> >
> >> > пн, 14 мая 2018 г. в 17:06, Vyacheslav Daradur :
> >> >
> >> >> Hi, Igniters!
> >> >>
> >> >> I’ve made tests coverage report of core module using Jacoco 0.8.1
> >> >>
> >> >> I’d like to share results with the community. Here is a link:
> >> >>
> >> >>
> >> https://www.dropbox.com/s/rdgs1svvojm757x/ignite-2.5-
> core-module-tests-coverage-report-rev-d83f1ec.zip?dl=0
> >> >>
> >> >> For viewing the report unpack the archive and open:
> >> >> coverage-report/index.html
> >> >>
> >> >> Also, there is ‘tests-classes.txt’ which includes names of tests
> >> >> classes which have been used.
> >> >>
> >> >> The report was generated from branch: ‘ignite-2-5’, commit:
> >> >>
> >> >>
> >> https://github.com/apache/ignite/tree/d83f1ec0a4d010e57fecc12ffd3d1f
> 8346ded61c
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Vyacheslav D.
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav D.
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>


IGNITE-6430 (Done) CacheGroupsMetricsRebalanceTest.testRebalanceEstimateFinishTime test fails periodically

2018-05-10 Thread Александр Меньшиков
Hi,

I have fixed problem with the test
CacheGroupsMetricsRebalanceTest.testRebalanceEstimateFinishTime.

Please review and merge if okay.

P.S.
Aleksey Plekhanov already approved it (see Jira).


JIRA: https://issues.apache.org/jira/browse/IGNITE-6430
PR: https://github.com/apache/ignite/pull/3914
TC (with 120 passes):
https://ci.ignite.apache.org/viewLog.html?buildId=1246058=IgniteTests24Java8_Cache3=testsInfo
TC:
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_BuildApacheIgnite_IgniteTests24Java8=pull%2F3914%2Fhead=buildTypeStatusDiv


Re: Optimize GridLongList serialization

2018-05-10 Thread Александр Меньшиков
Hi Dmitry,

It's 100% compatible. I just change one line for serializing only valuable
part of the array
 -- just use another method of a writer.

I think it's enough for now. We can return to a discussion about compact
GridLongList
 in future when the 3.0 release will start coming.

2018-05-08 19:33 GMT+03:00 Dmitry Pavlov <dpavlov@gmail.com>:

> Hi Alexander,
>
> Would new implementation of GridLongList be able to read value serialized
> by old implementation?
>
> Is it possible old implementation would not be able to read value from new?
>
> Sincerely,
> Dmitriy Pavlov
>
> пн, 7 мая 2018 г. в 11:30, Александр Меньшиков <sharple...@gmail.com>:
>
> > Hi.
> > I have finished this task. Just replaced
> >
> > `writer.writeLongArray("arr", arr)`
> > with
> > `writer.writeLongArray("arr", arr, idx)`
> >
> > Please review and merge if okay.
> >
> > Jira: https://issues.apache.org/jira/browse/IGNITE-8054
> > PR: https://github.com/apache/ignite/pull/3748
> > CI:
> > https://ci.ignite.apache.org/viewType.html?buildTypeId=
> IgniteTests24Java8_RunAll_IgniteTests24Java8=
> pull%2F3748%2Fhead=buildTypeStatusDiv
> >
> > 2018-03-27 11:50 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
> >
> >> The ticket is created:
> >> https://issues.apache.org/jira/browse/IGNITE-8054
> >>
> >> 2018-03-27 10:04 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
> >>
> >>> Sorry, Dmitiry, I meant Ignite, not GridGain (memories of pre-apache
> >>> coding). I am assuming that Alex Menshikov was referring to
> GridLongList
> >>> class in Ignite.
> >>>
> >>> D.
> >>>
> >>> On Mon, Mar 26, 2018 at 11:52 PM, Dmitry Pavlov <dpavlov@gmail.com
> >
> >>> wrote:
> >>>
> >>> > Hi Dmitriy, did you mean GridList?
> >>> >
> >>> > I don't understand what does it mean GridGain compress.
> >>> >
> >>> > вт, 27 мар. 2018 г. в 3:06, Dmitriy Setrakyan <dsetrak...@apache.org
> >:
> >>> >
> >>> > > Thanks, Alex.
> >>> > >
> >>> > > GridGain automatically compresses all the internal types. Somehow
> it
> >>> > looks
> >>> > > like the GridLongList may have been mixed. Can you please file a
> >>> ticket
> >>> > for
> >>> > > 2.5 release?
> >>> > >
> >>> > > D.
> >>> > >
> >>> > > On Mon, Mar 26, 2018 at 4:55 AM, Александр Меньшиков <
> >>> > sharple...@gmail.com
> >>> > > >
> >>> > > wrote:
> >>> > >
> >>> > > > I investigated network loading and found that a big part of
> >>> internal
> >>> > data
> >>> > > > inside messages is `GridLongList`.
> >>> > > > It is a part of `GridDhtTxFinishRequest`,
> >>> > > > `GridDhtAtomicDeferredUpdateResponse`,
> >>> `GridDhtAtomicUpdateRequest`,
> >>> > > > `GridNearAtomicFullUpdateRequest` and `NearCacheUpdates`.
> >>> > > >
> >>> > > > So I think it has the sense to optimize `GridLongList`
> >>> serialization.
> >>> > > >
> >>> > > >
> >>> > > > Here we serialize all elements and don't take into account `idx`
> >>> value:
> >>> > > >
> >>> > > > ```
> >>> > > >
> >>> > > > @Override public boolean writeTo(ByteBuffer buf, MessageWriter
> >>> writer)
> >>> > {
> >>> > > >
> >>> > > > writer.setBuffer(buf);
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > if (!writer.isHeaderWritten()) {
> >>> > > >
> >>> > > > if (!writer.writeHeader(directType(),
> fieldsCount()))
> >>> > > >
> >>> > > > return false;
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > writer.onHeaderWritten();
> >>> > > >
> >>> > > > }
> >>> > > >
> >>> > > >
> >>> > > >
> >>> >

Re: Ticket review checklist

2018-05-08 Thread Александр Меньшиков
 and
> > >> > separate
> > >> > > >
> > >> > > > PR.
> > >> > > > > If
> > >> > > > > > one have a time for refactoring, it should not be a problem
> > for
> > >> > him to
> > >> > > > > > spend several minutes on JIRA and GitHub.
> > >> > > > > >
> > >> > > > > > As far as documentation - what you describe is normal review
> > >> > process,
> > >> > > > >
> > >> > > > > when
> > >> > > > > > reviewer might want to ask contributor to fix something.
> > >> Checklist
> > >> > is a
> > >> > > > > > different thing - this is a set of rules which must be
> > followed
> > >> by
> > >> > > > >
> > >> > > > > anyone.
> > >> > > > > > I do not understand how you can define documentation in this
> > >> > checklist.
> > >> > > > > > Same problem with logging - what is "enough"?
> > >> > > > > >
> > >> > > > > > On Tue, Apr 24, 2018 at 4:51 PM, Eduard Shangareev <
> > >> > > > > > eduard.shangar...@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > Igniters,
> > >> > > > > > >
> > >> > > > > > > I don't understand why you are so against refactoring.
> > >> > > > > > > Code already smells like hell. Methods 200+ line is
> normal.
> > >> > Exchange
> > >> > > > > >
> > >> > > > > > future
> > >> > > > > > > is asking to be separated on several one. Transaction code
> > >> could
> > >> > > > > >
> > >> > > > > > understand
> > >> > > > > > > few people.
> > >> > > > > > >
> > >> > > > > > > If we separate refactoring from development it would mean
> > that
> > >> > no one
> > >> > > > > >
> > >> > > > > > will
> > >> > > > > > > do it.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > 2) Documentation.
> > >> > > > > > > Everything which was asked by reviewers to clarify idea
> > >> should be
> > >> > > > > >
> > >> > > > > > reflected
> > >> > > > > > > in the code.
> > >> > > > > > >
> > >> > > > > > > 3) Logging.
> > >> > > > > > > Logging should be enough to troubleshoot the problem if
> > >> someone
> > >> > comes
> > >> > > > >
> > >> > > > > to
> > >> > > > > > > user-list with an issue in the code.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov <
> > >> > > >
> > >> > > > dpavlov@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Igniters,
> > >> > > > > > > >
> > >> > > > > > > > +1 to idea of checklist.
> > >> > > > > > > >
> > >> > > > > > > > +1 to refactoring and documenting code related to ticket
> > in
> > >> > +/-20
> > >> > > >
> > >> > > > LOC
> > >> > > > > > at
> > >> > > > > > > > least.
> > >> > > > > > > >
> > >> > > > > > > > If we start to do it as part of our regular
> contribution,
> > >> code
> > >> > will
> > >> > > > >
> > >> > > > > be
> > >> > > > > > > > better, it would became common practice and part of
> Apache
> > >> > Ignite
&g

Re: Optimize GridLongList serialization

2018-05-07 Thread Александр Меньшиков
Hi.
I have finished this task. Just replaced

`writer.writeLongArray("arr", arr)`
with
`writer.writeLongArray("arr", arr, idx)`

Please review and merge if okay.

Jira: https://issues.apache.org/jira/browse/IGNITE-8054
PR: https://github.com/apache/ignite/pull/3748
CI:
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll_IgniteTests24Java8=pull%2F3748%2Fhead=buildTypeStatusDiv

2018-03-27 11:50 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> The ticket is created:
> https://issues.apache.org/jira/browse/IGNITE-8054
>
> 2018-03-27 10:04 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>
>> Sorry, Dmitiry, I meant Ignite, not GridGain (memories of pre-apache
>> coding). I am assuming that Alex Menshikov was referring to GridLongList
>> class in Ignite.
>>
>> D.
>>
>> On Mon, Mar 26, 2018 at 11:52 PM, Dmitry Pavlov <dpavlov@gmail.com>
>> wrote:
>>
>> > Hi Dmitriy, did you mean GridList?
>> >
>> > I don't understand what does it mean GridGain compress.
>> >
>> > вт, 27 мар. 2018 г. в 3:06, Dmitriy Setrakyan <dsetrak...@apache.org>:
>> >
>> > > Thanks, Alex.
>> > >
>> > > GridGain automatically compresses all the internal types. Somehow it
>> > looks
>> > > like the GridLongList may have been mixed. Can you please file a
>> ticket
>> > for
>> > > 2.5 release?
>> > >
>> > > D.
>> > >
>> > > On Mon, Mar 26, 2018 at 4:55 AM, Александр Меньшиков <
>> > sharple...@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > I investigated network loading and found that a big part of internal
>> > data
>> > > > inside messages is `GridLongList`.
>> > > > It is a part of `GridDhtTxFinishRequest`,
>> > > > `GridDhtAtomicDeferredUpdateResponse`,
>> `GridDhtAtomicUpdateRequest`,
>> > > > `GridNearAtomicFullUpdateRequest` and `NearCacheUpdates`.
>> > > >
>> > > > So I think it has the sense to optimize `GridLongList`
>> serialization.
>> > > >
>> > > >
>> > > > Here we serialize all elements and don't take into account `idx`
>> value:
>> > > >
>> > > > ```
>> > > >
>> > > > @Override public boolean writeTo(ByteBuffer buf, MessageWriter
>> writer)
>> > {
>> > > >
>> > > > writer.setBuffer(buf);
>> > > >
>> > > >
>> > > >
>> > > > if (!writer.isHeaderWritten()) {
>> > > >
>> > > > if (!writer.writeHeader(directType(), fieldsCount()))
>> > > >
>> > > > return false;
>> > > >
>> > > >
>> > > >
>> > > > writer.onHeaderWritten();
>> > > >
>> > > > }
>> > > >
>> > > >
>> > > >
>> > > > switch (writer.state()) {
>> > > >
>> > > > case 0:
>> > > >
>> > > > if (!writer.writeLongArray("arr", arr))
>> > > >
>> > > > return false;
>> > > >
>> > > >
>> > > >
>> > > > writer.incrementState();
>> > > >
>> > > >
>> > > >
>> > > > case 1:
>> > > >
>> > > > if (!writer.writeInt("idx", idx))
>> > > >
>> > > > return false;
>> > > >
>> > > >
>> > > >
>> > > > writer.incrementState();
>> > > >
>> > > >
>> > > >
>> > > > }
>> > > >
>> > > >
>> > > >
>> > > > return true;
>> > > >
>> > > > }
>> > > >
>> > > > ```
>> > > >
>> > > >
>> > > >
>> > > > Which is not happening in another serialization method in the same
>> > class:
>> > > >
>> > > >
>> > > >
>> > > > ```
>> > > >
>> > > > public static void writeTo(DataOutput out, @Nullable GridLongList
>> list)
>> > > > throws IOException {
>> > > >
>> > > > out.writeInt(list != null ? list.idx : -1);
>> > > >
>> > > >
>> > > >
>> > > > if (list != null) {
>> > > >
>> > > > for (int i = 0; i < list.idx; i++)
>> > > >
>> > > > out.writeLong(list.arr[i]);
>> > > >
>> > > > }
>> > > >
>> > > > }
>> > > >
>> > > > ```
>> > > >
>> > > >
>> > > > So, we can simply reduce messages size by sending only a valuable
>> part
>> > of
>> > > > the array.
>> > > > If you don't mind I will create an issue in Jira for this.
>> > > >
>> > > >
>> > > > By the way, `long` is a huge type. As I see in most cases
>> > `GridLongList`
>> > > > uses for counters.
>> > > > And I have checked the possibility of compress `long` into smaller
>> > types
>> > > as
>> > > > `int`, `short` or `byte` in test
>> > > > `GridCacheInterceptorAtomicRebalanceTest` (took it by random).
>> > > > And found out that all `long` in`GridLongList` can be cast to `int`
>> and
>> > > 70%
>> > > > of them to shorts.
>> > > > Such conversion is quite fast about 1.1 (ns) per element (I have
>> > checked
>> > > it
>> > > > by JMH test).
>> > > >
>> > > >
>> > > >
>> > > > Of course, there are a lot of ways to compress data,
>> > > > but I know proprietary GridGain plug-in has different
>> `MessageWriter`
>> > > > implementation.
>> > > > So maybe it is unnecessary and some compression already exists in
>> this
>> > > > proprietary plug-in.
>> > > > Does someone know something about it?
>> > > >
>> > >
>> >
>>
>
>


Re: Looks like a bug in ServerImpl.joinTopology()

2018-04-28 Thread Александр Меньшиков
+ Anton Vinogradov

2018-04-27 17:40 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Yakov, thank you for the advice.
>
> The thread.sleep is not enough, but some latch + future give me a way to
> the
> reproducer.
>
> I have created PR [1] into my master, for showing a test and modification
> of
> ServerImpl which help me to slow down execution inside a danger section.
>
> A code of test a bit long, but basically it about two parts:
>
> In the first part, I randomly start and stop nodes to get a moment when
> a server is starting to execute the dangerous code which I described in the
> first message.
>
> In the second part, I'm waiting while the first part produces this
> situation
> and after that, I call public method of ServerImpl which fails with an
> exception:
>
> java.lang.AssertionError: Invalid node order: TcpDiscoveryNode
> [id=f6bf048d-378b-4960-94cb-84e3d332, addrs=[127.0.0.1], sockAddrs=[/
> 127.0.0.1:47502], discPort=47502, order=0, intOrder=2, 
> lastExchangeTime=1524836605995,
> loc=false, ver=2.5.0#20180426-sha1:34e22396, isClient=false]
> at org.apache.ignite.spi.discovery.tcp.internal.
> TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52)
> at org.apache.ignite.spi.discovery.tcp.internal.
> TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49)
> at org.apache.ignite.internal.util.lang.GridFunc.isAll(
> GridFunc.java:2014)
> at org.apache.ignite.internal.util.IgniteUtils.arrayList(
> IgniteUtils.java:9679)
> at org.apache.ignite.internal.util.IgniteUtils.arrayList(
> IgniteUtils.java:9652)
> at org.apache.ignite.spi.discovery.tcp.internal.
> TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590)
> at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.
> visibleRemoteNodes(TcpDiscoveryNodesRing.java:164)
> at org.apache.ignite.spi.discovery.tcp.ServerImpl.
> getRemoteNodes(ServerImpl.java:304)
>
> As I told in the first message the problem arises because of the current
> code
> changes local node internal order and breaks sorting in
> TcpDiscoveryNodesRing.nodes collection.
>
> Is this reproducer convince enough?
>
> [1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files
>
>
>
>
> 2018-02-13 20:17 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
>> Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
>> anywhere you like to show the incorrect behavior you describe.
>>
>> --Yakov
>>
>
>


Re: Looks like a bug in ServerImpl.joinTopology()

2018-04-27 Thread Александр Меньшиков
Yakov, thank you for the advice.

The thread.sleep is not enough, but some latch + future give me a way to the
reproducer.

I have created PR [1] into my master, for showing a test and modification of
ServerImpl which help me to slow down execution inside a danger section.

A code of test a bit long, but basically it about two parts:

In the first part, I randomly start and stop nodes to get a moment when
a server is starting to execute the dangerous code which I described in the
first message.

In the second part, I'm waiting while the first part produces this situation
and after that, I call public method of ServerImpl which fails with an
exception:

java.lang.AssertionError: Invalid node order: TcpDiscoveryNode
[id=f6bf048d-378b-4960-94cb-84e3d332, addrs=[127.0.0.1], sockAddrs=[/
127.0.0.1:47502], discPort=47502, order=0, intOrder=2,
lastExchangeTime=1524836605995, loc=false,
ver=2.5.0#20180426-sha1:34e22396, isClient=false]
at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:52)
at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:49)
at
org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2014)
at
org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9679)
at
org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9652)
at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:590)
at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleRemoteNodes(TcpDiscoveryNodesRing.java:164)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.getRemoteNodes(ServerImpl.java:304)

As I told in the first message the problem arises because of the current
code
changes local node internal order and breaks sorting in
TcpDiscoveryNodesRing.nodes collection.

Is this reproducer convince enough?

[1] Reproducer: https://github.com/SharplEr/ignite/pull/10/files



2018-02-13 20:17 GMT+03:00 Yakov Zhdanov :

> Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
> anywhere you like to show the incorrect behavior you describe.
>
> --Yakov
>


Re: Ticket review checklist

2018-04-25 Thread Александр Меньшиков
+1 to Vladimir Ozerov

2018-04-25 11:44 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Guys,
>
> The problem with in-place refactorings is that you increase affected scope.
> It is not uncommon to break compatibility or public contracts with even
> minor things. E.g. recently we decided drop org.jsr166 package in favor of
> Java 8 classes. Innocent change. Result - broken storage. Another problem
> is conflicts. It is not uncommon to have long-lived branches which we need
> to merge with master over and over again. And a lot of refactorings cause
> conflicts. It is much easier to resolve them if you know that logic was not
> affected as opposed to cases when you need to resolve both renames and
> method extractions along with business-logic changes.
>
> I'd like to repeat - if you have a time for refactoring then you definitely
> have a time to extract these changes to separate PR and submit a separate
> ticket. I am quite understand what "low priority" do you mean if you do
> refactorings on your own.
>
> On Tue, Apr 24, 2018 at 10:52 PM, Andrey Kuznetsov <stku...@gmail.com>
> wrote:
>
> > +1.
> >
> > Once again, I beg for "small refactoring permission" in a checklist. As
> of
> > today, separate tickets for small refactorings has lowest priority, since
> > they neither fix any flaw nor add new functionality. Also, the attempts
> to
> > make issue-related code safer / cleaner / more readable in "real" pull
> > requests are typically rejected, since they contradict our current
> > guidelines.
> >
> > I understand this will require a bit more effort from
> committer/maintainer,
> > but otherwise we will get constantly degrading code quality.
> >
> >
> > 2018-04-24 18:52 GMT+03:00 Eduard Shangareev <
> eduard.shangar...@gmail.com
> > >:
> >
> > > Vladimir,
> > >
> > > I am not talking about massive/sophisticated refactoring. But I believe
> > > that ask to extract some methods should be OK to do without an extra
> > > ticket.
> > >
> > > A checklist shouldn't be necessarily a set of certain rules but also it
> > > could include suggestion and reminders.
> > >
> > > On Tue, Apr 24, 2018 at 6:39 PM, Vladimir Ozerov <voze...@gridgain.com
> >
> > > wrote:
> > >
> > > > Ed,
> > > >
> > > > Refactoring is a separate task. If you would like to rework exchange
> > > future
> > > > - please do this in a ticket "Refactor exchange task", nobody would
> > > against
> > > > this. This is just a matter of creating separate ticket and separate
> > PR.
> > > If
> > > > one have a time for refactoring, it should not be a problem for him
> to
> > > > spend several minutes on JIRA and GitHub.
> > > >
> > > > As far as documentation - what you describe is normal review process,
> > > when
> > > > reviewer might want to ask contributor to fix something. Checklist
> is a
> > > > different thing - this is a set of rules which must be followed by
> > > anyone.
> > > > I do not understand how you can define documentation in this
> checklist.
> > > > Same problem with logging - what is "enough"?
> > > >
> > > > On Tue, Apr 24, 2018 at 4:51 PM, Eduard Shangareev <
> > > > eduard.shangar...@gmail.com> wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I don't understand why you are so against refactoring.
> > > > > Code already smells like hell. Methods 200+ line is normal.
> Exchange
> > > > future
> > > > > is asking to be separated on several one. Transaction code could
> > > > understand
> > > > > few people.
> > > > >
> > > > > If we separate refactoring from development it would mean that no
> one
> > > > will
> > > > > do it.
> > > > >
> > > > >
> > > > > 2) Documentation.
> > > > > Everything which was asked by reviewers to clarify idea should be
> > > > reflected
> > > > > in the code.
> > > > >
> > > > > 3) Logging.
> > > > > Logging should be enough to troubleshoot the problem if someone
> comes
> > > to
> > > > > user-list with an issue in the code.
> > > > >
> > > > >
> > > > > On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov &l

Re: Ticket review checklist

2018-04-20 Thread Александр Меньшиков
4) Metrics.
partially +1

It makes sense to have some minimal code coverage for new code in PR. IMHO.

Also, we can limit the cyclomatic complexity of the new code in PR too.

6) Refactoring
-1

I understand why people want to refactor old code.
But I think refactoring should be always a separate task.
And it's better to remove all refactoring from PR, if it's not the sense of
the issue.


2018-04-20 16:54 GMT+03:00 Andrey Kuznetsov :

> What about adding the following item to the checklist: when the change adds
> new functionality, then unit tests should also be provided, if it's
> technically possible?
>
> As for refactorings, in fact they are strongly discouraged today for some
> unclear reason. Let's permit to make refactorings in the checklist being
> discussed. (Of cource, refactoring should relate to problem being solved.)
>
> 2018-04-20 16:16 GMT+03:00 Vladimir Ozerov :
>
> > Hi Ed,
> >
> > Unfortunately some of these points are not good candidates for the
> > checklist because of these:
> > - It must be clear and disallow *multiple interpretations*
> > - It must be *lightweight*, otherwise Ignite development would become a
> > nightmare
> >
> > We cannot have "nice to have" points here. Checklist should answer the
> > question "is ticket eligible to be merged?"
> >
> > >>> 1) Code style.
> > +1
> >
> > >>>  2) Documentation
> > -1, it is impossible to define what is "well-documented". A piece of code
> > could be obvious for one contributor, and non-obvious for another. In any
> > case this is not a blocker for merge. Instead, during review one can ask
> > implementer to add more docs, but it cannot be forced.
> >
> > >>>  3) Logging
> > -1, same problem - what is "enough logging?". Enough for whom? How to
> > understand whether it is enough or not?
> >
> > >>>  4) Metrics
> > -1, no clear boundaries, and decision on whether metrics are to be added
> or
> > not should be performed during design phase. As before, it is perfectly
> > valid to ask contributor to add metrics with clear explanation why, but
> > this is not part of the checklist.
> >
> > >>> 5) TC status
> > +1, already mentioned
> >
> > >>>  6) Refactoring
> > Strong -1. OOP is a slippery slope, there are no good and bad receipts
> for
> > all cases, hence it cannot be used in a checklist.
> >
> > We can borrow useful rules from p.2, p.3 and p.4 if you provide clear
> > definitions on how to measure them.
> >
> > Vladimir.
> >
> > On Fri, Apr 20, 2018 at 3:50 PM, Eduard Shangareev <
> > eduard.shangar...@gmail.com> wrote:
> >
> > > Also, I want to add some technical requirement. Let's discuss them.
> > >
> > > 1) Code style.
> > > The code needs to be formatted according to coding guidelines
> > >  >.
> > > The
> > > code must not contain TODOs without a ticket reference.
> > >
> > > It is highly recommended to make major formatting changes in existing
> > code
> > > as a separate commit, to make review process more practical.
> > >
> > > 2) Documentation.
> > > Added code should be well-documented. Any methods that raise questions
> > > regarding their code flow, invariants, synchronization, etc., must be
> > > documented with comprehensive javadoc. Any reviewer can request that a
> > > particular added method be documented. Also, it is a good practice to
> > > document old code in a 10-20 lines region around changed code.
> > >
> > > 3) Logging.
> > > Make sure that there are enough logging added in every category for
> > > possible diagnostic in field. Check that logging messages are properly
> > > spelled.
> > >
> > > 4) Metrics.
> > > Are there any metrics that need to be exposed to user?
> > >
> > > 5) TC status.
> > >
> > > Recheck that there are no new failing tests
> > >
> > > 6) Refactoring.
> > >
> > > The code should be better than before:
> > >
> > >- extract method from big one;
> > >- do anything else to make code clearer (don't forget about some
> > >OOP-practise, replace if-else hell with inheritance
> > >- split refactoring (renaming, code format) from actual changes by
> > >separate commit
> > >
> > >
> > >
> > >
> > > On Fri, Apr 20, 2018 at 3:23 PM, Eduard Shangareev <
> > > eduard.shangar...@gmail.com> wrote:
> > >
> > > > Hi, guys.
> > > >
> > > > I believe that we should update maintainers list before adding this
> > > review
> > > > requirement.
> > > > There should not be the situation when there is only one contributor
> > who
> > > > is responsible for a component.
> > > > We already have issues with review speed and response time.
> > > >
> > > > On Fri, Apr 20, 2018 at 2:17 PM, Anton Vinogradov 
> > wrote:
> > > >
> > > >> Vova,
> > > >>
> > > >> Everything you described sound good to me.
> > > >>
> > > >> I'd like to propose to create special page at AI Wiki and to
> describe
> > > >> checklist.
> > > >> In case we'll find something should be 

Re: Optimize GridLongList serialization

2018-03-27 Thread Александр Меньшиков
The ticket is created:
https://issues.apache.org/jira/browse/IGNITE-8054

2018-03-27 10:04 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> Sorry, Dmitiry, I meant Ignite, not GridGain (memories of pre-apache
> coding). I am assuming that Alex Menshikov was referring to GridLongList
> class in Ignite.
>
> D.
>
> On Mon, Mar 26, 2018 at 11:52 PM, Dmitry Pavlov <dpavlov@gmail.com>
> wrote:
>
> > Hi Dmitriy, did you mean GridList?
> >
> > I don't understand what does it mean GridGain compress.
> >
> > вт, 27 мар. 2018 г. в 3:06, Dmitriy Setrakyan <dsetrak...@apache.org>:
> >
> > > Thanks, Alex.
> > >
> > > GridGain automatically compresses all the internal types. Somehow it
> > looks
> > > like the GridLongList may have been mixed. Can you please file a ticket
> > for
> > > 2.5 release?
> > >
> > > D.
> > >
> > > On Mon, Mar 26, 2018 at 4:55 AM, Александр Меньшиков <
> > sharple...@gmail.com
> > > >
> > > wrote:
> > >
> > > > I investigated network loading and found that a big part of internal
> > data
> > > > inside messages is `GridLongList`.
> > > > It is a part of `GridDhtTxFinishRequest`,
> > > > `GridDhtAtomicDeferredUpdateResponse`, `GridDhtAtomicUpdateRequest`,
> > > > `GridNearAtomicFullUpdateRequest` and `NearCacheUpdates`.
> > > >
> > > > So I think it has the sense to optimize `GridLongList` serialization.
> > > >
> > > >
> > > > Here we serialize all elements and don't take into account `idx`
> value:
> > > >
> > > > ```
> > > >
> > > > @Override public boolean writeTo(ByteBuffer buf, MessageWriter
> writer)
> > {
> > > >
> > > > writer.setBuffer(buf);
> > > >
> > > >
> > > >
> > > > if (!writer.isHeaderWritten()) {
> > > >
> > > > if (!writer.writeHeader(directType(), fieldsCount()))
> > > >
> > > > return false;
> > > >
> > > >
> > > >
> > > > writer.onHeaderWritten();
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > switch (writer.state()) {
> > > >
> > > > case 0:
> > > >
> > > > if (!writer.writeLongArray("arr", arr))
> > > >
> > > > return false;
> > > >
> > > >
> > > >
> > > > writer.incrementState();
> > > >
> > > >
> > > >
> > > > case 1:
> > > >
> > > > if (!writer.writeInt("idx", idx))
> > > >
> > > > return false;
> > > >
> > > >
> > > >
> > > > writer.incrementState();
> > > >
> > > >
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > return true;
> > > >
> > > > }
> > > >
> > > > ```
> > > >
> > > >
> > > >
> > > > Which is not happening in another serialization method in the same
> > class:
> > > >
> > > >
> > > >
> > > > ```
> > > >
> > > > public static void writeTo(DataOutput out, @Nullable GridLongList
> list)
> > > > throws IOException {
> > > >
> > > > out.writeInt(list != null ? list.idx : -1);
> > > >
> > > >
> > > >
> > > > if (list != null) {
> > > >
> > > > for (int i = 0; i < list.idx; i++)
> > > >
> > > > out.writeLong(list.arr[i]);
> > > >
> > > > }
> > > >
> > > > }
> > > >
> > > > ```
> > > >
> > > >
> > > > So, we can simply reduce messages size by sending only a valuable
> part
> > of
> > > > the array.
> > > > If you don't mind I will create an issue in Jira for this.
> > > >
> > > >
> > > > By the way, `long` is a huge type. As I see in most cases
> > `GridLongList`
> > > > uses for counters.
> > > > And I have checked the possibility of compress `long` into smaller
> > types
> > > as
> > > > `int`, `short` or `byte` in test
> > > > `GridCacheInterceptorAtomicRebalanceTest` (took it by random).
> > > > And found out that all `long` in`GridLongList` can be cast to `int`
> and
> > > 70%
> > > > of them to shorts.
> > > > Such conversion is quite fast about 1.1 (ns) per element (I have
> > checked
> > > it
> > > > by JMH test).
> > > >
> > > >
> > > >
> > > > Of course, there are a lot of ways to compress data,
> > > > but I know proprietary GridGain plug-in has different `MessageWriter`
> > > > implementation.
> > > > So maybe it is unnecessary and some compression already exists in
> this
> > > > proprietary plug-in.
> > > > Does someone know something about it?
> > > >
> > >
> >
>


Optimize GridLongList serialization

2018-03-26 Thread Александр Меньшиков
I investigated network loading and found that a big part of internal data
inside messages is `GridLongList`.
It is a part of `GridDhtTxFinishRequest`,
`GridDhtAtomicDeferredUpdateResponse`, `GridDhtAtomicUpdateRequest`,
`GridNearAtomicFullUpdateRequest` and `NearCacheUpdates`.

So I think it has the sense to optimize `GridLongList` serialization.


Here we serialize all elements and don't take into account `idx` value:

```

@Override public boolean writeTo(ByteBuffer buf, MessageWriter writer) {

writer.setBuffer(buf);



if (!writer.isHeaderWritten()) {

if (!writer.writeHeader(directType(), fieldsCount()))

return false;



writer.onHeaderWritten();

}



switch (writer.state()) {

case 0:

if (!writer.writeLongArray("arr", arr))

return false;



writer.incrementState();



case 1:

if (!writer.writeInt("idx", idx))

return false;



writer.incrementState();



}



return true;

}

```



Which is not happening in another serialization method in the same class:



```

public static void writeTo(DataOutput out, @Nullable GridLongList list)
throws IOException {

out.writeInt(list != null ? list.idx : -1);



if (list != null) {

for (int i = 0; i < list.idx; i++)

out.writeLong(list.arr[i]);

}

}

```


So, we can simply reduce messages size by sending only a valuable part of
the array.
If you don't mind I will create an issue in Jira for this.


By the way, `long` is a huge type. As I see in most cases `GridLongList`
uses for counters.
And I have checked the possibility of compress `long` into smaller types as
`int`, `short` or `byte` in test
`GridCacheInterceptorAtomicRebalanceTest` (took it by random).
And found out that all `long` in`GridLongList` can be cast to `int` and 70%
of them to shorts.
Such conversion is quite fast about 1.1 (ns) per element (I have checked it
by JMH test).



Of course, there are a lot of ways to compress data,
but I know proprietary GridGain plug-in has different `MessageWriter`
implementation.
So maybe it is unnecessary and some compression already exists in this
proprietary plug-in.
Does someone know something about it?


Ignite-7640 Refactor DiscoveryDataClusterState to be immutable (Done)

2018-02-26 Thread Александр Меньшиков
Hi to all.

I have done issue ignite-7640. Please review.


JIRA: https://issues.apache.org/jira/browse/IGNITE-7640
PR: https://github.com/apache/ignite/pull/3515
TC:
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8_IgniteTests24Java8=pull%2F3515%2Fhead
 CR: https://reviews.ignite.apache.org/ignite/review/IGNT-CR-492


Re: Wrapping [Ignite]Interrupted[Checked]Exception in benign exceptions

2018-02-15 Thread Александр Меньшиков
Ilya, I agree with you about reducing the number of unchecked exceptions in
public API. Because when you work with grid it can throw about 4 types of
different runtime exception. And there is no way except experiments to know
about this types.

2018-02-15 15:57 GMT+03:00 Dmitriy Setrakyan :

> Ilya, i have looked at the ticket. I am not sure I understand what you are
> suggesting. Can you provide a "before" and "after" example?
>
> D.
>
> On Thu, Feb 15, 2018 at 4:40 AM, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>
> wrote:
>
> > Hello Igniters.
> >
> > I have stumbled on the problem for which I have created a ticket
> > https://issues.apache.org/jira/browse/IGNITE-7719
> >
> > Basically it is an awful code smell. On thread interrupt, we wrap
> > InterruptedException with some unrelated exception type, which prevents
> it
> > from being handled properly by client code. Especially bad since we use
> > thread interruption for client code workflow, e.g. in service grid.
> >
> > Hope to hear your opinions on this,
> >
> > --
> > Ilya Kasnacheev
> >
>


Looks like a bug in ServerImpl.joinTopology()

2018-02-13 Thread Александр Меньшиков
Hello.

I saw such code in `ServerImpl.joinTopology()`


locNode.order(1);

locNode.internalOrder(1);

spi.gridStartTime = U.currentTimeMillis();

locNode.visible(true);

ring.clear();

ring.topologyVersion(1);



And it looks like a bug because the `locNode` is contained inside the
`ring` (`TcpDiscoveryNodesRing.locNode` which also be inside a `
TcpDiscoveryNodesRing.nodes` collection) and every operation with the `
TcpDiscoveryNodesRing.nodes` is executed under a read-write lock. And not
without a reason. `locNode.order` used inside the `
TcpDiscoveryNodesRing.nodes` for sorting (it's TreeSet) and such violation
of thread safety can destroy collection navigation.

The `TcpDiscoveryNode.internalOrder` is volatile and `ring.clear()` line
resets the`TcpDiscoveryNodesRing.nodes` collection, so that issue is
hidden. But if another thread would execute finding operation on the
collection after `locNode.internalOrder(1)`, but before `ring.clear()` the
issue will appear.

But it's hard to create fair reproducer for this situation.

Am I right about that and should create an issue in Jira or I just miss
something?


Re: IGNITE-4908 is ready for review (Ignite.reentrantLock looks much slower than IgniteCache.lock)

2018-02-12 Thread Александр Меньшиков
Hi, guys! I'm waiting (more than 2 months) for CR ignite-4908 (
https://issues.apache.org/jira/browse/IGNITE-4908 all links inside). It's
about Ignite.reentrantlock.

2017-11-29 12:35 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Igniters,
>
> I’ve implemented new fast reentrant lock within an issue[1].
>
> Could someone review prepared PR [2, 3, 4]?
>
> Some details are described below.
>
> The main idea:
>
> Current locks implementation is based on shared states in a cache, while
> the new lock uses IgniteCache#invoke* methods to update shared state
> atomically. New lock implementation doesn’t use a continuous query, so the
> cache became atomic now.
>
> New lock implementation has two types: fair and unfair, split into
> different classes for performance reasons.
>
> Some benchmarks results (hardware: Core i5 (2 gen) + 6GB RAM):
>
> Speed up: single thread + fair:   21.9x (1 node), 3.4x (2 nodes), 9.9x (5
> nodes), 17.9x (10 nodes)
>
> Speed up: single thread + unfair: 22.4x (1 node), 3.2x (2 nodes), 8.0x (5
> nodes), 19.0x (10 nodes)
>
> Speed up: multi-threads + fair:   3.9x (1 n,2 t), 3.5x (1 n,10t), 13.5x (5
> n,2t), 15.0x (5n, 10t)
>
> Speed up: multi-threads + unfair: 33.5x (1 n,2t), 210x (1 n,10t), 318x  (5
> n,2t), 389x  (5n, 10t)
>
> Benchmarks’ summary:
>
> 1) The unfair lock has a local reentrant lock which is used for local
> synchronization as a guard before the shared state. This allows reaching
> performance close to a local reentrant lock.
>
> 2) One server node can be a primary one for the shared state, this gives
> us a performance boost on one node only.
>
> 3) Speedup grows with a number of nodes.
>
>
> [1] JIRA: https://issues.apache.org/jira/browse/IGNITE-4908
>
> [2] PR: https://github.com/apache/ignite/pull/2360
>
> [3] Upsource review: https://reviews.ignite.apache.
> org/ignite/review/IGNT-CR-248
>
> [4] Team City: https://ci.ignite.apache.org/project.html?projectId=
> Ignite20Tests_Ignite20Tests=pull%2F2360%2Fhead
>


IGNITE-4908 is ready for review (Ignite.reentrantLock looks much slower than IgniteCache.lock)

2017-11-29 Thread Александр Меньшиков
Igniters,

I’ve implemented new fast reentrant lock within an issue[1].

Could someone review prepared PR [2, 3, 4]?

Some details are described below.

The main idea:

Current locks implementation is based on shared states in a cache, while
the new lock uses IgniteCache#invoke* methods to update shared state
atomically. New lock implementation doesn’t use a continuous query, so the
cache became atomic now.

New lock implementation has two types: fair and unfair, split into
different classes for performance reasons.

Some benchmarks results (hardware: Core i5 (2 gen) + 6GB RAM):

Speed up: single thread + fair:   21.9x (1 node), 3.4x (2 nodes), 9.9x (5
nodes), 17.9x (10 nodes)

Speed up: single thread + unfair: 22.4x (1 node), 3.2x (2 nodes), 8.0x (5
nodes), 19.0x (10 nodes)

Speed up: multi-threads + fair:   3.9x (1 n,2 t), 3.5x (1 n,10t), 13.5x (5
n,2t), 15.0x (5n, 10t)

Speed up: multi-threads + unfair: 33.5x (1 n,2t), 210x (1 n,10t), 318x  (5
n,2t), 389x  (5n, 10t)

Benchmarks’ summary:

1) The unfair lock has a local reentrant lock which is used for local
synchronization as a guard before the shared state. This allows reaching
performance close to a local reentrant lock.

2) One server node can be a primary one for the shared state, this gives us
a performance boost on one node only.

3) Speedup grows with a number of nodes.


[1] JIRA: https://issues.apache.org/jira/browse/IGNITE-4908

[2] PR: https://github.com/apache/ignite/pull/2360

[3] Upsource review:
https://reviews.ignite.apache.org/ignite/review/IGNT-CR-248

[4] Team City:
https://ci.ignite.apache.org/project.html?projectId=Ignite20Tests_Ignite20Tests=pull%2F2360%2Fhead


IGNITE-4908: Ignite.reentrantLock looks much slower than IgniteCache.lock

2017-10-03 Thread Александр Меньшиков
Hi!

I need advice about the task:
https://issues.apache.org/jira/browse/IGNITE-4908

Better to read about the problem in Jira because it needs a context (2 last
messages).

The question in the last message. Please take a look.


IGNITE-5994: IgniteInternalCache.invokeAsync().get() can return null

2017-08-08 Thread Александр Меньшиков
Hello, I found one small bug.

JIRA: https://issues.apache.org/jira/browse/IGNITE-5994

The problem is IgniteInternalCache.invokeAsync().get() will return null if
EntryProcessor return null. But the IgniteInternalCache.invoke() will
return EntryProcessorResult in the same situation (with null inside). It
can be optimization, and I guess we have to change invoke()'s result in
that case. Or it's a bug, and we have to change the result of invokeAsync().

If EntryProcessor returns not null everything is okay.

It doesn't relate with IgniteCache.invoke*.


Brand new JMH and Yardstick benchmarks

2017-07-20 Thread Александр Меньшиков
Hi, guys. I have written benchmarks for comparing IgniteCache.lock() and
Ignite.reentrantLock() and they merged to master branch.

If someone new wants to make benchmarks look at the implementation. Don't
forget to write both JMH and Yardstick benchmarks.

In process of writing, I have found out that locks in Ignite have not any
optimization for the situation when locks are unnecessary. So you don't
need adding any operation between lock/unlock operations. Which is not true
for standard Java locks classes because JIT can remove locks with an empty
body. For check such situation you need to make three benchmarks: the first
only with locks but without the body, second with read/write operations but
without locks, and the other with locks and operations. And make sure
result is consistent. It's okay if the third benchmark works faster than
the first and the second in sum.

Classes:

1. JMH:
1)
org.apache.ignite.internal.benchmarks.jmh.cache.JmhCacheLocksBenchmark
2. Yardstick:
1) org.apache.ignite.yardstick.cache.IgniteLockBenchmark
2) org.apache.ignite.yardstick.cache.IgniteCacheLockBenchmark


Re: IGNITE-4365: Data grid in deadlock on stop by DataStreamerImpl

2017-06-29 Thread Александр Меньшиков
I don't have it. I got all information from thread dump which you added to
the ticket: one thread stuck in the DataStreamerImpl#doFlush() (which was
called by GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute()),
and the other in the GridCacheGateway#onStopped() (which was called by
GridCacheProcessor#onExchangeDone()).

I read about a problem with reproducing (Alexey Kuznetsov's first comment
in JIRA) and made the decision to look at the different view.
Code still looks dangerous, so I don't think the problem has resolved
itself.

In thread dump there are 2 tests:
1) GridCacheNearTxForceKeyTest
2) CrossCacheTxRandomOperationsTest

They all passed in a single running.

2017-06-29 15:31 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Alex, can you please share a test that demonstrates the hang?
>
> --Yakov
>
> 2017-06-29 14:27 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
>> Hello,
>>
>> I want to make ticket IGNITE-4365
>> <https://issues.apache.org/jira/browse/IGNITE-4365>. The problem came
>> from DataStreamerImpl.
>> There are methods which use DataStreamerImpl under the lock
>> (GridCacheGateway), but the method DataStreamerImpl#doFlush() has a
>> "while(true)" loop. And in case when someone is calling the
>> GridCacheGateway#onStopped(), application can get stuck in the loop in
>> DataStreamerImpl#doFlush(), and in trying get a lock in
>> GridCacheGateway#onStopped().
>>
>> So I need an expert opinion about DataStreamerImpl#doFlush().
>> 1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when
>> someone is calling GridCacheGateway#onStopped()? I can track it by adding a
>> volatile boolean flag in the GridCacheGateway.
>> 2) Or better to modify a futures execution DataStreamerImpl#load0() to
>> use onDone with an exception or something like that?
>>
>> Methods which use or might use DataStreamerImpl under the lock:
>>
>> 1) GridCacheAdapter#localLoad()
>> 2) GridCacheAdapter#localLoadAndUpdate()
>> 3) GridCacheAdapter#localLoadCache()
>> 4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it
>> exectly happen in thread dump in ticket)
>>
>
>


IGNITE-4365: Data grid in deadlock on stop by DataStreamerImpl

2017-06-29 Thread Александр Меньшиков
Hello,

I want to make ticket IGNITE-4365
. The problem came from
DataStreamerImpl.
There are methods which use DataStreamerImpl under the lock
(GridCacheGateway), but the method DataStreamerImpl#doFlush() has a
"while(true)" loop. And in case when someone is calling the
GridCacheGateway#onStopped(), application can get stuck in the loop in
DataStreamerImpl#doFlush(), and in trying get a lock in
GridCacheGateway#onStopped().

So I need an expert opinion about DataStreamerImpl#doFlush().
1) Can I just drop unfinished futures in DataStreamerImpl#doFlush() when
someone is calling GridCacheGateway#onStopped()? I can track it by adding a
volatile boolean flag in the GridCacheGateway.
2) Or better to modify a futures execution DataStreamerImpl#load0() to use
onDone with an exception or something like that?

Methods which use or might use DataStreamerImpl under the lock:

1) GridCacheAdapter#localLoad()
2) GridCacheAdapter#localLoadAndUpdate()
3) GridCacheAdapter#localLoadCache()
4) GridDistributedCacheAdapter.GlobalRemoveAllJob#localExecute() (it
exectly happen in thread dump in ticket)


IGNITE-4908

2017-06-20 Thread Александр Меньшиков
Hello, Andrew!

What do you think about create sub task for IGNITE-4908 about creating
benchmark which highlights performance problem?
I think it helps us to agree about use-case which we want to test.

Link on Jira: https://issues.apache.org/jira/browse/IGNITE-4908


Questions about IGNITE-4684

2017-06-20 Thread Александр Меньшиков
Hello, Igniters!

Can someone clarify the crux of ignite-4684 to me? I asked some questions
inside Jira's discussion. Link:
https://issues.apache.org/jira/browse/IGNITE-4684

In short:
1) What implementation of the IgniteCache we are interested in?
2) What locks we are interested in?


Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-03-31 Thread Александр Меньшиков
Alexey, I just remind you about issue.

2017-02-20 16:42 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Alexey,
>
> I'm ready to make some conclusions. You can see result immediately here:
> PR:  https://github.com/apache/ignite/pull/1537/files
> CI Tests:  http://ci.ignite.apache.org/project.html?projectId=
> IgniteTests=projectOverview_IgniteTests=pull/1537/head
>
> I fixed only "ClusterTopologyCheckedException", and didn't touched
> "ClusterTopologyException". "ClusterTopologyException" has a same problem
> like "ClusterTopologyCheckedException", but even now changes is huge (80
> files). So if you think fixing of "ClusterTopologyException" is necessary,
> you could add different issue in JIRA (or I can do that). And I will fix it
> in future. I can't implement your idea about using Runtime exception,
> because right now a lot of logic tied to the fact that this is
> "IgniteCheckedException". I real tried to mix it but I couldn't make it
> work.
>
> So before my changes we have 3 affected classes:
> 1. "ClusterTopologyCheckedException".
> 2. "ClusterTopologyServerNotFoundException".
> 3. "ClusterGroupEmptyCheckedException".
>
> Now we there are 5 affected classes:
>
> "ClusterTopologyCheckedException" split into 2 classes:
> 1. "ClusterTopologyCheckedException" (with future).
> 2. "ClusterTopologyLocalException" (without future, and it's parent for "
> ClusterTopologyCheckedException").
>
> "ClusterTopologyServerNotFoundException" rename to
> 3. "ClusterTopologyServerNotFoundLocalException" (For consistency of
> names, I didn't find any cases where "ClusterTopologyServerNotFoundException"
> need his future).
>
> "ClusterGroupEmptyCheckedException" split into 2 classes:
> 4. "ClusterGroupEmptyCheckedException".
> 5. "ClusterGroupEmptyLocalException".
>
> Also in code "ready future" is using for waiting, and don't using for get
> real value (just look at code, all values just ignored). In fact "ready
> future" has type "IgniteFuture". It suggests that result doesn't matter.
>
> 2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:
>
>> Alexander,
>>
>> I do not see why you would need to overwrite the cause field.
>>
>> What I meant in the previous email is that instead of setting a retry
>> ready future in the checked exception, you would set a correct affinity
>> topology version here. Then, during a construction of CacheException
>> (unchecked, which is guaranteed to be thrown locally and will not be
>> serialized), you would create the retry ready future based on the topology
>> version set.
>>
>> Hope this helps,
>> AG
>>
>> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>
>>> Alexey, I ran into some difficulties.
>>>
>>> Look at the method: GridNearTxFinishFuture.CheckBa
>>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>>
>>> *Throwable err* = res.checkCommittedError();
>>>
>>> if (*err* != null) {
>>> if (*err* instanceof IgniteCheckedException) {
>>> *ClusterTopologyCheckedException cause* =
>>> ((IgniteCheckedException)*err*).
>>> *getCause(ClusterTopologyCheckedException.class)*;
>>>
>>> if (*cause* != null)
>>> *cause.retryReadyFuture(*cctx.ne
>>> xtAffinityReadyFuture(tx.topologyVersion()));
>>> *   //^_Here update the readyFut in
>>> some first "cause". *
>>> }
>>>
>>> onDone(*err*);
>>> }
>>>
>>>
>>> So I can't rewrite "cause" field in exception without reflection. It
>>> means if I try to use two exception: one with "readyFut" and second without
>>> "readyFut", I see no way to do it in this place.
>>>
>>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>>> of a crutch and should be removed. What do you think?
>>>
>>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>
>>> :
>>>
>>>> In the example above there is no point of setting the retry future in
>>>> the
>>>> exception because it will be serialize

Re: Real size of a stored data in IgniteCache

2017-03-30 Thread Александр Меньшиков
Vyacheslav,
Objects have different size in different runtime (x86 or Power, compress
pointer on or off and so on). So only one possible way to see real size is
do it in runtime. You can see how many bytes was allocated in one thread by
using

import *com.sun.management*.ThreadMXBean;

ThreadMXBean bean = (ThreadMXBean) ManagementFactory.getThreadMXBean();
//it return java.*lang*.management.ThreadMXBean
System.out.println(bean.
*getThreadAllocatedBytes(Thread.currentThread().getId())*);

Or you can use JMH with args
*-prof gc.*
But please note it shows all the allocated bytes including temp objects,
not only the remaining.

If you know exactly which object contains your data, you can use *Java
Object Layout *( http://openjdk.java.net/projects/code-tools/jol/ ) which
show you everything even alignment. In this video (in Russian, but i know
it's not problem for you) you can see how to use it:
https://www.youtube.com/watch?v=r_bnfv-nlcs




2017-03-30 19:45 GMT+03:00 Dmitriy Setrakyan :

> On Thu, Mar 30, 2017 at 9:30 AM, Vyacheslav Daradur 
> wrote:
>
> > Dmitry, as I understand Ignite doesn't provide a possibility to show real
> > total size of objects in a IgniteCache.
> >
>
> Currently no, but the 2.0 release, which is fully off-heap, will provide a
> precise cache size.
>


Re: Renaming TcpDiscoverySpi.heartbeatsFrequency to TcpDiscoverySpi.metricsUpdateFrequency

2017-03-30 Thread Александр Меньшиков
Yakov, can I take this ticket or you have already started to implement it?
Ask because you are an assignee in JIRA.

2017-03-07 22:21 GMT+03:00 Denis Magda :

> Good, the ticket is ready:
> https://issues.apache.org/jira/browse/IGNITE-4799 <
> https://issues.apache.org/jira/browse/IGNITE-4799>
>
> —
> Denis
>
> > On Mar 7, 2017, at 1:47 AM, Yakov Zhdanov  wrote:
> >
> >>> What’s about compute engine load balancers then? They rely on the
> > metrics received from remote nodes.
> >
> > First of all, that was poor design decision - to make different
> components
> > dependant, possibly, through user-defined implementation of Discovery
> Spi.
> >
> > What you say makes sense to me. Let's remove HB frequency from Discovery
> > SPI API and adjust it accordingly to
> > IgniteConfiguraion.getMetricsUpdateFrequency.
> > You should also consider completely removing heartbeats from failure
> > detection process - it should rely only on internal ping packets.
> >
> > --Yakov
>
>


Re: Unused argument in DirectMessageWriter

2017-03-17 Thread Александр Меньшиков
Valentin, thank you.

Aleksey, yeah :-)

2017-03-16 15:51 GMT+03:00 ALEKSEY KUZNETSOV <alkuznetsov...@gmail.com>:

> watch him, he is gonna remove all the messages he doesn't like :)
>
> чт, 16 мар. 2017 г. в 14:56, Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
> > Hi,
> >
> > These classes are part of plugin framework. Even though 'name' argument
> is
> > not used by Ignite implementations, it can be used by others. Therefore
> it
> > should not be removed.
> >
> > -Val
> >
> > On Thu, Mar 16, 2017 at 10:36 AM, Александр Меньшиков <
> > sharple...@gmail.com>
> > wrote:
> >
> > > Hello, everyone!
> > >
> > > I found some strange thing. In 'MessageWriter' there are a lot of
> methods
> > > like 'write*(String name, * val)'. But in implementation
> > > 'DirectMessageWriter' argument 'name' isn't used in these methods. And
> > > there aren't other 'MessageWriter' implementations.
> > >
> > > Maybe we can remove 'name' from interface and implementation?
> > >
> >
> --
>
> *Best Regards,*
>
> *Kuznetsov Aleksey*
>


Unused argument in DirectMessageWriter

2017-03-16 Thread Александр Меньшиков
Hello, everyone!

I found some strange thing. In 'MessageWriter' there are a lot of methods
like 'write*(String name, * val)'. But in implementation
'DirectMessageWriter' argument 'name' isn't used in these methods. And
there aren't other 'MessageWriter' implementations.

Maybe we can remove 'name' from interface and implementation?


Re: Suggestion for remove ComputeJobContinuationAdapter

2017-03-14 Thread Александр Меньшиков
Okay, thank you.

2017-03-13 17:50 GMT+03:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> This class can't be removed as it's part of public API. It's a convenience
> adapter that users can extend when using continuations.
>
> -Val
>
> On Mon, Mar 13, 2017 at 3:28 PM, Александр Меньшиков <sharple...@gmail.com
> >
> wrote:
>
> > The abstract class "ComputeJobContinuationAdapter" is unused. There
> aren't
> > any subclass. Looks we can remove it now.
> >
>


Suggestion for remove ComputeJobContinuationAdapter

2017-03-13 Thread Александр Меньшиков
The abstract class "ComputeJobContinuationAdapter" is unused. There aren't
any subclass. Looks we can remove it now.


Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-03-10 Thread Александр Меньшиков
I'm sorry I didn't notice pair "acquireShared" and "releaseShared". Okay,
well, there are no races. But we still can use "getState()" in "assert"
instead "resFlag", that will allow us use boolean type. But if you think it
will not make code cleaner, then I will close issue.

2017-03-01 14:55 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> First of all I want to replace the "byte resFlag" on the "bool haveResult"
> only for aesthetic reasons. I think that use a byte as a boolean flag looks
> like C in 1983. And constructions like
>
> return haveResult ? (R)res : null;
>
> more readable unlike
>
> return resFlag == RES ? (R)res : null;
>
> because you don't know how many values the "resFlag" can have.
> But it's not big deal, so if community disagrees it's okay.
>
> About volatile:
>
> Class "GridFutureAdapter" uses zero value of the "resFlag" in assert for
> check that didn't call the "get0" before the "onDone".
> So I suggested to use the "getState()" for this check. But of course it's
> work only if "get0" and "onDone" are called in a same thread.
> But if threads are different (as you said), we have bug any way because
> the "resFlag" is not volatile.
> In the GridFutureAdapter we haven't any sync between write operations to
> the "resFlag" in the "onDone" and the read operation in the "get0". So
> the "get0" in the line "assert resFlag != 0" can throw AssertionError,
> because a thread with the "get0" may not see the write operation in
> "onDone" in other thread.
>
> So I think in this case we need to add volatile in the
> GridFutureAdapter.resFlag, and replace the GridFinishedFuture.resFlag on the
> GridFinishedFuture.haveResult (or not if you don't like it).
>
>
> 2017-02-28 19:39 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
>> Alexander,
>>
>> Can you please clarify this?
>>
>> >I suppose "volatile" is not need here, because right now if
>> GridFutureAdapter#onDone()
>> called in another thread than GridFutureAdapter#get0, it will produce
>> AssertionError.
>>
>> 1) 99.999% of time GridFutureAdapter#onDone() is called in some other
>> thread than the one which is frozen on get()
>> 2) What is volatile now which should not be?
>>
>> Regarding the flag - boolean and byte takes 1 byte in the object. I
>> suppose
>> some time ago there were 3 options, but now there are only 2. I don't
>> think
>> we need to change it now.
>>
>>
>>
>> --Yakov
>>
>> 2017-02-28 17:32 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>
>> > Sorry, I mean I want to do fast work on this issue (not make code
>> faster).
>> > Your code is strange. You can see my view in my local temp PR:
>> >
>> > https://github.com/SharplEr/ignite/pull/4/files
>> >
>> > This is what I'm meaning.
>> >
>> > I suppose "volatile" is not need here, because right now if
>> > GridFutureAdapter#onDone() called in another thread than
>> > GridFutureAdapter#get0, it will produce AssertionError.
>> >
>> >
>> >
>> > 2017-02-28 16:57 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:
>> >
>> > > Alexander,
>> > >
>> > > Are you sure, which it will be faster?
>> > > The condition
>> > >
>> > > resFlag == RES and resFlag == ERR
>> > >
>> > > should be more complicated... like something this:
>> > >
>> > > getState() != 0 &&  !resFlag and getState() != 0 &&  resFlag
>> > >
>> > > But unlike getState(), restFag is not volatile...
>> > >
>> > > On Tue, Feb 28, 2017 at 4:26 PM, Александр Меньшиков <
>> > sharple...@gmail.com
>> > > > wrote:
>> > >
>> > >> Alexey, Vladislav, are you agree with me, or not? I want to do it
>> faster
>> > >> and move on.
>> > >>
>> > >> 2017-02-17 17:36 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
>> >:
>> > >>
>> > >>> We can check "onDone" method was called with using getState()
>> method.
>> > If
>> > >>> getState()!=0 then  resFlag!=0. Just look at code.
>> > >>>
>> > >>>
>> > >>> 2017-02-17 17:12 GMT+03:00 А

Re: IGNITE-603

2017-03-02 Thread Александр Меньшиков
Yakov, Okay. You can see result in JIRA
https://issues.apache.org/jira/browse/IGNITE-603

2017-02-28 19:31 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Denis, I can't say for sure, but the fact test was commented out for a long
> time is suspicious.
>
> Alexander, can you please uncomment, fix all the compilation errors and
> update the ticket with test run results. Make sure you have
> modules/ext-data projects built and all paths mentioned
> in modules/core/src/test/config/tests.properties are valid (you will need
> to set IGNITE_HOME env var most probably)
>
> --Yakov
>
> 2017-02-13 23:10 GMT+03:00 Denis Magda <dma...@apache.org>:
>
> > Hi Alexander,
> >
> > Looks like the test is commented for a while. GridDeploymentMode has
> > already been renamed to DeploymentMode.
> >
> > Yakov, is this test still relevant for the compute grid?
> >
> > —
> > Denis
> >
> > > On Feb 13, 2017, at 5:45 AM, Александр Меньшиков <sharple...@gmail.com
> >
> > wrote:
> > >
> > > Hello, everyone.
> > >
> > > I think the commented-out method "processSize2Test" can be removed.
> There
> > > aren't class "GridDeploymentMode" which used in arguments.
> > >
> > > Do you agree with me?
> >
> >
>


Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-03-01 Thread Александр Меньшиков
First of all I want to replace the "byte resFlag" on the "bool haveResult"
only for aesthetic reasons. I think that use a byte as a boolean flag looks
like C in 1983. And constructions like

return haveResult ? (R)res : null;

more readable unlike

return resFlag == RES ? (R)res : null;

because you don't know how many values the "resFlag" can have.
But it's not big deal, so if community disagrees it's okay.

About volatile:

Class "GridFutureAdapter" uses zero value of the "resFlag" in assert for
check that didn't call the "get0" before the "onDone".
So I suggested to use the "getState()" for this check. But of course it's
work only if "get0" and "onDone" are called in a same thread.
But if threads are different (as you said), we have bug any way because the
"resFlag" is not volatile.
In the GridFutureAdapter we haven't any sync between write operations to
the "resFlag" in the "onDone" and the read operation in the "get0". So the
"get0" in the line "assert resFlag != 0" can throw AssertionError, because
a thread with the "get0" may not see the write operation in "onDone" in
other thread.

So I think in this case we need to add volatile in the
GridFutureAdapter.resFlag, and replace the GridFinishedFuture.resFlag on the
GridFinishedFuture.haveResult (or not if you don't like it).


2017-02-28 19:39 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Alexander,
>
> Can you please clarify this?
>
> >I suppose "volatile" is not need here, because right now if
> GridFutureAdapter#onDone()
> called in another thread than GridFutureAdapter#get0, it will produce
> AssertionError.
>
> 1) 99.999% of time GridFutureAdapter#onDone() is called in some other
> thread than the one which is frozen on get()
> 2) What is volatile now which should not be?
>
> Regarding the flag - boolean and byte takes 1 byte in the object. I suppose
> some time ago there were 3 options, but now there are only 2. I don't think
> we need to change it now.
>
>
>
> --Yakov
>
> 2017-02-28 17:32 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
> > Sorry, I mean I want to do fast work on this issue (not make code
> faster).
> > Your code is strange. You can see my view in my local temp PR:
> >
> > https://github.com/SharplEr/ignite/pull/4/files
> >
> > This is what I'm meaning.
> >
> > I suppose "volatile" is not need here, because right now if
> > GridFutureAdapter#onDone() called in another thread than
> > GridFutureAdapter#get0, it will produce AssertionError.
> >
> >
> >
> > 2017-02-28 16:57 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:
> >
> > > Alexander,
> > >
> > > Are you sure, which it will be faster?
> > > The condition
> > >
> > > resFlag == RES and resFlag == ERR
> > >
> > > should be more complicated... like something this:
> > >
> > > getState() != 0 &&  !resFlag and getState() != 0 &&  resFlag
> > >
> > > But unlike getState(), restFag is not volatile...
> > >
> > > On Tue, Feb 28, 2017 at 4:26 PM, Александр Меньшиков <
> > sharple...@gmail.com
> > > > wrote:
> > >
> > >> Alexey, Vladislav, are you agree with me, or not? I want to do it
> faster
> > >> and move on.
> > >>
> > >> 2017-02-17 17:36 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
> >:
> > >>
> > >>> We can check "onDone" method was called with using getState() method.
> > If
> > >>> getState()!=0 then  resFlag!=0. Just look at code.
> > >>>
> > >>>
> > >>> 2017-02-17 17:12 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
> >:
> > >>>
> > >>>> Like I said, if "resFlag==0" (of course 0 came from initialization)
> > >>>> means "you still haven't called the method onDone()", better make it
> > clear.
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2017-02-17 14:48 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com
> >:
> > >>>>
> > >>>>> Alexander,
> > >>>>>
> > >>>>> I think, the resFlag will be initiated as 0 (new
> > GridFutureAdapter()),
> > >>>>> but
> > >>>>> 1 and 2 states will be acquired on live.
> > >>>>>
> > >>>>>
> 

Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-28 Thread Александр Меньшиков
Sorry, I mean I want to do fast work on this issue (not make code faster).
Your code is strange. You can see my view in my local temp PR:

https://github.com/SharplEr/ignite/pull/4/files

This is what I'm meaning.

I suppose "volatile" is not need here, because right now if
GridFutureAdapter#onDone() called in another thread than
GridFutureAdapter#get0, it will produce AssertionError.



2017-02-28 16:57 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:

> Alexander,
>
> Are you sure, which it will be faster?
> The condition
>
> resFlag == RES and resFlag == ERR
>
> should be more complicated... like something this:
>
> getState() != 0 &&  !resFlag and getState() != 0 &&  resFlag
>
> But unlike getState(), restFag is not volatile...
>
> On Tue, Feb 28, 2017 at 4:26 PM, Александр Меньшиков <sharple...@gmail.com
> > wrote:
>
>> Alexey, Vladislav, are you agree with me, or not? I want to do it faster
>> and move on.
>>
>> 2017-02-17 17:36 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>
>>> We can check "onDone" method was called with using getState() method. If
>>> getState()!=0 then  resFlag!=0. Just look at code.
>>>
>>>
>>> 2017-02-17 17:12 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>>
>>>> Like I said, if "resFlag==0" (of course 0 came from initialization)
>>>> means "you still haven't called the method onDone()", better make it clear.
>>>>
>>>>
>>>>
>>>> 2017-02-17 14:48 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:
>>>>
>>>>> Alexander,
>>>>>
>>>>> I think, the resFlag will be initiated as 0 (new GridFutureAdapter()),
>>>>> but
>>>>> 1 and 2 states will be acquired on live.
>>>>>
>>>>>
>>>>> On Fri, Feb 17, 2017 at 1:56 PM, Александр Меньшиков <
>>>>> sharple...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Alexey,
>>>>> >
>>>>> > I see only one place where writes in resFlag:
>>>>> >
>>>>> > if (err != null) {
>>>>> > resFlag = ERR;
>>>>> > this.res = err;
>>>>> > }
>>>>> > else {
>>>>> > resFlag = RES;
>>>>> > this.res = res;
>>>>> > }
>>>>> >
>>>>> > And the comparison with only two values: "ERR" and "RES". Except
>>>>> "assert
>>>>> > resFlag != 0;". So if this "assert" protect from call "get0" before
>>>>> call
>>>>> > "onDone" I think will be clearer to set some ready flag or use
>>>>> "enum" type.
>>>>> > And throw IllegalStateException if condition is false, because right
>>>>> now
>>>>> > developer will not get clear error massage.
>>>>> >
>>>>> > 17 февр. 2017 г. 11:34 пользователь "Alexey Goncharuk" <
>>>>> > alexey.goncha...@gmail.com> написал:
>>>>> >
>>>>> > Alexander,
>>>>> >
>>>>> > This change is not applicable for GridFutureAdapter because resFlag
>>>>> can
>>>>> > have 3 values there.
>>>>> >
>>>>> > 2017-02-16 19:58 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
>>>>> >:
>>>>> >
>>>>> > > Hello.
>>>>> > >
>>>>> > > I propose to do refactoring of classes "GridFinishedFuture" and
>>>>> > > "GridFutureAdapter". There is field "resFlag" which can equals
>>>>> "ERR = 1"
>>>>> > or
>>>>> > > "RES = 2". So I can replace it to one "bool haveResult" field.
>>>>> > >
>>>>> > > If there are no objections, I'm ready to proceed. If you find more
>>>>> such
>>>>> > > classes, please write about them.
>>>>> > >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Vladislav Pyatkov
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Vladislav Pyatkov
>


Re: IGNITE-824: [Test] GridRandomSelfTest # testPerformance was disabled.

2017-02-28 Thread Александр Меньшиков
Yakov, please look at PR. It's simple removing.

2017-02-21 19:29 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Done.
>
> PR inside:
> https://issues.apache.org/jira/browse/IGNITE-824
>
>
> 2017-02-21 13:04 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
>> Alexander, you can remove the test and resolve the issue.
>>
>> --Yakov
>>
>> 2017-02-20 17:43 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>
>> > Do we still need this test? Can I remove it?
>> >
>>
>
>


Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-28 Thread Александр Меньшиков
Alexey, Vladislav, are you agree with me, or not? I want to do it faster
and move on.

2017-02-17 17:36 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> We can check "onDone" method was called with using getState() method. If
> getState()!=0 then  resFlag!=0. Just look at code.
>
>
> 2017-02-17 17:12 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
>> Like I said, if "resFlag==0" (of course 0 came from initialization) means
>> "you still haven't called the method onDone()", better make it clear.
>>
>>
>>
>> 2017-02-17 14:48 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:
>>
>>> Alexander,
>>>
>>> I think, the resFlag will be initiated as 0 (new GridFutureAdapter()),
>>> but
>>> 1 and 2 states will be acquired on live.
>>>
>>>
>>> On Fri, Feb 17, 2017 at 1:56 PM, Александр Меньшиков <
>>> sharple...@gmail.com>
>>> wrote:
>>>
>>> > Alexey,
>>> >
>>> > I see only one place where writes in resFlag:
>>> >
>>> > if (err != null) {
>>> > resFlag = ERR;
>>> > this.res = err;
>>> > }
>>> > else {
>>> > resFlag = RES;
>>> > this.res = res;
>>> > }
>>> >
>>> > And the comparison with only two values: "ERR" and "RES". Except
>>> "assert
>>> > resFlag != 0;". So if this "assert" protect from call "get0" before
>>> call
>>> > "onDone" I think will be clearer to set some ready flag or use "enum"
>>> type.
>>> > And throw IllegalStateException if condition is false, because right
>>> now
>>> > developer will not get clear error massage.
>>> >
>>> > 17 февр. 2017 г. 11:34 пользователь "Alexey Goncharuk" <
>>> > alexey.goncha...@gmail.com> написал:
>>> >
>>> > Alexander,
>>> >
>>> > This change is not applicable for GridFutureAdapter because resFlag can
>>> > have 3 values there.
>>> >
>>> > 2017-02-16 19:58 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>>> >
>>> > > Hello.
>>> > >
>>> > > I propose to do refactoring of classes "GridFinishedFuture" and
>>> > > "GridFutureAdapter". There is field "resFlag" which can equals "ERR
>>> = 1"
>>> > or
>>> > > "RES = 2". So I can replace it to one "bool haveResult" field.
>>> > >
>>> > > If there are no objections, I'm ready to proceed. If you find more
>>> such
>>> > > classes, please write about them.
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Vladislav Pyatkov
>>>
>>
>>
>


Re: IGNITE-824: [Test] GridRandomSelfTest # testPerformance was disabled.

2017-02-21 Thread Александр Меньшиков
Done.

PR inside:
https://issues.apache.org/jira/browse/IGNITE-824


2017-02-21 13:04 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Alexander, you can remove the test and resolve the issue.
>
> --Yakov
>
> 2017-02-20 17:43 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
> > Do we still need this test? Can I remove it?
> >
>


IGNITE-824: [Test] GridRandomSelfTest # testPerformance was disabled.

2017-02-20 Thread Александр Меньшиков
Do we still need this test? Can I remove it?


Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-02-20 Thread Александр Меньшиков
Alexey,

I'm ready to make some conclusions. You can see result immediately here:
PR:  https://github.com/apache/ignite/pull/1537/files
CI Tests:
http://ci.ignite.apache.org/project.html?projectId=IgniteTests=projectOverview_IgniteTests=pull/1537/head

I fixed only "ClusterTopologyCheckedException", and didn't touched
"ClusterTopologyException". "ClusterTopologyException" has a same problem
like "ClusterTopologyCheckedException", but even now changes is huge (80
files). So if you think fixing of "ClusterTopologyException" is necessary,
you could add different issue in JIRA (or I can do that). And I will fix it
in future. I can't implement your idea about using Runtime exception,
because right now a lot of logic tied to the fact that this is
"IgniteCheckedException". I real tried to mix it but I couldn't make it
work.

So before my changes we have 3 affected classes:
1. "ClusterTopologyCheckedException".
2. "ClusterTopologyServerNotFoundException".
3. "ClusterGroupEmptyCheckedException".

Now we there are 5 affected classes:

"ClusterTopologyCheckedException" split into 2 classes:
1. "ClusterTopologyCheckedException" (with future).
2. "ClusterTopologyLocalException" (without future, and it's parent for
"ClusterTopologyCheckedException").

"ClusterTopologyServerNotFoundException" rename to
3. "ClusterTopologyServerNotFoundLocalException" (For consistency of names,
I didn't find any cases where "ClusterTopologyServerNotFoundException" need
his future).

"ClusterGroupEmptyCheckedException" split into 2 classes:
4. "ClusterGroupEmptyCheckedException".
5. "ClusterGroupEmptyLocalException".

Also in code "ready future" is using for waiting, and don't using for get
real value (just look at code, all values just ignored). In fact "ready
future" has type "IgniteFuture". It suggests that result doesn't matter.

2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> Alexander,
>
> I do not see why you would need to overwrite the cause field.
>
> What I meant in the previous email is that instead of setting a retry
> ready future in the checked exception, you would set a correct affinity
> topology version here. Then, during a construction of CacheException
> (unchecked, which is guaranteed to be thrown locally and will not be
> serialized), you would create the retry ready future based on the topology
> version set.
>
> Hope this helps,
> AG
>
> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
>> Alexey, I ran into some difficulties.
>>
>> Look at the method: GridNearTxFinishFuture.CheckBa
>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>
>> *Throwable err* = res.checkCommittedError();
>>
>> if (*err* != null) {
>> if (*err* instanceof IgniteCheckedException) {
>> *ClusterTopologyCheckedException cause* =
>> ((IgniteCheckedException)*err*).
>> *getCause(ClusterTopologyCheckedException.class)*;
>>
>> if (*cause* != null)
>> *cause.retryReadyFuture(*cctx.ne
>> xtAffinityReadyFuture(tx.topologyVersion()));
>> *   //^_Here update the readyFut in
>> some first "cause". *
>> }
>>
>> onDone(*err*);
>> }
>>
>>
>> So I can't rewrite "cause" field in exception without reflection. It
>> means if I try to use two exception: one with "readyFut" and second without
>> "readyFut", I see no way to do it in this place.
>>
>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>> of a crutch and should be removed. What do you think?
>>
>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:
>>
>>> In the example above there is no point of setting the retry future in the
>>> exception because it will be serialized and sent to a remote node.
>>>
>>> I see the only possible way to ensure this: make this be verified at
>>> compile time. This looks like a big change, but the draft may look like
>>> so:
>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>>> which
>>> *must* take the minimum topology version to wait for
>>> 2) In all cases when a cache operation fails because of topology change,
>>> provide the appropriate exception
>>> 3) When CacheTopologyException is

Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-17 Thread Александр Меньшиков
We can check "onDone" method was called with using getState() method. If
getState()!=0 then  resFlag!=0. Just look at code.

2017-02-17 17:12 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Like I said, if "resFlag==0" (of course 0 came from initialization) means
> "you still haven't called the method onDone()", better make it clear.
>
>
>
> 2017-02-17 14:48 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:
>
>> Alexander,
>>
>> I think, the resFlag will be initiated as 0 (new GridFutureAdapter()), but
>> 1 and 2 states will be acquired on live.
>>
>>
>> On Fri, Feb 17, 2017 at 1:56 PM, Александр Меньшиков <
>> sharple...@gmail.com>
>> wrote:
>>
>> > Alexey,
>> >
>> > I see only one place where writes in resFlag:
>> >
>> > if (err != null) {
>> > resFlag = ERR;
>> > this.res = err;
>> > }
>> > else {
>> > resFlag = RES;
>> > this.res = res;
>> > }
>> >
>> > And the comparison with only two values: "ERR" and "RES". Except "assert
>> > resFlag != 0;". So if this "assert" protect from call "get0" before call
>> > "onDone" I think will be clearer to set some ready flag or use "enum"
>> type.
>> > And throw IllegalStateException if condition is false, because right now
>> > developer will not get clear error massage.
>> >
>> > 17 февр. 2017 г. 11:34 пользователь "Alexey Goncharuk" <
>> > alexey.goncha...@gmail.com> написал:
>> >
>> > Alexander,
>> >
>> > This change is not applicable for GridFutureAdapter because resFlag can
>> > have 3 values there.
>> >
>> > 2017-02-16 19:58 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>> >
>> > > Hello.
>> > >
>> > > I propose to do refactoring of classes "GridFinishedFuture" and
>> > > "GridFutureAdapter". There is field "resFlag" which can equals "ERR =
>> 1"
>> > or
>> > > "RES = 2". So I can replace it to one "bool haveResult" field.
>> > >
>> > > If there are no objections, I'm ready to proceed. If you find more
>> such
>> > > classes, please write about them.
>> > >
>> >
>>
>>
>>
>> --
>> Vladislav Pyatkov
>>
>
>


Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-17 Thread Александр Меньшиков
Like I said, if "resFlag==0" (of course 0 came from initialization) means
"you still haven't called the method onDone()", better make it clear.


2017-02-17 14:48 GMT+03:00 Vladislav Pyatkov <vldpyat...@gmail.com>:

> Alexander,
>
> I think, the resFlag will be initiated as 0 (new GridFutureAdapter()), but
> 1 and 2 states will be acquired on live.
>
>
> On Fri, Feb 17, 2017 at 1:56 PM, Александр Меньшиков <sharple...@gmail.com
> >
> wrote:
>
> > Alexey,
> >
> > I see only one place where writes in resFlag:
> >
> > if (err != null) {
> > resFlag = ERR;
> > this.res = err;
> > }
> > else {
> > resFlag = RES;
> > this.res = res;
> > }
> >
> > And the comparison with only two values: "ERR" and "RES". Except "assert
> > resFlag != 0;". So if this "assert" protect from call "get0" before call
> > "onDone" I think will be clearer to set some ready flag or use "enum"
> type.
> > And throw IllegalStateException if condition is false, because right now
> > developer will not get clear error massage.
> >
> > 17 февр. 2017 г. 11:34 пользователь "Alexey Goncharuk" <
> > alexey.goncha...@gmail.com> написал:
> >
> > Alexander,
> >
> > This change is not applicable for GridFutureAdapter because resFlag can
> > have 3 values there.
> >
> > 2017-02-16 19:58 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
> >
> > > Hello.
> > >
> > > I propose to do refactoring of classes "GridFinishedFuture" and
> > > "GridFutureAdapter". There is field "resFlag" which can equals "ERR =
> 1"
> > or
> > > "RES = 2". So I can replace it to one "bool haveResult" field.
> > >
> > > If there are no objections, I'm ready to proceed. If you find more such
> > > classes, please write about them.
> > >
> >
>
>
>
> --
> Vladislav Pyatkov
>


Re: IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-17 Thread Александр Меньшиков
Alexey,

I see only one place where writes in resFlag:

if (err != null) {
resFlag = ERR;
this.res = err;
}
else {
resFlag = RES;
this.res = res;
}

And the comparison with only two values: "ERR" and "RES". Except "assert
resFlag != 0;". So if this "assert" protect from call "get0" before call
"onDone" I think will be clearer to set some ready flag or use "enum" type.
And throw IllegalStateException if condition is false, because right now
developer will not get clear error massage.

17 февр. 2017 г. 11:34 пользователь "Alexey Goncharuk" <
alexey.goncha...@gmail.com> написал:

Alexander,

This change is not applicable for GridFutureAdapter because resFlag can
have 3 values there.

2017-02-16 19:58 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Hello.
>
> I propose to do refactoring of classes "GridFinishedFuture" and
> "GridFutureAdapter". There is field "resFlag" which can equals "ERR = 1"
or
> "RES = 2". So I can replace it to one "bool haveResult" field.
>
> If there are no objections, I'm ready to proceed. If you find more such
> classes, please write about them.
>


IGNITE-4713 : refactoring of GridFinishedFuture and GridFutureAdapter

2017-02-16 Thread Александр Меньшиков
Hello.

I propose to do refactoring of classes "GridFinishedFuture" and
"GridFutureAdapter". There is field "resFlag" which can equals "ERR = 1" or
"RES = 2". So I can replace it to one "bool haveResult" field.

If there are no objections, I'm ready to proceed. If you find more such
classes, please write about them.


IGNITE-1624: [Test failed] Ignite SPI: TcpClientDiscoverySpiSelfTest.testJoinError failed

2017-02-16 Thread Александр Меньшиков
Hello.

I try to fix test. But can't reproduce problem. I added reps in a loop and
ran test local 10 minutes. There aren't problem. After that I set 1500 reps
and ran on CI (my local machine can do 1900 reps in 5 minutes) and got
timeout, but not the origin problem from issue.

Anyone have any ideas on how to reproduce the problem?


Re: Sort nodes in the ring in order to minimize the number of reconnections

2017-02-15 Thread Александр Меньшиков
Need to do code review until February 17, if we want to get this feature in
version 1.9.

2017-02-08 22:14 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Done. Please look.
>
> JIRA: https://issues.apache.org/jira/browse/IGNITE-4501
> PR: https://github.com/apache/ignite/pull/1436/files
> Tests: http://ci.ignite.apache.org/project.html?projectId=IgniteTes
> ts=projectOverview_IgniteTests=pull/1436/head
>


Re: IGNITE-817: [Test] Disabled tests of GridCacheOffHeapTest

2017-02-14 Thread Александр Меньшиков
Yep, i will do it immediately

2017-02-13 23:12 GMT+03:00 Denis Magda <dma...@apache.org>:

> Alexander, could you wipe out this test then making the code clean?
>
> —
> Denis
>
> > On Feb 12, 2017, at 11:24 PM, Semyon Boikov <sboi...@gridgain.com>
> wrote:
> >
> > Hi,
> >
> > GridCacheOffHeapTest is some very old class and I think it can be
> removed.
> > Currently all actual benchmarks are in 'benchmarks' and 'yardstick'
> modules.
> >
> > Thanks
> >
> > On Sat, Feb 11, 2017 at 3:18 AM, Denis Magda <dma...@apache.org> wrote:
> >
> >> Yakov, Sam?
> >>
> >> Please join and share your thoughts.
> >>
> >> —
> >> Denis
> >>
> >>> On Feb 10, 2017, at 7:08 AM, Александр Меньшиков <sharple...@gmail.com
> >
> >> wrote:
> >>>
> >>> Denis, thank you for information. I see JMH in pom file in benchmarks
> >>> module. So this must be removed or it's okay for micro-benchmarks?
> >>>
> >>> 2017-02-10 0:04 GMT+03:00 Denis Magda <dma...@apache.org>:
> >>>
> >>>> Alexander,
> >>>>
> >>>> We use Yardstick benchmarks for performance measurements of Ignite:
> >>>> https://github.com/apacheignite/yardstick-ignite <https://github.com/
> >>>> apacheignite/yardstick-ignite>
> >>>>
> >>>> In 1.9 it will be much easier to execute the benchmarks directly from
> >>>> Apache Ignite bundle:
> >>>> http://apache-ignite-developers.2346864.n4.nabble.
> >> com/IGNITE-4212-Ignite-
> >>>> Benchmarking-Simplification-and-Automation-td13079.html <
> >>>> http://apache-ignite-developers.2346864.n4.nabble.
> >> com/IGNITE-4212-Ignite-
> >>>> Benchmarking-Simplification-and-Automation-td13079.html>
> >>>>
> >>>> As for GridCacheOffHeapTest it looks like a candidate for removal.
> >> *Yakov,
> >>>> Sam*, do we still need this test?
> >>>>
> >>>>> On Feb 9, 2017, at 6:25 AM, Александр Меньшиков <
> sharple...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hello, everyone.
> >>>>>
> >>>>> GridCacheOffHeapTest  is about the Off-heap benchmarks.
> >>>>>
> >>>>> As I understand it right now we haven't clear vision how better make
> >>>>> benchmarks in Ignite.
> >>>>>
> >>>>> If it's not true, please make me know.
> >>>>>
> >>>>> I think it's a good idea to make benchmarks with JMH framework. It
> can
> >> be
> >>>>> load with maven, right now it's part of OpenJDK project. But it will
> be
> >>>>> included in Java 9 like standart framework for benchmarks.
> >>>>>
> >>>>> So what do you think, Igniters?
> >>>>
> >>>>
> >>
> >>
>
>


IGNITE-603

2017-02-13 Thread Александр Меньшиков
Hello, everyone.

I think the commented-out method "processSize2Test" can be removed. There
aren't class "GridDeploymentMode" which used in arguments.

Do you agree with me?


Re: IGNITE-817: [Test] Disabled tests of GridCacheOffHeapTest

2017-02-10 Thread Александр Меньшиков
Denis, thank you for information. I see JMH in pom file in benchmarks
module. So this must be removed or it's okay for micro-benchmarks?

2017-02-10 0:04 GMT+03:00 Denis Magda <dma...@apache.org>:

> Alexander,
>
> We use Yardstick benchmarks for performance measurements of Ignite:
> https://github.com/apacheignite/yardstick-ignite <https://github.com/
> apacheignite/yardstick-ignite>
>
> In 1.9 it will be much easier to execute the benchmarks directly from
> Apache Ignite bundle:
> http://apache-ignite-developers.2346864.n4.nabble.com/IGNITE-4212-Ignite-
> Benchmarking-Simplification-and-Automation-td13079.html <
> http://apache-ignite-developers.2346864.n4.nabble.com/IGNITE-4212-Ignite-
> Benchmarking-Simplification-and-Automation-td13079.html>
>
> As for GridCacheOffHeapTest it looks like a candidate for removal. *Yakov,
> Sam*, do we still need this test?
>
> > On Feb 9, 2017, at 6:25 AM, Александр Меньшиков <sharple...@gmail.com>
> wrote:
> >
> > Hello, everyone.
> >
> > GridCacheOffHeapTest  is about the Off-heap benchmarks.
> >
> > As I understand it right now we haven't clear vision how better make
> > benchmarks in Ignite.
> >
> > If it's not true, please make me know.
> >
> > I think it's a good idea to make benchmarks with JMH framework. It can be
> > load with maven, right now it's part of OpenJDK project. But it will be
> > included in Java 9 like standart framework for benchmarks.
> >
> > So what do you think, Igniters?
>
>


IGNITE-817: [Test] Disabled tests of GridCacheOffHeapTest

2017-02-09 Thread Александр Меньшиков
Hello, everyone.

GridCacheOffHeapTest  is about the Off-heap benchmarks.

As I understand it right now we haven't clear vision how better make
benchmarks in Ignite.

If it's not true, please make me know.

I think it's a good idea to make benchmarks with JMH framework. It can be
load with maven, right now it's part of OpenJDK project. But it will be
included in Java 9 like standart framework for benchmarks.

So what do you think, Igniters?


Re: Test failures

2017-02-08 Thread Александр Меньшиков
+1 to Aleksey, Alexander and Vyacheslav.

I suppose that the best option is make issue for every master-failed-test.
And fix them all.

Floating tests should be normal. I think in most case we can just add
repeating.

All new test for some not ready future should be marked like "Should be fix
in IGNITE-".


2017-02-08 15:23 GMT+03:00 Vyacheslav Daradur :

> I vote for the master-branche without failed-tests)
>
> I understand that impossible to make it quickly.
>
> We shall aim at this approach.
>
> It will be more comfortable to us to develop.
>
> 2017-02-08 12:17 GMT+03:00 Alexander Fedotov  >:
>
> > Hi,
> >
> > I would agree with Aleksey.
> > From the CI perspective, failing tests should be the main concern,
> because
> > it prevents a durable development of new features. Also, as Aleksey has
> > noted, developers working on different features could end up fixing the
> > same regressions, chances are - in different ways, resulting in merge
> > conflicts.
> > Having failing base branch, it is not possible to determine the reason
> of a
> > failure from the first sight.
> > All these points impact as the feedback from tests, so the whole process
> > agility.
> > Ideally, no new feature development should be started based on the
> failing
> > branch.
> > I think we should adopt this approach, otherwise, with growing amount of
> > features,
> > we will eventually end up spending more time dealing with regressions,
> than
> > developing features.
> >
> > Regards,
> > Alexander
> >
> > 8 февр. 2017 г. 10:50 AM пользователь "ALEKSEY KUZNETSOV" <
> > alkuznetsov...@gmail.com> написал:
> >
> > How could they co-exist ? When you developing some ticket you are risking
> > introduce bug which is reproduced by already failed test(s).
> > Moreover its time consuming to look up new failed tests when your build
> has
> > completed.
> > The last one, committers who introduced new bugs is responsible for them
> > and have to fix them, not other ones.
> > There are number of tests which i have to execute to ensure they are
> flaky
> > or permanently failed. For example this one :
> > CacheJdbcPojoStoreTest.testLoadCache()
> >
> >
> >
> > вт, 7 февр. 2017 г. в 22:10, Denis Magda :
> >
> > > Aleksey,
> > >
> > > Bugs fixing and features development are two processes that usually
> > > co-exist.
> > >
> > > Some of the committer/contributors fix tests/functionality while the
> > > others add new functionality. Someone does both.
> > >
> > > You’re welcomed to start fixing the failing tests. Are there any
> specific
> > > that annoys you most?
> > >
> > > —
> > > Denis
> > >
> > > > On Feb 7, 2017, at 8:40 AM, ALEKSEY KUZNETSOV <
> > alkuznetsov...@gmail.com>
> > > wrote:
> > > >
> > > > We have a lot of failed tests, which is frustrating. Some of them are
> > > > flaky(floating status randomly goes from succesful to failed) which
> > adds
> > > to
> > > > frustration. Perhaps, we should fix all the tests in first place, and
> > > then
> > > > continue doing tickets ?
> > > > --
> > > >
> > > > *Best Regards,*
> > > >
> > > > *Kuznetsov Aleksey*
> > >
> > > --
> >
> > *Best Regards,*
> >
> > *Kuznetsov Aleksey*
> >
>


Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-02-07 Thread Александр Меньшиков
Alexey, I ran into some difficulties.

Look at the method:
GridNearTxFinishFuture.CheckBackupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse
res)

*Throwable err* = res.checkCommittedError();

if (*err* != null) {
if (*err* instanceof IgniteCheckedException) {
*ClusterTopologyCheckedException cause* =
((IgniteCheckedException)*err*).
*getCause(ClusterTopologyCheckedException.class)*;

if (*cause* != null)
*cause.retryReadyFuture(*
cctx.nextAffinityReadyFuture(tx.topologyVersion()));
*   //^_Here update the readyFut in
some first "cause". *
}

onDone(*err*);
}


So I can't rewrite "cause" field in exception without reflection. It means
if I try to use two exception: one with "readyFut" and second without
"readyFut", I see no way to do it in this place.

Perhaps I misunderstood your original idea. Or maybe this code some kind of
a crutch and should be removed. What do you think?

2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> In the example above there is no point of setting the retry future in the
> exception because it will be serialized and sent to a remote node.
>
> I see the only possible way to ensure this: make this be verified at
> compile time. This looks like a big change, but the draft may look like so:
> 1) Introduce new exception type, like CacheTopology*Checked*Exception
> which
> *must* take the minimum topology version to wait for
> 2) In all cases when a cache operation fails because of topology change,
> provide the appropriate exception
> 3) When CacheTopologyException is being constructed, create a corresponding
> local future based on wait version and throw the exception.
>
> Even though this still does not give us 100% guarantee that we did not miss
> anything, it should cover a lot more cases than now.
>
> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
> > Alexey,
> >
> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
> > will be ignored (or will cast to String). Perhaps my message was unclear.
> > I try to say that three methods can throw "
> ClusterTopologyCheckedException"
> > without any problem.
> >
> > But you are right, in some methods I can't set "retryFuture". We must
> > ensure that exception doesn't escape away. The best option is to ignore
> it
> > (or cast to String).
> >
> > But any way, look here:
> >
> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
> boolean
> > committed, GridCacheVersion nearTxId)
> >
> > Inside you can found next lines:
> > __
> > ClusterTopologyCheckedException *cause* =
> > new ClusterTopologyCheckedException("Primary node left grid.");
> >
> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
> > commit transaction " +
> > "(transaction has been rolled back on backup node): " +
> req.version(),
> > *cause*));
> > __
> >
> > How do you unsure that *"cause"* can't come to GridCacheUtils#
> retryTopologySafe(final
> > Callable c) ?
> >
> >
> > Perhaps I don't know some rules which you use to maintain it now?
> >
> >
> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com
> >:
> >
> >> Alexander,
> >>
> >> Do you have a use-case in mind which results in IgniteTopologyException
> >> thrown from public API with retryFuture unset?
> >>
> >> retryFuture makes sense only for certain cache operations and may be set
> >> only in certain places in the code where we already know what to wait
> for.
> >> I do not see how you can initialize this future, for example, in
> >>  GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> >> byte
> >> plc)
> >>
> >> --AG
> >>
> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
> >>
> >> > I found a lot of using "ClusterTopologyCheckedException" exception.
> In
> >> > most
> >> > use-case these exceptions were just ignored. It's easier to see if
> >> > issue-4612 will be applied (I'm waiting for code review by my team
> >> leader)
> >> > where I renamed all unused exceptions in the "ignored".
> >> >
> >&

Re: Unused variables in code

2017-01-30 Thread Александр Меньшиков
Done.

My PR:
https://github.com/apache/ignite/pull/1466/files
<https://github.com/apache/ignite/pull/1466>



2017-01-25 13:11 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:

> Okay, I created issue:
> https://issues.apache.org/jira/browse/IGNITE-4612
>
>
> 2017-01-24 18:19 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>:
>
>> I think it is better to have such issue in JIRA "Minor code cleanup":
>>
>> 1) Cleanup unused imports.
>> 2) Rename e -> ignored
>> 3) Remove unused variables
>> 4)...
>> 5) PROFIT :)
>>
>> On Tue, Jan 24, 2017 at 9:46 PM, Александр Меньшиков <
>> sharple...@gmail.com>
>> wrote:
>>
>> > Hello, everyone!
>> >
>> > I found some unused variables in code. For example:
>> >
>> > Unnecessary argument in method
>> > MiniFuture#onResult(ClusterTopologyCheckedException e) in inner class
>> of
>> > GridDhtLockFuture.
>> >
>> > There isn't another onResult() method (in super class including) with
>> > another behavior. So argument can be safe delete.
>> >
>> > Also I found some ignored exceptions wasn't named "ignored". For example
>> > unused exception was named "e0" in method:
>> >
>> > GridDhtTransactionalCacheAdapter#onForceKeysError(UUID,
>> > GridDhtLockRequest,
>> > IgniteCheckedException)
>> >
>> >
>> > So I think i can go through code and cleanup thing like that.
>> >
>> > Need I create new issue in JIRA for cleanup or some issue for cleanup
>> > already exists? If that please give me a link.
>> >
>>
>>
>>
>> --
>> Alexey Kuznetsov
>>
>
>


Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-01-30 Thread Александр Меньшиков
Okay, it seems good. So you want to remove field *readyFut* from
*ClusterTopologyCheckedException*  and add *CacheTopologyCheckedException*
like *ClusterTopologyCheckedException* but initialize *readyFut* in
constructor, yes?

2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> In the example above there is no point of setting the retry future in the
> exception because it will be serialized and sent to a remote node.
>
> I see the only possible way to ensure this: make this be verified at
> compile time. This looks like a big change, but the draft may look like so:
> 1) Introduce new exception type, like CacheTopology*Checked*Exception
> which
> *must* take the minimum topology version to wait for
> 2) In all cases when a cache operation fails because of topology change,
> provide the appropriate exception
> 3) When CacheTopologyException is being constructed, create a corresponding
> local future based on wait version and throw the exception.
>
> Even though this still does not give us 100% guarantee that we did not miss
> anything, it should cover a lot more cases than now.
>
> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
> > Alexey,
> >
> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
> > will be ignored (or will cast to String). Perhaps my message was unclear.
> > I try to say that three methods can throw "
> ClusterTopologyCheckedException"
> > without any problem.
> >
> > But you are right, in some methods I can't set "retryFuture". We must
> > ensure that exception doesn't escape away. The best option is to ignore
> it
> > (or cast to String).
> >
> > But any way, look here:
> >
> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
> boolean
> > committed, GridCacheVersion nearTxId)
> >
> > Inside you can found next lines:
> > __
> > ClusterTopologyCheckedException *cause* =
> > new ClusterTopologyCheckedException("Primary node left grid.");
> >
> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
> > commit transaction " +
> > "(transaction has been rolled back on backup node): " +
> req.version(),
> > *cause*));
> > __
> >
> > How do you unsure that *"cause"* can't come to GridCacheUtils#
> retryTopologySafe(final
> > Callable c) ?
> >
> >
> > Perhaps I don't know some rules which you use to maintain it now?
> >
> >
> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com
> >:
> >
> >> Alexander,
> >>
> >> Do you have a use-case in mind which results in IgniteTopologyException
> >> thrown from public API with retryFuture unset?
> >>
> >> retryFuture makes sense only for certain cache operations and may be set
> >> only in certain places in the code where we already know what to wait
> for.
> >> I do not see how you can initialize this future, for example, in
> >>  GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> >> byte
> >> plc)
> >>
> >> --AG
> >>
> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
> >>
> >> > I found a lot of using "ClusterTopologyCheckedException" exception.
> In
> >> > most
> >> > use-case these exceptions were just ignored. It's easier to see if
> >> > issue-4612 will be applied (I'm waiting for code review by my team
> >> leader)
> >> > where I renamed all unused exceptions in the "ignored".
> >> >
> >> > It means that in some case "retryReadyFuture" can left unset. But
> there
> >> are
> >> > code where exception is being getting from "get()" method in some
> >> "future"
> >> > class and don't ignored (exception is sending to method
> >> > "GridFutureAdapter#onDone()" for example).
> >> >
> >> > So I decided not to do a deep global analysis of code and make some
> >> simple
> >> > rules.
> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
> >> ignores
> >> > it
> >> > (or converts to String, there are some variants to lose exception like
> >> > object) in all methods that call the method "X", then we can don't set
> >> > "retryReadyFuture" for optimi

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-01-30 Thread Александр Меньшиков
Alexey,

For GridCacheIoManager#sendNoRetry it's not necessary because exception
will be ignored (or will cast to String). Perhaps my message was unclear.
I try to say that three methods can throw "ClusterTopologyCheckedException"
without any problem.

But you are right, in some methods I can't set "retryFuture". We must
ensure that exception doesn't escape away. The best option is to ignore it
(or cast to String).

But any way, look here:

IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean
committed, GridCacheVersion nearTxId)

Inside you can found next lines:
__
ClusterTopologyCheckedException *cause* =
new ClusterTopologyCheckedException("Primary node left grid.");

res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
commit transaction " +
"(transaction has been rolled back on backup node): " + req.version(),
*cause*));
__

How do you unsure that *"cause"* can't come to
GridCacheUtils#retryTopologySafe(final Callable c) ?


Perhaps I don't know some rules which you use to maintain it now?


2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> Alexander,
>
> Do you have a use-case in mind which results in IgniteTopologyException
> thrown from public API with retryFuture unset?
>
> retryFuture makes sense only for certain cache operations and may be set
> only in certain places in the code where we already know what to wait for.
> I do not see how you can initialize this future, for example, in
>  GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> byte
> plc)
>
> --AG
>
> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharple...@gmail.com>:
>
> > I found a lot of using "ClusterTopologyCheckedException" exception. In
> > most
> > use-case these exceptions were just ignored. It's easier to see if
> > issue-4612 will be applied (I'm waiting for code review by my team
> leader)
> > where I renamed all unused exceptions in the "ignored".
> >
> > It means that in some case "retryReadyFuture" can left unset. But there
> are
> > code where exception is being getting from "get()" method in some
> "future"
> > class and don't ignored (exception is sending to method
> > "GridFutureAdapter#onDone()" for example).
> >
> > So I decided not to do a deep global analysis of code and make some
> simple
> > rules.
> > 1. If some method "X" throws ClusterTopologyCheckedException and ignores
> > it
> > (or converts to String, there are some variants to lose exception like
> > object) in all methods that call the method "X", then we can don't set
> > "retryReadyFuture" for optimization. But this should be clarified in
> > javadoc.
> > 2. But if exception escapes at least once like object: we must set
> > "retryReadyFuture" in that method.
> >
> > A few times when call tree was simple, I went deeper.
> >
> > So I found only three methods which can throw
> > "ClusterTopologyCheckedException" without setting "retryReadyFuture":
> >
> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
> > IgfsCommunicationMessage msg)
> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
> msg,
> > byte plc)
> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic,
> > GridCacheMessage msg, byte plc, long timeout)
> >
> > In all other methods "ClusterTopologyCheckedException" escapes away too
> > far.
> >
> > What do you think about this approach to make compromise between
> > optimization and correctness?
> >
>


IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

2017-01-27 Thread Александр Меньшиков
I found a lot of using "ClusterTopologyCheckedException" exception. In most
use-case these exceptions were just ignored. It's easier to see if
issue-4612 will be applied (I'm waiting for code review by my team leader)
where I renamed all unused exceptions in the "ignored".

It means that in some case "retryReadyFuture" can left unset. But there are
code where exception is being getting from "get()" method in some "future"
class and don't ignored (exception is sending to method
"GridFutureAdapter#onDone()" for example).

So I decided not to do a deep global analysis of code and make some simple
rules.
1. If some method "X" throws ClusterTopologyCheckedException and ignores it
(or converts to String, there are some variants to lose exception like
object) in all methods that call the method "X", then we can don't set
"retryReadyFuture" for optimization. But this should be clarified in
javadoc.
2. But if exception escapes at least once like object: we must set
"retryReadyFuture" in that method.

A few times when call tree was simple, I went deeper.

So I found only three methods which can throw
"ClusterTopologyCheckedException" without setting "retryReadyFuture":

1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
IgfsCommunicationMessage msg)
2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
byte plc)
3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic,
GridCacheMessage msg, byte plc, long timeout)

In all other methods "ClusterTopologyCheckedException" escapes away too far.

What do you think about this approach to make compromise between
optimization and correctness?


Re: Unused variables in code

2017-01-25 Thread Александр Меньшиков
Okay, I created issue:
https://issues.apache.org/jira/browse/IGNITE-4612


2017-01-24 18:19 GMT+03:00 Alexey Kuznetsov <akuznet...@apache.org>:

> I think it is better to have such issue in JIRA "Minor code cleanup":
>
> 1) Cleanup unused imports.
> 2) Rename e -> ignored
> 3) Remove unused variables
> 4)...
> 5) PROFIT :)
>
> On Tue, Jan 24, 2017 at 9:46 PM, Александр Меньшиков <sharple...@gmail.com
> >
> wrote:
>
> > Hello, everyone!
> >
> > I found some unused variables in code. For example:
> >
> > Unnecessary argument in method
> > MiniFuture#onResult(ClusterTopologyCheckedException e) in inner class of
> > GridDhtLockFuture.
> >
> > There isn't another onResult() method (in super class including) with
> > another behavior. So argument can be safe delete.
> >
> > Also I found some ignored exceptions wasn't named "ignored". For example
> > unused exception was named "e0" in method:
> >
> > GridDhtTransactionalCacheAdapter#onForceKeysError(UUID,
> > GridDhtLockRequest,
> > IgniteCheckedException)
> >
> >
> > So I think i can go through code and cleanup thing like that.
> >
> > Need I create new issue in JIRA for cleanup or some issue for cleanup
> > already exists? If that please give me a link.
> >
>
>
>
> --
> Alexey Kuznetsov
>


IGNITE-4364: DML: AssertionError: Row@ ... null 5 -> mapcol_asc_idx

2017-01-24 Thread Александр Меньшиков
Hello, Sergey and other.

I followed the instruction in issue description and I didn't get
AssertionError, but got exception looks like in issue in "Caused by" part.
I attached exception part of my logs in JIRA like file
"Ubuntu_OpenJDK_1.8.111.txt" (name is my configuration).

https://issues.apache.org/jira/browse/IGNITE-4364

So if this behavior is fine I think issue can be closed. Otherwise Could
you, Sergey, to lend me a hand and tell me does time out is invalid
behavior or some thing else?


Re: Sort nodes in the ring in order to minimize the number of reconnections

2017-01-23 Thread Александр Меньшиков
Igor, I have thought about approach what you are talking about. It need add
new field named like "sortedNodes" with custom ordering, which will have
the same items as "nodes" field, because "nodes" has being used with
default ordering in other methods. It have this advantages:

1. Method "nextNode" will look simpler.
2. Method "nextNode" will work faster, because using of method
TreeSet#higher() will be available. But that possibility had not been used
in original code. And I don't why.


But also have some disadvantages because new field "sortedNodes" will be
strongly connected with "nodes":
1. It need copy-paste all code, which modifies "nodes" in 4 other methods.
It will decrease maintainability.
2. Field "nodes" is being used with "copy-on-write" algorithm. So state of
"nodes" and "sortedNodes" can be inconsistent. Maybe it's okay, in fact I
just don't know. But any way in future it may become a problem.

So my opinion is that "presorted" approach can work a little bit faster
(number of nodes never can't be so big that O(log n) became more faster
than O(n)), but code complexity will been increased, because it will add
one logic connection inside the whole class "TcpDiscoveryNodesRing".

Yakov, can you settle our argument?

2017-01-20 16:30 GMT+03:00 Игорь Г :

> Alexander, maybe you should use presorted collection in
> TcpDiscoveryNodesRing.nextNode instead of iterating through unsorted one
> every time?
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2017-01-20 Thread Александр Меньшиков
Yakov, I changed the implementation. Please, look at my code again.

https://github.com/apache/ignite/pull/1436

2017-01-19 15:37 GMT+03:00 Yakov Zhdanov :

> Alexander, as far as I remember we talked about having cluster id set via
> TcpDiscoverySpi configuration and also via system property.
>
> What do you mean by "existence of sufficient comparator"? If we require
> that attribute is integer then we don't have any problems. If it is not
> integer then you should throw exception on start.
>
> I repeat this once again - we need to (1)prevent our users from falling
> into terrible discovery issues (which are hard to debug) if users provide
> inconsistent comparators on different nodes, but we also need to have
> (2)full flexibility and control over nodes ordering. I think if we don't
> have comparator on public API then we are still OK with both points above.
> Therefore, I would like you to implement this feature using the approach we
> agreed on before. Let me know if you want me to take a look at the code
> before you proceed.
>
> --Yakov
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2017-01-18 Thread Александр Меньшиков
Yakov, as I understand it we need add CLUSTER_REGION_ID for each nodes in
config file. And in fact using some kind of sort in nextNode method (the
search for extreme values to be exact). And the existence of valid
comparator is a sufficient condition to sort nodes to build new correct
ring. So I has thought we will not get any extra benefits (performance or
maintainability) if we close the ability for users to set their sort logic.
Code will a similar in two variants. I has thought if I show this variant
will be easier to see that variant is okay.
But if not, then I can fast change code.


Re: Sort nodes in the ring in order to minimize the number of reconnections

2017-01-18 Thread Александр Меньшиков
I done that things:

-- Add to TcpDiscoverySpi field Comparator nodeComparator
for load custom comparators from config file like bean.
-- Add implementation with old behavior: BaseNodeComparator
-- Add region id implementation: RegionNodeComparator which get map from IP
address to region ID in constructor.
-- Modified TcpDiscoveryNodesRing#nextNode for using nodeComparator for
find next node.

You can see that in PR: https://github.com/apache/ignite/pull/1436

Main question is: how to test it?

For my local test i just changed BaseNodeComparator with this odd
comparator:

new Comparator() {
@Override
public int compare(TcpDiscoveryNode t1, TcpDiscoveryNode
t2) {
//shuffle nodes
final int ans =
Long.compare((t1.internalOrder()*3L+13L)%4L,
(t2.internalOrder()*3L+13L)%4L);
return (ans==0)?t1.compareTo(t2):ans;
}
};

It's looking scary, but in fact it just consistently shuffle nodes. If you
have 4 nodes with topology versions 1, 2, 3 and 4, it will be ring: 1-4-3-2.

So I think if we just using in old test this shuffle comparator and nothing
gone wrong it's good enough.

But any way I don't know how to add that to tests.

And may be we need some test for custom comparators. But in fact comparators
just must be valid Java comparator and work the same on all nodes.

Any comments are welcome.


Re: IGNITE-4487 - NPE on query execution

2017-01-06 Thread Александр Меньшиков
http://ci.ignite.apache.org/project.html?projectId=IgniteTests=projectOverview_IgniteTests=pull%2F1388%2Fmerge

There are overview of tests. It seems okay. A little unclear because there
are tests which  failed also in master branch.

2016-12-29 19:49 GMT+03:00 Denis Magda <dma...@apache.org>:

> Alexander, thanks.
>
> Please move the ticket from “open” into “patch available” state in JIRA
> and run the tests on TeamCity. Refer to the details covered there
> https://cwiki.apache.org/confluence/display/IGNITE/How+
> to+Contribute#HowtoContribute-1.CreateGitHubpull-request <
> https://cwiki.apache.org/confluence/display/IGNITE/How+
> to+Contribute#HowtoContribute-1.CreateGitHubpull-request>
>
> —
> Denis
>
> > On Dec 29, 2016, at 3:45 AM, Александр Меньшиков <sharple...@gmail.com>
> wrote:
> >
> > Alexey, I'm already make pull request where throw exception in that
> place.
> >
> > https://github.com/apache/ignite/pull/1388/commits
> >
> > 2016-12-29 11:16 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com
> >:
> >
> >> I think that If fallbacks(...) returns an empty nodes collection, then
> we
> >> should fail with an exception.
> >>
> >> 2016-12-28 22:06 GMT+03:00 Denis Magda <dma...@apache.org>:
> >>
> >>> Alexander, added you to the contributors list. Please check that you
> can
> >>> assign the ticket on yourself.
> >>>
> >>> —
> >>> Denis
> >>>
> >>>> On Dec 28, 2016, at 2:15 AM, Александр Меньшиков <
> sharple...@gmail.com
> >>>
> >>> wrote:
> >>>>
> >>>>
> >>>> Username: sharpler
> >>>>
> >>>> Full Name: Alexander Menshikov
> >>>>
> >>>>
> >>>>
> >>>> 2016-12-27 22:57 GMT+03:00 Denis Magda <dma...@apache.org  >>> dma...@apache.org>>:
> >>>> Alexander,
> >>>>
> >>>> I need to know your JIRA ID in order to add you to the contributors
> >> list.
> >>>>
> >>>> As for your questions, this situation might be caused by the race when
> >> a
> >>> cache is being stopped and there are still scan queries running in
> >>> parallel. So, in general it’s not about data loss.
> >>>>
> >>>> Sam, Alex G., could you share your thoughts in regards to the proper
> >> fix?
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>>> On Dec 26, 2016, at 2:43 AM, Александр Меньшиков <
> >> sharple...@gmail.com
> >>> <mailto:sharple...@gmail.com>> wrote:
> >>>>>
> >>>>> Hello everyone.
> >>>>>
> >>>>> I want to pick up *https://issues.apache.org/jira/browse/IGNITE-4487
> >> <
> >>> https://issues.apache.org/jira/browse/IGNITE-4487>
> >>>>> <https://issues.apache.org/jira/browse/IGNITE-4487 <
> >>> https://issues.apache.org/jira/browse/IGNITE-4487>>* as my
> >>>>> first issue.
> >>>>>
> >>>>> Please add me as contributor.
> >>>>>
> >>>>> I already found that: in inner class
> >>>>> 'GridCacheQueryAdapter.ScanQueryFallbackClosableIterator' in
> >>> constructor is
> >>>>> called with method 'init()', but method 'init()' cannot be called
> >> with
> >>> an
> >>>>> empty field 'nodes'. In source code it looks like:
> >>>>>
> >>>>> private ScanQueryFallbackClosableIterator(int part,
> >>> GridCacheQueryAdapter
> >>>>> qry,
> >>>>>   GridCacheQueryManager qryMgr, GridCacheContext cctx) {
> >>>>>   this.qry = qry;
> >>>>>   this.qryMgr = qryMgr;
> >>>>>   this.cctx = cctx;
> >>>>>   this.part = part;
> >>>>>
> >>>>>   nodes = fallbacks(cctx.discovery().topologyVersionEx());
> >>>>>   // !!! Here nodes.isEmpty()==true, and init() will fail in
> >>> the
> >>>>> future. !!!
> >>>>>   init();
> >>>>>   }
> >>>>>
> >>>>> I can fix it by adding some check in code, but i must know what
> >>> behavior
> >>>>> are best in this case? As I understand it, the list of nodes is empty
> >>> if
> >>>>> there are no nodes with the current partition, which means data loss,
> >>> and
> >>>>> either need to return a meaningful exception, or ignore this
> >>> situation. But
> >>>>> maybe I missed something.
> >>>>
> >>>>
> >>>
> >>>
> >>
>
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-29 Thread Александр Меньшиков
I think that in the weeks after the 'new year' holidays or sooner.

2016-12-29 13:28 GMT+03:00 Yakov Zhdanov :

> Guys, I have updated the ticket.
>
> Alexander Menshikov, when do you expect the implementation to be finished?
>


Re: IGNITE-4487 - NPE on query execution

2016-12-29 Thread Александр Меньшиков
Alexey, I'm already make pull request where throw exception in that place.

https://github.com/apache/ignite/pull/1388/commits

2016-12-29 11:16 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> I think that If fallbacks(...) returns an empty nodes collection, then we
> should fail with an exception.
>
> 2016-12-28 22:06 GMT+03:00 Denis Magda <dma...@apache.org>:
>
> > Alexander, added you to the contributors list. Please check that you can
> > assign the ticket on yourself.
> >
> > —
> > Denis
> >
> > > On Dec 28, 2016, at 2:15 AM, Александр Меньшиков <sharple...@gmail.com
> >
> > wrote:
> > >
> > >
> > > Username: sharpler
> > >
> > > Full Name: Alexander Menshikov
> > >
> > >
> > >
> > > 2016-12-27 22:57 GMT+03:00 Denis Magda <dma...@apache.org  > dma...@apache.org>>:
> > > Alexander,
> > >
> > > I need to know your JIRA ID in order to add you to the contributors
> list.
> > >
> > > As for your questions, this situation might be caused by the race when
> a
> > cache is being stopped and there are still scan queries running in
> > parallel. So, in general it’s not about data loss.
> > >
> > > Sam, Alex G., could you share your thoughts in regards to the proper
> fix?
> > >
> > > —
> > > Denis
> > >
> > > > On Dec 26, 2016, at 2:43 AM, Александр Меньшиков <
> sharple...@gmail.com
> > <mailto:sharple...@gmail.com>> wrote:
> > > >
> > > > Hello everyone.
> > > >
> > > > I want to pick up *https://issues.apache.org/jira/browse/IGNITE-4487
> <
> > https://issues.apache.org/jira/browse/IGNITE-4487>
> > > > <https://issues.apache.org/jira/browse/IGNITE-4487 <
> > https://issues.apache.org/jira/browse/IGNITE-4487>>* as my
> > > > first issue.
> > > >
> > > > Please add me as contributor.
> > > >
> > > > I already found that: in inner class
> > > > 'GridCacheQueryAdapter.ScanQueryFallbackClosableIterator' in
> > constructor is
> > > > called with method 'init()', but method 'init()' cannot be called
> with
> > an
> > > > empty field 'nodes'. In source code it looks like:
> > > >
> > > > private ScanQueryFallbackClosableIterator(int part,
> > GridCacheQueryAdapter
> > > > qry,
> > > >GridCacheQueryManager qryMgr, GridCacheContext cctx) {
> > > >this.qry = qry;
> > > >this.qryMgr = qryMgr;
> > > >this.cctx = cctx;
> > > >this.part = part;
> > > >
> > > >nodes = fallbacks(cctx.discovery().topologyVersionEx());
> > > >// !!! Here nodes.isEmpty()==true, and init() will fail in
> > the
> > > > future. !!!
> > > >init();
> > > >}
> > > >
> > > > I can fix it by adding some check in code, but i must know what
> > behavior
> > > > are best in this case? As I understand it, the list of nodes is empty
> > if
> > > > there are no nodes with the current partition, which means data loss,
> > and
> > > > either need to return a meaningful exception, or ignore this
> > situation. But
> > > > maybe I missed something.
> > >
> > >
> >
> >
>


Re: IGNITE-4487 - NPE on query execution

2016-12-28 Thread Александр Меньшиков
Username: sharpler

Full Name: Alexander Menshikov



2016-12-27 22:57 GMT+03:00 Denis Magda <dma...@apache.org>:

> Alexander,
>
> I need to know your JIRA ID in order to add you to the contributors list.
>
> As for your questions, this situation might be caused by the race when a
> cache is being stopped and there are still scan queries running in
> parallel. So, in general it’s not about data loss.
>
> Sam, Alex G., could you share your thoughts in regards to the proper fix?
>
> —
> Denis
>
> > On Dec 26, 2016, at 2:43 AM, Александр Меньшиков <sharple...@gmail.com>
> wrote:
> >
> > Hello everyone.
> >
> > I want to pick up *https://issues.apache.org/jira/browse/IGNITE-4487
> > <https://issues.apache.org/jira/browse/IGNITE-4487>* as my
> > first issue.
> >
> > Please add me as contributor.
> >
> > I already found that: in inner class
> > 'GridCacheQueryAdapter.ScanQueryFallbackClosableIterator' in
> constructor is
> > called with method 'init()', but method 'init()' cannot be called with an
> > empty field 'nodes'. In source code it looks like:
> >
> > private ScanQueryFallbackClosableIterator(int part,
> GridCacheQueryAdapter
> > qry,
> >GridCacheQueryManager qryMgr, GridCacheContext cctx) {
> >this.qry = qry;
> >this.qryMgr = qryMgr;
> >this.cctx = cctx;
> >this.part = part;
> >
> >nodes = fallbacks(cctx.discovery().topologyVersionEx());
> >// !!! Here nodes.isEmpty()==true, and init() will fail in the
> > future. !!!
> >init();
> >}
> >
> > I can fix it by adding some check in code, but i must know what behavior
> > are best in this case? As I understand it, the list of nodes is empty if
> > there are no nodes with the current partition, which means data loss, and
> > either need to return a meaningful exception, or ignore this situation.
> But
> > maybe I missed something.
>
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-27 Thread Александр Меньшиков
Yes, i can. But someone needs to give me the rights of contributor in Jira.

2016-12-27 17:07 GMT+03:00 Vyacheslav Daradur :

> I have described a task: https://issues.apache.org/jira/browse/IGNITE-4501
>
> and linked a bug https://issues.apache.org/jira/browse/IGNITE-4499
>
> Alex Menshikov, maybe you will take her?
>
>
> 2016-12-27 13:32 GMT+03:00 Alexei Scherbakov  >:
>
> > 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov :
> >
> > > >>
> > > My main concern here is code complexity. Yakov, how difficult it is to
> > > stick a new node in an arbitrary spot of a discovery ring?
> > > >>
> > >
> > > Dmitry, I think this is not hard. At least I don't see any issue now.
> > >
> > > >>
> > > I think the NodeComparator approach will work. User can chose how to
> sort
> > > nodes from one rack before nodes from another rack. Same goes for
> > subnets,
> > > or data centers.
> > > >>
> > >
> > > Dmitry, can you please explain why you enforce user to write code? This
> > > does not seem convenient to me at all. If user wants to write code then
> > he
> > > can do it for calculating proper arc_id.
> > >
> >
> > Yakov, where is no need to for user to write code. We can provide two
> > default Comparator implementations:
> > first based on IP address(default), and second based on node attribute.
> > User just plugs one of the implementations and adds node attribute to
> node
> > config in second case - let it be ARC_ID by default.
> >
> >
> > >
> > > Another point I already posted to this thread - this is very error
> prone.
> > >
> > > >>
> > > I am strongly against giving user an opportunity to point exact place
> in
> > > the ring with somewhat like this interface [int getIdex(Node newNode,
> > > List currentRing)]. This is very error prone and may require
> tricky
> > > consistency checks just to make sure that implementation of this
> > interface
> > > is consistent along the topology.
> > > With "arcs" approach user can automatically assign proper ids basing on
> > > physical network topology and network routes.
> > > >>
> > >
> > > I still think arc_id is better:
> > > 1. No code from user side. Only env variable or system property on a
> > > machine.
> > > 2. All code inside Ignite - easy to fix and change if required.
> > > 3. All benefits of comparator are still available.
> > >
> >
> > I suppose my approach is more generic and also matches listed
> requirements.
> >
> >
> > >
> > > Alex, I still don't get how you (and other guys as well) want to deal
> > with
> > > latencies here. I would like you explain how you solve this - you have
> > 1000
> > > IP addresses, and you need to sort them in your beloved latency order,
> > but
> > > please note that you need to get exactly the same ring on all of these
> > 1000
> > > machines.
> > >
> >
> > Calculating latencies are beyond scope of generic approach of nodes
> > ordering.
> > It's just of one of possible NodeComparator implementations.
> > Let's not bother this it right now.
> >
> >
> > >
> > > --Yakov
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-26 Thread Александр Меньшиков
> Can you please explain why this is better than arc approach?

We had a misunderstanding. Everything okay with arc approach. But we must
choose how nodes will determine "ARC_ID", and i think it can be calculated
from latency values. If users will be able to set "ARC_ID" in config file
then they can set different 'ARC_ID' on all nodes, and, in fact, point
exact place in the ring, what we would like to avoid.

2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur :

> >>
> Vyacheslav, I agree that latency increase in the way you describe, but I
> still don't understand how we use this information in discovery. Latency
> may differ from time to time depending on many factors. I still think that
> arc approach is more intuitive for user and easier to implement.
> >>
>
> Way of latency increase is just a main idea.
>
> I suggest to connect new node on some priority.
> General approach:
> --
> if [ there are same host node ] then [ connect with it ]
> else if [ there are same subnet nodes] then [ connect with one of them ]
>  // how to choose node from a set of subnet? - choose with min latency each
> other
> else [ connect to remote nodes ] // how to choose node from a set of
> remotes? - choose with min latency each other
> --
> Maybe we can describe another intermediate steps.
>
>
> 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov :
>
> > >>
> > I just want to understand which benefits we get when implement what we're
> > talking about. If major benefit is reduced latency of ring messages, then
> > the assignment 'ARC ID' in accordance with latency value is quite
> > enough. But if there are any hidden problems because of the large number
> of
> > reconnection (like I described in first message in this discussion), then
> > better to find a way to determine real physical location.
> > >>
> >
> > I suggest to solve ring building up and reducing number of reconnects
> > separately. If we have AxB-C-D-A then A will try to reconnect to B, then
> to
> > C, then to D. This is how discovery works now. I agree this should be
> fixed
> > and I have couple ideas on how we can do it but let's separate these
> ones.
> >
> > >>
> > Okey, then i think Vyacheslav's idea (using latency values) is quite
> enough
> > when we can't determine real physical location.
> > >>
> >
> > Can you please explain why this is better than arc approach?
> >
> > --Yakov
> >
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-26 Thread Александр Меньшиков
> I am afraid I did not understand this at all. Please elaborate.

I just want to understand which benefits we get when implement what we're
talking about. If major benefit is reduced latency of ring messages, then
the assignment 'ARC ID' in accordance with latency value is quite
enough. But if there are any hidden problems because of the large number of
reconnection (like I described in first message in this discussion), then
better to find a way to determine real physical location.

> And, yes, proper built ring should reduce latency of ring messages IMO.

Okey, then i think Vyacheslav's idea (using latency values) is quite enough
when we can't determine real physical location.

2016-12-26 13:03 GMT+03:00 Yakov Zhdanov :

> >Then, as I understand it, a lot of reconnection in the ring cannot create
> even temporary but major problems for performance. And in general this
> optimization will change practically nothing. Or am I missing some things?
>
> I am afraid I did not understand this at all. Please elaborate.
>
> I did not suggest any reconnections or ring rebuild. All I suggest is to
> control over ring building process with arcs. And, yes, proper built ring
> should reduce latency of ring messages IMO.
>
> --Yakov
>


IGNITE-4487 - NPE on query execution

2016-12-26 Thread Александр Меньшиков
Hello everyone.

I want to pick up *https://issues.apache.org/jira/browse/IGNITE-4487
* as my
first issue.

Please add me as contributor.

I already found that: in inner class
'GridCacheQueryAdapter.ScanQueryFallbackClosableIterator' in constructor is
called with method 'init()', but method 'init()' cannot be called with an
empty field 'nodes'. In source code it looks like:

private ScanQueryFallbackClosableIterator(int part, GridCacheQueryAdapter
qry,
GridCacheQueryManager qryMgr, GridCacheContext cctx) {
this.qry = qry;
this.qryMgr = qryMgr;
this.cctx = cctx;
this.part = part;

nodes = fallbacks(cctx.discovery().topologyVersionEx());
// !!! Here nodes.isEmpty()==true, and init() will fail in the
future. !!!
init();
}

I can fix it by adding some check in code, but i must know what behavior
are best in this case? As I understand it, the list of nodes is empty if
there are no nodes with the current partition, which means data loss, and
either need to return a meaningful exception, or ignore this situation. But
maybe I missed something.


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-26 Thread Александр Меньшиков
Thank you, Denis, for your explanation. Then, as I understand it, a lot of
reconnection in the ring cannot create even temporary but major problems
for performance. And in general this optimization will change practically
nothing. Or am I missing some things?

2016-12-26 12:20 GMT+03:00 Yakov Zhdanov :

> >>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
> >>
>
> Vyacheslav, please elaborate on how we can determine whether we are on the
> same rack. I am not sure this is possible in general case. Please see my
> suggestions below.
>
> >>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
> >>
>
> I don't think this may cause issues. Nodes ordering and placement is
> implemented in TcpDiscoveryNodesRing and I think that we will just need to
> alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#
> nextNode(java.util.Collection discovery.tcp.internal.TcpDiscoveryNode>)
> logic.
>
> As far as design of this, I would suggest the following.
>
> 1.  User should have an ability to define ARC_ID for the node. I suggest
> "arc" for this since we are using "ring" concept. This will be the most
> honored characteristic for nodes placement. By default arc_id is 0 and
> possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
> via TcpDiscoverySpi.setArcId() - new method.
> So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
> to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
> represent different racks or data centers.
>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
>
> 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
> placed side by side in the same arc.
>
> 3. Physical host - 3rd honored parameter. Nodes on the same physical host
> should be placed together automatically in the same arc.
>
> 4. New mode involving points 1-3 should become default and we should also
> provide ability to switch to current mode which should become legacy.
>
> --Yakov
>


Re: Sort nodes in the ring in order to minimize the number of reconnections

2016-12-23 Thread Александр Меньшиков
I in fact worried about the following situation:

Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
and D,E,F was been broken. But nodes will detect the fact of the
unavailability of nodes not at the same time. And meanwhile the client will
perform transactional operations. Transactions may rollback many times in
the following sequence of events:

0. Everything is fine: A->F->B->E->C->D->A.
1. Connection between A,B,C and D,E,F is broken.
2. "A" sees "F" falls out of topology and reconnect to "B", all
transactions using the "F" are rolled back and begin with backup node ("B",
for example).
3. After that "B" sees "E" falls out of topology and reconnect to "C", all
transaction using "E" are rolled back and begin with backup node ("C", for
example).
4. After that "C" sees "D" falls out of topology and reconnect to "A", all
transaction using "D" are rolled back and begin with backup node ("A", for
example).

And we get 3 different set of rollbacks, instead one set of rollbacks.

2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Hi Vyacheslav,
>
> Discovery logic is incapsulated in TcpDiscoverySpi.
> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
> The only purpose of the IP finder is to provide list of addresses where a
> node can send initial join request, and the fact that it sends this initial
> request to node A doesn't actually mean that it will be connected to A
> within a ring. Having said that, I doubt that IP finder will be somehow
> affected in case the discussed change is implemented.
>
> Discovery protocol already maintains consistent information about the ring,
> so any node in topology already knows everything about other nodes,
> including ordering in the ring. So on discovery level it should not be very
> difficult to customize where a joining node is placed on the ring.
>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
>
> -Val
>
> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <daradu...@gmail.com>
> wrote:
>
> > Thanks for reply.
> >
> > I have some questions:
> >
> > 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
> and
> > TcpDiscoveryMulticastIpFinder?
> >
> > 2. Which standart Ignite metrics you can recommend to use for
> > node-ordering?
> >
> > 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
> >
> > > I think having some user-defined ordering can be beneficial. However,
> we
> > > are only talking about node discovery protocol here to maintain the
> > > cluster. All other communication between nodes happens directly (does
> not
> > > go through the ring).
> > >
> > > D.
> > >
> > > On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello, Alex!
> > > >
> > > > I think it is a great idea.
> > > >
> > > > I suggest to build communications between nodes on weight (or
> > priority).
> > > >
> > > > For example, ordering on latency:
> > > > - nodes on one host = 1
> > > > - nodes in one rack-blade = 2
> > > > - nodes in one server-rack = 3
> > > > - nodes in one physical cluster = 4
> > > > - nodes in one subnet = 5
> > > > - etc.
> > > >
> > > > Maybe it'll be better to use some metrics from ClusterMetrics
> > interface.
> > > >
> > > > The algorithm of ordering can be implemented in a class such as
> > > Comparator
> > > > and use it when we build a cluster or we select a place for a new
> node.
> > > >
> > > > --
> > > > With best regards,
> > > > Vyacheslav Daradur
> > > >
> > > > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sharple...@gmail.com
>

Re: [jira] [Created] (IGNITE-4487) NPE on query execution

2016-12-23 Thread Александр Меньшиков
I found that: in inner class
'GridCacheQueryAdapter.ScanQueryFallbackClosableIterator' in constructor is
called with method 'init()', but method 'init()' cannot be called with an
empty field 'nodes'. In source code it looks like:


private ScanQueryFallbackClosableIterator(int part, GridCacheQueryAdapter
qry,
GridCacheQueryManager qryMgr, GridCacheContext cctx) {
this.qry = qry;
this.qryMgr = qryMgr;
this.cctx = cctx;
this.part = part;

nodes = fallbacks(cctx.discovery().topologyVersionEx());
// !!! Here nodes.isEmpty()==true, and init() will fail in the
future. !!!
init();
}


I can fix it by adding some check in code, but i must know what behavior
are best in this case? As I understand it, the list of nodes is empty if
there are no nodes with the current partition, which means data loss, and
either need to return a meaningful exception, or ignore this situation. But
maybe I missed something.


2016-12-23 14:47 GMT+03:00 Dmitry Karachentsev (JIRA) :

> Dmitry Karachentsev created IGNITE-4487:
> ---
>
>  Summary: NPE on query execution
>  Key: IGNITE-4487
>  URL: https://issues.apache.org/jira/browse/IGNITE-4487
>  Project: Ignite
>   Issue Type: Bug
>   Components: cache
> Affects Versions: 1.8
> Reporter: Dmitry Karachentsev
>  Fix For: 2.0
>
>
> NPE may be thrown when called destroyCache() and started querying.
>
> Attached example reproduces this case when 
> GridDiscoveryManager#removeCacheFilter
> called but cache state haven't been changed to STOPPED
> org.apache.ignite.internal.processors.cache.GridCacheGateway#onStopped.
> {code:none}
> javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException:
> null
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxy.query(IgniteCacheProxy.java:740)
> at com.intellica.evam.engine.event.future.FutureEventWorker.
> processFutureEvents(FutureEventWorker.java:117)
> at com.intellica.evam.engine.event.future.FutureEventWorker.run(
> FutureEventWorker.java:66)
> Caused by: class org.apache.ignite.IgniteCheckedException: null
> at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> executeQuery(GridQueryProcessor.java:1693)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxy.query(IgniteCacheProxy.java:494)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxy.query(IgniteCacheProxy.java:732)
> ... 2 more
> Caused by: java.lang.NullPointerException
> at org.apache.ignite.internal.processors.cache.query.
> GridCacheQueryAdapter$ScanQueryFallbackClosableIterator.init(
> GridCacheQueryAdapter.java:712)
> at org.apache.ignite.internal.processors.cache.query.
> GridCacheQueryAdapter$ScanQueryFallbackClosableIterator.(
> GridCacheQueryAdapter.java:677)
> at org.apache.ignite.internal.processors.cache.query.
> GridCacheQueryAdapter$ScanQueryFallbackClosableIterator.(
> GridCacheQueryAdapter.java:628)
> at org.apache.ignite.internal.processors.cache.query.
> GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:548)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxy$2.applyx(IgniteCacheProxy.java:497)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxy$2.applyx(IgniteCacheProxy.java:495)
> at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> apply(IgniteOutClosureX.java:36)
> at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> executeQuery(GridQueryProcessor.java:1670)
> ... 4 more
> {code}
>
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Sort nodes in the ring in order to minimize the number of reconnections

2016-12-22 Thread Александр Меньшиков
Hello everyone,

As far as I know nodes are connected in a ring. For example if i have 6
nodes, with names A, B, C, D, E, and F they can connect in ring any
possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node falls
out of topology neighboring nodes must reconnect. If nodes A,B and C
located in the same physical location, and D, E and F in another, and in
some time one physical location is not available in another, we can get
different number of reconnections. Best case scenario if we have ring like
A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
reconnect to A or F reconnect to D -- depending on what part of the cluster
we leave alive). But now possible that case AxFxBxExCxDxA -- then we get a
lot of reconnections (A to B, B to C, C to A -- in general n/2
reconnections, where n -- number of nodes). And i think to add something to
ensure that we always have good sorting of nodes connections
(A-B-C-...-Z-A).

Of course in real world we can have multiple levels of physical closeness.

In my opinion enough to add one parameter of 'int' to configuration (with
name like 'ExtraNodeOrder') and to change the method of comparison nodes so
that it first compared the 'ExtraNodeOrder', and then according to the old
criterion (as far as I know Ignite use topology version). So if some users
have multiple levels of physical closeness, he can use different bits. For
example use 16 high bits for DC number, and low 16 bits for racks.

Alternatively, we can add array of ‘int’ to configuration and compare nodes
in sequence from the zero element to the last.


Re: Hello

2016-12-21 Thread Александр Меньшиков
Denis,

Thanks for your reply.

My question is not about data locality. As far as I know nodes are
connected in a ring. For example if i have 6 nodes, with names A, B, C, D,
E, and F they can connect in ring any possible way: A-B-C-D-E-F-A, or
A-F-B-E-C-D-A, etc. And if some node falls out of topology neighboring
nodes must reconnect. If nodes A,B and C located in the same physical
location, and D, E and F in another, and in some time one physical location
is not available in another, we can get different number of reconnections.
Best case scenario if we have ring like A-B-CxD-E-FxA ('x' mean disconnect)
-- then we get only one reconnect (C reconnect to A or F reconnect to D --
depending on what part of the cluster we leave alive). But now possible
that case AxFxBxExCxDxA -- then we get a lot of reconnections (A to B, B to
C, C to A -- in general n/2 reconnections, where n -- number of nodes). And
i think to add something to ensure that we always have good sorting of
nodes connections (A-B-C-...-Z-A).

2016-12-20 23:05 GMT+03:00 Denis Magda <dma...@apache.org>:

> Alexander,
>
> CC-ing the dev list to keep everyone in track.
>
> I hope the case you’re talking about can be already implemented by the
> usage of RendezvousAffinityFunction.setAffinityBackupFilter(…) method
> [1]. This method gives you a way to control a locality of your primary and
> backup nodes. Depending on the filter implementation specific nodes might
> end up in one physical location while the rest in the others.
>
> [1] https://ignite.apache.org/releases/1.7.0/javadoc/org/
> apache/ignite/cache/affinity/rendezvous/RendezvousAffinityFunction.html
>
> —
> Denis
>
> On Dec 20, 2016, at 7:02 AM, Александр Меньшиков <sharple...@gmail.com>
> wrote:
>
> I think in near time i will take some "newbie" tickets, but for future
> want to ask about ring topology in Ignite. In case where we have big
> physically stretched claster ring can be not optimal. For example: Node-1
> in 1-physical location, Node-2 in  2-physical location,  Node-3 in
> 1-physical location,  Node-4 in  2-physical location and ect. So if one
> physical location to fall out of the cluster, we get more breaks in ring
> than the theoretical minimum. I want to add an additional order in the
> process of constructing a ring so that ensure physically close nodes were
> close in topology.
>
> 2016-11-24 0:01 GMT+03:00 Denis Magda <dma...@apache.org>:
>
>> Here are some ticket you may want to start with
>> https://ignite.apache.org/community/contribute.html#pick-ticket
>>
>> —
>> Denis
>>
>> On Nov 23, 2016, at 5:33 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
>> wrote:
>>
>> Hi Alex,
>>
>> Welcome to the Ignite community!
>>
>> Once you are familiar with the contribution process, please see if there
>> is
>> a specific Jira ticket you would like work on. We will help you get
>> started.
>>
>> D.
>>
>> On Wed, Nov 23, 2016 at 2:24 AM, Vladislav Pyatkov <vpyat...@gridgain.com
>> >
>> wrote:
>>
>> Please read https://ignite.apache.org/community/contribute.html#contribu
>> te
>>
>> Looking forward for your contributions!
>>
>> On Wed, Nov 23, 2016 at 1:15 PM, Александр Меньшиков <
>> sharple...@gmail.com
>>
>>
>> wrote:
>>
>> I want to contribute.
>>
>> 2016-11-23 13:04 GMT+03:00 Vladislav Pyatkov <vpyat...@gridgain.com>:
>>
>> Hi Alex,
>>
>> This is Apache Ignite development community.
>> Are you want to contribute to Ignite or you are have any suggestion?
>>
>> On Wed, Nov 23, 2016 at 12:41 PM, Александр Меньшиков <
>> sharple...@gmail.com>
>> wrote:
>>
>> Hello
>>
>>
>>
>>
>>
>>
>
>


Re: Hello

2016-11-23 Thread Александр Меньшиков
I want to contribute.

2016-11-23 13:04 GMT+03:00 Vladislav Pyatkov <vpyat...@gridgain.com>:

> Hi Alex,
>
> This is Apache Ignite development community.
> Are you want to contribute to Ignite or you are have any suggestion?
>
> On Wed, Nov 23, 2016 at 12:41 PM, Александр Меньшиков <
> sharple...@gmail.com>
> wrote:
>
> > Hello
> >
>