Re: Re[2]: [VOTE] Apache Ignite PMC Chair

2019-10-31 Thread Ilya Lantukh
+1 Alexey Goncharuk

On Thu, Oct 31, 2019 at 12:53 PM Sergey Antonov 
wrote:

> +1 Dmitry Pavlov
>
> чт, 31 окт. 2019 г. в 13:59, Вячеслав Коптилин :
>
> > +1 Dmitry Pavlov
> >
> > Thanks,
> > S.
> >
> > чт, 31 окт. 2019 г. в 12:31, Алексей Платонов :
> >
> > > +1 for Dmitry Pavlov
> > >
> > > чт, 31 окт. 2019 г. в 01:23, Valentin Kulichenko <
> > > valentin.kuliche...@gmail.com>:
> > >
> > > > +1 Dmitry Pavlov (binding)
> > > >
> > > > On Wed, Oct 30, 2019 at 12:55 PM Alex Plehanov <
> > plehanov.a...@gmail.com>
> > > > wrote:
> > > >
> > > > > + 1 Dmitry Pavlov
> > > > >
> > > > > ср, 30 окт. 2019 г. в 20:50, Pavel Kovalenko :
> > > > >
> > > > > > +1 for Dmitry Pavlov
> > > > > >
> > > > > > ср, 30 окт. 2019 г. в 18:46, Alexei Scherbakov <
> > > > > > alexey.scherbak...@gmail.com
> > > > > > >:
> > > > > >
> > > > > > > +1 for Dmitry Pavlov
> > > > > > >
> > > > > > > ср, 30 окт. 2019 г. в 18:22, aealexsandrov <
> > > aealexsand...@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > +1 Alexey Goncharuk
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sent from:
> > > http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> BR, Sergey Antonov
>


Re: DataRegionConfiguration is a FINAL class but could we make it not final?

2019-03-21 Thread Ilya Lantukh
Hi,

I couldn't find any strong reason why this class needs to be final. So, if
that is a restriction for you, feel free to create a ticket and submit a
pull request.

On Fri, Mar 15, 2019 at 11:47 PM hlopez  wrote:

> Is there a very good reason why the DataRegionConfiguration is a *FINAL*
> class?
>
> I would like to be able to extend the DataRegionConfiguration.
>
> In my extended class I would like to add a new method for the
> *setMaxSize()*
> and *setInitialSize()* for example where the input is of the type
> *org.springframework.util.unit.DataSize* instead of *long*.
>
> This way I can do *super.setMaxSize(maxSize.toBytes())* which will convert
> a
> property set in my application.properties in a form like *"1GB"* to the
> corresponding bytes by Spring Boot automatically.
>
> see [1]  Properties Conversion
> <
> https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config-conversion-datasize>
>
>
> [1]
>
> https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config-conversion-datasize
>
> The only option I have now is to Wrap the class and delegate the methods,
> but that makes me create two beans instead of one for each
> DataRegionConfiguration.
>
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: New committer: Ilya Lantukh

2019-02-15 Thread Ilya Lantukh
Thanks!

On Fri, Feb 15, 2019 at 3:03 PM Andrey Mashenkov 
wrote:

> Congrats, Ilya.
>
> On Fri, Feb 15, 2019 at 2:33 PM Anton Vinogradov  wrote:
>
> > Congrats!
> >
> > On Fri, Feb 15, 2019 at 2:23 PM Павлухин Иван 
> wrote:
> >
> > > Ilya,
> > >
> > > My congratulations! Hope to hear from you about hair-splitting in
> > > context of affinity and topology.
> > >
> > > пт, 15 февр. 2019 г. в 13:44, Dmitriy Pavlov :
> > > >
> > > > Congrats, Ilya. I'm glad that it eventually happened.
> > > >
> > > > пт, 15 февр. 2019 г. в 11:04, Alexey Goncharuk <
> agoncha...@apache.org
> > >:
> > > >
> > > > > Dear Ignite Developers,
> > > > >
> > > > >
> > > > >
> > > > > The Project Management Committee (PMC) for Apache Ignite has
> invited
> > > Ilya
> > > > > Lantukh to become a committer and we are pleased to announce that
> he
> > > has
> > > > > accepted.
> > > > >
> > > > >
> > > > >
> > > > > Ilya has a long history contributing to Apache Ignite, including a
> > lot
> > > of
> > > > > complex changes allowed us to enable and support native
> persistence.
> > > > >
> > > > > Being a committer enables easier contribution to the project since
> > > there is
> > > > > no need to go via the patch submission process. This should enable
> > > better
> > > > > productivity.
> > > > >
> > > > >
> > > > >
> > > > > Please join me in welcoming Ilya, and congratulating him on the new
> > > role in
> > > > > the Apache Ignite Community.
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Alexey Goncharuk
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


Re: Default failure handler was changed for tests

2019-01-11 Thread Ilya Lantukh
Dmitry,

It doesn't make sense to run that test now because the root cause of it's
failure had been fixed.

You should verify that getting an unhandled exception in any system thread
leads to the failure of currently running test.

On Fri, Jan 11, 2019 at 12:16 PM Dmitrii Ryabov 
wrote:

> Ilya, can you check your test on current implementation [1]?
>
> [1] https://github.com/apache/ignite/pull/5662
>
> 10 дек. 2018 г. 17:10 пользователь "Dmitriy Pavlov" 
> написал:
>
> Reverted.
>
> https://issues.apache.org/jira/browse/IGNITE-8227 reopened
>
> пн, 10 дек. 2018 г. в 16:23, Dmitriy Pavlov :
>
>
> > Anton, I was expecting that you revert, because you wanted to do it.
> >
> > Provided that I agree that fix could be reverted because of both
> > functional and style possible improvements, does not mean I believe it is
> > the only option and it should be reverted.
> >
> > Even if I agree to revert doesn't mean all community agrees, so reverting
> > just 1 minute after writing to dev list would be strange. I believe we
> > should be courteous enough to give a couple of days for people to come
> and
> > give feedback.
> >
> > So if you have a spare minute, please go ahead. If not, I can do it
> later.
> >
> > пн, 10 дек. 2018 г. в 14:23, Anton Vinogradov :
> >
> >> Dmitriy,
> >>
> >> You confirmed that fix should be reverted and reworked last Friday.
> >> Why it still not reverted?
> >>
> >> On Mon, Dec 10, 2018 at 12:46 AM Dmitrii Ryabov 
> >> wrote:
> >>
> >> > Agree, it is reasonable to revert.
> >> > пт, 7 дек. 2018 г. в 18:44, Dmitriy Pavlov :
> >> > >
> >> > > Hi Ilya,
> >> > >
> >> > > thank you for noticing.
> >> > >
> >> > > Calling to fail is equal to re-throw,
> >> > >
> >> > > throw new AssertionFailedError(message);
> >> > >
> >> > > So, yes, for now it is absolutely valid reason to revert and rework
> >> fix
> >> > >
> >> > > - as Nikolay suggested to reduce method override ocurrences.
> >> > > - and with transferring this exception into GridAbstractTest and
> >> > > correctly failing test.
> >> > >
> >> > > Sincerely,
> >> > > Dmitriy Pavlov
> >> > >
> >> > >
> >> > > пт, 7 дек. 2018 г. в 18:38, Ilya Lantukh :
> >> > >
> >> > > > Unfortunately, this FailureHandler doesn't seem to work. I wrote a
> >> test
> >> > > > that reproduces a bug and should fail. It prints the following
> text
> >> > into
> >> > > > log, but the test still passes "successfully":
> >> > > >
> >> > > > [2018-12-07
> >> > > >
> >> > > >
> >> >
> >>
> 18:28:23,800][ERROR][sys-stripe-1-#345%recovery.GridPointInTimeRecoveryCacheNoAffinityExchangeTest1%][IgniteTestResources]
> >> > > > Critical system error detected. Will be handled accordingly to
> >> > configured
> >> > > > handler [hnd=TestFailingFailureHandler [],
> failureCtx=FailureContext
> >> > > > [type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Unable
> to
> >> > find
> >> > > > consistentId by UUID [nodeId=80dd2ec6-1913-4a5c-a839-630315c3,
> >> > > > topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0
> >> > > > java.lang.IllegalStateException: Unable to find consistentId by
> UUID
> >> > > > [nodeId=80dd2ec6-1913-4a5c-a839-630315c3,
> >> > > > topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]
> >> > > > at
> >> > > >
> >> > > >
> >> >
> >>
> org.apache.ignite.internal.managers.discovery.ConsistentIdMapper.mapToCompactId(ConsistentIdMapper.java:62)
> >> > > > at
> >> > > >
> >> > > >
> >> >
> >>
> org.apache.ignite.internal.managers.discovery.ConsistentIdMapper.mapToCompactIds(ConsistentIdMapper.java:123)
> >> > > > at
> >> > > >
> >> > > >
> >> >
> >>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.newTxRecord(IgniteTxManager.java:2

Re: Default failure handler was changed for tests

2018-12-07 Thread Ilya Lantukh
Unfortunately, this FailureHandler doesn't seem to work. I wrote a test
that reproduces a bug and should fail. It prints the following text into
log, but the test still passes "successfully":

[2018-12-07
18:28:23,800][ERROR][sys-stripe-1-#345%recovery.GridPointInTimeRecoveryCacheNoAffinityExchangeTest1%][IgniteTestResources]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=TestFailingFailureHandler [], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Unable to find
consistentId by UUID [nodeId=80dd2ec6-1913-4a5c-a839-630315c3,
topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0
java.lang.IllegalStateException: Unable to find consistentId by UUID
[nodeId=80dd2ec6-1913-4a5c-a839-630315c3,
topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0]]
at
org.apache.ignite.internal.managers.discovery.ConsistentIdMapper.mapToCompactId(ConsistentIdMapper.java:62)
at
org.apache.ignite.internal.managers.discovery.ConsistentIdMapper.mapToCompactIds(ConsistentIdMapper.java:123)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.newTxRecord(IgniteTxManager.java:2507)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.logTxRecord(IgniteTxManager.java:2483)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1226)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1054)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.startRemoteTx(IgniteTxHandler.java:1836)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxPrepareRequest(IgniteTxHandler.java:1180)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$400(IgniteTxHandler.java:118)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:222)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:220)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1059)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:383)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:309)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)


On Thu, Dec 6, 2018 at 4:01 PM Anton Vinogradov  wrote:

> >> We stop, for now, then you will chill a
> >> little bit, then you will have an absolutely fantastic weekend, and then
> on
> >> Monday, Dec 10 we will continue this discussion in a positive and
> >> constructive manner.
> Agree
>
> On Thu, Dec 6, 2018 at 3:55 PM Nikolay Izhikov 
> wrote:
>
> > Anton.
> >
> > I discussed this fix privately with Dmitriy Pavlov.
> >
> > 1. We had NoOpHandler for ALL tests before this merge.
> > 2. Dmitry Ryabov will remove all copypasted code soon.
> >
> > So, this fix make things better.
> >
> > I think we shouldn't revert it.
> >
> > I think we should continue work to turn off NoOpHandler in all tests.
> >
> > Dmitriy Pavlov, can you do it, as a committer of this patch?
> >
> > On 12/6/18 3:02 PM, Anton Vinogradov wrote:
> > >>> I still hope Anton will do the first bunch of tests research to
> > > demonstrate
> > >>> the idea.
> > >
> > > Dmitriy,
> > > Just want to remind you that we already spend time here because of
> > > unacceptable code merge situation.
> > > Such merges should NEVER happen again.
> > > Please, next time make sure that code you merge has no massive
> > duplication
> > > and fixes without proper reason investigation.
> > > Committer always MUST be ready to explain each symbol inside code he
> > merged.
> > > The situation when you have no clue why it written this way
> unacceptable.
> > >
> > > Feel free to start a discussion at private in case you have some
> > objections.
> > > 

[jira] [Created] (IGNITE-10392) Client broken cluster where try to connect. Server nodes drop by handler

2018-11-23 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10392:
-

 Summary: Client broken cluster where try to connect. Server nodes 
drop by handler
 Key: IGNITE-10392
 URL: https://issues.apache.org/jira/browse/IGNITE-10392
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


{noformat}
org.apache.ignite.IgniteException: Failed to resolve nodes topology 
[cacheGrp=N/A, topVer=AffinityTopologyVersion [topVer=133, minorTopVer=0], 
history=[AffinityTopologyVersion [topVer=35, minorTopVer=0], 
AffinityTopologyVersion [topVer=36, minorTopVer=0], AffinityTopologyVersion 
[topVer=37, minorTopVer=0], AffinityTopologyVersion [topVer=38, minorTopVer=0], 
AffinityTopologyVersion [topVer=39, minorTopVer=0], AffinityTopologyVersion 
[topVer=40, minorTopVer=0], AffinityTopologyVersion [topVer=41, minorTopVer=0], 
AffinityTopologyVersion [topVer=42, minorTopVer=0], AffinityTopologyVersion 
[topVer=43, minorTopVer=0], AffinityTopologyVersion [topVer=44, minorTopVer=0], 
AffinityTopologyVersion [topVer=45, minorTopVer=0], AffinityTopologyVersion 
[topVer=46, minorTopVer=0], AffinityTopologyVersion [topVer=47, minorTopVer=0], 
AffinityTopologyVersion [topVer=48, minorTopVer=0], AffinityTopologyVersion 
[topVer=49, minorTopVer=0], AffinityTopologyVersion [topVer=50, minorTopVer=0], 
AffinityTopologyVersion [topVer=51, minorTopVer=0], AffinityTopologyVersion 
[topVer=52, minorTopVer=0], AffinityTopologyVersion [topVer=53, minorTopVer=0], 
AffinityTopologyVersion [topVer=54, minorTopVer=0], AffinityTopologyVersion 
[topVer=55, minorTopVer=0], AffinityTopologyVersion [topVer=56, minorTopVer=0], 
AffinityTopologyVersion [topVer=57, minorTopVer=0], AffinityTopologyVersion 
[topVer=58, minorTopVer=0], AffinityTopologyVersion [topVer=59, minorTopVer=0], 
AffinityTopologyVersion [topVer=60, minorTopVer=0], AffinityTopologyVersion 
[topVer=61, minorTopVer=0], AffinityTopologyVersion [topVer=62, minorTopVer=0], 
AffinityTopologyVersion [topVer=63, minorTopVer=0], AffinityTopologyVersion 
[topVer=64, minorTopVer=0], AffinityTopologyVersion [topVer=65, minorTopVer=0], 
AffinityTopologyVersion [topVer=66, minorTopVer=0], AffinityTopologyVersion 
[topVer=67, minorTopVer=0], AffinityTopologyVersion [topVer=68, minorTopVer=0], 
AffinityTopologyVersion [topVer=69, minorTopVer=0], AffinityTopologyVersion 
[topVer=70, minorTopVer=0], AffinityTopologyVersion [topVer=71, minorTopVer=0], 
AffinityTopologyVersion [topVer=72, minorTopVer=0], AffinityTopologyVersion 
[topVer=73, minorTopVer=0], AffinityTopologyVersion [topVer=74, minorTopVer=0], 
AffinityTopologyVersion [topVer=75, minorTopVer=0], AffinityTopologyVersion 
[topVer=76, minorTopVer=0], AffinityTopologyVersion [topVer=77, minorTopVer=0], 
AffinityTopologyVersion [topVer=78, minorTopVer=0], AffinityTopologyVersion 
[topVer=79, minorTopVer=0], AffinityTopologyVersion [topVer=80, minorTopVer=0], 
AffinityTopologyVersion [topVer=81, minorTopVer=0], AffinityTopologyVersion 
[topVer=82, minorTopVer=0], AffinityTopologyVersion [topVer=83, minorTopVer=0], 
AffinityTopologyVersion [topVer=84, minorTopVer=0], AffinityTopologyVersion 
[topVer=85, minorTopVer=0], AffinityTopologyVersion [topVer=86, minorTopVer=0], 
AffinityTopologyVersion [topVer=87, minorTopVer=0], AffinityTopologyVersion 
[topVer=88, minorTopVer=0], AffinityTopologyVersion [topVer=89, minorTopVer=0], 
AffinityTopologyVersion [topVer=90, minorTopVer=0], AffinityTopologyVersion 
[topVer=91, minorTopVer=0], AffinityTopologyVersion [topVer=92, minorTopVer=0], 
AffinityTopologyVersion [topVer=93, minorTopVer=0], AffinityTopologyVersion 
[topVer=94, minorTopVer=0], AffinityTopologyVersion [topVer=95, minorTopVer=0], 
AffinityTopologyVersion [topVer=96, minorTopVer=0], AffinityTopologyVersion 
[topVer=97, minorTopVer=0], AffinityTopologyVersion [topVer=98, minorTopVer=0], 
AffinityTopologyVersion [topVer=99, minorTopVer=0], AffinityTopologyVersion 
[topVer=100, minorTopVer=0], AffinityTopologyVersion [topVer=101, 
minorTopVer=0], AffinityTopologyVersion [topVer=102, minorTopVer=0], 
AffinityTopologyVersion [topVer=103, minorTopVer=0], AffinityTopologyVersion 
[topVer=104, minorTopVer=0], AffinityTopologyVersion [topVer=105, 
minorTopVer=0], AffinityTopologyVersion [topVer=106, minorTopVer=0], 
AffinityTopologyVersion [topVer=107, minorTopVer=0], AffinityTopologyVersion 
[topVer=108, minorTopVer=0], AffinityTopologyVersion [topVer=109, 
minorTopVer=0], AffinityTopologyVersion [topVer=110, minorTopVer=0], 
AffinityTopologyVersion [topVer=111, minorTopVer=0], AffinityTopologyVersion 
[topVer=112, minorTopVer=0], AffinityTopologyVersion [topVer=113, 
minorTopVer=0], AffinityTopologyVersion [topVer=114, minorTopVer=0], 
AffinityTopologyVersion [topVer=115, minorTopVer=0], AffinityTopologyVersion 
[topVer=116, minorTopVer=0], AffinityTopologyVersion [topVer=117, 
minorTopVer=0], AffinityTopologyVersion [topVer=118

Re: [ANNOUNCE] Welcome Pavel Kovalenko as a new committer

2018-11-21 Thread Ilya Lantukh
Congratulations!

On Wed, Nov 21, 2018 at 12:30 PM Dmitriy Pavlov  wrote:

> Congrats, Pavel. Well deserved achievement.
>
> ср, 21 нояб. 2018 г., 12:09 Nikita Amelchev :
>
> > Congratulations, Pavel!
> > ср, 21 нояб. 2018 г. в 11:47, Vyacheslav Daradur :
> > >
> > > Congrats Pavel, I'm looking forward to future PME optimizations!
> > > On Wed, Nov 21, 2018 at 11:44 AM Alexey Goncharuk <
> agoncha...@apache.org>
> > wrote:
> > > >
> > > > The Apache Ignite Project Management Committee (PMC) has invited
> Pavel
> > > > Kovalenko to become a new committer and are happy to announce that he
> > has
> > > > accepted.
> > > >
> > > > Pavel was actively investigating and improving the speed of PME
> during
> > node
> > > > join/leave scenarios and achieved great progress in these tasks.
> > > >
> > > > Please join me in welcoming Pavel and congratulating him on his new
> > role in
> > > > the Apache Ignite Community.
> > > >
> > > > Good luck, Pavel!
> > > >
> > > > --AG
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> >
> >
> >
> > --
> > Best wishes,
> > Amelchev Nikita
> >
>


-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-10186) Revise ability to merge client-only exchanges

2018-11-08 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10186:
-

 Summary: Revise ability to merge client-only exchanges
 Key: IGNITE-10186
 URL: https://issues.apache.org/jira/browse/IGNITE-10186
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


In IGNITE-9558 ability to merge client exchanges was disabled because it could 
create complex technical problems. 

Now we need to decide either to re-enable this functionality and fix all 
related issues or remove it completely, including test 
CacheExchangeMergeTest.testMergeServerAndClientJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10046) MVCC: IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback hangs sometimes.

2018-10-29 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10046:
-

 Summary: MVCC: 
IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback
 hangs sometimes.
 Key: IGNITE-10046
 URL: https://issues.apache.org/jira/browse/IGNITE-10046
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
 Fix For: 2.7
 Attachments: 
Ignite_Tests_2.4_Java_8_9_10_11_MVCC_Queries_1323.log(1).zip

The following exception can be found in log before the hangup:
{noformat}
[14:51:43]W: [org.apache.ignite:ignite-indexing] 
java.lang.NullPointerException
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.future(GridCacheMvccManager.java:754)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processPartitionCountersResponse(IgniteTxHandler.java:2204)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$1100(IgniteTxHandler.java:120)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:276)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:274)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:585)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:384)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:310)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
java.lang.Thread.run(Thread.java:748)
{noformat}
Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Applicability of term 'cache' to Apache Ignite

2018-10-18 Thread Ilya Lantukh
I thought that current "caches" and "tables" have 1-to-N relation. If
that's not a problem, than I also think that "table" is the best term.

On Thu, Oct 18, 2018 at 9:29 AM Vladimir Ozerov 
wrote:

> Well, I never thought about term “table” as a replacement for “cache”, but
> it appears to be good candidate.
>
> This is used by many some major vendors whose underlying storage is indeed
> a kind of key-value data structure. Most well-known example is MySQL with
> its MyISAM engine. Table can be used for both fixed and flexible (e.g.
> JSON) schemas, as well as key-value access (hash map -> hash table, both
> are good).
>
> Another important thing - we already use term “table”, and it is always
> hard to explain our users how it relates to “cache”. If “cache” is dropped,
> then a single term “table” will be used everywhere.
>
> Last, but not least - “table” works well for both in-memory and persistent
> modes.
>
> So if we are really aim to rename “cache”, then “table” is the best
> candidate I’ve heard so far.
>
> чт, 18 окт. 2018 г. в 8:40, Alexey Zinoviev :
>
> > Or we could extend our SQL commands by "GET BY KEY = X" and "PUT (x1, x2,
> > x3) BY KEY = X" and the IgniteTable could be correct.
> > Agree with Denis that each table in the 3rd normal form is like key-value
> > store. Key-value operations are only subset of rich SQL commands.
> >
> > The problem with IgniteData that it's too common. Also, it's difficult to
> > understand is it a plural or single object? For instance, the bunch of
> > IgniteTables could be IgniteData. But the set of IgniteData? IgniteDatum?
> >
> >
> >
> > чт, 18 окт. 2018 г. в 4:18, Denis Magda :
> >
> > > Key-value calls are just primary key based calls. From a user
> > perspective,
> > > it's the same as "SELECT * FROM table WHERE primary_idx = X", just
> > > different API.
> > >
> > > --
> > > Denis
> > >
> > > On Wed, Oct 17, 2018 at 5:04 PM Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > > wrote:
> > >
> > > > On Wed, Oct 17, 2018 at 4:58 PM Denis Magda 
> wrote:
> > > >
> > > > > I've been calling everything "tables" instead of "caches" for a
> > while.
> > > > The
> > > > > main reason is the maturity of our SQL engine - seeing more SQL
> users
> > > and
> > > > > deployments which talk "tables" language.
> > > > >
> > > > >
> > > > I think "IgniteTable" only implies SQL, not key-value. We need both.
> > > >
> > >
> >
>


-- 
Best regards,
Ilya


Re: Applicability of term 'cache' to Apache Ignite

2018-10-17 Thread Ilya Lantukh
As I see, many people agree that the term *"cache"* is outdated, but
consider these changes too disruptive.

For me, keeping terminology up-to-date is important part of project
development. If we change some of our core terms with more relevant ones,
it indeed might cause confusion for current users, but in long term it will
help new users to understand what Ignite is and what it isn't. And most
short-term problems can easily be avoided by keeping @Deprecated
IgniteCache.

On Wed, Oct 17, 2018 at 2:59 PM Ilya Lantukh  wrote:

> Unfortunately, we already use the word *"store"* for many other concepts,
> like CacheStore and PageStore. I'd prefer to avoid giving it one more
> meaning.
>
> As already mentioned, *"dataset"* has special meaning for ML folks.
>
> *"Bucket" *might give wrong association with bucket in a hash table.
>
> On Wed, Oct 17, 2018 at 1:49 PM Igor Sapego  wrote:
>
>> Well, the obvious term for me is a "Store" or "MemoryStore", as we already
>> have persistence store.
>>
>> Best Regards,
>> Igor
>>
>>
>> On Wed, Oct 17, 2018 at 1:19 PM Andrey Kuznetsov 
>> wrote:
>>
>> > I'm not an ML expert, so 'dataset' term just reminds me of various
>> client
>> > drivers to access tables from RDBM servers. For me, the only common
>> trait
>> > of all kinds of Ignite caches is their asociativity. So if we rename
>> them
>> > I'd suggest something like KVStore.
>> >
>> > ср, 17 окт. 2018 г. в 12:56, Alexey Zinoviev :
>> >
>> > > From my perspective, the main goal is to make easy the explanation
>> what
>> > is
>> > > Ignite on conferences, marketing deals, in papers, in documentation.
>> And
>> > > the
>> > > /cache/ term really reduces the area of Ignite usage in users minds.
>> > >
>> > > I don't support the critical changes in code base, but I support all
>> > > changes
>> > > that helps the goal described above in this letter.
>> > >
>> > >
>> > >
>> > > --
>> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> > >
>> >
>> >
>> > --
>> > Best regards,
>> >   Andrey Kuznetsov.
>> >
>>
>
>
> --
> Best regards,
> Ilya
>


-- 
Best regards,
Ilya


Re: Applicability of term 'cache' to Apache Ignite

2018-10-17 Thread Ilya Lantukh
Unfortunately, we already use the word *"store"* for many other concepts,
like CacheStore and PageStore. I'd prefer to avoid giving it one more
meaning.

As already mentioned, *"dataset"* has special meaning for ML folks.

*"Bucket" *might give wrong association with bucket in a hash table.

On Wed, Oct 17, 2018 at 1:49 PM Igor Sapego  wrote:

> Well, the obvious term for me is a "Store" or "MemoryStore", as we already
> have persistence store.
>
> Best Regards,
> Igor
>
>
> On Wed, Oct 17, 2018 at 1:19 PM Andrey Kuznetsov 
> wrote:
>
> > I'm not an ML expert, so 'dataset' term just reminds me of various client
> > drivers to access tables from RDBM servers. For me, the only common trait
> > of all kinds of Ignite caches is their asociativity. So if we rename them
> > I'd suggest something like KVStore.
> >
> > ср, 17 окт. 2018 г. в 12:56, Alexey Zinoviev :
> >
> > > From my perspective, the main goal is to make easy the explanation what
> > is
> > > Ignite on conferences, marketing deals, in papers, in documentation.
> And
> > > the
> > > /cache/ term really reduces the area of Ignite usage in users minds.
> > >
> > > I don't support the critical changes in code base, but I support all
> > > changes
> > > that helps the goal described above in this letter.
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
>


-- 
Best regards,
Ilya


Re: Abbreviation code-style requirement.

2018-10-17 Thread Ilya Lantukh
+ 1 from me to make abbreviations optional.

On Wed, Oct 17, 2018 at 1:00 PM Sergey Antonov 
wrote:

> + 1
>
> But, I think that we must leave abbreviations for common words with single
> meaning. For example, group -> grp, transaction -> tx, context -> ctx.
>
> ср, 17 окт. 2018 г. в 12:46, Alexey Zinoviev :
>
> > + 1
> > I dislike the current list of abbreviations. It gives me a pain to
> support
> > code with unclear variables naming, also I agree that we should avoid
> crazy
> > Java camel long naming like
> > FactoryBuildingCrazyAffinityCallerForComibingInSpace but instead that we
> > make shorter clear concepts like /counter/, /vertex/, /collection/ and
> etc
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>
>
> --
> BR, Sergey Antonov
>


-- 
Best regards,
Ilya


Re: Applicability of term 'cache' to Apache Ignite

2018-10-16 Thread Ilya Lantukh
To me it seems that usage of term *"cache" *restricts adoption of Apache
Ignite as a primary data storage. If I didn't know anything about internal
implementation, storing critical data in IgniteCache would make me feel
that I'm doing something wrong. Of course it's just my point of view, and
things might look different for other Ignite users - so I'd like to ask
community members to share their opinion.


On Tue, Oct 16, 2018 at 3:54 PM Vladimir Ozerov 
wrote:

> What is the ultimate goal of all these changes? While I agree that term
> "cache" might be a bit outdated at the moment, there is nothing
> fundamentally wrong with - data is still being cached in memory with an
> option to persist it on disk. We should remember, that legacy and previous
> user experience is of great importance for users. And disruptive changes
> such as rename of a basic concept may make adoption of a new versions
> harder for users, with very questionable benefits on the other side.
>
> As far as wrappers, personally I do not support this idea. Both "cache" and
> "sql" are access methods to some information ("space"), rather than
> wrappers around it. Moreover, it is hard to say whether we will have SQL
> API at all, because this is big effort with not very clear value, provided
> that there are industrial interfaces (JDBC, ODBC).
>
> On Tue, Oct 16, 2018 at 3:23 PM Stanislav Lukyanov  >
> wrote:
>
> > How about separating our JCache implementation from the core of the
> > probuct.
> >
> > Currently IgniteCache is the heart of Ignite. It is the basic storage
> unit.
> > At the same time, it is the direct implementation of the JCache API,
> > and some of the JCache features align somewhat awkwardly with Ignite
> > concepts.
> >
> > Would be nice to have something like IgniteSpace as our core component,
> > and have other components built on top of it as wrappers providing
> various
> > APIs.
> > For example
> > - IgniteSpace itself is a distributed storage unit, that is partitioned,
> > that has affinity, etc;
> > note that it doesn’t have to have ANY particular API to add data, even
> > key-value
> > - IgniteCache is a wrapper around IgniteSpace that allows to store
> > key-value pairs and implements JCache API
> > - IgniteSql (we’re doing it eventually, right?) is a wrapper around
> > IgniteSpace that allows to store SQL tables and implements ANSI SQL
> > - IgniteQueue is a wrapper that implements Queue
> > and so on.
> >
> > WDYT?
> >
> > Stan
> >
> > From: Ilya Lantukh
> > Sent: 15 октября 2018 г. 14:49
> > To: dev@ignite.apache.org
> > Subject: Applicability of term 'cache' to Apache Ignite
> >
> > Hi Igniters,
> >
> > I would like to rise a question how we use the term *'cache'* in Ignite
> and
> > how it corresponds to terminology in IT industry in general.
> >
> > From wikipedia:
> > In computing <https://en.wikipedia.org/wiki/Computing>, a *cache* /kæʃ/
> > <https://en.wikipedia.org/wiki/Help:IPA/English> *kash*
> > <https://en.wikipedia.org/wiki/Help:Pronunciation_respelling_key>, is a
> > hardware or software component that stores data so that future requests
> for
> > that data can be served faster; the data stored in a cache might be the
> > result of an earlier computation or a copy of data stored elsewhere. [1]
> >
> > When the first version of Ignite was released, this term was correct. We
> > positioned Ignite mostly as an intermediate storage layer between
> > application and a database, designed to make data access faster.
> >
> > However, since addition of native persistence we started to call Ignite a
> > "memory-centric database", and as far as I know, some organizations now
> use
> > it as a primary data storage, without underlying database. In this case,
> > calling our storage unit a *'cache'* causes unnecessary confusion.
> >
> > Thus, I suggest to rename IgniteCache in Ignite 3.0 to something that
> would
> > fit both use-cases.
> > Personally I like the term IgniteSpace.
> >
> > [1] https://en.wikipedia.org/wiki/Cache_(computing)
> > --
> > Best regards,
> > Ilya
> >
> >
>


-- 
Best regards,
Ilya


Applicability of term 'cache' to Apache Ignite

2018-10-15 Thread Ilya Lantukh
Hi Igniters,

I would like to rise a question how we use the term *'cache'* in Ignite and
how it corresponds to terminology in IT industry in general.

>From wikipedia:
In computing , a *cache* /kæʃ/
 *kash*
, is a
hardware or software component that stores data so that future requests for
that data can be served faster; the data stored in a cache might be the
result of an earlier computation or a copy of data stored elsewhere. [1]

When the first version of Ignite was released, this term was correct. We
positioned Ignite mostly as an intermediate storage layer between
application and a database, designed to make data access faster.

However, since addition of native persistence we started to call Ignite a
"memory-centric database", and as far as I know, some organizations now use
it as a primary data storage, without underlying database. In this case,
calling our storage unit a *'cache'* causes unnecessary confusion.

Thus, I suggest to rename IgniteCache in Ignite 3.0 to something that would
fit both use-cases.
Personally I like the term IgniteSpace.

[1] https://en.wikipedia.org/wiki/Cache_(computing)
-- 
Best regards,
Ilya


Re: The future of Affinity / Topology concepts and possible PME optimizations.

2018-09-18 Thread Ilya Lantukh
Thanks for the feedback!

I agree that we should start with the simplest optimizations, but it seems
that decoupling affinity/topology versions is necessary before we can make
any progress here, and this is a rather complex change all over the code.

If anyone wants to help, please contact me privately and we will discuss
how this work can be split.

Denis Magda, do you think we should create IEP for these optimizations?

On Tue, Sep 18, 2018 at 5:59 PM, Maxim Muzafarov  wrote:

> Ilya,
>
>
> > 3. Start node in baseline: both affinity and topology versions should be
> incremented, but it might be possible to optimize PME for such case and
> avoid cluster-wide freeze. Partition assignments for such node are already
> calculated, so we can simply put them all into MOVING state. However, it
> might take significant effort to avoid race conditions and redesign our
> architecture.
>
> As you mentioned all assignments are already calculated. So as another
> suggestion,
> we can introduce a new `intermediate` state of such joined nodes. Being in
> this state
> node recovers all data from their local storage, preloads whole missed
> partition
> data from the cluster (probably on some point in time), creates and
> preloads missed
> in-memory and persistent caches. And only after these recovery such node
> will fire
> discovery join event and affinity\topology version may be incremented. I
> think this
> approach can help to reduce the further rebalance time.
> WDYT?
>
>
>
> On Tue, 18 Sep 2018 at 16:31 Alexey Goncharuk 
> wrote:
>
> > Ilya,
> >
> > This is a great idea, but before we can ultimately decouple the affinity
> > version from the topology version, we need to fix a few things with
> > baseline topology first. Currently the in-memory caches are not using the
> > baseline topology. We are going to fix this as a part of IEP-4 Phase II
> > (baseline auto-adjust). Once fixed, we can safely assume that
> > out-of-baseline node does not affect affinity distribution.
> >
> > Agree with Dmitriy that we should start with simpler optimizations first.
> >
> > чт, 13 сент. 2018 г. в 15:58, Ilya Lantukh :
> >
> > > Igniters,
> > >
> > > As most of you know, Ignite has a concept of AffinityTopologyVersion,
> > which
> > > is associated with nodes that are currently present in topology and a
> > > global cluster state (active/inactive, baseline topology, started
> > caches).
> > > Modification of either of them involves process called Partition Map
> > > Exchange (PME) and results in new AffinityTopologyVersion. At that
> moment
> > > all new cache and compute grid operations are globally "frozen". This
> > might
> > > lead to indeterminate cache downtimes.
> > >
> > > However, our recent changes (esp. introduction of Baseline Topology)
> > caused
> > > me to re-think those concept. Currently there are many cases when we
> > > trigger PME, but it isn't necessary. For example, adding/removing
> client
> > > node or server node not in BLT should never cause partition map
> > > modifications. Those events modify the *topology*, but *affinity* in
> > > unaffected. On the other hand, there are events that affect only
> > *affinity*
> > > - most straightforward example is CacheAffinityChange event, which is
> > > triggered after rebalance is finished to assign new primary/backup
> nodes.
> > > So the term *AffinityTopologyVersion* now looks weird - it tries to
> > "merge"
> > > two entities that aren't always related. To me it makes sense to
> > introduce
> > > separate *AffinityVersion *and *TopologyVersion*, review all events
> that
> > > currently modify AffinityTopologyVersion and split them into 3
> > categories:
> > > those that modify only AffinityVersion, only TopologyVersion and both.
> It
> > > will allow us to process such events using different mechanics and
> avoid
> > > redundant steps, and also reconsider mapping of operations - some will
> be
> > > mapped to topology, others - to affinity.
> > >
> > > Here is my view about how different event types theoretically can be
> > > optimized:
> > > 1. Client node start / stop: as stated above, no PME is needed, ticket
> > > https://issues.apache.org/jira/browse/IGNITE-9558 is already in
> > progress.
> > > 2. Server node start / stop not from baseline: should be similar to the
> > > previous case, since nodes outside of baseline cannot be partition
> > owners.
> > > 3. Start node in baseline: both a

The future of Affinity / Topology concepts and possible PME optimizations.

2018-09-13 Thread Ilya Lantukh
Igniters,

As most of you know, Ignite has a concept of AffinityTopologyVersion, which
is associated with nodes that are currently present in topology and a
global cluster state (active/inactive, baseline topology, started caches).
Modification of either of them involves process called Partition Map
Exchange (PME) and results in new AffinityTopologyVersion. At that moment
all new cache and compute grid operations are globally "frozen". This might
lead to indeterminate cache downtimes.

However, our recent changes (esp. introduction of Baseline Topology) caused
me to re-think those concept. Currently there are many cases when we
trigger PME, but it isn't necessary. For example, adding/removing client
node or server node not in BLT should never cause partition map
modifications. Those events modify the *topology*, but *affinity* in
unaffected. On the other hand, there are events that affect only *affinity*
- most straightforward example is CacheAffinityChange event, which is
triggered after rebalance is finished to assign new primary/backup nodes.
So the term *AffinityTopologyVersion* now looks weird - it tries to "merge"
two entities that aren't always related. To me it makes sense to introduce
separate *AffinityVersion *and *TopologyVersion*, review all events that
currently modify AffinityTopologyVersion and split them into 3 categories:
those that modify only AffinityVersion, only TopologyVersion and both. It
will allow us to process such events using different mechanics and avoid
redundant steps, and also reconsider mapping of operations - some will be
mapped to topology, others - to affinity.

Here is my view about how different event types theoretically can be
optimized:
1. Client node start / stop: as stated above, no PME is needed, ticket
https://issues.apache.org/jira/browse/IGNITE-9558 is already in progress.
2. Server node start / stop not from baseline: should be similar to the
previous case, since nodes outside of baseline cannot be partition owners.
3. Start node in baseline: both affinity and topology versions should be
incremented, but it might be possible to optimize PME for such case and
avoid cluster-wide freeze. Partition assignments for such node are already
calculated, so we can simply put them all into MOVING state. However, it
might take significant effort to avoid race conditions and redesign our
architecture.
4. Cache start / stop: starting or stopping one cache doesn't modify
partition maps for other caches. It should be possible to change this
procedure to skip PME and perform all necessary actions (compute affinity,
start/stop cache contexts on each node) in background, but it looks like a
very complex modification too.
5. Rebalance finish: it seems possible to design a "lightweight" PME for
this case as well. If there were no node failures (and if there were, PME
should be triggered and rebalance should be cancelled anyways) all
partition states are already known by coordinator. Furthermore, no new
MOVING or OWNING node for any partition is introduced, so all previous
mappings should still be valid.

For the latter complex cases in might be necessary to introduce "is
compatible" relationship between affinity versions. Operation needs to be
remapped only if new version isn't compatible with the previous one.

Please share your thoughts.

-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-9400) TC bot: add progress bar to history page

2018-08-28 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9400:


 Summary: TC bot: add progress bar to history page
 Key: IGNITE-9400
 URL: https://issues.apache.org/jira/browse/IGNITE-9400
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


History page (like */all.html?branch=master*) takes significant amount of time 
to load, and it would be helpful to replace spinning wheel with progress bar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-08-27 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9392:


 Summary: CacheAsyncOperationsFailoverTxTest hangs on TC
 Key: IGNITE-9392
 URL: https://issues.apache.org/jira/browse/IGNITE-9392
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Pessimistic Mode and Transactions and 2PC

2018-08-10 Thread Ilya Lantukh
Hi John,

1. There is no "begin prepare" record, nodes acquire locks on TX keys and
then transfer local TX to PREPARED state.
2. TxRecord is logged when local TX state changes to PREPARED, COMMITED or
ROLLED_BACK before sending a response to what you call "coordinator" (in
Ignite we use the term "near node", because the term "coordinator" is
already used in another case).
3. Yes.

Hope this helps.

On Wed, Aug 8, 2018 at 6:29 PM, John Wilson  wrote:

> No they are not. I just want to understand.
>
> On Wednesday, August 8, 2018, Dmitriy Pavlov 
> wrote:
>
> > Hi John,
> >
> > Are these questions related to some contribution?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 8 авг. 2018 г. в 3:18, John Wilson :
> >
> > > Hi,
> > >
> > > Assume the following:
> > >
> > >
> > >- I have a transaction coordinator and two primary nodes with 0
> backup
> > >nodes.
> > >- Persistence store is enabled.
> > >- I'm running a transaction in pessimistic mode with serializable
> > >isolation.
> > >
> > > I have these questions:
> > >
> > >1. What exactly happens during the prepare phase? Only acquiring
> locks
> > >on the two primary nodes? Or do the primary nodes themselves, in
> > > addition
> > >to acquiring locks, write to their respective WAL a TxRecord with a
> > > "begin
> > >prepare" info?
> > >2. Assume locks have been acquired successfully, would the nodes
> then
> > >write a "prepared" TxRecord to WAL before returning a "Yes" vote to
> > >coordinator?
> > >3. When the coordinator sends a commit message, would each node
> write
> > >the key-values to the DataRecord and a commit to the TxRecord before
> > >returning to coordinator?
> > >
> > >
> > > Overall, I'm trying to understand what happens exactly during prepare
> and
> > > commit phases and when the key-values involved in the transaction are
> > > actually written; as well as the exact updates that are written to the
> > WAL
> > > files in each phase.
> > >
> > > appreciate your response.
> > >
> > > Thanks,
> > >
> >
>



-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-08-10 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9249:


 Summary: Tests hang when different threads try to start and stop 
nodes at the same time.
 Key: IGNITE-9249
 URL: https://issues.apache.org/jira/browse/IGNITE-9249
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


An example of such test is 
GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().

Hanged threads:
{code}
"restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
  java.lang.Thread.State: WAITING
  at java.lang.Object.wait(Object.java:-1)
  at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
  at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
  at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
  at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
  at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
  at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
  at 
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
  - locked <0xfc36> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
  at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
  at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
  at java.lang.Thread.run(Thread.java:748)

"restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
  at 
org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
  at 
org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
  at 
org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
  at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:661)
  at java.lang.Thread.run(Thread.java:748)
{code}

Full thread dump:
{code}
"test-runner-#26488%dht.GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest%@63124"
 prio=5 tid=0x7e6a nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.L

[jira] [Created] (IGNITE-9236) Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)

2018-08-08 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9236:


 Summary: Handshake timeout never completes in some tests 
(GridCacheReplicatedFailoverSelfTest in particular)
 Key: IGNITE-9236
 URL: https://issues.apache.org/jira/browse/IGNITE-9236
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP 
connection and hangs on handshake forever, holding lock on RebalanceFuture:
{code}
[11:51:55] : [Step 3/4] Locked synchronizers:
[11:51:55] : [Step 3/4] 
java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883
[11:51:55] : [Step 3/4] Thread 
[name="sys-#68921%new-node-topology-change-thread-1%", id=77410, 
state=RUNNABLE, blockCnt=3, waitCnt=0]
[11:51:55] : [Step 3/4] at 
sun.nio.ch.FileDispatcherImpl.read0(Native Method)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
[11:51:55] : [Step 3/4] at sun.nio.ch.IOUtil.read(IOUtil.java:197)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
[11:51:55] : [Step 3/4] - locked java.lang.Object@23aaa756
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
[11:51:55] : [Step 3/4] at 
o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643)
[11:51:55] : [Step 3/4] at 
o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041)
[11:51:55] : [Step 3/4] - locked 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown
 Source)
[11:51:55] : [Step 3/4] at 
o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
[11:51:55] : [Step 3/4] at 
o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[11:51:55] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[11:51:55] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[11:51:55] : [Step 3/4] at java.lang.Thread.run(Thread.java:748)
{code}

Because of that, exchange worker hangs forever while trying to acquire that 
lock:
{code}
[11:51:55] : [Step 3/4] Thread 
[name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, 
state=BLOCKED, blockCnt=11, waitCnt=7]
[11:51:55] : [Step 3/4] Lock 
[object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150,
 ownerName=sys-#68921%new-node-topology-change-thread-1%, ownerId=77410]
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1033)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.addAssignments(GridDhtPartitionDemander.java:302)
[11:51

[jira] [Created] (IGNITE-9213) CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading to TC timeout

2018-08-07 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9213:


 Summary: CacheLockReleaseNodeLeaveTest.testLockTopologyChange 
hangs sometimes, leading to TC timeout
 Key: IGNITE-9213
 URL: https://issues.apache.org/jira/browse/IGNITE-9213
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


Probability is quite low, < 5%.

One thread gets stuck in GridCacheAdapter.lockAll(...), holding gw readlock and 
waiting for future that never completes. Another one cannot acquire gw 
writelock.

{code}
"test-runner-#123405%distributed.CacheLockReleaseNodeLeaveTest%" #136172 prio=5 
os_prio=0 tid=0x7f20cd3d7000 nid=0x356f sleeping[0x7f1eae48b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7678)
at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:318)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.blockGateways(GridCacheProcessor.java:970)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2195)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
- locked <0xc2e69580> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
at org.apache.ignite.Ignition.stop(Ignition.java:229)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1153)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1196)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1174)
at 
org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest.testLockTopologyChange(CacheLockReleaseNodeLeaveTest.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2156)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:143)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2071)
at java.lang.Thread.run(Thread.java:745)

"test-lock-thread-4" #136488 prio=5 os_prio=0 tid=0x7f208802a000 nid=0x36a5 
waiting on condition [0x7f1ea81c3000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3405)
at 
org.apache.ignite.internal.processors.cache.CacheLockImpl.lock(CacheLockImpl.java:74)
at 
org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest$3.run(CacheLockReleaseNodeLeaveTest.java:154)
at 
org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9155) Exception during cluster state change terminates ExchangeWorker

2018-08-01 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9155:


 Summary: Exception during cluster state change terminates 
ExchangeWorker
 Key: IGNITE-9155
 URL: https://issues.apache.org/jira/browse/IGNITE-9155
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh


After IGNITE-8311 we throw an exception in ExchangeFuture instead swallowing it.

ClusterStateChangeProcessor has it's own exception handling mechanism, which 
doesn't require ExchangeWorker termination (and leaving node in broken state).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Quick question on data and index pages

2018-07-26 Thread Ilya Lantukh
Hi,

1. As the name implies, indirectCount is the count of indirect items, which
are references to direct items. According to our DataPage format, we keep
all items in the beginning of the page. Take a look at this diagram:
https://cwiki-test.apache.org/confluence/download/attachments/73632614/Part%206.%205.%20Page%20structure%20%281%29.png?version=1=1525443891000=v2.
If we remove It2, we will have to move It3 onto it's place. But we already
have external references to It3 by it's index (3). So, to keep those
external references correct, we have to mark item on index 3 as "indirect"
and make it point on index 2. In this case, such page will have directCount
== 2 and indirectCount == 1.

2. No, only index pages are organized in a B+ tree. Data pages are
organized in another data structure called FreeList - it stores how many
free space is available on each data page and provides fast access to pages
that have space >= specified.

3. Yes. The most significant difference is that internal nodes need to
store links to nodes on the next level. Check classes BPlusInnerIO and
BPlusLeafIO (and their subclasses) if you are interested in more details.

On Thu, Jul 26, 2018 at 6:22 AM, John Wilson 
wrote:

> Hi,
>
> 1. What are direct and indirect count in data page header used for? What is
> the difference?
>
> [
> https://cwiki-test.apache.org/confluence/display/IGNITE/
> Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-
> underthehood-Freelists
> ]
>
> 2. Are data pages organized in a B+ tree structure or index pages only?
>
> 3. Is there any difference between internal and leaf nodes in the B+ tree
> structure?
>
>
> Thanks,
>



-- 
Best regards,
Ilya


Re: ConcurrentLinkedHashMap works incorrectly after clear()

2018-07-24 Thread Ilya Lantukh
Thanks for revealing this issue!

I don't understand why should we disallow calling clear().

One way how it can be re-implemented is:
1. acquire write locks on all segments;
2. clear them;
3. reset size to 0;
4. release locks.

Another approach is to calculate inside
ConcurrentLinkedHashMap.Segment.clear() how many entries you actually
deleted and then call size.addAndGet(...).

In both cases you'll have to replace LongAdder with AtomicLong.

On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev 
wrote:

> Hello igniters!
>
> So I was working on a fix for
> https://issues.apache.org/jira/browse/IGNITE-9056
> The reason for test flakiness turned out our ConcurrentLinkedHashMap (and
> its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :(
>
> When you do clear(). its size counter is not updated. So sizex() will
> return the old size after clear, and if there's maxCnt set, after several
> clear()s it will immediately evict entries after they are inserted,
> maintaining map size at 0.
>
> This is scary since indexing internals make intense use of
> ConcurrentLinkedHashMaps.
>
> My suggestion for this fix is to avoid ever calling clear(), making it
> throw UnsupportedOperationException and recreating/replacing map instead of
> clear()ing it. Unless somebody is going to stand up and fix
> ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of
> touching this code in any non-trivial way.
>
> --
> Ilya Kasnacheev
>



-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-8877) PartitionsExchangeOnDiscoveryHistoryOverflowTest.testDynamicCacheCreation leads to OutOfMemoryError

2018-06-26 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8877:


 Summary:  
PartitionsExchangeOnDiscoveryHistoryOverflowTest.testDynamicCacheCreation leads 
to OutOfMemoryError
 Key: IGNITE-8877
 URL: https://issues.apache.org/jira/browse/IGNITE-8877
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Enabling Persistent Memory for Ignite

2018-06-21 Thread Ilya Lantukh
Hi,

Thank you for your willingness to contribute.

Please assign the Jira ticket to yourself and set it's status to "In
Progress", so Apache community can see that this feature is currently being
worked on.
Also, please open a Github pull request so we can review and give you
feedback in a more convenient way.

Thanks again.


On Wed, Jun 20, 2018 at 7:46 PM, Mammo, Mulugeta 
wrote:

> Hi,
>
> We have shared an experimental patch that optionally enables a 3DXPoint
> (aka Intel Optane DC Persistent Memory) persistent store for Ignite. The
> work is still in progress we welcome your code review and feedback.
>
> The patch is based on a Low Level Persistent Library (LLPL) we released
> earlier.
>
> Jira: https://issues.apache.org/jira/browse/IGNITE-6854
> LLPL: https://github.com/pmem/pcj/tree/master/LLPL.
>
> Thanks,
> Mulugeta
>
>


-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-8839) Throw meaningful and user-friendly exception when Ignite Persistence files are unaccessible

2018-06-20 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8839:


 Summary: Throw meaningful and user-friendly exception when Ignite 
Persistence files are unaccessible
 Key: IGNITE-8839
 URL: https://issues.apache.org/jira/browse/IGNITE-8839
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


Currently not having read+write permissions for persistence directory can lead 
to exception like this:
{noformat}
2018-06-19 16:27:51 ERROR GridDhtPartitionsExchangeFuture:495 - Failed to
activate node components [nodeId=7ff3a67c-ed86-452a-9097-2626a0252a6e,
client=false, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
class org.apache.ignite.IgniteCheckedException: Error while creating file
page store
[file=/home/apache-ignite-fabric-2.4.0-bin/work/db/node02-119b5ab9-11ac-456b-8488-6ba8f6d51d69/cache-c091e548-b45a-49b4-b8ec-2cb5e27c7af6_ConversationMessages/part-95.bin]:
at
org.apache.ignite.internal.processors.cache.persistence.file.FileVersionCheckingFactory.createPageStore(FileVersionCheckingFactory.java:98)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.initDir(FilePageStoreManager.java:450)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.initForCache(FilePageStoreManager.java:411)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.initializeForCache(FilePageStoreManager.java:217)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:714)
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:844)
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:596)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2337)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.AccessDeniedException:
/home/bizruntime/ubiquitos/apache-ignite-fabric-2.4.0-bin/work/db/node02-119b5ab9-11ac-456b-8488-6ba8f6d51d69/cache-c091e548-b45a-49b4-b8ec-2cb5e27c7af6_ConversationMessages/part-95.bin
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at
org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.(RandomAccessFileIO.java:43)
at
org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory.create(RandomAccessFileIOFactory.java:42)
at
org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory.create(RandomAccessFileIOFactory.java:37)
at
org.apache.ignite.internal.processors.cache.persistence.file.FileVersionCheckingFactory.createPageStore(FileVersionCheckingFactory.java:78)
... 9 more
{noformat}
Or like this:
{noformat}
IgniteCheckedException: Error while creating file page store
[file=/home/apache-ignite-fabric-2.4.0-bin/work/db/node02-119b5ab9-11ac-456b-8488-6ba8f6d51d69/cache-c091e548-b45a-49b4-b8ec-2cb5e27c7af6_ConversationMessages/part-95.bin]:,
done=true, state=SRV, evtLatch=0,
remaining=[d62b8f0a-6ddc-4314-90f3-93ca24d44200], super=GridFutureAdapter
[ignoreInterrupts=false, state=DONE, res=java.lang.NullPointerException,
hash=812167532]]
class org.apache.ignite.IgniteCheckedException: null
at
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7244)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:259)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:207)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2347)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate

[jira] [Created] (IGNITE-8725) Make IGNITE_DISABLE_WAL_DURING_REBALANCING enabled by default

2018-06-06 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8725:


 Summary: Make IGNITE_DISABLE_WAL_DURING_REBALANCING enabled by 
default
 Key: IGNITE-8725
 URL: https://issues.apache.org/jira/browse/IGNITE-8725
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8693) SQL JOIN between PARTITIONED and REPLICATED cache fails

2018-06-04 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8693:


 Summary: SQL JOIN between PARTITIONED and REPLICATED cache fails
 Key: IGNITE-8693
 URL: https://issues.apache.org/jira/browse/IGNITE-8693
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: I am sorry for causing confusion. IGNITE-6531

2018-05-28 Thread Ilya Lantukh
Welcome to Apache Ignite community!

The problem with your ticket is that it's not clear what value it adds to
the project and how it will improve user experience in general. The problem
you described is rather narrow and I am sure it's possible to find a
workaround without modifying Ignite code.
If you think that such option can be helpful in more general cases, please
explain why and how.

Thanks!

On Mon, May 28, 2018 at 4:27 PM, JD Nam  wrote:

> Hello Igniter!
>
> First of all, I'm sorry. I was not aware of the ignite community process
> well, so I did not follow the guideline and caused the confusion to the
> community.
> My name is joungdal.nam and I made the issue as following link.
> https://issues.apache.org/jira/browse/IGNITE-6531
>
> Pleases allow me to introduce myself.
> I am a java developer working in e-Commerce in a small country in Asia.
> Currently, I am developing a Spring based framework (for our company domain
> only).
>
> We had a little problems in developing our program, especially Batch.
> Most java developers in the company for which I work are domain experts,
> they are not getting away from jar-hell
> There are a lot of problems associated with CPU and memory, etc., because
> many java batches that are starting and shutdown by the scheduler work like
> many beans of spring.
> It is not possible to separate a module (lib) because it takes too much
> time and resources.
>
> When I saw ignite I shouted Eureka.
> By applying Spring-boot & Ignite's ComputeTask, I thought I'd have to a
> Distribute Batch Platform with Daemon.
> The main concept of this platform is to assign a job to a node with the
> lowest load through the matrix information of the ignite cluster.
> Since all CRUD work is handled by Oracle, ignite's distributed computing
> seemed to be of great help (source can not be hdfs.)
> I do not want to shutdown manually, so, the client node is managed by
> initiating shutdown if the count of running nodes reaches the threshold
> value by using userAttribute. The version is named to jar file name and
> build time as postfix.
>
> Sorry. I'll get to the point. The reason why I created IGNITE-6531 is that
> the batch of my company has too many beans.
> Unfortunately, I am not able to tell which lib contains a particular bean.
> Moreover, I am not able to remove the lib even if I find the lib (the
> product may not be delivered to the customer)
> Certain beans only work in stg, prod in dev, qa, stg, prod (external
> communication issue or firewall problem)
> Due to the nature of our Traditional Business, instances of beans are
> created only under certain circumstances and cannot be injected in other
> environments.
>
> After all, the best way to do this is to treat it the same as the method of
> the spring framework.
> When injecting another spring bean in ComputeTask,  we could write it just
> like @AutoWired (required = false).
> I did not want to see the non-operation of the daemon because of a runtime
> exception due to a firewall problem.
> So, I thought “required = false” is required @SpringResource.
> Hope to hear your opinion on this.
> Once, again, I would like to apologize that I did not aware of the ignite
> process.
>
>
>
> I also have another question. I heard that there is an issue with
> configuration on the mailing list.
> How deploy node with different env, do you use it as below?
>
> In the java code,
> try (Ignite ignite = Ignition.start("/config/ignite.xml")) { // deault
> zone
> try (Ignite ignite = Ignition.start("/config/ignite-dev.xml")) { // dev?
> try (Ignite ignite = Ignition.start("/config/ignite-qa.xml")) { // qa?
> try (Ignite ignite = Ignition.start("/config/ignite-stg.xml")) { // stg??
>
> It will be a more user friendly settling by using yaml. Our project has
> over 300 properties
> If yaml is supported by ignite, it would be much more convenience. (If you
> do not have a plan to support spring-boot)
> I used ignite-default.yml, ignite-dev.yml, etc in my project. By
> distinguishing java -D options, codes are not revised and I only read
> configuration.
>
>
>
> I think there's a mistype in the document of the link below.
> It is the example of SqlFieldsQuery.
> https://apacheignite-sql.readme.io/docs/java-sql-api
> try (QueryCursor > cursor = cache.query (sql) {
>
> I think there should be one more ')' as below.
> = cache.query (sql))  {
>
> I'll try to read the contribute documentation to fix this issue.
> It's better if you fix it.
> Ignite is really cool, you guys are really great.
>



-- 
Best regards,
Ilya


Re: IEP-4, Phase 2. Using BL(A)T for in-memory caches.

2018-04-25 Thread Ilya Lantukh
Eduard,

I'm not sure I understand what you mean by "policy". Is it an interface
that will have a few default implementations and user will be able to
create his own one? If so, could you please write an example of such
interface (how you see it) and how and when it's methods will be invoked.

On Wed, Apr 25, 2018 at 10:10 PM, Eduard Shangareev <
eduard.shangar...@gmail.com> wrote:

> Igniters,
> I have described the issue with current approach in "New definition for
> affinity node (issues with baseline)" topic[1].
>
> Now we have 2 different affinity topology (one for in-memory, another for
> persistent caches).
>
> It causes problems:
> - we lose (in general) co-location between different caches;
> - we can't avoid PME when non-BLAT node joins cluster;
> - implementation should consider 2 different approaches to affinity
> calculation.
>
> So, I suggest unifying behavior of in-memory and persistent caches.
> They should all use BLAT.
>
> Their behaviors were different because we couldn't guarantee the safety of
> in-memory data.
> It should be fixed by a new mechanism of BLAT changing policy which was
> already discussed there - "Triggering rebalancing on timeout or manually if
> the baseline topology is not reassembled" [2].
>
> And we should have a policy by default which similar to current one
> (add nodes, remove nodes automatically but after some reasonable delay
> [seconds]).
>
> After this change, we could stop using the term 'BLAT', Basline and so on.
> Because there would not be an alternative. So, it would be only one
> possible Affinity Topology.
>
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/New-definition-for-
> affinity-node-issues-with-baseline-td29868.html
> [2]
> http://apache-ignite-developers.2346864.n4.nabble.
> com/Triggering-rebalancing-on-timeout-or-manually-if-the-
> baseline-topology-is-not-reassembled-td29299.html#none
>



-- 
Best regards,
Ilya


Re: Atomic caches

2018-04-23 Thread Ilya Lantukh
Val,

The logic you describe changed a while ago, with introduction of persistent
storage and cache group mechanics. Currently there might exist multiple
special caches where data structures are stored. The main reason for it was
to give users ability to put different structures into different cache
groups, and configure which ones should be persistent and which shouldn't.

On Wed, Apr 11, 2018 at 2:33 AM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Guys,
>
> I also not sure I understand the purpose of methods like [1] that accept
> instance of AtomicConfiguration to create new atomic structure. Per my
> knowledge, all atomics are stored in a single cache which is configured by
> AtomicConfiguration provided on startup as part of IgniteConfiguration. If
> that's the case, providing another configuration for a particular atomic
> doesn't make sense because it will never be used.
>
> Any thoughts on this? Unless I'm missing something, I think we can
> deprecate these methods.
>
> [1]
> https://ignite.apache.org/releases/latest/javadoc/org/
> apache/ignite/Ignite.html#atomicLong-java.lang.String-org.apache.ignite.
> configuration.AtomicConfiguration-long-boolean-
>
> -Val
>
> On Tue, Apr 10, 2018 at 3:28 PM, Dmitriy Setrakyan 
> wrote:
>
> > On Tue, Apr 10, 2018 at 2:03 PM, akurbanov  wrote:
> >
> > > Dmitry,
> > >
> > > Sorry for confusing topic. I I'm pretty sure that configuration for
> > atomic
> > > caches is validated, will double-check this. I was referring only
> atomic
> > > data structures cache.
> > >
> >
> > Got it. We should definitely add validation for the atomic data
> structures
> > configuration as well.
> >
>



-- 
Best regards,
Ilya


Deprecate CacheRebalanceMode.NONE

2018-04-17 Thread Ilya Lantukh
Igniters,

While working on rebalancing optimizations, I've discovered that we have
very weird logic for CacheRebalanceMode.NONE. In this mode we always keep
partitions in OWNING state even if their data is outdated, breaking our
internal invariants. In this mode every cache.get(...) will lead to sending
ForceKeysRequests to the youngest owner - and it's the only case where
Ignite still needs force keys mechanics. To maintain data consistency in
such mode a user has to ensure that nodes never leave or fail, which is
impossible in real world.

So, I suggest to deprecate CacheRebalanceMode.NONE and remove it in Ignite
3.0 because:
a. It has no real use-case. If a user wants to have control on rebalancing
process, we provide other mechanics: baseline topology and manual
rebalancing (rebalanceDelay == -1).
b. It adds too much complexity to our code and architecture.

Please share your thoughts.

-- 
Best regards,
Ilya


Re: Rebalancing - how to make it faster

2018-04-09 Thread Ilya Lantukh
Igniters,

I am currently at the finish line of
https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable WAL during
initial preloading") implementation. And I need that such behavior should
be configurable. In my intermediate implementation I have parameter called
"disableWalDuringRebalancing" in IgniteConfiguration. Do you thing such
name is meaningful and self-explanatory? Do we need to ensure that it has
the same value on every node? Should I make it configurable per cache
rather than globally?

Please share your thoughts.

On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh <ilant...@gridgain.com> wrote:

> Denis,
>
> Those ticket are rather complex, and so I don't know when I'll be able to
> start working on them.
>
> On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <dma...@apache.org> wrote:
>
>> Ilya,
>>
>> Just came across the IEP put together by you:
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16%
>> 3A+Optimization+of+rebalancing
>>
>> Excellent explanation, thanks for aggregating everything there.
>>
>> Two tickets below don't have a fixed version assigned:
>> https://issues.apache.org/jira/browse/IGNITE-8020
>> https://issues.apache.org/jira/browse/IGNITE-7935
>>
>> Do you plan to work on them in 2.6 time frame, right?
>>
>> --
>> Denis
>>
>> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <dma...@apache.org> wrote:
>>
>> > Ilya, granted you all the required permissions. Please let me know if
>> you
>> > still have troubles with the wiki.
>> >
>> > --
>> > Denis
>> >
>> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <ilant...@gridgain.com>
>> > wrote:
>> >
>> >> Unfortunately, I don't have permission to create page for IEP on wiki.
>> >> Denis, can you grant it? My username is ilantukh.
>> >>
>> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <a...@apache.org>
>> wrote:
>> >>
>> >> > >> It is impossible to disable WAL only for certain partitions
>> without
>> >> > >> completely overhauling design of Ignite storage mechanism. Right
>> now
>> >> we
>> >> > can
>> >> > >> afford only to change WAL mode per cache group.
>> >> >
>> >> > Cache group rebalancing is a one cache rebalancing, and then this
>> cache
>> >> > ("cache group") can be presented as a set of virtual caches.
>> >> > So, there is no issues for initial rebalancing.
>> >> > Lets disable WAL on initial rebalancing.
>> >> >
>> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
>> >> >
>> >> > > Dmitry,
>> >> > > It is impossible to disable WAL only for certain partitions without
>> >> > > completely overhauling design of Ignite storage mechanism. Right
>> now
>> >> we
>> >> > can
>> >> > > afford only to change WAL mode per cache group.
>> >> > >
>> >> > > The idea is to disable WAL when node doesn't have any partition in
>> >> OWNING
>> >> > > state, which means it doesn't have any consistent data and won't be
>> >> able
>> >> > to
>> >> > > restore from WAL anyway. I don't see any potential use for WAL on
>> such
>> >> > > node, but we can keep a configurable parameter indicating can we
>> >> > > automatically disable WAL in such case or not.
>> >> > >
>> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
>> >> dpavlov@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > > > Denis, as I understood, there is and idea to exclude only
>> rebalanced
>> >> > > > partition(s) data. All other data will go to the WAL.
>> >> > > >
>> >> > > > Ilya, please correct me if I'm wrong.
>> >> > > >
>> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
>> >> > > >
>> >> > > > > Ilya,
>> >> > > > >
>> >> > > > > That's a decent boost (5-20%) even having WAL enabled. Not sure
>> >> that
>> >> > we
>> >> > > > > should stake on the WAL "off" mode here because if the whole
>> >> cluste

Re: Rebalancing - how to make it faster

2018-04-09 Thread Ilya Lantukh
Denis,

Those ticket are rather complex, and so I don't know when I'll be able to
start working on them.

On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <dma...@apache.org> wrote:

> Ilya,
>
> Just came across the IEP put together by you:
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 16%3A+Optimization+of+rebalancing
>
> Excellent explanation, thanks for aggregating everything there.
>
> Two tickets below don't have a fixed version assigned:
> https://issues.apache.org/jira/browse/IGNITE-8020
> https://issues.apache.org/jira/browse/IGNITE-7935
>
> Do you plan to work on them in 2.6 time frame, right?
>
> --
> Denis
>
> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <dma...@apache.org> wrote:
>
> > Ilya, granted you all the required permissions. Please let me know if you
> > still have troubles with the wiki.
> >
> > --
> > Denis
> >
> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <ilant...@gridgain.com>
> > wrote:
> >
> >> Unfortunately, I don't have permission to create page for IEP on wiki.
> >> Denis, can you grant it? My username is ilantukh.
> >>
> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <a...@apache.org>
> wrote:
> >>
> >> > >> It is impossible to disable WAL only for certain partitions without
> >> > >> completely overhauling design of Ignite storage mechanism. Right
> now
> >> we
> >> > can
> >> > >> afford only to change WAL mode per cache group.
> >> >
> >> > Cache group rebalancing is a one cache rebalancing, and then this
> cache
> >> > ("cache group") can be presented as a set of virtual caches.
> >> > So, there is no issues for initial rebalancing.
> >> > Lets disable WAL on initial rebalancing.
> >> >
> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
> >> >
> >> > > Dmitry,
> >> > > It is impossible to disable WAL only for certain partitions without
> >> > > completely overhauling design of Ignite storage mechanism. Right now
> >> we
> >> > can
> >> > > afford only to change WAL mode per cache group.
> >> > >
> >> > > The idea is to disable WAL when node doesn't have any partition in
> >> OWNING
> >> > > state, which means it doesn't have any consistent data and won't be
> >> able
> >> > to
> >> > > restore from WAL anyway. I don't see any potential use for WAL on
> such
> >> > > node, but we can keep a configurable parameter indicating can we
> >> > > automatically disable WAL in such case or not.
> >> > >
> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
> >> dpavlov@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Denis, as I understood, there is and idea to exclude only
> rebalanced
> >> > > > partition(s) data. All other data will go to the WAL.
> >> > > >
> >> > > > Ilya, please correct me if I'm wrong.
> >> > > >
> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
> >> > > >
> >> > > > > Ilya,
> >> > > > >
> >> > > > > That's a decent boost (5-20%) even having WAL enabled. Not sure
> >> that
> >> > we
> >> > > > > should stake on the WAL "off" mode here because if the whole
> >> cluster
> >> > > goes
> >> > > > > down, it's then the data consistency is questionable. As an
> >> > architect,
> >> > > I
> >> > > > > wouldn't disable WAL for the sake of rebalancing; it's too
> risky.
> >> > > > >
> >> > > > > If you agree, then let's create the IEP. This way it will be
> >> easier
> >> > to
> >> > > > > track this endeavor. BTW, are you already ready to release any
> >> > > > > optimizations in 2.5 that is being discussed in a separate
> thread?
> >> > > > >
> >> > > > > --
> >> > > > > Denis
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <
> >> ilant...@gridgain.com
> >> > >
> >> > >

Re: Atomic caches

2018-04-09 Thread Ilya Lantukh
Anton,

Please do not use term "atomic cache" for system caches that hold internal
data for atomic data structures. This is very confusing.

You are right, currently there is no logic that will validate cache
configuration. It definitely should be fixed. And having configuration
parameters encoded in cache name, like it is currently implemented for
collections, is one of the most straightforward approaches.



On Mon, Apr 9, 2018 at 4:19 PM, akurbanov  wrote:

> I've attached .java file to https://issues.apache.org/
> jira/browse/IGNITE-2096
>
> Atomic references created with cacheMode=REPLICATED configuration are being
> lost when first references are created with cacheMode=PARTITIONED, no
> validation errors.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>



-- 
Best regards,
Ilya


Re: Rebalancing - how to make it faster

2018-03-27 Thread Ilya Lantukh
Unfortunately, I don't have permission to create page for IEP on wiki.
Denis, can you grant it? My username is ilantukh.

On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <a...@apache.org> wrote:

> >> It is impossible to disable WAL only for certain partitions without
> >> completely overhauling design of Ignite storage mechanism. Right now we
> can
> >> afford only to change WAL mode per cache group.
>
> Cache group rebalancing is a one cache rebalancing, and then this cache
> ("cache group") can be presented as a set of virtual caches.
> So, there is no issues for initial rebalancing.
> Lets disable WAL on initial rebalancing.
>
> 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
>
> > Dmitry,
> > It is impossible to disable WAL only for certain partitions without
> > completely overhauling design of Ignite storage mechanism. Right now we
> can
> > afford only to change WAL mode per cache group.
> >
> > The idea is to disable WAL when node doesn't have any partition in OWNING
> > state, which means it doesn't have any consistent data and won't be able
> to
> > restore from WAL anyway. I don't see any potential use for WAL on such
> > node, but we can keep a configurable parameter indicating can we
> > automatically disable WAL in such case or not.
> >
> > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <dpavlov@gmail.com>
> > wrote:
> >
> > > Denis, as I understood, there is and idea to exclude only rebalanced
> > > partition(s) data. All other data will go to the WAL.
> > >
> > > Ilya, please correct me if I'm wrong.
> > >
> > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
> > >
> > > > Ilya,
> > > >
> > > > That's a decent boost (5-20%) even having WAL enabled. Not sure that
> we
> > > > should stake on the WAL "off" mode here because if the whole cluster
> > goes
> > > > down, it's then the data consistency is questionable. As an
> architect,
> > I
> > > > wouldn't disable WAL for the sake of rebalancing; it's too risky.
> > > >
> > > > If you agree, then let's create the IEP. This way it will be easier
> to
> > > > track this endeavor. BTW, are you already ready to release any
> > > > optimizations in 2.5 that is being discussed in a separate thread?
> > > >
> > > > --
> > > > Denis
> > > >
> > > >
> > > >
> > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <ilant...@gridgain.com
> >
> > > > wrote:
> > > >
> > > > > Denis,
> > > > >
> > > > > > - Don't you want to aggregate the tickets under an IEP?
> > > > > Yes, I think so.
> > > > >
> > > > > > - Does it mean we're going to update our B+Tree implementation?
> Any
> > > > ideas
> > > > > how risky it is?
> > > > > One of tickets that I created (
> > > > > https://issues.apache.org/jira/browse/IGNITE-7935) involves B+Tree
> > > > > modification, but I am not planning to do it in the nearest future.
> > It
> > > > > shouldn't affect existing tree operations, only introduce new ones
> > > > (putAll,
> > > > > invokeAll, removeAll).
> > > > >
> > > > > > - Any chance you had a prototype that shows performance
> > optimizations
> > > > the
> > > > > approach you are suggesting to take?
> > > > > I have a prototype for simplest improvements (
> > > https://issues.apache.org/
> > > > > jira/browse/IGNITE-8019 & https://issues.apache.org/
> > > > > jira/browse/IGNITE-8018)
> > > > > - together they increase throughput by 5-20%, depending on
> > > configuration
> > > > > and environment. Also, I've tested different WAL modes - switching
> > from
> > > > > LOG_ONLY to NONE gives over 100% boost - this is what I expect from
> > > > > https://issues.apache.org/jira/browse/IGNITE-8017.
> > > > >
> > > > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <dma...@apache.org>
> > > wrote:
> > > > >
> > > > > > Ilya,
> > > > > >
> > > > > > That's outstanding research and summary. Thanks for spending your
> > > time
> > > > on
> > > > > > this.

Re: Rebalancing - how to make it faster

2018-03-26 Thread Ilya Lantukh
Dmitry,
It is impossible to disable WAL only for certain partitions without
completely overhauling design of Ignite storage mechanism. Right now we can
afford only to change WAL mode per cache group.

The idea is to disable WAL when node doesn't have any partition in OWNING
state, which means it doesn't have any consistent data and won't be able to
restore from WAL anyway. I don't see any potential use for WAL on such
node, but we can keep a configurable parameter indicating can we
automatically disable WAL in such case or not.

On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <dpavlov@gmail.com>
wrote:

> Denis, as I understood, there is and idea to exclude only rebalanced
> partition(s) data. All other data will go to the WAL.
>
> Ilya, please correct me if I'm wrong.
>
> пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
>
> > Ilya,
> >
> > That's a decent boost (5-20%) even having WAL enabled. Not sure that we
> > should stake on the WAL "off" mode here because if the whole cluster goes
> > down, it's then the data consistency is questionable. As an architect, I
> > wouldn't disable WAL for the sake of rebalancing; it's too risky.
> >
> > If you agree, then let's create the IEP. This way it will be easier to
> > track this endeavor. BTW, are you already ready to release any
> > optimizations in 2.5 that is being discussed in a separate thread?
> >
> > --
> > Denis
> >
> >
> >
> > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <ilant...@gridgain.com>
> > wrote:
> >
> > > Denis,
> > >
> > > > - Don't you want to aggregate the tickets under an IEP?
> > > Yes, I think so.
> > >
> > > > - Does it mean we're going to update our B+Tree implementation? Any
> > ideas
> > > how risky it is?
> > > One of tickets that I created (
> > > https://issues.apache.org/jira/browse/IGNITE-7935) involves B+Tree
> > > modification, but I am not planning to do it in the nearest future. It
> > > shouldn't affect existing tree operations, only introduce new ones
> > (putAll,
> > > invokeAll, removeAll).
> > >
> > > > - Any chance you had a prototype that shows performance optimizations
> > the
> > > approach you are suggesting to take?
> > > I have a prototype for simplest improvements (
> https://issues.apache.org/
> > > jira/browse/IGNITE-8019 & https://issues.apache.org/
> > > jira/browse/IGNITE-8018)
> > > - together they increase throughput by 5-20%, depending on
> configuration
> > > and environment. Also, I've tested different WAL modes - switching from
> > > LOG_ONLY to NONE gives over 100% boost - this is what I expect from
> > > https://issues.apache.org/jira/browse/IGNITE-8017.
> > >
> > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > > > Ilya,
> > > >
> > > > That's outstanding research and summary. Thanks for spending your
> time
> > on
> > > > this.
> > > >
> > > > Not sure I have enough expertise to challenge your approach, but it
> > > sounds
> > > > 100% reasonable to me. As side notes:
> > > >
> > > >- Don't you want to aggregate the tickets under an IEP?
> > > >- Does it mean we're going to update our B+Tree implementation?
> Any
> > > >ideas how risky it is?
> > > >- Any chance you had a prototype that shows performance
> > optimizations
> > > of
> > > >the approach you are suggesting to take?
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <ilant...@gridgain.com
> >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I've spent some time analyzing performance of rebalancing process.
> > The
> > > > > initial goal was to understand, what limits it's throughput,
> because
> > it
> > > > is
> > > > > significantly slower than network and storage device can
> > theoretically
> > > > > handle.
> > > > >
> > > > > Turns out, our current implementation has a number of issues caused
> > by
> > > a
> > > > > single fundamental problem.
> > > > >
> > > > > During rebalance data is sent in batches called
> > > > > GridDhtPartitionSupplyMessages. Batch size i

Re: Rebalancing - how to make it faster

2018-03-23 Thread Ilya Lantukh
Denis,

> - Don't you want to aggregate the tickets under an IEP?
Yes, I think so.

> - Does it mean we're going to update our B+Tree implementation? Any ideas
how risky it is?
One of tickets that I created (
https://issues.apache.org/jira/browse/IGNITE-7935) involves B+Tree
modification, but I am not planning to do it in the nearest future. It
shouldn't affect existing tree operations, only introduce new ones (putAll,
invokeAll, removeAll).

> - Any chance you had a prototype that shows performance optimizations the
approach you are suggesting to take?
I have a prototype for simplest improvements (https://issues.apache.org/
jira/browse/IGNITE-8019 & https://issues.apache.org/jira/browse/IGNITE-8018)
- together they increase throughput by 5-20%, depending on configuration
and environment. Also, I've tested different WAL modes - switching from
LOG_ONLY to NONE gives over 100% boost - this is what I expect from
https://issues.apache.org/jira/browse/IGNITE-8017.

On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <dma...@apache.org> wrote:

> Ilya,
>
> That's outstanding research and summary. Thanks for spending your time on
> this.
>
> Not sure I have enough expertise to challenge your approach, but it sounds
> 100% reasonable to me. As side notes:
>
>- Don't you want to aggregate the tickets under an IEP?
>- Does it mean we're going to update our B+Tree implementation? Any
>ideas how risky it is?
>- Any chance you had a prototype that shows performance optimizations of
>the approach you are suggesting to take?
>
> --
> Denis
>
> On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <ilant...@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > I've spent some time analyzing performance of rebalancing process. The
> > initial goal was to understand, what limits it's throughput, because it
> is
> > significantly slower than network and storage device can theoretically
> > handle.
> >
> > Turns out, our current implementation has a number of issues caused by a
> > single fundamental problem.
> >
> > During rebalance data is sent in batches called
> > GridDhtPartitionSupplyMessages. Batch size is configurable, default
> value
> > is 512KB, which could mean thousands of key-value pairs. However, we
> don't
> > take any advantage over this fact and process each entry independently:
> > - checkpointReadLock is acquired multiple times for every entry, leading
> to
> > unnecessary contention - this is clearly a bug;
> > - for each entry we write (and fsync, if configuration assumes it) a
> > separate WAL record - so, if batch contains N entries, we might end up
> > doing N fsyncs;
> > - adding every entry into CacheDataStore also happens completely
> > independently. It means, we will traverse and modify each index tree N
> > times, we will allocate space in FreeList N times and we will have to
> > additionally store in WAL O(N*log(N)) page delta records.
> >
> > I've created a few tickets in JIRA with very different levels of scale
> and
> > complexity.
> >
> > Ways to reduce impact of independent processing:
> > - https://issues.apache.org/jira/browse/IGNITE-8019 - aforementioned
> bug,
> > causing contention on checkpointReadLock;
> > - https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency in
> > GridCacheMapEntry implementation;
> > - https://issues.apache.org/jira/browse/IGNITE-8017 - automatically
> > disable
> > WAL during preloading.
> >
> > Ways to solve problem on more global level:
> > - https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to
> > introduce
> > batch modification;
> > - https://issues.apache.org/jira/browse/IGNITE-8020 - complete redesign
> of
> > rebalancing process for persistent caches, based on file transfer.
> >
> > Everyone is welcome to criticize above ideas, suggest new ones or
> > participate in implementation.
> >
> > --
> > Best regards,
> > Ilya
> >
>



-- 
Best regards,
Ilya


Rebalancing - how to make it faster

2018-03-22 Thread Ilya Lantukh
Igniters,

I've spent some time analyzing performance of rebalancing process. The
initial goal was to understand, what limits it's throughput, because it is
significantly slower than network and storage device can theoretically
handle.

Turns out, our current implementation has a number of issues caused by a
single fundamental problem.

During rebalance data is sent in batches called
GridDhtPartitionSupplyMessages. Batch size is configurable, default value
is 512KB, which could mean thousands of key-value pairs. However, we don't
take any advantage over this fact and process each entry independently:
- checkpointReadLock is acquired multiple times for every entry, leading to
unnecessary contention - this is clearly a bug;
- for each entry we write (and fsync, if configuration assumes it) a
separate WAL record - so, if batch contains N entries, we might end up
doing N fsyncs;
- adding every entry into CacheDataStore also happens completely
independently. It means, we will traverse and modify each index tree N
times, we will allocate space in FreeList N times and we will have to
additionally store in WAL O(N*log(N)) page delta records.

I've created a few tickets in JIRA with very different levels of scale and
complexity.

Ways to reduce impact of independent processing:
- https://issues.apache.org/jira/browse/IGNITE-8019 - aforementioned bug,
causing contention on checkpointReadLock;
- https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency in
GridCacheMapEntry implementation;
- https://issues.apache.org/jira/browse/IGNITE-8017 - automatically disable
WAL during preloading.

Ways to solve problem on more global level:
- https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to introduce
batch modification;
- https://issues.apache.org/jira/browse/IGNITE-8020 - complete redesign of
rebalancing process for persistent caches, based on file transfer.

Everyone is welcome to criticize above ideas, suggest new ones or
participate in implementation.

-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-8020) Rebalancing for persistent cache should transfer file store over network instead of using existing supply/demand protocol

2018-03-22 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8020:


 Summary: Rebalancing for persistent cache should transfer file 
store over network instead of using existing supply/demand protocol
 Key: IGNITE-8020
 URL: https://issues.apache.org/jira/browse/IGNITE-8020
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8019) During rebalancing checkpoint read lock is acquired for each entry (twice).

2018-03-22 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8019:


 Summary: During rebalancing checkpoint read lock is acquired for 
each entry (twice).
 Key: IGNITE-8019
 URL: https://issues.apache.org/jira/browse/IGNITE-8019
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


In GridDhtPartitionDemander.preloadEntry(...) checkpointReadLock is acquired 
twice, which is definitely a mistake. But even acquiring it once per entry 
causes unnecessary contention - it should be taken once for SupplyMessage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8018) Avoid unnecessary unswap(...) call in GridCacheMapEntry.initialValue(...)

2018-03-22 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8018:


 Summary: Avoid unnecessary unswap(...) call in 
GridCacheMapEntry.initialValue(...)
 Key: IGNITE-8018
 URL: https://issues.apache.org/jira/browse/IGNITE-8018
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


GridCacheMapEntry.initialValue(...) method executes unswap(...) to read entry 
version from PageMemory, but then it also executes storeValue(...) which writes 
new value if entry is obsolete. It means that the same entry is searched in 
PageMemory twice, which is not optimal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8017) Disable WAL during initial preloading

2018-03-22 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-8017:


 Summary: Disable WAL during initial preloading
 Key: IGNITE-8017
 URL: https://issues.apache.org/jira/browse/IGNITE-8017
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

2018-03-22 Thread Ilya Lantukh
+1 for fixing LOG_ONLY. If current implementation doesn't protect from data
corruption, it doesn't make sence.

On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda  wrote:

> +1 for the fix of LOG_ONLY
>
> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > +1 for fixing LOG_ONLY to enforce corruption safety given the provided
> > performance results.
> >
> > 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov :
> >
> > > +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
> at
> > > all, provided that we fixing a bug. I.e. should we implement it
> correctly
> > > in the first place we would never notice any "drop".
> > > I do not understand why someone would like to use current broken mode.
> > >
> > > On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov 
> > > wrote:
> > >
> > > > Hi, I think option 1 is better. As Val said any mode that allows
> > > corruption
> > > > does not make much sense.
> > > >
> > > > What Ivan mentioned here as drop, in relation to old mode DEFAULT
> > (FSYNC
> > > > now), is still significant perfromance boost.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > ср, 21 мар. 2018 г. в 17:56, Ivan Rakov :
> > > >
> > > > > I've attached benchmark results to the JIRA ticket.
> > > > > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
> WAL
> > > > > compaction enabled flag. It's pretty significant drop: WAL
> compaction
> > > > > itself gives only ~3% drop.
> > > > >
> > > > > I see two options here:
> > > > > 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> > > release
> > > > > AI 2.5 with 7% drop.
> > > > > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
> > 2.5
> > > > > that we added power loss durability in default mode, but user may
> > > > > fallback to previous LOG_ONLY in order to retain performance.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best Regards,
> > > > > Ivan Rakov
> > > > >
> > > > > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > > > > Val,
> > > > > >
> > > > > >> If a storage is in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data?
> > > > > >
> > > > > > Yes, there's a chance that in LOG_ONLY all local data will be
> lost,
> > > > > > but only in *power loss**/ OS crash* case.
> > > > > > kill -9, JVM crash, death of critical system thread and all other
> > > > > > cases that usually take place are variations of *process crash*.
> > All
> > > > > > WAL modes (except NONE, of course) ensure corruption-safety in
> case
> > > of
> > > > > > process crash.
> > > > > >
> > > > > >> If so, I'm not sure any mode
> > > > > >> that allows corruption makes much sense to me.
> > > > > > It depends on performance impact of enforcing power-loss
> corruption
> > > > > > safety. Price of full protection from power loss is high - FSYNC
> is
> > > > > > way slower (2-10 times) than other WAL modes. The question is
> > whether
> > > > > > ensuring weaker guarantees (corruption can't happen, but loss of
> > last
> > > > > > updates can) will affect performance as badly as strong
> guarantees.
> > > > > > I'll share benchmark results soon.
> > > > > >
> > > > > > Best Regards,
> > > > > > Ivan Rakov
> > > > > >
> > > > > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > > > > >> Guys,
> > > > > >>
> > > > > >> What do we understand under "data corruption" here? If a storage
> > is
> > > in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data? If so, I'm not sure
> > any
> > > > mode
> > > > > >> that allows corruption makes much sense to me. How am I supposed
> > to
> > > > > >> use a
> > > > > >> database, if virtually any failure can end with complete loss of
> > > data?
> > > > > >>
> > > > > >> In any case, this definitely should not be a default behavior.
> If
> > > > > >> user ever
> > > > > >> switches to corruption-unsafe mode, there should be a clear
> > warning
> > > > > >> about
> > > > > >> this.
> > > > > >>
> > > > > >> -Val
> > > > > >>
> > > > > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
> > ivan.glu...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Ticket to track changes:
> > > > > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > > > >>>
> > > > > >>> Best Regards,
> > > > > >>> Ivan Rakov
> > > > > >>>
> > > > > >>>
> > > > > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > > > > >>>
> > > > >  On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> > > ivan.glu...@gmail.com
> > > > >
> > > > >  wrote:
> > > > > 
> > > > >  Vladimir,
> > > > > > Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > > > > > unless power
> > > > > > loss has happened.
> > > > > > Seems like we need to 

[jira] [Created] (IGNITE-7961) Rebalance throughput requires optimization

2018-03-15 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7961:


 Summary: Rebalance throughput requires optimization
 Key: IGNITE-7961
 URL: https://issues.apache.org/jira/browse/IGNITE-7961
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7936) Make partition state transfer from OWNING state possible only within exchange.

2018-03-13 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7936:


 Summary: Make partition state transfer from OWNING state possible 
only within exchange.
 Key: IGNITE-7936
 URL: https://issues.apache.org/jira/browse/IGNITE-7936
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


Currently partition is moved from OWNING to RENTING state right after exchange, 
which makes it necessary to handle races while mapping distributed operations. 
It can be improved, simplified and optimized by performing such transfer within 
exchange.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7934) Do not checkpoint pages from MOVING partitions

2018-03-13 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7934:


 Summary: Do not checkpoint pages from MOVING partitions
 Key: IGNITE-7934
 URL: https://issues.apache.org/jira/browse/IGNITE-7934
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


To remove necessity to acquire checkpoint lock and write records to WAL while 
processing supply messages, it might be beneficial to skip pages from MOVING 
partitions during checkpointing. 

The only problem that it might cause is index consistency, which can be avoided 
by splitting rebalancing in 2 steps:
1. data preloading;
2. index rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7935) Add batch put operation to B+ tree implementation.

2018-03-13 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7935:


 Summary: Add batch put operation to B+ tree implementation.
 Key: IGNITE-7935
 URL: https://issues.apache.org/jira/browse/IGNITE-7935
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7890) Node start with corrupted pds hangs indefinitely.

2018-03-06 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7890:


 Summary: Node start with corrupted pds hangs indefinitely.
 Key: IGNITE-7890
 URL: https://issues.apache.org/jira/browse/IGNITE-7890
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


Starting node with corrupted PDS + WAL leads to cluster-wide hang-up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7857) Extend IgniteWalFlushMultiNodeFailoverAbstractSelfTest to cover MMAP mode

2018-03-01 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7857:


 Summary: Extend IgniteWalFlushMultiNodeFailoverAbstractSelfTest to 
cover MMAP mode
 Key: IGNITE-7857
 URL: https://issues.apache.org/jira/browse/IGNITE-7857
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7831) Throw Exceptions instead of AssertionErrors when reading from corrupted persistence

2018-02-27 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7831:


 Summary: Throw Exceptions instead of AssertionErrors when reading 
from corrupted persistence
 Key: IGNITE-7831
 URL: https://issues.apache.org/jira/browse/IGNITE-7831
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7767) Update WALMode configuration in .NET

2018-02-20 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7767:


 Summary: Update WALMode configuration in .NET
 Key: IGNITE-7767
 URL: https://issues.apache.org/jira/browse/IGNITE-7767
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


In https://issues.apache.org/jira/browse/IGNITE-7594 we changed values for 
WALMode enum, but .NET code wasn't modified accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7754) WAL in LOG_ONLY mode doesn't execute fsync on checkpoint begin

2018-02-19 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7754:


 Summary: WAL in LOG_ONLY mode doesn't execute fsync on checkpoint 
begin
 Key: IGNITE-7754
 URL: https://issues.apache.org/jira/browse/IGNITE-7754
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


On checkpoint begin method IgniteWriteAheadLogManager.fsync(WALPointer ptr) 
will be called, but it won't actually perform fsync because mode isn't FSYNC. 
It might lead to LFS corruption if OS or hardware failed until checkpoint had 
been finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Ignite 2.4 release

2018-02-16 Thread Ilya Lantukh
I am OK with either FSYNC or STRICT variant.

LOG_ONLY name means "log without fsync".

On Fri, Feb 16, 2018 at 4:05 PM, Dmitriy Setrakyan 
wrote:

> On Fri, Feb 16, 2018 at 7:02 AM, Ivan Rakov  wrote:
>
> > Why create a new term to define something that has already been defined?
> >>
> > That makes sense. I'm ok with FSYNC.
> > Anton, I don't understand why we should rename LOG_ONLY to SYNC. We
> > started this discussion with bad naming of DEFAULT, but this has nothing
> to
> > do with LOG_ONLY (even though it may be scientific - but SYNC sounds
> > scientific as well).
> >
>
> I agree with Ivan, we should not go wild with renaming. However, I would
> like to find out what is the meaning behind the LOG_ONLY name. Can someone
> explain?
>
> D.
>



-- 
Best regards,
Ilya


Re: Apache Ignite 2.4 release

2018-02-16 Thread Ilya Lantukh
Hi all,

I'd like to suggest to change default WALMode. Currently we have:
DEFAULT (write and fsync),
LOG_ONLY (write without fsync),
BACKGROUND,
NONE.

It turns out that fsyncs in current DEFAULT mode significantly restricts
Ignite performance. Compared to LOG_ONLY, it offers additional guarantees
that data won't be lost in case of OS or hardware failure, but such
guarantees aren't needed very often, and tradeoff is too big.

I suggest to rename current DEFAULT to STRICT and make LOG_ONLY new default
mode. We can leave DEFAULT as @Deprecated and treat it as STRICT, so that
users with old configs will have the same behaviour.

What do you think?

On Fri, Feb 16, 2018 at 12:35 AM, Denis Magda  wrote:

> Vladimir,
>
> I would suggest not to do this because we still need to spend time on
> testing, documentation, etc. If someone shows interest in this features
> they can assemble binaries from the master.
>
> --
> Denis
>
> On Thu, Feb 15, 2018 at 6:43 AM, Nikolay Izhikov 
> wrote:
>
> > +1
> >
> > В Чт, 15/02/2018 в 17:27 +0300, Vladimir Ozerov пишет:
> > > Igniters,
> > >
> > > AI 2.4 release was shifted a bit and over this time we implemented two
> > > important SQL features:
> > > 1) COPY command for fast file upload to the cluster [1]
> > > 2) Streaming mode for thin driver [2]
> > >
> > > Both commands are very important for fast data ingestion into Ignite
> > > through SQL. I would like to ask community to consider to include these
> > two
> > > features into AI 2.4 in *experimental* state because both of them will
> be
> > > improved in various ways in the nearest time. If we do so, we will be
> > able
> > > to collect some feedback from the users before AI 2.5 release. What do
> > you
> > > think?
> > >
> > > Vladimir.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-6917
> > > [2] https://issues.apache.org/jira/browse/IGNITE-7253
> > >
> > > On Tue, Feb 13, 2018 at 1:22 AM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > On Mon, Feb 12, 2018 at 9:22 AM, Dmitry Pavlov <
> dpavlov@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Unfortunately, a quick fix did not give us too much performance
> > boost.
> > > > >
> > > > > I'm going to implement a complete algorithm change for storing the
> > page
> > > > > identifier. But this change is quite significant and will require
> > > > > re-testing. I suggest including
> > > > > https://issues.apache.org/jira/browse/IGNITE-7638 in the next
> > version,
> > > >
> > > > for
> > > > > example, to 2.5.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > >
> > > > >
> > > >
> > > > Dmitriy, thanks for the update! Are there other tickets that are
> > holding
> > > > the release at this point? I remember that there was a performance
> > > > degradation issue in FULL_SYNC mode, but I cannot find a ticket.
> > > >
> > > > D.
> > > >
> >
>



-- 
Best regards,
Ilya


Re: Author in header of class

2018-02-14 Thread Ilya Lantukh
I agree with Vyacheslav and Anton Vinogradov. Using @author tag can be
misleading, Upsource will provide more up-to-date information.

On Wed, Feb 14, 2018 at 3:06 PM, Anton Vinogradov 
wrote:

> +1 to Upsource usage.
>
> Checked that code ownership provided by Upsource is correct.
>
> On Wed, Feb 14, 2018 at 2:22 PM, Vyacheslav Daradur 
> wrote:
>
> > Dmitry,
> >
> > > - easy find reviewer and (Patch available queue),
> > > - find test author / mainteiner (MTCGA)>.
> > Upsource suggests reviewers based on changed classes maintainers when
> > we create pull request review.
> > Moreover, it shows all authors of each class in PR, with showing a
> > percentage of contributing.
> >
> > IMO using @author tag will be easy-to-use only with "little" classes,
> > because often hard to identify maintainer of "big" classes because of
> > great number of changes.
> >
> > Also, it will be preferable to send a review request to dev-list, if
> > maintainer retired and doesn't subscribe dev-list anymore.
> >
> > On Wed, Feb 14, 2018 at 1:38 PM, Dmitry Pavlov 
> > wrote:
> > > Hi Folks,
> > >
> > > Listing from git annotate are not informative because there are many
> > > contributors change same place of code.
> > >
> > > Code author information can help us to solve 2 issues for new community
> > > members:
> > > - easy find reviewer and (Patch available queue),
> > > - find test author / mainteiner (MTCGA).
> > >
> > > So I strongly like Anton K. proposal. It's pity that author tag is
> banned
> > > by Apache.
> > >
> > > Let's think how we can solve these 2 issues without author tag.
> > >
> > > One more point: IMHO we should remove selection of 2+ mainterners or
> > > underline one (default) contact point for area. Necessity to choise may
> > > confuse newcomer. One main (default) mainteiner will redirect review.
> > >
> > > Sincererely,
> > > Dmitriy Pavlov
> > >
> > > ср, 14 февр. 2018 г. в 13:10, Anton Vinogradov <
> avinogra...@gridgain.com
> > >:
> > >
> > >> Anton,
> > >>
> > >> All listed info can be gained from git history.
> > >>
> > >> On Wed, Feb 14, 2018 at 1:01 PM, Дмитрий Рябов  >
> > >> wrote:
> > >>
> > >> > For the first profit you should see maintainer table [1].
> > >> >
> > >> > [1]
> > >> > https://cwiki.apache.org/confluence/display/IGNITE/How+
> > >> > to+Contribute#HowtoContribute-ReviewProcessandMaintainers
> > >> >
> > >> > 2018-02-14 11:42 GMT+03:00 Vyacheslav Daradur  >:
> > >> >
> > >> > > Hi Anton,
> > >> > >
> > >> > > Apache Ignite is licensed under the Apache License Version 2.0,
> > which
> > >> > > does not allow to use author tag in java code [1].
> > >> > >
> > >> > > [1] http://directory.apache.org/fortress/coding-standards.
> > >> > > html#classinterface-headers
> > >> > >
> > >> > > On Wed, Feb 14, 2018 at 11:23 AM, Антон Калашников <
> > kaa@yandex.ru>
> > >> > > wrote:
> > >> > > > Hello Ignite Community!
> > >> > > >
> > >> > > > My name is Anton. I joined to community some time ago and I want
> > to
> > >> > > contribute to Apache Ignite.
> > >> > > >
> > >> > > > I would be want to make my first proposal. I noticed that Ignite
> > >> don't
> > >> > > have author description in header of classes unlike many other
> > apache
> > >> > > projects.
> > >> > > > I propose to use javadoc tag @author in header of class when you
> > >> > created
> > >> > > it  and also add extra tag @author when you do many changes in
> this
> > >> > class.
> > >> > > >
> > >> > > > Profits of this aproach, in my opinion:
> > >> > > > 1) You always know who has knowledge of this class unlike git
> > >> annotate
> > >> > > which are sensetive in refactoring, moving, etc.
> > >> > > > 2) It will highed responsibilty for code quality because nobody
> > want
> > >> to
> > >> > > sign under bad code)
> > >> > > >
> > >> > > > Example:
> > >> > > > /**
> > >> > > >  * @author Anton Kalashnikov
> > >> > > >  * @author Other Author
> > >> > > >  */
> > >> > > >
> > >> > > > --
> > >> > > > Best Regards,
> > >> > > > Anton Kalashnikov
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best Regards, Vyacheslav D.
> > >> > >
> > >> >
> > >>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>



-- 
Best regards,
Ilya


Re: IGNITE-6005 is ready for review

2018-02-12 Thread Ilya Lantukh
Hi Nikolay,

Thanks for contribution!

I will try to review your pull request by tomorrow.

On Mon, Feb 12, 2018 at 11:45 AM, Nikolay Izhikov <nizhi...@apache.org>
wrote:

> Hello, Igniters.
>
> I've fix some relatively small issue(100 loc) IGNITE-6005 [1], PR [2]
> The fix in DataStructuresProcessor.java
>
> As far as I can see from git log Ilya Lantukh maintains this piece of code.
>
> Ilya, can you please take a look at the fix?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6005
>
> [2] https://github.com/apache/ignite/pull/2773




-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-7592) Cache with rebalanceDelay == -1 doesn't trigger late affinity assignment even after explicit rebalance is called on every node

2018-01-31 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7592:


 Summary: Cache with rebalanceDelay == -1 doesn't trigger late 
affinity assignment even after explicit rebalance is called on every node
 Key: IGNITE-7592
 URL: https://issues.apache.org/jira/browse/IGNITE-7592
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Ilya Lantukh


Reproducer:
{noformat}
startGrids(NODE_COUNT);

IgniteEx ig = grid(0);

ig.cluster().active(true);

awaitPartitionMapExchange();

IgniteCache<Integer, Integer> cache =
ig.createCache(
new CacheConfiguration<Integer, Integer>()
.setName(CACHE_NAME)
.setCacheMode(PARTITIONED)
.setBackups(1)
.setPartitionLossPolicy(READ_ONLY_SAFE)
.setReadFromBackup(true)
.setWriteSynchronizationMode(FULL_SYNC)
.setRebalanceDelay(-1)
);

for (int i = 0; i < NODE_COUNT; i++)
grid(i).cache(CACHE_NAME).rebalance().get();

awaitPartitionMapExchange();
{noformat}

Sometimes this code will hang on awaitPartitionMapExchange(), though 
probability that it will happen is rather low (<10%).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7514) Affinity assignment isn't recalculated if PRIMARY node isn't OWNER

2018-01-24 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7514:


 Summary: Affinity assignment isn't recalculated if PRIMARY node 
isn't OWNER
 Key: IGNITE-7514
 URL: https://issues.apache.org/jira/browse/IGNITE-7514
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh
 Fix For: 2.4


It can happen after activation or recovery and leads to multiple exceptions / 
assertions during cache operations mapping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7502) Baseline topology should affect only persistent caches

2018-01-23 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-7502:


 Summary: Baseline topology should affect only persistent caches
 Key: IGNITE-7502
 URL: https://issues.apache.org/jira/browse/IGNITE-7502
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


Non-persistent caches shouldn't restrict affinity assignment to baseline nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Ignite Committership Bar Guidance

2017-12-26 Thread Ilya Lantukh
Denis,

I presume that number of committers in Ignite doesn't grow as fast as
amount of contributions, so it requires additional effort to keep track on
each individual's progress.

So yes, I'd like to propose my candidacy since I'm both willing and able to
participate in development of Apache Ignite and it's community.

On Sun, Dec 24, 2017 at 6:04 AM, Denis Magda <dma...@apache.org> wrote:

> Hi Ilya,
>
> Basically one of committers who usually reviews and merges code should
> spot that it’s time to discuss a candidacy.
>
> Anyway, if a contributor feels that committers don’t keep an eye on
> his/her progress the contributor can remind about personal existence by
> sending a message to @dev. ;)
>
> —
> Denis
>
> > On Dec 23, 2017, at 3:58 AM, Ilya Lantukh <ilant...@gridgain.com> wrote:
> >
> > Hi Denis,
> > Who should take the initiative in promotion? As a contributor who meets
> > those criteria, should I send my application to the PMC or just wait?
> >
> > On Fri, Dec 22, 2017 at 8:24 PM, Denis Magda <dma...@apache.org> wrote:
> >
> >> Hi Vyacheslav,
> >>
> >> Some common sense, I presume. A committer who will be proposing a
> >> candidate can use JIRA to see a total number of resolved tickets and Git
> >> history to get insights on a complexity of every contribution.
> >>
> >> Do you have any suggestions on this?
> >>
> >> —
> >> Denis
> >>
> >>> On Dec 22, 2017, at 12:56 AM, Vyacheslav Daradur <daradu...@gmail.com>
> >> wrote:
> >>>
> >>> Hi, Denis, how will the complexity of tasks be estimated?
> >>>
> >>>
> >>> On Fri, Dec 22, 2017 at 3:16 AM, Denis Magda <dma...@apache.org>
> wrote:
> >>>> Igniters,
> >>>>
> >>>> We at Ignite PMC put together a page with rules-of-thumb on when it’s
> a
> >> right time to promote a contributor to a committer:
> >>>> https://cwiki.apache.org/confluence/display/IGNITE/
> >> Committership+Bar+Guidance <https://cwiki.apache.org/
> >> confluence/display/IGNITE/Committership+Bar+Guidance>
> >>>>
> >>>> It’s intended to make “contributor to committer” process more evident
> >> and simplify Ignite PMC voting process on this matter.
> >>>>
> >>>> Please use it as a reference, comments and suggestions are welcomed!
> >>>>
> >>>> —
> >>>> Denis
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Vyacheslav D.
> >>
> >>
> >
> >
> > --
> > Best regards,
> > Ilya
>
>


-- 
Best regards,
Ilya


Re: Runtime.availableProcessors() returns hardware's CPU count which is the issue with Ignite in Kubernetes

2017-12-26 Thread Ilya Lantukh
Hi Yakov,

I think that property IGNITE_NODES_PER_HOST, as you suggested, would be
confusing, because users might want to reduce amount of available resources
for ignite node not only because they run multiple nodes per host, but also
because they run other software. Also, in my opinion all types of system
resources (CPU, memory, network) shouldn't be scaled using the same value.

So I'd prefer to have IGNITE_CONCURRENCY_LEVEL or
IGNITE_AVAILABLE_PROCESSORS, as it was originally suggested.

On Tue, Dec 26, 2017 at 4:05 PM, Yakov Zhdanov  wrote:

> Cross-posting to dev list.
>
> Guys,
>
> Suggestion below makes sense to me. Filed a ticket
> https://issues.apache.org/jira/browse/IGNITE-7310
>
> Perhaps, Arseny would like to provide a PR himself ;)
>
> --Yakov
>
> 2017-12-26 14:32 GMT+03:00 Arseny Kovalchuk :
>
> > Hi guys.
> >
> > Ignite configures all thread pools, selectors, etc. basing on
> Runtime.availableProcessors()
> > which seems not correct in containerized environment. In Kubernetes with
> > Docker that method returns CPU count of a Node/machine, which is 64 in
> our
> > particular case. But those 64 CPU and their timings are shared between
> > other stuff on the node like other Pods and services. Appropriate value
> of
> > available cores for Pod is usually configured as CPU Resource and
> estimated
> > basing on different things taking performance into account. General idea,
> > if you want to run several Pods on the same node, they all should request
> > less resources then the node provides. So, we give 4-8 cores for Ignite
> > instance in Kubernetes, but Ignite's thread pools are configured like
> they
> > get all 64 CPUs, and in turn we get a lot of threads for the Pod with 4-8
> > cores available.
> >
> > Now we manually set appropriate values for all available properties which
> > relate to thread pools.
> >
> > Would it be correct to have one environment variable, say
> > IGNITE_CONCURRENCY_LEVEL which will be used as a reference value for
> those
> > configurations and by default equals to Runtime.availableProcessors()?
> >
> > Thanks.
> >
> > ​
> > Arseny Kovalchuk
> >
> > Senior Software Engineer at Synesis
> > skype: arseny.kovalchuk
> > mobile: +375 (29) 666-16-16
> > ​LinkedIn Profile ​
> >
>



-- 
Best regards,
Ilya


Re: Ignite Committership Bar Guidance

2017-12-23 Thread Ilya Lantukh
Hi Denis,
Who should take the initiative in promotion? As a contributor who meets
those criteria, should I send my application to the PMC or just wait?

On Fri, Dec 22, 2017 at 8:24 PM, Denis Magda  wrote:

> Hi Vyacheslav,
>
> Some common sense, I presume. A committer who will be proposing a
> candidate can use JIRA to see a total number of resolved tickets and Git
> history to get insights on a complexity of every contribution.
>
> Do you have any suggestions on this?
>
> —
> Denis
>
> > On Dec 22, 2017, at 12:56 AM, Vyacheslav Daradur 
> wrote:
> >
> > Hi, Denis, how will the complexity of tasks be estimated?
> >
> >
> > On Fri, Dec 22, 2017 at 3:16 AM, Denis Magda  wrote:
> >> Igniters,
> >>
> >> We at Ignite PMC put together a page with rules-of-thumb on when it’s a
> right time to promote a contributor to a committer:
> >> https://cwiki.apache.org/confluence/display/IGNITE/
> Committership+Bar+Guidance  confluence/display/IGNITE/Committership+Bar+Guidance>
> >>
> >> It’s intended to make “contributor to committer” process more evident
> and simplify Ignite PMC voting process on this matter.
> >>
> >> Please use it as a reference, comments and suggestions are welcomed!
> >>
> >> —
> >> Denis
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
>
>


-- 
Best regards,
Ilya


Re: Cluster auto activation design proposal

2017-11-04 Thread Ilya Lantukh
I assume that BaselineNode is intended to be used instead of ClusterNode
during affinity assignment calculation. However, it will require that we
change signature of AffinityFunction and AffinityFunctionContext methods,
which are part of public API. Can we actually do it?

On Sat, Nov 4, 2017 at 5:56 PM, Dmitriy Setrakyan 
wrote:

> Alexey, what is the point of BaselineNode interface? Why not just have
> ClusterNode?
>
> On Fri, Nov 3, 2017 at 11:26 PM, Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > Guys,
> >
> > We are getting closer to the baseline topology finalization. As a first
> > step, I would like to request a review of the baseline topology
> management
> > API. The changes are summarized in [1]. In my opinion, changes are quite
> > simple and concise. Also, as a side note, I suggest moving cluster
> > activation methods to the IgniteCluter facade as well because the facade
> > itself looks like a good place for management API. Looks like the
> original
> > decision to place it on Ignite was wrong.
> >
> > Thanks!
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5850
> >
> > 2017-09-04 17:46 GMT+03:00 Dmitriy Setrakyan :
> >
> > > On Mon, Sep 4, 2017 at 6:13 AM, Sergey Chugunov <
> > sergey.chugu...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Dmitriy,
> > > >
> > > > I like the idea of ClusterActivator interface.
> > > >
> > > > From user perspective it provides the same functionality as the
> setter
> > > but
> > > > in more clear and intuitive way.
> > > >
> > >
> > > BTW, I made a naming mistake in the original email. The setter name
> > should
> > > be "setClusterActivator(...).
> > >
> > >
> > > >
> > > > Also it gives us a good place to put all the documentation about the
> > > > feature.
> > > >
> > >
> > > Agree. Another advantage is that users can now provide custom logic for
> > the
> > > initial cluster activation.
> > >
> > >
> > > >
> > > > Any other opinions?
> > > >
> > >
> > > Alexey Goncharuk, given that you provided the initial implementation of
> > the
> > > cluster activation, can you please take a look at this design and
> provide
> > > comments?
> > >
> > >
> > > >
> > > > On Fri, Sep 1, 2017 at 2:35 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > How about this:
> > > > >
> > > > >
> > > > > > *interface ClusterActivator {*
> > > > > > *boolean activate(Collection nodes);**}*
> > > > >
> > > > >
> > > > > Out of the box, we can provide this implementation of the
> activation
> > > > > filter:
> > > > >
> > > > >
> > > > > >
> > > > > > *ClusterInitialActiveSet implements ClusterActivator { *
> > > > > > *InigeInitialActiveSet(String... addresses);**}*
> > > > >
> > > > >
> > > > > Then user configuration can look as follows:
> > > > >
> > > > > *IgniteConfiguration.setActivationFilter(new
> > > > > > ClusterInitialActiveSet("1.2.3.4", "4.3.2.1", etc));*
> > > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > D.
> > > > >
> > > > > On Fri, Sep 1, 2017 at 1:47 PM, Sergey Chugunov <
> > > > sergey.chugu...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Dmitriy,
> > > > > >
> > > > > > The idea is interesting however I cannot come up with a clear use
> > > case
> > > > > > which can be widely adopted.
> > > > > > I would give users a simple API at first to cover 80% of their
> > needs
> > > > and
> > > > > > then collect some feedback and start thinking about adding new
> > > > > > functionality.
> > > > > >
> > > > > > Makes sense?
> > > > > >
> > > > > > Sergey.
> > > > > >
> > > > > > On Thu, Aug 31, 2017 at 3:55 AM, Dmitriy Setrakyan <
> > > > > dsetrak...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hm... Can we also ask user to optionally provide a predicate
> > which
> > > > will
> > > > > > > receive a collection of nodes started so far and return true if
> > the
> > > > > > > activation should happen? Will it be useful?
> > > > > > >
> > > > > > > On Wed, Aug 30, 2017 at 6:28 PM, Sergey Chugunov <
> > > > > > > sergey.chugu...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Nick,
> > > > > > > >
> > > > > > > > As I summed up in this thread above, calling setter for
> initial
> > > > > > > activation
> > > > > > > > nodes is not the only option:
> > > > > > > >
> > > > > > > >1. user starts up new cluster of desired number of nodes
> and
> > > > > > activates
> > > > > > > >it using existing API.
> > > > > > > >BLT is created with all nodes presented in the cluster at
> > the
> > > > > moment
> > > > > > > of
> > > > > > > >activation, no API is needed;
> > > > > > > >
> > > > > > > >2. user prepares BLT using web-console or visor CMD tools
> > and
> > > > sets
> > > > > > it
> > > > > > > to
> > > > > > > >the cluster. New API setter is needed:
> > > > > > > >Ignite.activation().setInitialActivationNodes(Collection<
> > > > > > ClusterNode>
> > > > > > > >nodes);
> > > > > 

Re: Monitoring of active transactions

2017-10-05 Thread Ilya Lantukh
On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncha...@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We
should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest
and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?

I agree with this approach.

On Thu, Oct 5, 2017 at 1:24 PM, Alexey Goncharuk <alexey.goncha...@gmail.com
> wrote:

> Guys,
>
> I think we should not limit this functionality to http-rest only. We should
> add this information to one of the MBeans as the primary information
> source. Then this should be added as a client command both to http-rest and
> binary-rest endpoints, so the information is available through different
> tools.
>
> Thoughts?
>
> 2017-09-28 13:35 GMT+03:00 Sasha Belyak <rtsfo...@gmail.com>:
>
> > It's very useful, but I often we need to get list of hang transaction
> when
> > exchange stopped by some reason and in this case utility, based on client
> > node won't help. I rewrite it as ComputeTask with default constructor,
> add
> > jar into ignite libs, activate http rest api and now it can be used from
> > console and no need to update cluster topology:
> >
> > curl '
> > http://127.0.0.1:8080/ignite?cmd=exe=org.apache.ignite.txviewer.
> > RestCollectTxInfoTask=false=100
> > '
> >
> > {"successStatus":0,"sessionToken":null,"error":null,"response":{"id":"~
> > 98391a83-3d76-4e5e-b0c3-185cf2bd4acd","finished":true,
> > "error":null,"result":[{"nearXidVersion":"GridCacheVersion
> > [topVer=118063514, order=1506583525449,
> > nodeOrder=2]","nodeId":"baa0237e-707c-4b69-abb6-
> > 555a2fc17762","nodeString":"TcpDiscoveryNode
> > [id=baa0237e-707c-4b69-abb6-555a2fc17762, addrs=[0:0:0:0:0:0:0:1%1,
> > 10.0.3.1, 10.38.176.253, 10.42.1.107, 127.0.0.1, 172.17.0.1],
> sockAddrs=[/
> > 127.0.0.1:0, /10.42.1.107:0, /0:0:0:0:0:0:0:1%1:0, /172.17.0.1:0, /
> > 10.38.176.253:0, /10.0.3.1:0], discPort=0, order=2, intOrder=0,
> > lastExchangeTime=1506583525683, loc=true, ver=2.1.5#20170922-sha1:
> > 6452201d,
> > isClient=true]","threadId":1,"startTime":"Thu Sep 28 14:25:27 NOVT
> > 2017","entries":[{"cache":"txCache","key":"1","value":"1"
> > ,"operation":"CREATE"}]}]}}
> >
> > Even better if this tool can use binary rest too. I mean that we should
> be
> > able to run this collecting task from:
> > 1) http rest api by curl/wget (but must deploy class somehow before, by
> > peerClassLoad with Continues mode or by adding it to application
> classpash)
> > Good for admin's console scripting.
> > 2) binary rest api by some java tool (with instant peerClassLoading).
> Good
> > for investigation on any grid configuration.
> > 3) maybe, by client node as it implemented now (can't see any adwantages)
> >
> >
> > 2017-09-16 5:35 GMT+07:00 Dmitry Pavlov <dpavlov@gmail.com>:
> >
> > > Hi Ilya,
> > >
> > > I can help with including this utility into build/release, I've recenty
> > > finished same steps for PDS WAL analysing tool for converting records
> to
> > > human readable format.
> > > Please feel free to contact me.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 15 сент. 2017 г. в 6:37, Dmitriy Setrakyan <dsetrak...@apache.org
> >:
> > >
> > > > It seems that the community (including me) really would like to see
> > this
> > > > feature in Ignite.
> > > >
> > > > Ilya, can you create a ticket and submit it for review?
> > > >
> > > > D.
> > > >
> > > > On Fri, Sep 8, 2017 at 7:15 AM, Anton Vinogradov <a...@apache.org>
> > wrote:
> > > >
> > > > > Ilya,
> > > > >
> > > > > We extremely need this!
> > > > >
> > > > > Txs and Locks info should be collected on each cluster hang.
> > > > > We already have an issue related to this problem -
> > > > > https://issues.apache.org/jira/browse/IGNITE-4937
> > > > >
> > > > > Nikolay,
> > > > >
> > > > > Good point,
> > > > >

[jira] [Created] (IGNITE-6384) Introduce API to monitor pending locks and transactions

2017-09-14 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6384:


 Summary: Introduce API to monitor pending locks and transactions
 Key: IGNITE-6384
 URL: https://issues.apache.org/jira/browse/IGNITE-6384
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6340) Attempting to create the same cache from multiple threads leads to creation of client cache on server node

2017-09-11 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6340:


 Summary: Attempting to create the same cache from multiple threads 
leads to creation of client cache on server node
 Key: IGNITE-6340
 URL: https://issues.apache.org/jira/browse/IGNITE-6340
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Monitoring of active transactions

2017-09-08 Thread Ilya Lantukh
Igniters,

According to our current design and implementation, unclosed transaction or
unreleased lock can hang ignite cluster forever. This is logical, and with
correct usage of those mechanics such issue should never happen, in real
world developers can make mistakes and leave transaction open. We have a
feature "transaction timeout", but turns out it doesn't work in all cases
(see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all known
issues are fixed, there is still a lot of room for mistake and incorrect
usage.

To make it possible for Ignite users to discover such problem and trace it
to a particular part of code, I've created a very simple utility that
collects and prints information about long running transactions for the
whole cluster. It is available here:
https://github.com/ilantukh/IgniteTxViewer.

One might expect such monitoring utilities to be included in Ignite
codebase. Personally, I think that such information should be available
from public API, without using of additional applications or diving into
Ignite internals.

What do you think?

-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-6254) Assertion error in IgniteTxHandler.processDhtTxFinishRequest(...)

2017-09-04 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6254:


 Summary: Assertion error in 
IgniteTxHandler.processDhtTxFinishRequest(...)
 Key: IGNITE-6254
 URL: https://issues.apache.org/jira/browse/IGNITE-6254
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh
 Fix For: 2.3


AssertionError is thrown in the end of the method:
{noformat}
assert req.txState() != null || (ctx.tm().tx(req.version()) == null && 
ctx.tm().nearTx(req.version()) == null);
{noformat}
This could happen only if results of calls to IgniteTxManager changed after 
method execution started. We should re-use already aquired values:
- replace ctx.tm().tx(...) call with dhtTx;
- replace ctx.tml().nearTx(...) call with nearTx.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6232) Introduce WAL records for partition map exchange start/finish

2017-08-31 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6232:


 Summary: Introduce WAL records for partition map exchange 
start/finish
 Key: IGNITE-6232
 URL: https://issues.apache.org/jira/browse/IGNITE-6232
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


These events mean that all pending cache operations have been finished and so 
they can be used to optimize WAL replay in certain scenarios.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6212) Assertion error: Invalid node2part

2017-08-29 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6212:


 Summary: Assertion error: Invalid node2part
 Key: IGNITE-6212
 URL: https://issues.apache.org/jira/browse/IGNITE-6212
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh
Priority: Critical
 Fix For: 2.2


Reproduced by IgniteServiceDynamicCachesSelfTest with ~10% probability. Leads 
to hang-up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Service versioning

2017-08-18 Thread Ilya Lantukh
Sergey,

I think the ticket covers single atomic piece of functionality.

I'd prefer to avoid to separately address flaws in current service upgrade
process because this process had never been designed, implemented and
supported. Yes, in current ServiceGrid implementation you can (technically)
replace existing service with newer implementation, and there are a number
of potential problems you might encounter while doing so. But it doesn't
make much sense to me to fix flaws in functionality that officially doesn't
exist. Those issues should be addressed in the scope of service versioning.

Use-cases that you described are interesting and we should try to support
them, but I don't think that we should implement functionality for all
possible cases in single step. I'll think how we can technically support
them in Ignite and I'm open to suggestions from you (and anyone else).

I don't think I understand your concept of ServiceRegistry in the same way
you do. Could you please describe it in more detail?

Thanks.

On Wed, Aug 16, 2017 at 6:06 PM, Sergey Chugunov <sergey.chugu...@gmail.com>
wrote:

> Ilya,
>
> I think the proposed change definitely makes sense.
>
> At the same time I think the scope of the ticket is too broad. It covers at
> least two different aspects of service grid functionality:
>
>- Very specific issue of service upgrade process which is caused by
>current design flaw (although it's not a bug).
>- Very high-level suggestions about improving upgrade process in
> general.
>
>
> I think it is better to separate these things. Lets firstly address item#1
> and then reiterate on #2.
>
> Managing services deployment in distributed system is a non-trivial task
> with a lot of options and different requirements; all changes and
> improvements must be designed thoughtfully.
>
> E.g. for the requirement from the ticket that all services under the same
> name must be of the same version: what if a user wants to test a new
> implementation transparently to the clients of the cluster?
> It would be convenient for that user to be able to redeploy only one
> instance of the service to the newer version (without touching other
> instances) and see if any client reports a problem.
>
> Also I'm familiar with an example when a system with service-like entities
> connected to it supported both upgrades and downgrades: it was possible to
> connect new versions of services to the "parent" system, send some work to
> them but in case of any issues with newer version it was always possible to
> quickly fall back to the previous one known to be stable.
> This use case isn't covered in your ticket at all although I think it is
> kind of obvious feature.
>
> From my point of view Ignite just doesn't have necessary internal
> abstractions to support rich service functionality with managing versions,
> upgrades/downgrades and so on.
> Something like distributed ServiceRegistry concept with managing API to the
> user. We don't have anything like this now.
>
> So I think we should focus on addressing existing issue described in item#1
> and start a new discussion about ways to improve our service grid
> capabilities in a separate thread.
> Item#2 will be addressed automatically as part of that discussion.
>
> Makes sense?
>
> Thanks,
> Sergey.
>
> On Wed, Aug 16, 2017 at 2:22 AM, Denis Magda <dma...@apache.org> wrote:
>
> > Ilya,
> >
> > Every Service Grid user will welcome this feature. Fully for it.
> >
> > Mentioned in the ticket that it has to be documented before the ticket is
> > closed.
> >
> > —
> > Denis
> >
> > > On Aug 15, 2017, at 8:18 AM, Ilya Lantukh <ilant...@gridgain.com>
> wrote:
> > >
> > > Igniters,
> > >
> > > Our current ServiceGrid implementation lacks several important
> features,
> > > one of which is ability to upgrade Service without downtime.
> > >
> > > I've created ticket to add this feature:
> > > https://issues.apache.org/jira/browse/IGNITE-6069.
> > >
> > > To implement it, the following problems must be solved:
> > > - currently we can store only one BinaryMetadata instance for given
> > > classname - if you change Service class implementation and then try to
> > > deploy it, you might get an exception like
> > > "org.apache.ignite.binary.BinaryObjectException: Binary type has
> > different
> > > field types". It can be solved by making (className, version) unique
> > > metadata identifier. We can use serialVersionUuid as version.
> > > - ServiceProxy must be aware of actual version and know if service is
> > > currently being upgraded. I

[jira] [Created] (IGNITE-6097) GridCacheDatabaseSharedManager.restorePartitionState(...) should not access PageMemory directly

2017-08-17 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6097:


 Summary: GridCacheDatabaseSharedManager.restorePartitionState(...) 
should not access PageMemory directly
 Key: IGNITE-6097
 URL: https://issues.apache.org/jira/browse/IGNITE-6097
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh
 Fix For: 2.2


It breaks encapsulation and leads to subtle non-trivial problems (like deadlock 
in IGNITE-6096). We should update state by calling methods of 
GridDhtLocalPartition instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6096) Race between partition eviction and re-creation

2017-08-17 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6096:


 Summary: Race between partition eviction and re-creation
 Key: IGNITE-6096
 URL: https://issues.apache.org/jira/browse/IGNITE-6096
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh
 Fix For: 2.2


There are a number of cases that aren't handled correctly, leading to assertion 
errors, grid hang-ups and data loss:
- PageMemoryImpl.refreshOutdatedPage(...) - if refreshed page is currently 
scheduled for checkpoint, it will be stored if FileStore filled with zeroes. 
Reading this page later will fail.
- GridCacheDatabaseSharedManager.restorePartitionState(...) - 
grp.offheap().onPartitionInitialCounterUpdated(...) is called under meta page 
write lock. If DataStore requires initialization, it will try to aquire write 
lock for meta page again and hang up.
- GridDhtPartitionTopologyImpl.createPartition(...) and .localPartition0(...) - 
if partition is present but has EVICTED state, we will try to create new 
partition instance. However, DataStore for old partition might still be 
present, and we will get AssertionError when we attempt to create new DataStore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Service versioning

2017-08-15 Thread Ilya Lantukh
Igniters,

Our current ServiceGrid implementation lacks several important features,
one of which is ability to upgrade Service without downtime.

I've created ticket to add this feature:
https://issues.apache.org/jira/browse/IGNITE-6069.

To implement it, the following problems must be solved:
- currently we can store only one BinaryMetadata instance for given
classname - if you change Service class implementation and then try to
deploy it, you might get an exception like
"org.apache.ignite.binary.BinaryObjectException: Binary type has different
field types". It can be solved by making (className, version) unique
metadata identifier. We can use serialVersionUuid as version.
- ServiceProxy must be aware of actual version and know if service is
currently being upgraded. If user tries to execute any method during
upgrade, proxy should implicitly wait until new version is ready. We
already have similar logic in IgniteCacheProxyImpl.
- To effectively and conveniently migrate service state to newer version,
we should add method to Service interface that will be executed before
start if older version was present. To keep API compatible, I suggest to
introduce interface UpgradableService (extends Service).

Any critique and suggestions are welcome.

-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-6069) Service versioning

2017-08-15 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6069:


 Summary: Service versioning
 Key: IGNITE-6069
 URL: https://issues.apache.org/jira/browse/IGNITE-6069
 Project: Ignite
  Issue Type: New Feature
  Components: general
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6010) ZookeeperIpFinderTest.testFourNodesKillRestartZookeeper fails sometimes

2017-08-09 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6010:


 Summary: ZookeeperIpFinderTest.testFourNodesKillRestartZookeeper 
fails sometimes
 Key: IGNITE-6010
 URL: https://issues.apache.org/jira/browse/IGNITE-6010
 Project: Ignite
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 2.1
Reporter: Ilya Lantukh
 Fix For: 2.2


{noformat}
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at junit.framework.TestCase.assertTrue(TestCase.java:201)
at 
org.apache.ignite.spi.discovery.tcp.ipfinder.zk.ZookeeperIpFinderTest.testFourNodesKillRestartZookeeper(ZookeeperIpFinderTest.java:365)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6009) CacheExamplesSelfTest.testCacheSemaphoreExample fails sometimes due to timeout

2017-08-09 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6009:


 Summary: CacheExamplesSelfTest.testCacheSemaphoreExample fails 
sometimes due to timeout
 Key: IGNITE-6009
 URL: https://issues.apache.org/jira/browse/IGNITE-6009
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
 Fix For: 2.2






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6008) CacheRemoveAllSelfTest.testRemoveAll fails sometimes

2017-08-09 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6008:


 Summary: CacheRemoveAllSelfTest.testRemoveAll fails sometimes
 Key: IGNITE-6008
 URL: https://issues.apache.org/jira/browse/IGNITE-6008
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
 Fix For: 2.2


{noformat}
[2017-08-09 01:18:15,172][ERROR][main][root] Test failed.
junit.framework.AssertionFailedError: Local size: 58
On heap: 58
Off heap: 58
Primary: 58
Backup: 14 expected:<0> but was:<58>
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.failNotEquals(Assert.java:329)
at junit.framework.Assert.assertEquals(Assert.java:78)
at junit.framework.Assert.assertEquals(Assert.java:234)
at junit.framework.TestCase.assertEquals(TestCase.java:401)
at 
org.apache.ignite.internal.processors.cache.CacheRemoveAllSelfTest.testRemoveAll(CacheRemoveAllSelfTest.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6007) GridCacheBinaryObjectMetadataExchangeMultinodeTest.testSequentialUpdatesNoConflicts fails sometimes

2017-08-09 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-6007:


 Summary: 
GridCacheBinaryObjectMetadataExchangeMultinodeTest.testSequentialUpdatesNoConflicts
 fails sometimes
 Key: IGNITE-6007
 URL: https://issues.apache.org/jira/browse/IGNITE-6007
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh
 Fix For: 2.2


{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.binary.GridCacheBinaryObjectMetadataExchangeMultinodeTest.testSequentialUpdatesNoConflicts(GridCacheBinaryObjectMetadataExchangeMultinodeTest.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5990) IgniteCachePutAllRestartTest.testStopOriginatingNode fails on TC sometimes

2017-08-08 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5990:


 Summary:  IgniteCachePutAllRestartTest.testStopOriginatingNode 
fails on TC sometimes
 Key: IGNITE-5990
 URL: https://issues.apache.org/jira/browse/IGNITE-5990
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh


Could not reproduce locally. Test fails due to timeout. Thread dump attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5962) Increase max length of index name

2017-08-07 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5962:


 Summary: Increase max length of index name
 Key: IGNITE-5962
 URL: https://issues.apache.org/jira/browse/IGNITE-5962
 Project: Ignite
  Issue Type: Improvement
  Components: general, sql
Affects Versions: 2.1
Reporter: Ilya Lantukh


In https://issues.apache.org/jira/browse/IGNITE-5941 max index name length was 
reduced from 768 to 256 bytes. If we need to support longer names, we need to 
change format of metastore data pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5794) BinaryMetadata is lost if all server nodes have been restarted

2017-07-20 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5794:


 Summary: BinaryMetadata is lost if all server nodes have been 
restarted
 Key: IGNITE-5794
 URL: https://issues.apache.org/jira/browse/IGNITE-5794
 Project: Ignite
  Issue Type: Bug
  Components: binary, general
Affects Versions: 2.1
Reporter: Ilya Lantukh


Steps to reproduce are described here: 
http://apache-ignite-users.70518.x6.nabble.com/Problem-with-Messages-after-client-reconnect-tt15127.html

This problem occurs because client has metadata in it's local metadata cache 
and decides that server node should have it too. But server lost it after 
restart.

Possible fixes:
- Make client nodes re-send all contents of local metadata cache after 
reconnect.
- If node doesn't have metadata to process incoming message, it should request 
it from sender node.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5694) IgniteAtomicLongChangingTopologySelfTest.testClientQueueCreateCloseFailover() fails sometimes

2017-07-04 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5694:


 Summary: 
IgniteAtomicLongChangingTopologySelfTest.testClientQueueCreateCloseFailover() 
fails sometimes
 Key: IGNITE-5694
 URL: https://issues.apache.org/jira/browse/IGNITE-5694
 Project: Ignite
  Issue Type: Bug
  Components: data structures
Affects Versions: 2.1
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-5690) JavaEmbeddedIgniteRDDSelfTest sometimes hangs on TC

2017-07-04 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5690:


 Summary: JavaEmbeddedIgniteRDDSelfTest sometimes hangs on TC
 Key: IGNITE-5690
 URL: https://issues.apache.org/jira/browse/IGNITE-5690
 Project: Ignite
  Issue Type: Bug
  Components: spark
Affects Versions: 2.1
Reporter: Ilya Lantukh


Test hangs when IgniteContext.close(...) is called:
{noformat}
[12:20:47]W: [org.apache.ignite:ignite-spark] Thread 
[name="test-runner-#1%spark.JavaEmbeddedIgniteRDDSelfTest%", id=77, 
state=WAITING, blockCnt=59, waitCnt=10012]
[12:20:47]W: [org.apache.ignite:ignite-spark] Lock 
[object=scala.concurrent.impl.Promise$CompletionLatch@77d938ab, ownerName=null, 
ownerId=-1]
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
sun.misc.Unsafe.park(Native Method)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
o.a.i.spark.IgniteContext.close(IgniteContext.scala:168)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
o.a.i.spark.JavaIgniteContext.close(JavaIgniteContext.scala:68)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
o.a.i.spark.JavaEmbeddedIgniteRDDSelfTest.testReadDataFromIgnite(JavaEmbeddedIgniteRDDSelfTest.java:181)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
java.lang.reflect.Method.invoke(Method.java:606)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
junit.framework.TestCase.runTest(TestCase.java:176)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1997)
[12:20:47]W: [org.apache.ignite:ignite-spark] at 
o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
[

[jira] [Created] (IGNITE-5689) IgniteCommunicationBalanceTest hangs on Windows

2017-07-04 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5689:


 Summary: IgniteCommunicationBalanceTest hangs on Windows
 Key: IGNITE-5689
 URL: https://issues.apache.org/jira/browse/IGNITE-5689
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Ilya Lantukh


Test runner is waiting for ComputeTask to finish:
{noformat}
"test-runner-#40602%communication.IgniteCommunicationBalanceMultipleConnectionsTest%"
 prio=6 tid=0x766d6000 nid=0x7934 waiting on condition 
[0xeacfe000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:315)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
at 
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
at 
org.apache.ignite.internal.IgniteComputeImpl.call(IgniteComputeImpl.java:785)
at 
org.apache.ignite.internal.managers.communication.IgniteCommunicationBalanceTest$1.apply(IgniteCommunicationBalanceTest.java:158)
at 
org.apache.ignite.testframework.GridTestUtils.waitForCondition(GridTestUtils.java:1603)
at 
org.apache.ignite.internal.managers.communication.IgniteCommunicationBalanceTest.testBalance1(IgniteCommunicationBalanceTest.java:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1997)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1912)
at java.lang.Thread.run(Thread.java:745)
{noformat}

While thread that tried to send job result got stuck in 
IgniteUtils.reachable(...):
{noformat}
"pub-#40657%communication.IgniteCommunicationBalanceMultipleConnectionsTest5%" 
prio=6 tid=0x11b8 nid=0x3868 runnable [0xeecfe000]
   java.lang.Thread.State: RUNNABLE
at java.net.Inet4AddressImpl.isReachable0(Native Method)
at java.net.Inet4AddressImpl.isReachable(Inet4AddressImpl.java:70)
at java.net.InetAddress.isReachable(InetAddress.java:475)
at java.net.InetAddress.isReachable(InetAddress.java:434)
at 
org.apache.ignite.internal.util.IgniteUtils.reachable(IgniteUtils.java:2113)
at 
org.apache.ignite.internal.util.IgniteUtils.filterReachable(IgniteUtils.java:1877)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2955)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2763)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2655)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2516)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2480)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:928)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.finishJob(GridJobWorker.java:773)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:625)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:489)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1181)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1908)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(Gr

[jira] [Created] (IGNITE-5529) Assertion Error in GridDhtPartitionsExchangeFuture : msg == null

2017-06-17 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5529:


 Summary: Assertion Error in GridDhtPartitionsExchangeFuture : msg 
== null
 Key: IGNITE-5529
 URL: https://issues.apache.org/jira/browse/IGNITE-5529
 Project: Ignite
  Issue Type: Bug
  Components: general
Reporter: Ilya Lantukh
 Fix For: 2.1


{noformat}
[14:56:31]W: [org.apache.ignite:ignite-core] 
java.lang.AssertionError: GridDhtPartitionsExchangeFuture [dummy=false, 
forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent 
[customMsg=null, affTopVer=AffinityTopologyVersion [topVer=18, minorTopVer=2], 
super=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=48b4ad89-eb6c-4d5b-85b5-c964fd33, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47503], discPort=47503, order=8, intOrder=6, 
lastExchangeTime=1497700591967, loc=true, ver=2.1.0#19700101-sha1:, 
isClient=false], topVer=18, nodeId8=48b4ad89, msg=null, 
type=DISCOVERY_CUSTOM_EVT, tstamp=1497700591967]], crd=TcpDiscoveryNode 
[id=48b4ad89-eb6c-4d5b-85b5-c964fd33, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47503], discPort=47503, order=8, intOrder=6, 
lastExchangeTime=1497700591967, loc=true, ver=2.1.0#19700101-sha1:, 
isClient=false], exchId=GridDhtPartitionExchangeId 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=2], nodeId=48b4ad89, 
evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter 
[ignoreInterrupts=false, state=DONE, res=false, hash=79503565], init=false, 
lastVer=null, partReleaseFut=null, exchActions=null, affChangeMsg=null, 
skipPreload=false, clientOnlyExchange=false, initTs=1497700591967, 
centralizedAff=false, changeGlobalStateE=null, 
exchangeOnChangeGlobalState=false, forcedRebFut=null, done=true, evtLatch=0, 
remaining=[20d897e3-729f-404e-a4b5-56513711, 
afc48273-8741-48b5-a1c9-e9c011b0, 38c88372-509a-45b2-9b0d-39ccc012], 
super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=class 
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is stopping: 
cache.IgniteCacheCreateRestartSelfTest3, hash=551729273]]
[14:56:31]W: [org.apache.ignite:ignite-core]at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:570)
[14:56:31]W: [org.apache.ignite:ignite-core]at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1858)
[14:56:31]W: [org.apache.ignite:ignite-core]at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[14:56:31]W: [org.apache.ignite:ignite-core]at 
java.lang.Thread.run(Thread.java:745)
{noformat}

IgniteCacheCreateRestartSelfTest reproduces this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: contention on DataStructure creation/removing

2017-06-01 Thread Ilya Lantukh
Alexey,

I think, that from user's perspective data structures look more like a
library that works on top of caches, not something completely separate and
independent. Most properties from CacheConfiguration make sense for Atomics
and Collections (for example, properties related to rebalancing). We can
always validate cache config in methods that create data structure and
throw exception if it is incorrect.

But simply adding groupName to AtomicConfiguration is fine too.

On Thu, Jun 1, 2017 at 5:16 PM, Yakov Zhdanov <yzhda...@apache.org> wrote:

> Agree with Alex.
>
> --Yakov
>
> 2017-06-01 17:10 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:
>
>> I do not like this change - we intentionally separated a few properties
>> in AtomicConfiguration that make sense for Atomics, there is not need to
>> get back to cache configuration again. In my understanding, we only need to
>> add groupName to Atomics and Collection configuration.
>>
>> Thoughts?
>>
>> 2017-06-01 16:32 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
>>
>>> This is how I see API to create data structures in user-defined caches:
>>> https://github.com/apache/ignite/pull/2058
>>>
>>> On Thu, Jun 1, 2017 at 2:56 PM, Ilya Lantukh <ilant...@gridgain.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> There are other problems with current data structures implementation,
>>>> which are related to new persistence mechanics. For example, take a look at
>>>> this ticket: https://issues.apache.org/jira/browse/IGNITE-5324
>>>> I think the best approach is to store data structures not in special
>>>> system cache, but in user defined ones. All API methods to access data
>>>> structures will have cacheName parameter, and unique identifier will be a
>>>> pair (cacheName, dsName). In this case we won't need a single place to
>>>> store all data structure metadata.
>>>> What do you think?
>>>>
>>>>
>>>> On Thu, Jun 1, 2017 at 1:32 PM, Mikhail Cherkasov <
>>>> mcherka...@gridgain.com> wrote:
>>>>
>>>>> On Thu, Jun 1, 2017 at 1:29 AM, Dmitriy Setrakyan <
>>>>> dsetrak...@apache.org>
>>>>> wrote:
>>>>>
>>>>> > Won't it be confusing from a user stand point to have multiple data
>>>>> > structures with the same name?
>>>>> >
>>>>>
>>>>> I won't change this, optionally I can allow to have the same name for
>>>>> different data structures' types,
>>>>> but it's better to keep single namespace for all data structures as
>>>>> it's
>>>>> implemented now.
>>>>>
>>>>>
>>>>> > Also, what is the performance impact of this
>>>>> > change?
>>>>> >
>>>>>
>>>>> I'm working on a benchmark.
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Mikhail.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Ilya
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ilya
>>>
>>
>>
>


-- 
Best regards,
Ilya


Re: contention on DataStructure creation/removing

2017-06-01 Thread Ilya Lantukh
This is how I see API to create data structures in user-defined caches:
https://github.com/apache/ignite/pull/2058

On Thu, Jun 1, 2017 at 2:56 PM, Ilya Lantukh <ilant...@gridgain.com> wrote:

> Hi all,
>
> There are other problems with current data structures implementation,
> which are related to new persistence mechanics. For example, take a look at
> this ticket: https://issues.apache.org/jira/browse/IGNITE-5324
> I think the best approach is to store data structures not in special
> system cache, but in user defined ones. All API methods to access data
> structures will have cacheName parameter, and unique identifier will be a
> pair (cacheName, dsName). In this case we won't need a single place to
> store all data structure metadata.
> What do you think?
>
>
> On Thu, Jun 1, 2017 at 1:32 PM, Mikhail Cherkasov <mcherka...@gridgain.com
> > wrote:
>
>> On Thu, Jun 1, 2017 at 1:29 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
>> wrote:
>>
>> > Won't it be confusing from a user stand point to have multiple data
>> > structures with the same name?
>> >
>>
>> I won't change this, optionally I can allow to have the same name for
>> different data structures' types,
>> but it's better to keep single namespace for all data structures as it's
>> implemented now.
>>
>>
>> > Also, what is the performance impact of this
>> > change?
>> >
>>
>> I'm working on a benchmark.
>>
>>
>> --
>> Thanks,
>> Mikhail.
>>
>
>
>
> --
> Best regards,
> Ilya
>



-- 
Best regards,
Ilya


Re: contention on DataStructure creation/removing

2017-06-01 Thread Ilya Lantukh
Hi all,

There are other problems with current data structures implementation, which
are related to new persistence mechanics. For example, take a look at this
ticket: https://issues.apache.org/jira/browse/IGNITE-5324
I think the best approach is to store data structures not in special system
cache, but in user defined ones. All API methods to access data structures
will have cacheName parameter, and unique identifier will be a pair
(cacheName, dsName). In this case we won't need a single place to store all
data structure metadata.
What do you think?


On Thu, Jun 1, 2017 at 1:32 PM, Mikhail Cherkasov 
wrote:

> On Thu, Jun 1, 2017 at 1:29 AM, Dmitriy Setrakyan 
> wrote:
>
> > Won't it be confusing from a user stand point to have multiple data
> > structures with the same name?
> >
>
> I won't change this, optionally I can allow to have the same name for
> different data structures' types,
> but it's better to keep single namespace for all data structures as it's
> implemented now.
>
>
> > Also, what is the performance impact of this
> > change?
> >
>
> I'm working on a benchmark.
>
>
> --
> Thanks,
> Mikhail.
>



-- 
Best regards,
Ilya


Re: Message.directType

2017-05-19 Thread Ilya Lantukh
Yury,

All message type IDs are listed in GridIoMessageFactory#create(...) method
in switch block. You can check there which IDs are still free.

On Fri, May 19, 2017 at 6:44 PM, Yury Babak  wrote:

> Hi all!
>
> I want to create new custom implementation of
> org.apache.ignite.plugin.extensions.communication.Message and I want avoid
> conflicts in message type. So did we have the list of all reserved types?
>
> Thanks,
> Yury
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Message-directType-tp17828.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>



-- 
Best regards,
Ilya


[jira] [Created] (IGNITE-5068) Redesign usage of GridDhtPartitionTopologyImpl.part2node map to store only diff from affinity assignment

2017-04-24 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-5068:


 Summary: Redesign usage of GridDhtPartitionTopologyImpl.part2node 
map to store only diff from affinity assignment
 Key: IGNITE-5068
 URL: https://issues.apache.org/jira/browse/IGNITE-5068
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


This map can become very huge on large topologies, and rebuilding it on each 
update is also costly. Some beneficial changes were made in the scope of 
IGNITE-4626, but further improvement requires complete redesign.
This map always stores affinity nodes + some additional "temporary owners". 
Those owners are only needed to complete rebalancing and they will evict 
partition when rebalancing is finished. It seems that storing only those 
non-affinity owners can greatly reduce memory required by this map (it will be 
empty on stable topology) and effort needed to keep it consistent with 
node2part.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: IGNITE-4535 : Add option to store deserialized values on-heap

2017-04-05 Thread Ilya Lantukh
Dmitry,

1. Setting an eviction policy should not be a mechanism to enable the
> on-heap cache. We already have eviction policies off-heap as well, and they
> don't enable anything. On top of that, the eviction policy should not be a
> requirement for the on-heap cache. User should still be able to enable the
> on-heap cache, even if it grows indefinitely without evictions. We should
> have a more intuitive flag here.


It doesn't make any sence to me to enable on-heap cache without evictions:
it will result in having all data both in on-heap and off-heap memory. If
we want to support such use case, we should implement a separate on-heap
only mode with page memory disabled.


On Tue, Apr 4, 2017 at 7:15 PM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> Ilya, I looked at the Semyon's comments in the ticket, and I think I agree
> with him on all counts.
>
> 1. Setting an eviction policy should not be a mechanism to enable the
> on-heap cache. We already have eviction policies off-heap as well, and they
> don't enable anything. On top of that, the eviction policy should not be a
> requirement for the on-heap cache. User should still be able to enable the
> on-heap cache, even if it grows indefinitely without evictions. We should
> have a more intuitive flag here.
>
> 2. As far as the tests go, they should examine all the tests and adapt them
> to the new behavior. I think we should have a replica of all off-heap tests
> to test the scenario with on-heap caches.
>
> D.
>
> On Tue, Apr 4, 2017 at 8:12 AM, Ilya Lantukh <ilant...@gridgain.com>
> wrote:
>
> > Hi Igniters,
> >
> > Since review of IGNITE-4535
> > <https://issues.apache.org/jira/browse/IGNITE-4535> implementation
> caused
> > some misunderstandings, I'd like to open a discussion here and see if
> > everyone agrees with the chosen approach or can suggest a better one.
> >
> > We are going to re-use existing EvictionPolicy mechanics to decide when
> > entry is going to be evicted from on-heap cache. If evictionPolicy ==
> null,
> > we assume that there is no on-heap cache. One of suggested alternatives
> was
> > to have a separate boolean parameter that will enable on-heap cache.
> >
> > Another questionable decision was to remove tests for memory mode
> > variations. For example, we had GridCacheContinuousQueryAtomicSelfTest,
> > GridCacheContinuousQueryAtomicOffheapTieredSelfTest and
> > GridCacheContinuousQueryAtomicOffheapValuesSelfTest that were testing
> the
> > same functionallity for ONHEAP_TIERED, OFFHEAP_TIERED and OFFHEAP_VALUES
> > modes, respectively. Since those memory modes were removed, only
> > GridCacheContinuousQueryAtomicSelfTest was left and it now runs in
> > off-heap
> > mode without on-heap cache. One of suggestions was to add a new subclass
> to
> > this test (and all other tests) that will run the same test case with
> > on-heap cache enabled. In my opinion, functionallity that is specific for
> > on-heap cache should be tested in completely separate tests (which we
> > already have), and there is no need to run generic tests with every
> > possible configuration.
> >
> > What do you think?
> >
> > --
> > Best regards,
> > Ilya
> >
>



-- 
Best regards,
Ilya


IGNITE-4535 : Add option to store deserialized values on-heap

2017-04-04 Thread Ilya Lantukh
Hi Igniters,

Since review of IGNITE-4535
 implementation caused
some misunderstandings, I'd like to open a discussion here and see if
everyone agrees with the chosen approach or can suggest a better one.

We are going to re-use existing EvictionPolicy mechanics to decide when
entry is going to be evicted from on-heap cache. If evictionPolicy == null,
we assume that there is no on-heap cache. One of suggested alternatives was
to have a separate boolean parameter that will enable on-heap cache.

Another questionable decision was to remove tests for memory mode
variations. For example, we had GridCacheContinuousQueryAtomicSelfTest,
GridCacheContinuousQueryAtomicOffheapTieredSelfTest and
GridCacheContinuousQueryAtomicOffheapValuesSelfTest that were testing the
same functionallity for ONHEAP_TIERED, OFFHEAP_TIERED and OFFHEAP_VALUES
modes, respectively. Since those memory modes were removed, only
GridCacheContinuousQueryAtomicSelfTest was left and it now runs in off-heap
mode without on-heap cache. One of suggestions was to add a new subclass to
this test (and all other tests) that will run the same test case with
on-heap cache enabled. In my opinion, functionallity that is specific for
on-heap cache should be tested in completely separate tests (which we
already have), and there is no need to run generic tests with every
possible configuration.

What do you think?

-- 
Best regards,
Ilya


Re: IGNITE-4536 metrics of new offheap storage

2017-03-17 Thread Ilya Lantukh
Denis, Sergey,

Changes in https://issues.apache.org/jira/browse/IGNITE-4535 won't be so
global. I am not going to replace older mechanics, but rather re-enable and
adjust them to work with PageMemory. We will still use the same distributed
hash table and evict entries based on existing EvictionPolicy API. I think
'Dht' methods, like getDhtEvictQueueCurrentSize(), are still relevant and
important.

On Fri, Mar 17, 2017 at 12:08 PM, Sergey Chugunov  wrote:

> Dmitriy,
>
> My main goal was to add a metric to estimate FreeList space fragmentation
> and "hist" was the first thing I came up with.
>
> Let's consider one case: we placed into a cache 4 entities 60% page size
> each.
> After that we'll have 4 pages in FreeList each with a hole of 40% of it's
> size.
> Utilization of FreeList will be 60% but with big fragmentation.
>
> Let's consider another case: we have added and removed a bunch of entries
> much smaller than a page. After that we have two pages 90% full, one page
> 50% full and one page 10% full.
> Utilization of FreeList is 60% again, very simple math, but fragmentation
> is much smaller.
>
> So, when we calculate only a simple average we lose a lot of information;
> and this information may be very useful to make a decision about best page
> size configuration.
>
> Thanks,
> Sergey.
>
>
> On Thu, Mar 16, 2017 at 10:22 PM, Dmitriy Setrakyan  >
> wrote:
>
> > As far as the percentage of the free page space, why do we need to
> provide
> > 3 ranges: 0 -> 16, 16 -> 32, 32 -> 64, etc? Why not just provide average
> > free bytes percentage as one value?
> >
> > Am I misunderstanding something?
> >
> > On Thu, Mar 16, 2017 at 11:04 AM, Denis Magda  wrote:
> >
> > > Sergey,
> > >
> > > Considering that the swap tier will no longer be supported in 2.0 all
> the
> > > methods that start with ‘getSwap…’ are no longer relevant and have to
> be
> > > removed from metrics. For instance, the swap functionality has already
> > been
> > > wiped out from .NET:
> > > https://issues.apache.org/jira/browse/IGNITE-4736
> > >
> > > Next, I’m also confused with the metrics that include ‘Dht’ in its
> name.
> > > The on-heap tier we have in 1.x will be replaced with on-heap cache:
> > > https://issues.apache.org/jira/browse/IGNITE-4535 <
> > > https://issues.apache.org/jira/browse/IGNITE-4535>
> > > Does it me that ‘Dht’ methods are still relevant or they need to be
> > > replaced with something more meaningful? *Alex G.*, please chime in.
> > >
> > > Finally, personally I don’t like the API for these 3 methods
> > >
> > > >
> > > >public float getPagesPercentage_8_16_freeBytes();
> > > >public float getPagesPercentage_16_64_freeBytes();
> > > >public float getPagesPercentage_64_256_freeBytes();
> > >
> > > Wouldn’t it better to have a single method like this?
> > >
> > > public float[] getPagesFreeBytesPercentage();
> > >
> > > where
> > >
> > > float[0] - 0 to 16 free bytes.
> > > float[1] - 16 to 32 free bytes.
> > > float[2] - 32 to 64 free bytes.
> > > …..
> > > float[N] - page_size - 16 to page size free bytes.
> > >
> > > —
> > > Denis
> > >
> > > > On Mar 16, 2017, at 10:22 AM, Sergey Chugunov <
> > sergey.chugu...@gmail.com>
> > > wrote:
> > > >
> > > > Denis,
> > > >
> > > > Here is a version of CacheMetrics interface with all changes how I
> see
> > > them
> > > > (pretty long list :)).
> > > >
> > > > public interface CacheMetrics {
> > > >
> > > >   public long getCacheHits();
> > > >
> > > >   public float getCacheHitPercentage();
> > > >
> > > >   public long getCacheMisses();
> > > >
> > > >   public float getCacheMissPercentage();
> > > >
> > > >   public long getCacheGets();
> > > >
> > > >   public long getCachePuts();
> > > >
> > > >   public long getCacheRemovals();
> > > >
> > > >   public long getCacheEvictions();
> > > >
> > > >   public float getAverageGetTime();
> > > >
> > > >   public float getAveragePutTime();
> > > >
> > > >   public float getAverageRemoveTime();
> > > >
> > > >   public float getAverageTxCommitTime();
> > > >
> > > >   public float getAverageTxRollbackTime();
> > > >
> > > >   public long getCacheTxCommits();
> > > >
> > > >   public long getCacheTxRollbacks();
> > > >
> > > >   public String name();
> > > >
> > > >   public long getOverflowSize();
> > > >
> > > >   public long getOffHeapGets();
> > > >
> > > >   public long getOffHeapPuts();//removing as it duplicates cachePuts
> > > >
> > > >   public long getOffHeapRemovals();
> > > >
> > > >   public long getOffHeapEvictions();
> > > >
> > > >   public long getOffHeapHits();
> > > >
> > > >   public float getOffHeapHitPercentage();
> > > >
> > > >   public long getOffHeapMisses();//removing as it duplicates
> > cacheMisses
> > > >
> > > >   public float getOffHeapMissPercentage();//removing as it
> duplicates
> > > > cacheMissPercentage
> > > >
> > > >   public long getOffHeapEntriesCount();
> > > >
> > > >   public long 

[jira] [Created] (IGNITE-3950) Deadlock when exchange happens with pending explicit lock

2016-09-21 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-3950:


 Summary: Deadlock when exchange happens with pending explicit lock
 Key: IGNITE-3950
 URL: https://issues.apache.org/jira/browse/IGNITE-3950
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh


Reproduced by IgniteCacheMultiLockSelfTest#testExplicitLockManyKeysWithClient 
(hangs with ~10% probability).

Exchange worker waits for lock to be released:
{noformat}
Thread [name="exchange-worker-#155%dht.IgniteCacheMultiTxLockSelfTest3%", 
id=195, state=TIMED_WAITING, blockCnt=0, waitCnt=44]
Lock 
[object=o.a.i.i.processors.cache.GridCacheMvccManager$FinishLockFuture@2638011, 
ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at 
o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:187)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:137)
at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:835)
at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:763)
at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:516)
at 
o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1464)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}
while thread that holds lock cannot finish cache operation:
{noformat}
"Thread-9@3645" prio=5 tid=0x11a nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:117)
  at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter$45.inOp(GridCacheAdapter.java:2849)
  at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:5303)
  at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4351)
  at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2847)
  at 
org.apache.ignite.internal.processors.cache.GridCacheProxyImpl.putAll(GridCacheProxyImpl.java:838)
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.IgniteCacheMultiTxLockSelfTest$1.run(IgniteCacheMultiTxLockSelfTest.java:218)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (IGNITE-3014) Optimize GridDhtPartitionTopologyImpl#localPartition()

2016-04-15 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-3014:


 Summary: Optimize GridDhtPartitionTopologyImpl#localPartition()
 Key: IGNITE-3014
 URL: https://issues.apache.org/jira/browse/IGNITE-3014
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


This method is called at least once for every cache operation on each node.
It was partially optimized in https://issues.apache.org/jira/browse/IGNITE-2948.
It seems that we can reduce time spent in that method even further by removing 
excessive RW locks and using volatile read/write instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >