Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-18 Thread Ivan Pavlukhin
Hi,

Thank you for digging deeper! No good ideas about problems in curCrd.local().

Have you tried to reproduce a leak with starting/stopping huge amount
of client nodes?

вс, 17 нояб. 2019 г. в 13:16, mvkarp :
>
> Only other thing I can think of if it's through onDiscovery() is that
> curCrd.local() somehow is returning true. However I am unable to find
> exactly how local() is determined since there appears to be a big chain.
>
> I know that the node uuid on the leaking server is on a different physical
> node as well as has a completely different node ID
> (b-b-b-b-b-b) to what the MVCC coordinator is
> (mvccCrd=a--a-a-a)
>
> Is there any way that the curCrd.local() could be returning True on the
> leaking server JVM? I am trying to investigate how local() is determined and
> what could cause it to be true.
>
>
> Ivan Pavlukhin wrote
> > But currently I suspect that you faced a leak in
> > MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
> > think that there is other reason in you case?
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-17 Thread mvkarp
Only other thing I can think of if it's through onDiscovery() is that
curCrd.local() somehow is returning true. However I am unable to find
exactly how local() is determined since there appears to be a big chain. 

I know that the node uuid on the leaking server is on a different physical
node as well as has a completely different node ID
(b-b-b-b-b-b) to what the MVCC coordinator is
(mvccCrd=a--a-a-a)

Is there any way that the curCrd.local() could be returning True on the
leaking server JVM? I am trying to investigate how local() is determined and
what could cause it to be true.


Ivan Pavlukhin wrote
> But currently I suspect that you faced a leak in
> MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
> think that there is other reason in you case?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-16 Thread Ivan Pavlukhin
> But the population might be through processRecoveryFinishedMessage() - which 
> does not do any check for isLocal() and goes straight to processing message 
> since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always 
> returns false?

All nodes sends messages intended for handling in
processRecoveryFinishedMessage() to MVCC coordinator node. So, we
assume that only coordinator receives such messages. If you are
interested in, you can find other side code in
IgniteTxManager.NodeFailureTimeoutObject, "recovery finished" messages
are issued there.

But currently I suspect that you faced a leak in
MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
think that there is other reason in you case?

сб, 16 нояб. 2019 г. в 12:30, mvkarp :
>
> 1 & 2. Actually, looking at latest Master on the release of 2.7.5 and the
> current Master version, it is 'pickMvccCoordinator' function which returns
> the coordinator (this is same function that selects node that is not Client
> and Ignite version >= 2.7). curCrd is then assigned the return variable of
> pickMvccCoordinator, which becomes the active Mvcc coordinator. So looks
> like it does become active, but not sure the effect of that yet.
>
> 3. Assuming it is then active, looks like there are two entry points into
> recoveryBallotBoxes. Through  onDiscovery() and via
> processRecoveryFinishedMessage().
>
> Is it possible that onDiscovery() does not populate recoveryBallotBoxes as
> there is curCrd0.local() check - so processing will only be done if MVCC
> coordinator is local - thus a node that is actually a MVCC coordinator will
> clear out the recoveryBallotBoxes (which is the explicit check that you
> mentioned).
>
> But the population might be through processRecoveryFinishedMessage() - which
> does not do any check for isLocal() and goes straight to processing message
> since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always
> returns false?
>
>
> Ivan Pavlukhin wrote
> > 1. MVCC coordinator should not do anything when there is no MVCC
> > caches, actually it should not be active in such case. Basically, MVCC
> > coordinator is needed to have a consistent order between transactions.
> > 2. In 2.7.5 "assigned coordinator" is always selected but it does not
> > mean that it is active. MvccProcessorImpl.curCrd variable corresponds
> > to active MVCC coordinator.
> > 3. If that statement is true, then it should be rather easy to
> > reproduce the problem by starting and stopping client nodes
> > frequently. recoveryBallotBoxes was not assumed to be populated on
> > nodes other than MVCC coordinator. If it happens than we found a bug.
> > Actually, the code in master is different and has an explicit check
> > that recoveryBallotBoxes are populated only on MVCC coordinator.
> >
> > чт, 14 нояб. 2019 г. в 15:42, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >>
> >> Hi, after investigating I have few questions regarding this issue.
> >>
> >> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
> >> able to shed some light on the role and purpose of the MVCC coordinator?
> >> What does the MVCC coordinator do, why is one selected? Should an MVCC
> >> coordinator be selected regardless of MVCC being disabled? (i.e. is it
> >> used
> >> for any other base features and is it just the way Ignite is meant to
> >> work)
> >>
> >> 2. Following on from this, after looking at the code of the
> >> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an
> >> MVCC
> >> coordinator is ALWAYS selected and assigns one of the server nodes as the
> >> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or
> >> not
> >> (mvccEnabled can be false but a MVCC coordinator is still be selected).
> >>
> >> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
> >>
> >> On Line 861, in assignMvccCoordinator method, it loops through all nodes
> >> in
> >> the cluster with only these two conditions.
> >>
> >> *if (!node.isClient() && supportsMvcc(node))*
> >>
> >> It only checks if the node is not a client, and that is supportsMvcc
> >> (which
> >> is true for all versions > 2.7). It does not check mvccEnabled at all.
> >>
> >>
> >> Can you confirm the above is intentional/expected or if there is another
> >> piece of code I am missing?
> >>
> >>
> >> 3. As extra information, the node that happens to be selected as MVCC
> >> coordinator does not get the leak. But every other client/server gets the
> >> leak.
> >>
> >>
> >>
> >> Ivan Pavlukhin wrote
> >> > Hi,
> >> >
> >> > I suspect a following here. Some node treats itself as a MVCC
> >> > coordinator and creates a new RecoveryBallotBox when each client node
> >> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> >> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> >> > coordinator. Consequently a memory leak.
> >> >
> >> 

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-16 Thread mvkarp
1 & 2. Actually, looking at latest Master on the release of 2.7.5 and the
current Master version, it is 'pickMvccCoordinator' function which returns
the coordinator (this is same function that selects node that is not Client
and Ignite version >= 2.7). curCrd is then assigned the return variable of
pickMvccCoordinator, which becomes the active Mvcc coordinator. So looks
like it does become active, but not sure the effect of that yet.

3. Assuming it is then active, looks like there are two entry points into
recoveryBallotBoxes. Through  onDiscovery() and via
processRecoveryFinishedMessage().

Is it possible that onDiscovery() does not populate recoveryBallotBoxes as
there is curCrd0.local() check - so processing will only be done if MVCC
coordinator is local - thus a node that is actually a MVCC coordinator will
clear out the recoveryBallotBoxes (which is the explicit check that you
mentioned).

But the population might be through processRecoveryFinishedMessage() - which
does not do any check for isLocal() and goes straight to processing message
since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always
returns false?


Ivan Pavlukhin wrote
> 1. MVCC coordinator should not do anything when there is no MVCC
> caches, actually it should not be active in such case. Basically, MVCC
> coordinator is needed to have a consistent order between transactions.
> 2. In 2.7.5 "assigned coordinator" is always selected but it does not
> mean that it is active. MvccProcessorImpl.curCrd variable corresponds
> to active MVCC coordinator.
> 3. If that statement is true, then it should be rather easy to
> reproduce the problem by starting and stopping client nodes
> frequently. recoveryBallotBoxes was not assumed to be populated on
> nodes other than MVCC coordinator. If it happens than we found a bug.
> Actually, the code in master is different and has an explicit check
> that recoveryBallotBoxes are populated only on MVCC coordinator.
> 
> чт, 14 нояб. 2019 г. в 15:42, mvkarp <

> liquid_ninja2k@

> >:
>>
>> Hi, after investigating I have few questions regarding this issue.
>>
>> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
>> able to shed some light on the role and purpose of the MVCC coordinator?
>> What does the MVCC coordinator do, why is one selected? Should an MVCC
>> coordinator be selected regardless of MVCC being disabled? (i.e. is it
>> used
>> for any other base features and is it just the way Ignite is meant to
>> work)
>>
>> 2. Following on from this, after looking at the code of the
>> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an
>> MVCC
>> coordinator is ALWAYS selected and assigns one of the server nodes as the
>> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or
>> not
>> (mvccEnabled can be false but a MVCC coordinator is still be selected).
>>
>> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
>>
>> On Line 861, in assignMvccCoordinator method, it loops through all nodes
>> in
>> the cluster with only these two conditions.
>>
>> *if (!node.isClient() && supportsMvcc(node))*
>>
>> It only checks if the node is not a client, and that is supportsMvcc
>> (which
>> is true for all versions > 2.7). It does not check mvccEnabled at all.
>>
>>
>> Can you confirm the above is intentional/expected or if there is another
>> piece of code I am missing?
>>
>>
>> 3. As extra information, the node that happens to be selected as MVCC
>> coordinator does not get the leak. But every other client/server gets the
>> leak.
>>
>>
>>
>> Ivan Pavlukhin wrote
>> > Hi,
>> >
>> > I suspect a following here. Some node treats itself as a MVCC
>> > coordinator and creates a new RecoveryBallotBox when each client node
>> > leaves. Some (may be all) other nodes think that MVCC is disabled and
>> > do not send a vote (assumed for aforementioned ballot box) to MVCC
>> > coordinator. Consequently a memory leak.
>> >
>> > A following could be done:
>> > 1. Figure out why some node treats itself MVCC coordinator and others
>> > think that MVCC is disabled.
>> > 2. Try to introduce some defensive matters in Ignite code to protect
>> > from the leak in a long running cluster.
>> >
>> > As a last chance workaround I can suggest writing custom code, which
>> > cleans recoveryBallotBoxes map from time to time (most likely using
>> > reflection).
>> >
>> > пн, 11 нояб. 2019 г. в 08:53, mvkarp <
>>
>> > liquid_ninja2k@
>>
>> > >:
>> >>
>> >> We have frequently stopping and starting clients in short lived client
>> >> JVM
>> >> processes as required for our purposes, this seems to lead to a huge
>> >> bunch
>> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>> >>
>> >> Still can not figure out why this map won't clear (there are no
>> >> exceptions
>> >> or err at all in the entire log)
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apach

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-15 Thread Ivan Pavlukhin
1. MVCC coordinator should not do anything when there is no MVCC
caches, actually it should not be active in such case. Basically, MVCC
coordinator is needed to have a consistent order between transactions.
2. In 2.7.5 "assigned coordinator" is always selected but it does not
mean that it is active. MvccProcessorImpl.curCrd variable corresponds
to active MVCC coordinator.
3. If that statement is true, then it should be rather easy to
reproduce the problem by starting and stopping client nodes
frequently. recoveryBallotBoxes was not assumed to be populated on
nodes other than MVCC coordinator. If it happens than we found a bug.
Actually, the code in master is different and has an explicit check
that recoveryBallotBoxes are populated only on MVCC coordinator.

чт, 14 нояб. 2019 г. в 15:42, mvkarp :
>
> Hi, after investigating I have few questions regarding this issue.
>
> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
> able to shed some light on the role and purpose of the MVCC coordinator?
> What does the MVCC coordinator do, why is one selected? Should an MVCC
> coordinator be selected regardless of MVCC being disabled? (i.e. is it used
> for any other base features and is it just the way Ignite is meant to work)
>
> 2. Following on from this, after looking at the code of the
> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an MVCC
> coordinator is ALWAYS selected and assigns one of the server nodes as the
> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or not
> (mvccEnabled can be false but a MVCC coordinator is still be selected).
>
> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
>
> On Line 861, in assignMvccCoordinator method, it loops through all nodes in
> the cluster with only these two conditions.
>
> *if (!node.isClient() && supportsMvcc(node))*
>
> It only checks if the node is not a client, and that is supportsMvcc (which
> is true for all versions > 2.7). It does not check mvccEnabled at all.
>
>
> Can you confirm the above is intentional/expected or if there is another
> piece of code I am missing?
>
>
> 3. As extra information, the node that happens to be selected as MVCC
> coordinator does not get the leak. But every other client/server gets the
> leak.
>
>
>
> Ivan Pavlukhin wrote
> > Hi,
> >
> > I suspect a following here. Some node treats itself as a MVCC
> > coordinator and creates a new RecoveryBallotBox when each client node
> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> > coordinator. Consequently a memory leak.
> >
> > A following could be done:
> > 1. Figure out why some node treats itself MVCC coordinator and others
> > think that MVCC is disabled.
> > 2. Try to introduce some defensive matters in Ignite code to protect
> > from the leak in a long running cluster.
> >
> > As a last chance workaround I can suggest writing custom code, which
> > cleans recoveryBallotBoxes map from time to time (most likely using
> > reflection).
> >
> > пн, 11 нояб. 2019 г. в 08:53, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >>
> >> We have frequently stopping and starting clients in short lived client
> >> JVM
> >> processes as required for our purposes, this seems to lead to a huge
> >> bunch
> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >>
> >> Still can not figure out why this map won't clear (there are no
> >> exceptions
> >> or err at all in the entire log)
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-14 Thread mvkarp
Hi, after investigating I have few questions regarding this issue.

1. Having lack of knowledge in what MVCC coordinator is used for, are you
able to shed some light on the role and purpose of the MVCC coordinator?
What does the MVCC coordinator do, why is one selected? Should an MVCC
coordinator be selected regardless of MVCC being disabled? (i.e. is it used
for any other base features and is it just the way Ignite is meant to work)

2. Following on from this, after looking at the code of the
MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an MVCC
coordinator is ALWAYS selected and assigns one of the server nodes as the
MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or not
(mvccEnabled can be false but a MVCC coordinator is still be selected).

https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java

On Line 861, in assignMvccCoordinator method, it loops through all nodes in
the cluster with only these two conditions. 

*if (!node.isClient() && supportsMvcc(node))*

It only checks if the node is not a client, and that is supportsMvcc (which
is true for all versions > 2.7). It does not check mvccEnabled at all.


Can you confirm the above is intentional/expected or if there is another
piece of code I am missing?


3. As extra information, the node that happens to be selected as MVCC
coordinator does not get the leak. But every other client/server gets the
leak.



Ivan Pavlukhin wrote
> Hi,
> 
> I suspect a following here. Some node treats itself as a MVCC
> coordinator and creates a new RecoveryBallotBox when each client node
> leaves. Some (may be all) other nodes think that MVCC is disabled and
> do not send a vote (assumed for aforementioned ballot box) to MVCC
> coordinator. Consequently a memory leak.
> 
> A following could be done:
> 1. Figure out why some node treats itself MVCC coordinator and others
> think that MVCC is disabled.
> 2. Try to introduce some defensive matters in Ignite code to protect
> from the leak in a long running cluster.
> 
> As a last chance workaround I can suggest writing custom code, which
> cleans recoveryBallotBoxes map from time to time (most likely using
> reflection).
> 
> пн, 11 нояб. 2019 г. в 08:53, mvkarp <

> liquid_ninja2k@

> >:
>>
>> We have frequently stopping and starting clients in short lived client
>> JVM
>> processes as required for our purposes, this seems to lead to a huge
>> bunch
>> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>>
>> Still can not figure out why this map won't clear (there are no
>> exceptions
>> or err at all in the entire log)
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-11 Thread Ivan Pavlukhin
Hi,

My first thought is deploying a service [1] (remotely dynamically
Ignite.services().deploy() or statically
IgniteConfiguration.setServiceConfiguration()) clearing problematic
map periodically.

[1] https://apacheignite.readme.io/docs/service-grid

пн, 11 нояб. 2019 г. в 13:20, mvkarp :
>
> Hi,
>
> Would you have any suggestion on how to implement a last chance workaround
> for this issue for the server JVM?
>
>
> Ivan Pavlukhin wrote
> > Hi,
> >
> > I suspect a following here. Some node treats itself as a MVCC
> > coordinator and creates a new RecoveryBallotBox when each client node
> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> > coordinator. Consequently a memory leak.
> >
> > A following could be done:
> > 1. Figure out why some node treats itself MVCC coordinator and others
> > think that MVCC is disabled.
> > 2. Try to introduce some defensive matters in Ignite code to protect
> > from the leak in a long running cluster.
> >
> > As a last chance workaround I can suggest writing custom code, which
> > cleans recoveryBallotBoxes map from time to time (most likely using
> > reflection).
> >
> > пн, 11 нояб. 2019 г. в 08:53, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >>
> >> We have frequently stopping and starting clients in short lived client
> >> JVM
> >> processes as required for our purposes, this seems to lead to a huge
> >> bunch
> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >>
> >> Still can not figure out why this map won't clear (there are no
> >> exceptions
> >> or err at all in the entire log)
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-11 Thread mvkarp
Hi,

Would you have any suggestion on how to implement a last chance workaround
for this issue for the server JVM?


Ivan Pavlukhin wrote
> Hi,
> 
> I suspect a following here. Some node treats itself as a MVCC
> coordinator and creates a new RecoveryBallotBox when each client node
> leaves. Some (may be all) other nodes think that MVCC is disabled and
> do not send a vote (assumed for aforementioned ballot box) to MVCC
> coordinator. Consequently a memory leak.
> 
> A following could be done:
> 1. Figure out why some node treats itself MVCC coordinator and others
> think that MVCC is disabled.
> 2. Try to introduce some defensive matters in Ignite code to protect
> from the leak in a long running cluster.
> 
> As a last chance workaround I can suggest writing custom code, which
> cleans recoveryBallotBoxes map from time to time (most likely using
> reflection).
> 
> пн, 11 нояб. 2019 г. в 08:53, mvkarp <

> liquid_ninja2k@

> >:
>>
>> We have frequently stopping and starting clients in short lived client
>> JVM
>> processes as required for our purposes, this seems to lead to a huge
>> bunch
>> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>>
>> Still can not figure out why this map won't clear (there are no
>> exceptions
>> or err at all in the entire log)
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-10 Thread Ivan Pavlukhin
Hi,

I suspect a following here. Some node treats itself as a MVCC
coordinator and creates a new RecoveryBallotBox when each client node
leaves. Some (may be all) other nodes think that MVCC is disabled and
do not send a vote (assumed for aforementioned ballot box) to MVCC
coordinator. Consequently a memory leak.

A following could be done:
1. Figure out why some node treats itself MVCC coordinator and others
think that MVCC is disabled.
2. Try to introduce some defensive matters in Ignite code to protect
from the leak in a long running cluster.

As a last chance workaround I can suggest writing custom code, which
cleans recoveryBallotBoxes map from time to time (most likely using
reflection).

пн, 11 нояб. 2019 г. в 08:53, mvkarp :
>
> We have frequently stopping and starting clients in short lived client JVM
> processes as required for our purposes, this seems to lead to a huge bunch
> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>
> Still can not figure out why this map won't clear (there are no exceptions
> or err at all in the entire log)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-10 Thread mvkarp
We have frequently stopping and starting clients in short lived client JVM
processes as required for our purposes, this seems to lead to a huge bunch
of PME (but no rebalancing) and topology changes (topVer=300,000+)

Still can not figure out why this map won't clear (there are no exceptions
or err at all in the entire log)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-10 Thread mvkarp
Hi,

There are no more exceptions or err in the logs, only hundreds of thousands
of these logs - heap usage is still increasing steeply and leak is still
present.

[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager] Topology
snapshot [ver=366003, locNode=6a9db3c2, servers=2, clients=17, state=ACTIVE,
CPUs=64, offheap=960.0GB, heap=46.0GB]
[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager]   ^--
Baseline [id=0, size=2, online=0, offline=2]
[13:46:17,683][INFO][exchange-worker-#103][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=99624746-b624-49d6-9e36-bb6d648e9c3b,
crdVer=1571956920778, topVer=AffinityTopologyVersion [topVer=315751,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_LEFT,
evtNode=824dca07-a847-4fd7-81a5-ac0aa8644b26, customEvt=null,
allowMerge=true]
[13:46:17,685][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], err=null]
[13:46:17,708][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=6a9db3c2-08df-4bc2-8a26-13df50b86207,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=824dca07-a847-4fd7-81a5-ac0aa8644b26, addrs=[10.16.1.47, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, x.com.au/10.16.1.47:0], discPort=0,
order=365983, intOrder=183032, lastExchangeTime=1573393534700, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
durationFromInit=21]
[13:46:17,708][INFO][exchange-worker-#103][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0], crd=false]
[13:46:17,770][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], force=false, evt=NODE_LEFT,
node=824dca07-a847-4fd7-81a5-ac0aa8644b26]
[13:46:18,620][INFO][disco-event-worker-#102][GridDiscoveryManager] Added
new node to topology: TcpDiscoveryNode
[id=1115b6b7-7caf-4737-9c61-930e193468f6, addrs=[10.16.1.43, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, x/10.16.1.43:0], discPort=0, order=366004,
intOrder=183041, lastExchangeTime=1573393578569, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true]


*Lots of these warnings *
[13:49:04,673][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 798 milliseconds.

* and sometimes this *
[13:49:04,863][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366038, minorTopVer=0], force=false, evt=NODE_JOINED,
node=7b25d879-b674-4e7d-b5f7-d1c6619e0091]
[13:49:05,677][INFO][grid-nio-worker-tcp-comm-0-#72][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.16.1.47:47101,
rmtAddr=/10.16.1.48:50550]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,707][INFO][tcp-disco-sock-reader-#25013][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.16.1.47:53836, rmtPort=53836]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-08 Thread Ilya Kasnacheev
Hello!

You seem to have an awful lot of errors related to connectivity problems
between nodes, such as:

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=ult-s2-svr1.dataprocessors.com.au/10.16.1.47:47106,
err=Connection refused]

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=ult-s2-svr3/10.16.1.43:47102, err=Remote node ID
is not as expected [expected=d97b5e5d-fb46-4b5b-91ad-79a69fce738f,
rcvd=1dc23ebb-0997-4858-9433-d5d30c9b643e]]

I recommend figuring those errors out: it's possible that you have nodes in
your cluster which are not reachable by communication from server node(s),
but present in discovery. Such nodes will cause all kinds of problems in
cluster.

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 17:12, mvkarp :

> Ok, there are no exceptions in the ignite logs for the client JVMs but I've
> attached the log for one of the problem servers. Looks like a few errors
> but
> I am unable to determine the root cause.
> ignite-46073e05.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/ignite-46073e05.zip>
>
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > This is very strange, since we expect this collection to be cleared on
> > exchange.
> >
> > Please make sure you don't have any stray exceptions during exchange in
> > your logs.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 8 нояб. 2019 г. в 12:49, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >
> >> Hi,
> >>
> >> This is not the case. Always only a maximum total of two server nodes.
> >> One
> >> JVM server on each. However there are many client JVMs that start and
> >> stop
> >> caches with setClientMode=true. It looks like one of the server
> instances
> >> is
> >> immune to the issue, whilst the most newly created one gets the leak,
> >> with
> >> a
> >> lot of partition exchanges happening for EVT_NODE_JOINED and
> >> EVT_NODE_LEFT
> >> (one of the nodes don't get any of these partition exchanges, however
> the
> >> exact server node that gets this can alternate so its not linked to one
> >> node
> >> in particular but seems to be linked to the most newly launched server).
> >>
> >>
> >> ilya.kasnacheev wrote
> >> > Hello!
> >> >
> >> > How many nodes do you have in your cluster?
> >> >
> >> > From the dump it seems that the number of server nodes is in
> thousands.
> >> Is
> >> > this the case?
> >> >
> >> > Regards,
> >> > --
> >> > Ilya Kasnacheev
> >> >
> >> >
> >> > пт, 8 нояб. 2019 г. в 10:26, mvkarp <
> >>
> >> > liquid_ninja2k@
> >>
> >> > >:
> >> >
> >> >> Let me know if these help or if you need anything more specific.
> >> >> recoveryBallotBoxes.zip
> >> >> <
> >> >>
> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
> >> >
> >> >>
> >> >>
> >> >>
> >> >> ilya.kasnacheev wrote
> >> >> > Hello!
> >> >> >
> >> >> > Can you please check whether there are any especially large objects
> >> >> inside
> >> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help
> >> in
> >> >> > determining this. It would be nice to know what is the type
> >> histogram
> >> >> of
> >> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> >> >> > resides.
> >> >> >
> >> >> > Regards,
> >> >> > --
> >> >> > Ilya Kasnacheev
> >> >> >
> >> >> >
> >> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp <
> >> >>
> >> >> > liquid_ninja2k@
> >> >>
> >> >> > >:
> >> >> >
> >> >> >> I've attached another set of screenshots, might be more clear.
> >> >> >> heap.zip
> >> >> >> <
> >> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >> >> >>
> >> >> >>
> >> >> >> mvkarp wrote
> >> >> >> > I've attached some extra screenshots showing what is inside
> these
> >> >> >> records
> >> >> >> > and path to GC roots. heap.zip
> >> >> >> > <
> >> >> >>
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-08 Thread mvkarp
Ok, there are no exceptions in the ignite logs for the client JVMs but I've
attached the log for one of the problem servers. Looks like a few errors but
I am unable to determine the root cause.
ignite-46073e05.zip
 
 


ilya.kasnacheev wrote
> Hello!
> 
> This is very strange, since we expect this collection to be cleared on
> exchange.
> 
> Please make sure you don't have any stray exceptions during exchange in
> your logs.
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> пт, 8 нояб. 2019 г. в 12:49, mvkarp <

> liquid_ninja2k@

> >:
> 
>> Hi,
>>
>> This is not the case. Always only a maximum total of two server nodes.
>> One
>> JVM server on each. However there are many client JVMs that start and
>> stop
>> caches with setClientMode=true. It looks like one of the server instances
>> is
>> immune to the issue, whilst the most newly created one gets the leak,
>> with
>> a
>> lot of partition exchanges happening for EVT_NODE_JOINED and
>> EVT_NODE_LEFT
>> (one of the nodes don't get any of these partition exchanges, however the
>> exact server node that gets this can alternate so its not linked to one
>> node
>> in particular but seems to be linked to the most newly launched server).
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > How many nodes do you have in your cluster?
>> >
>> > From the dump it seems that the number of server nodes is in thousands.
>> Is
>> > this the case?
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > пт, 8 нояб. 2019 г. в 10:26, mvkarp <
>>
>> > liquid_ninja2k@
>>
>> > >:
>> >
>> >> Let me know if these help or if you need anything more specific.
>> >> recoveryBallotBoxes.zip
>> >> <
>> >>
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
>> >
>> >>
>> >>
>> >>
>> >> ilya.kasnacheev wrote
>> >> > Hello!
>> >> >
>> >> > Can you please check whether there are any especially large objects
>> >> inside
>> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help
>> in
>> >> > determining this. It would be nice to know what is the type
>> histogram
>> >> of
>> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> >> > resides.
>> >> >
>> >> > Regards,
>> >> > --
>> >> > Ilya Kasnacheev
>> >> >
>> >> >
>> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp <
>> >>
>> >> > liquid_ninja2k@
>> >>
>> >> > >:
>> >> >
>> >> >> I've attached another set of screenshots, might be more clear.
>> >> >> heap.zip
>> >> >> <
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
>> >> >>
>> >> >>
>> >> >> mvkarp wrote
>> >> >> > I've attached some extra screenshots showing what is inside these
>> >> >> records
>> >> >> > and path to GC roots. heap.zip
>> >> >> > <
>> >> >>
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-08 Thread Ilya Kasnacheev
Hello!

This is very strange, since we expect this collection to be cleared on
exchange.

Please make sure you don't have any stray exceptions during exchange in
your logs.

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 12:49, mvkarp :

> Hi,
>
> This is not the case. Always only a maximum total of two server nodes. One
> JVM server on each. However there are many client JVMs that start and stop
> caches with setClientMode=true. It looks like one of the server instances
> is
> immune to the issue, whilst the most newly created one gets the leak, with
> a
> lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
> (one of the nodes don't get any of these partition exchanges, however the
> exact server node that gets this can alternate so its not linked to one
> node
> in particular but seems to be linked to the most newly launched server).
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > How many nodes do you have in your cluster?
> >
> > From the dump it seems that the number of server nodes is in thousands.
> Is
> > this the case?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 8 нояб. 2019 г. в 10:26, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >
> >> Let me know if these help or if you need anything more specific.
> >> recoveryBallotBoxes.zip
> >> <
> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
> >
> >>
> >>
> >>
> >> ilya.kasnacheev wrote
> >> > Hello!
> >> >
> >> > Can you please check whether there are any especially large objects
> >> inside
> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
> >> > determining this. It would be nice to know what is the type histogram
> >> of
> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> >> > resides.
> >> >
> >> > Regards,
> >> > --
> >> > Ilya Kasnacheev
> >> >
> >> >
> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp <
> >>
> >> > liquid_ninja2k@
> >>
> >> > >:
> >> >
> >> >> I've attached another set of screenshots, might be more clear.
> >> >> heap.zip
> >> >> <
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >> >>
> >> >>
> >> >> mvkarp wrote
> >> >> > I've attached some extra screenshots showing what is inside these
> >> >> records
> >> >> > and path to GC roots. heap.zip
> >> >> > <
> >> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-08 Thread mvkarp
Hi, 

This is not the case. Always only a maximum total of two server nodes. One
JVM server on each. However there are many client JVMs that start and stop
caches with setClientMode=true. It looks like one of the server instances is
immune to the issue, whilst the most newly created one gets the leak, with a
lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
(one of the nodes don't get any of these partition exchanges, however the
exact server node that gets this can alternate so its not linked to one node
in particular but seems to be linked to the most newly launched server).


ilya.kasnacheev wrote
> Hello!
> 
> How many nodes do you have in your cluster?
> 
> From the dump it seems that the number of server nodes is in thousands. Is
> this the case?
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> пт, 8 нояб. 2019 г. в 10:26, mvkarp <

> liquid_ninja2k@

> >:
> 
>> Let me know if these help or if you need anything more specific.
>> recoveryBallotBoxes.zip
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>>
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > Can you please check whether there are any especially large objects
>> inside
>> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
>> > determining this. It would be nice to know what is the type histogram
>> of
>> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> > resides.
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > чт, 7 нояб. 2019 г. в 06:23, mvkarp <
>>
>> > liquid_ninja2k@
>>
>> > >:
>> >
>> >> I've attached another set of screenshots, might be more clear.
>> >> heap.zip
>> >> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
>> >>
>> >>
>> >> mvkarp wrote
>> >> > I've attached some extra screenshots showing what is inside these
>> >> records
>> >> > and path to GC roots. heap.zip
>> >> > <
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-08 Thread Ilya Kasnacheev
Hello!

How many nodes do you have in your cluster?

>From the dump it seems that the number of server nodes is in thousands. Is
this the case?

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 10:26, mvkarp :

> Let me know if these help or if you need anything more specific.
> recoveryBallotBoxes.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > Can you please check whether there are any especially large objects
> inside
> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
> > determining this. It would be nice to know what is the type histogram of
> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> > resides.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 7 нояб. 2019 г. в 06:23, mvkarp <
>
> > liquid_ninja2k@
>
> > >:
> >
> >> I've attached another set of screenshots, might be more clear.
> >> heap.zip
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >>
> >>
> >> mvkarp wrote
> >> > I've attached some extra screenshots showing what is inside these
> >> records
> >> > and path to GC roots. heap.zip
> >> > <
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-07 Thread mvkarp
Let me know if these help or if you need anything more specific.
recoveryBallotBoxes.zip

  


ilya.kasnacheev wrote
> Hello!
> 
> Can you please check whether there are any especially large objects inside
> recoveryBallotBoxes object graph? Sorting by retained heap may help in
> determining this. It would be nice to know what is the type histogram of
> what's inside recoveryBallotBoxes and where the bulk of heap usage
> resides.
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> чт, 7 нояб. 2019 г. в 06:23, mvkarp <

> liquid_ninja2k@

> >:
> 
>> I've attached another set of screenshots, might be more clear.
>> heap.zip
>> ;
>>
>>
>> mvkarp wrote
>> > I've attached some extra screenshots showing what is inside these
>> records
>> > and path to GC roots. heap.zip
>> > <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-07 Thread Ilya Kasnacheev
Hello!

Can you please check whether there are any especially large objects inside
recoveryBallotBoxes object graph? Sorting by retained heap may help in
determining this. It would be nice to know what is the type histogram of
what's inside recoveryBallotBoxes and where the bulk of heap usage resides.

Regards,
-- 
Ilya Kasnacheev


чт, 7 нояб. 2019 г. в 06:23, mvkarp :

> I've attached another set of screenshots, might be more clear.
> heap.zip
> 
>
>
> mvkarp wrote
> > I've attached some extra screenshots showing what is inside these records
> > and path to GC roots. heap.zip
> > <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>;
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-06 Thread mvkarp
I've attached another set of screenshots, might be more clear.
heap.zip
  


mvkarp wrote
> I've attached some extra screenshots showing what is inside these records
> and path to GC roots. heap.zip
> ;  
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-06 Thread mvkarp
I've created ticket, not too sure about how to go about creating a reproducer
for this - https://issues.apache.org/jira/browse/IGNITE-12350

I've attached some extra screenshots showing what is inside these records
and path to GC roots. heap.zip
  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-06 Thread Ilya Kasnacheev
Hello!

Can you please show contents of some of these records, as well as their
referential path to MvccProcessorImpl?

Regards,
-- 
Ilya Kasnacheev


пт, 1 нояб. 2019 г. в 03:25, mvkarp :

> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>
>
>
> I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
> (there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-11-01 Thread Ivan Pavlukhin
Hi,

Sounds like a bug. Would be great to have a ticket with reproducer.

пт, 1 нояб. 2019 г. в 03:25, mvkarp :
>
> 
>
> I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
> (there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-10-31 Thread mvkarp
 

I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
(there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-10-30 Thread Denis Magda
Please try to capture head dumps that will show where is the leak. Share
the dumps with us if the leak is not caused by the application code.

-
Denis


On Tue, Oct 29, 2019 at 11:59 PM mvkarp  wrote:

> Hi team, any update/clarification on this? It is quite critical bug in
> production environment as it is taking 100% CPU usage and leads to OOM /
> crashes.
>
> As more information, this is also affecting the ignite server JVMs causing
> them to crash, and it seems to be assigning a mvcc coordinator node
> regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.
>
> If MVCC is disabled, should there be no MVCC coordinator node in the first
> place? Nor should there be anything being populated in the Mvcc classes
> (otherwise they never get processed and this leads to a memory leak).
>
> Furthermore, is there a way to disable MVCC completely?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-10-30 Thread Ilya Kasnacheev
Hello!

Since you are the first to report such problem, I recommend you to try make
a reproducer project and/or file an issue against Ignite JIRA. Then,
somebody will check it.

Regards,
-- 
Ilya Kasnacheev


ср, 30 окт. 2019 г. в 06:59, mvkarp :

> Hi team, any update/clarification on this? It is quite critical bug in
> production environment as it is taking 100% CPU usage and leads to OOM /
> crashes.
>
> As more information, this is also affecting the ignite server JVMs causing
> them to crash, and it seems to be assigning a mvcc coordinator node
> regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.
>
> If MVCC is disabled, should there be no MVCC coordinator node in the first
> place? Nor should there be anything being populated in the Mvcc classes
> (otherwise they never get processed and this leads to a memory leak).
>
> Furthermore, is there a way to disable MVCC completely?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-10-29 Thread mvkarp
Hi team, any update/clarification on this? It is quite critical bug in
production environment as it is taking 100% CPU usage and leads to OOM /
crashes.

As more information, this is also affecting the ignite server JVMs causing
them to crash, and it seems to be assigning a mvcc coordinator node
regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.

If MVCC is disabled, should there be no MVCC coordinator node in the first
place? Nor should there be anything being populated in the Mvcc classes
(otherwise they never get processed and this leads to a memory leak).

Furthermore, is there a way to disable MVCC completely?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


recoveryBallotBoxes in MvccProcessorImpl memory leak?

2019-10-25 Thread mvkarp
Hi, I am on Ignite 2.7.5 with MVCC disabled for all caches (all
CacheAtomicityMode is ATOMIC)

After analysing a few heaps on the CLIENT node JVM using Eclipse MAT there
is a 'recoveryBallotBoxes' ConcurrentHashMap in the MvccProcessorImpl that
is growing infinitely in size on the heap at a constant rate and can not be
garbage collected. After 10 hours the HashMap takes 600MB of the heap and
the only solution thus far is for me to restart the JVM.

Would any of you know what this leak might be caused by and what
recoveryBallotBoxes is used for when MVCC is disabled? (and how to prevent
it from permanently growing in size).

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/