Re: "Adding entry to partition that is concurrently evicted" error

2020-02-03 Thread Abhishek Gupta (BLOOMBERG/ 919 3RD A)
Thanks Andrei.  Looking at my exception (see below), it seem like it is related 
to https://issues.apache.org/jira/browse/IGNITE-11620 in that it occurred while 
expiration was going on. 

1. As a workaround, would it be valid to increase my ttl to reduce the 
possibility of this occurring ? 
2. My worry about using "NoOpFailureHandler" is that the error would still have 
occurred and it might have put the node in a bad situation which might be just 
as bad or worse than just killing the node. 

If you can confirm 1. is a valid line of defense (albeit not air-tight), that 
would be great.

Thanks,
Abhishek

P.S. My exception below. See it occurs on 'expire()' - similar stack trace as 
the one in 11620


 [ERROR] ttl-cleanup-worker-#159 - Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException
 [part=1013, msg=Adding entry to partition that is concurrently evicted 
[grp=mainCache, part=1013, shouldBeMoving=, belongs=false, 
topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1] 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException:
 Adding entry to partition that is concurrently evicted [grp=mainCache, 
part=1013, shouldBeMoving=, belongs=false, topVer=AffinityTopologyVersion 
[topVer=1978, minorTopVer=1], curTopVer=AffinityTopologyVersion [topVer=1978, 
minorTopVer=1]] at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localPartition0(GridDhtPartitionTopologyImpl.java:950)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localPartition(GridDhtPartitionTopologyImpl.java:825)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.localPartition(GridCachePartitionedConcurrentMap.java:70)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:89)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:1008)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:544)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:999)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expireInternal(IgniteCacheOffheapManagerImpl.java:1403)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1347)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)
 [ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.5-0-2.jar:2.7.5] at java.lang.Thread.run(Thread.java:748) 
[?:1.8.0_222]

From: user@ignite.apache.org At: 01/31/20 05:11:57To:  user@ignite.apache.org
Subject: Re: "Adding entry to partition that is concurrently evicted" error

  
Hi,
  
  Current problem should be solved in ignite-2.8. I am not sure why   
this fix isn't a part of ignite-2.7.6.
  
  https://issues.apache.org/jira/browse/IGNITE-11127
  
  Your cluster was stopped because of failure handler work.
  
https://apacheignite.readme.io/docs/critical-failures-handling#section-failure-handling
  
  I am not sure about possible workarounds here (probably you can   set 
the NoOpFailureHandler). You also can try to create the thread   on 
developer user list:
  
http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-release-td34076i40.html
  
  BR,
  Andrei 
1/29/2020 1:58 AM, Abhishek Gupta   (BLOOMBERG/ 919 3RD A) пишет:
 
  
Hello!  I've got a 6 node Ignite 2.7.5 grid. I had this strange issue where 
multiple nodes hit the following exception -   [ERROR] [sys-stripe-53-#54] 
GridCacheIoManager - Failed to process message 
[senderI

Re: "Adding entry to partition that is concurrently evicted" error

2020-01-31 Thread Andrei Aleksandrov

Hi,

Current problem should be solved in ignite-2.8. I am not sure why this 
fix isn't a part of ignite-2.7.6.


https://issues.apache.org/jira/browse/IGNITE-11127

Your cluster was stopped because of failure handler work.

https://apacheignite.readme.io/docs/critical-failures-handling#section-failure-handling

I am not sure about possible workarounds here (probably you can set the 
NoOpFailureHandler). You also can try to create the thread on developer 
user list:


http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-release-td34076i40.html

BR,
Andrei

1/29/2020 1:58 AM, Abhishek Gupta (BLOOMBERG/ 919 3RD A) пишет:
Hello! I've got a 6 node Ignite 2.7.5 grid. I had this strange issue 
where multiple nodes hit the following exception - [ERROR] 
[sys-stripe-53-#54] GridCacheIoManager - Failed to process message 
[senderId=f4a736b6-cfff-4548-a8b4-358d54d19ac6, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearGetRequest] 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException: 
Adding entry to partition that is concurrently evicted [grp=mainCache, 
part=733, shouldBeMoving=, belongs=false, 
topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]] and 
then died after 2020-01-27 13:30:19.849 [ERROR] 
[ttl-cleanup-worker-#159] - JVM will be halted immediately due to the 
failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=class 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException 
[part=1013, msg=Adding entry to partition that is concurrently evicted 
[grp=mainCache, part=1013, shouldBeMoving=, belongs=false, 
topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1] The 
sequence of events was simply the following -
One of the nodes (lets call it node 1) was down for 2.5 hours and 
restarted. After a configured delay of 20 mins, it started to 
rebalance from the other 5 nodes. There were no other nodes that 
joined or left in this period. 40 minutes into the rebalance the the 
above errors started showing in the other nodes and they just bounced, 
and therefore there was data loss. I found a few links related to this 
but nothing that explained the root cause or what my work around could 
be - * 
http://apache-ignite-users.70518.x6.nabble.com/Adding-entry-to-partition-that-is-concurrently-evicted-td24782.html#a24786 
* https://issues.apache.org/jira/browse/IGNITE-9803

* https://issues.apache.org/jira/browse/IGNITE-11620
Thanks, Abhishek