[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-03-10 Thread Vladimir Ozerov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-11148:
-
Labels: Faillover Hanging Transactions  (was: Faillover Hanging 
Transactions mvcc_stabilization_stage_1)

>  PartitionCountersNeighborcastFuture blocks partition map exchange
> --
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
>  Issue Type: Bug
>  Components: mvcc
>Reporter: Stepachev Maksim
>Assignee: Ivan Pavlukhin
>Priority: Major
>  Labels: Faillover, Hanging, Transactions
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We researched a problem with "execution timeout" in Continuous Query 2 for 
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
> The investigation result showed that we got MVCC problem, as result the test 
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
>  Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
> nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
> [xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], writeVer=GridCacheVersion [topVer=16078, 
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
> duration=191ms, onePhaseCommit=false{code}
> and after that:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
>  Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
> [topVer=16078, order=1548853376060, nodeOrder=5], 
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
> provide status information* (monitoring)._
> One of possible position of the problem: 
> PartitionCountersNeighborcastFuture.onNodeLeft 
> As result we have the transaction in *state=PREPARED* and *completionTime=0* 
> which never complete :
>  
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 
> 13:03:16,776][WARN 
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
>  Failed to wait for partition release future [topVer=AffinityTopologyVersion 
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
> LocalTxReleaseFuture [
>  topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
>  futures=[
>  TxFinishFuture [ 
>  tx=GridDhtTxRemote [
>  nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
>  nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
>  xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], 
>  writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
> nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, o

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Andrew Mashenkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mashenkov updated IGNITE-11148:
--
Labels: Faillover Hanging Transactions mvcc_stabilization_stage_1  (was: 
mvcc_stabilization_stage_1)

>  PartitionCountersNeighborcastFuture blocks partition map exchange
> --
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
>  Issue Type: Bug
>  Components: mvcc
>Reporter: Stepachev Maksim
>Priority: Major
>  Labels: Faillover, Hanging, Transactions, 
> mvcc_stabilization_stage_1
> Fix For: 2.8
>
>
> We researched a problem with "execution timeout" in Continuous Query 2 for 
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
> The investigation result showed that we got MVCC problem, as result the test 
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
>  Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
> nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
> [xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], writeVer=GridCacheVersion [topVer=16078, 
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
> duration=191ms, onePhaseCommit=false{code}
> and after that:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
>  Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
> [topVer=16078, order=1548853376060, nodeOrder=5], 
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
> provide status information* (monitoring)._
> One of possible position of the problem: 
> PartitionCountersNeighborcastFuture.onNodeLeft 
> As result we have the transaction in *state=PREPARED* and *completionTime=0* 
> which never complete :
>  
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 
> 13:03:16,776][WARN 
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
>  Failed to wait for partition release future [topVer=AffinityTopologyVersion 
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
> LocalTxReleaseFuture [
>  topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
>  futures=[
>  TxFinishFuture [ 
>  tx=GridDhtTxRemote [
>  nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
>  nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
>  xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], 
>  writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
> nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=RECOVERY_FINIS

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Andrew Mashenkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mashenkov updated IGNITE-11148:
--
Fix Version/s: 2.8

>  PartitionCountersNeighborcastFuture blocks partition map exchange
> --
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
>  Issue Type: Bug
>  Components: mvcc
>Reporter: Stepachev Maksim
>Priority: Major
>  Labels: mvcc_stabilization_stage_1
> Fix For: 2.8
>
>
> We researched a problem with "execution timeout" in Continuous Query 2 for 
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
> The investigation result showed that we got MVCC problem, as result the test 
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
>  Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
> nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
> [xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], writeVer=GridCacheVersion [topVer=16078, 
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
> duration=191ms, onePhaseCommit=false{code}
> and after that:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
>  Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
> [topVer=16078, order=1548853376060, nodeOrder=5], 
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
> provide status information* (monitoring)._
> One of possible position of the problem: 
> PartitionCountersNeighborcastFuture.onNodeLeft 
> As result we have the transaction in *state=PREPARED* and *completionTime=0* 
> which never complete :
>  
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 
> 13:03:16,776][WARN 
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
>  Failed to wait for partition release future [topVer=AffinityTopologyVersion 
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
> LocalTxReleaseFuture [
>  topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
>  futures=[
>  TxFinishFuture [ 
>  tx=GridDhtTxRemote [
>  nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
>  nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
>  xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], 
>  writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
> nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mv

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Andrew Mashenkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mashenkov updated IGNITE-11148:
--
Ignite Flags:   (was: Docs Required)

>  PartitionCountersNeighborcastFuture blocks partition map exchange
> --
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
>  Issue Type: Bug
>  Components: mvcc
>Reporter: Stepachev Maksim
>Priority: Major
>
> We researched a problem with "execution timeout" in Continuous Query 2 for 
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
> The investigation result showed that we got MVCC problem, as result the test 
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
>  Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
> nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
> [xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], writeVer=GridCacheVersion [topVer=16078, 
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
> duration=191ms, onePhaseCommit=false{code}
> and after that:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
>  Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
> [topVer=16078, order=1548853376060, nodeOrder=5], 
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
> provide status information* (monitoring)._
> One of possible position of the problem: 
> PartitionCountersNeighborcastFuture.onNodeLeft 
> As result we have the transaction in *state=PREPARED* and *completionTime=0* 
> which never complete :
>  
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 
> 13:03:16,776][WARN 
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
>  Failed to wait for partition release future [topVer=AffinityTopologyVersion 
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
> LocalTxReleaseFuture [
>  topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
>  futures=[
>  TxFinishFuture [ 
>  tx=GridDhtTxRemote [
>  nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
>  nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
>  xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], 
>  writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
> nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Andrew Mashenkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mashenkov updated IGNITE-11148:
--
Labels: mvcc_stabilization_stage_1  (was: )

>  PartitionCountersNeighborcastFuture blocks partition map exchange
> --
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
>  Issue Type: Bug
>  Components: mvcc
>Reporter: Stepachev Maksim
>Priority: Major
>  Labels: mvcc_stabilization_stage_1
>
> We researched a problem with "execution timeout" in Continuous Query 2 for 
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
> The investigation result showed that we got MVCC problem, as result the test 
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
>  Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
> nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
> [xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], writeVer=GridCacheVersion [topVer=16078, 
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
> duration=191ms, onePhaseCommit=false{code}
> and after that:
> {code:java}
> [16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
>  Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
> [topVer=16078, order=1548853376060, nodeOrder=5], 
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
> provide status information* (monitoring)._
> One of possible position of the problem: 
> PartitionCountersNeighborcastFuture.onNodeLeft 
> As result we have the transaction in *state=PREPARED* and *completionTime=0* 
> which never complete :
>  
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 
> 13:03:16,776][WARN 
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
>  Failed to wait for partition release future [topVer=AffinityTopologyVersion 
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
> LocalTxReleaseFuture [
>  topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
>  futures=[
>  TxFinishFuture [ 
>  tx=GridDhtTxRemote [
>  nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
>  nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
> [explicitVers=null, started=true, commitAllowed=0, 
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
>  xidVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], 
>  writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
> nodeOrder=3], implicit=false, loc=false, threadId=21, 
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, 
> startVer=GridCacheVersion [topVer=16078, order=1548853376060, 
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, 
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, 
> commitVer=GridCacheVersion [topVer=16078, order=1548853376061, 
> nodeOrder=3], finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED, 
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
> mvccSn

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Stepachev Maksim (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stepachev Maksim updated IGNITE-11148:
--
Description: 
We researched a problem with "execution timeout" in Continuous Query 2 for 
*CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
The investigation result showed that we got MVCC problem, as result the test 
blocks at *getAndPut*, because in some moment wrong behavior happened:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
 Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
[nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
[explicitVers=null, started=true, commitAllowed=0, 
txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
[xidVer=GridCacheVersion [topVer=16078, order=1548853376061, nodeOrder=3], 
writeVer=GridCacheVersion [topVer=16078, order=1548853376062, nodeOrder=3], 
implicit=false, loc=false, threadId=21, startTime=1548853376731, 
nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, startVer=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, 
sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=16078, order=1548853376061, nodeOrder=3], finalizing=NONE, 
invalidParts=null, state=PREPARED, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
duration=191ms, onePhaseCommit=false{code}
and after that:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
 Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=5], 
futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
_!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't 
provide status information* (monitoring)._

One of possible position of the problem: 
PartitionCountersNeighborcastFuture.onNodeLeft 

As result we have the transaction in *state=PREPARED* and *completionTime=0* 
which never complete :

 
{code:java}
[16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 13:03:16,776][WARN 
][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
 Failed to wait for partition release future [topVer=AffinityTopologyVersion 
[topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
LocalTxReleaseFuture [
 topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
 futures=[
 TxFinishFuture [ 
 tx=GridDhtTxRemote [
 nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
 nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
[explicitVers=null, started=true, commitAllowed=0, 
txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
 xidVer=GridCacheVersion [topVer=16078, order=1548853376061, nodeOrder=3], 
 writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
nodeOrder=3], implicit=false, loc=false, threadId=21, startTime=1548853376731, 
nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, startVer=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, 
sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=16078, order=1548853376061, nodeOrder=3], 
finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
duration=20048ms, onePhaseCommit=false]]], completionTime=0, duration=20048]
{code}
 

 

  was:
We researched a problem with "execution timeout" in Continuous Query 2 for 
*CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
The investigation result showed that we got MVCC problem, as result the test 
blocks at getAndPut, because in some moment wrong behavior happened:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
][sys-stripe-6-#9%continuous

[jira] [Updated] (IGNITE-11148) PartitionCountersNeighborcastFuture blocks partition map exchange

2019-01-30 Thread Stepachev Maksim (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stepachev Maksim updated IGNITE-11148:
--
Description: 
We researched a problem with "execution timeout" in Continuous Query 2 for 
*CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
The investigation result showed that we got MVCC problem, as result the test 
blocks at getAndPut, because in some moment wrong behavior happened:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
 Finishing prepared transaction [commit=false, tx=GridDhtTxRemote 
[nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
[explicitVers=null, started=true, commitAllowed=0, 
txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter 
[xidVer=GridCacheVersion [topVer=16078, order=1548853376061, nodeOrder=3], 
writeVer=GridCacheVersion [topVer=16078, order=1548853376062, nodeOrder=3], 
implicit=false, loc=false, threadId=21, startTime=1548853376731, 
nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, startVer=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, 
sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=16078, order=1548853376061, nodeOrder=3], finalizing=NONE, 
invalidParts=null, state=PREPARED, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
duration=191ms, onePhaseCommit=false{code}
and after that:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,931][INFO 
][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
 Starting delivery partition countres to remote nodes [txId=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=5], 
futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
_!IMPORTANT - we work with PartitionCountersNeighborcastFuture which doesn't 
provide status information (monitoring)._

One of possible position of the problem: 
PartitionCountersNeighborcastFuture.onNodeLeft 

As result we have the transaction in state=PREPARED and completionTime=0 which 
never complete :

 
{code:java}
[16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30 13:03:16,776][WARN 
][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
 Failed to wait for partition release future [topVer=AffinityTopologyVersion 
[topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f6490]
LocalTxReleaseFuture [
 topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0], 
 futures=[
 TxFinishFuture [ 
 tx=GridDhtTxRemote [
 nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b94, 
rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4, 
 nearXidVer=GridCacheVersion [topVer=16078, order=1548853376060, 
nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter 
[explicitVers=null, started=true, commitAllowed=0, 
txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {}, 
writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
 xidVer=GridCacheVersion [topVer=16078, order=1548853376061, nodeOrder=3], 
 writeVer=GridCacheVersion [topVer=16078, order=1548853376062, 
nodeOrder=3], implicit=false, loc=false, threadId=21, startTime=1548853376731, 
nodeId=3e6881c0-1e96-42a9-8bd1-55d344c2, startVer=GridCacheVersion 
[topVer=16078, order=1548853376060, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, 
sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=16078, order=1548853376061, nodeOrder=3], 
finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], 
mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207, 
cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null, 
duration=20048ms, onePhaseCommit=false]]], completionTime=0, duration=20048]
{code}
 

 

  was:
We researched a problem with "execution timeout" in the Continuous Query 2 for 
*CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*. 
The investigation result showed that we got MVCC problem, as result the test 
blocks at getAndPut, because in some moment wrong behavior happened:
{code:java}
[16:02:56] :     [Step 4/5] [2019-01-30 13:02:56,923][INFO 
][sys-stripe-6-#9%continuous.Cac