Hi Tom,

In geo-cluster scenario (Boron-SR3), when leader changes are done and all
shards are ready, transactions weren't allowed because of failure to
prepare txn / ask time out exceptions.

All nodes are reachable among themselves.

Have you seen this case before ?  Any insights are highly appreciated.




More description of the karaf messages below.
------------------------------------------------------------------------------------------------------------

Role change messages where node-3 becomes leader.



2017-11-01 18:50:26,379 | INFO  | lt-dispatcher-33 | Shard
          | 163 - org.opendaylight.controller.sal-akka-raft -
1.4.3.Boron-SR3 | member-3-shard-default-config (PreLeader) :- Swi
tching from behavior PreLeader to Leader, election term: 780
2017-11-01 18:50:26,379 | INFO  | lt-dispatcher-41 | RoleChangeNotifier
           | 162 - org.opendaylight.controller.sal-clustering-commons -
1.4.3.Boron-SR3 | RoleChangeNotifier for member-3-shard-d
efault-config , received role change from PreLeader to Leader
2017-11-01 18:50:26,379 | INFO  | lt-dispatcher-20 | ShardManager
           | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | shard-manager-config: Received role
changed for member-3-shard-default-config from PreLeader to Leader
2017-11-01 18:50:26,379 | INFO  | lt-dispatcher-20 | ShardManager
           | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | shard-manager-config: All Shards are
 ready - data store config is ready, available count is 0




Immediately different transactions (50063 ,50064,50065......) were failed
due to ask time out.


2017-11-01 18:50:35,293 | ERROR | lt-dispatcher-20 |
LocalThreePhaseCommitCohort      | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Failed to prepare transaction member
-3-datastore-config-fe-51-txn-50063 on backend
akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka://opendaylight-cluster-data/),
Path(/user/shardmanager-config/member-3-shard-default-config#-1977680713)]]
after [30000 ms].
Sender[null] sent message of type
"org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[150:com.typesafe.akka.actor:2.4.7]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[150:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]
2017-11-01 18:50:35,293 | WARN  | lt-dispatcher-16 |
ConcurrentDOMDataBroker          | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Tx: DOM-50120 Error during phase CAN
_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka://opendaylight-cluster-data/),
Path(/user/shardmanager-config/member-3-shard-default-config#-1977680713)]]
after [30000 ms].
Sender[null] sent message of type
"org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[150:com.typesafe.akka.actor:2.4.7]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[150:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]
2017-11-01 18:50:36,122 | ERROR | lt-dispatcher-14 |
LocalThreePhaseCommitCohort      | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Failed to prepare transaction member
-3-datastore-config-fe-51-txn-50064 on backend
akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka://opendaylight-cluster-data/),
Path(/user/shardmanager-config/member-3-shard-default-config#-1977680713)]]
after [30000 ms].
Sender[null] sent message of type
"org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[150:com.typesafe.akka.actor:2.4.7]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)





2017-11-01 18:50:27,773 | WARN  | ult-dispatcher-3 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Curre
nt transaction member-3-datastore-config-fe-51-txn-50063 has timed out
after 15000 ms in state COMMIT_PENDING
2017-11-01 18:50:27,774 | WARN  | ult-dispatcher-3 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Trans
action member-3-datastore-config-fe-51-txn-50063 is still committing,
cannot abort
2017-11-01 18:50:35,293 | ERROR | lt-dispatcher-20 |
LocalThreePhaseCommitCohort      | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Failed to prepare transaction member
-3-datastore-config-fe-51-txn-50063 on backend
2017-11-01 18:50:47,772 | WARN  | lt-dispatcher-35 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Curre
nt transaction member-3-datastore-config-fe-51-txn-50063 has timed out
after 15000 ms in state COMMIT_PENDING
2017-11-01 18:50:47,772 | WARN  | lt-dispatcher-35 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Trans
action member-3-datastore-config-fe-51-txn-50063 is still committing,
cannot abort
2017-11-01 18:51:02,773 | WARN  | lt-dispatcher-23 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Curre
nt transaction member-3-datastore-config-fe-51-txn-50063 has timed out
after 15000 ms in state COMMIT_PENDING
2017-11-01 18:51:02,773 | WARN  | lt-dispatcher-23 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Trans
action member-3-datastore-config-fe-51-txn-50063 is still committing,
cannot abort
.....................................
....................................
.....................................
2017-11-01 20:51:12,773 | WARN  | lt-dispatcher-38 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Curre
nt transaction member-3-datastore-config-fe-51-txn-50063 has timed out
after 15000 ms in state COMMIT_PENDING
2017-11-01 20:51:12,773 | WARN  | lt-dispatcher-38 | ShardDataTree
          | 168 - org.opendaylight.controller.sal-distributed-datastore -
1.4.3.Boron-SR3 | member-3-shard-default-config: Trans
action member-3-datastore-config-fe-51-txn-50063 is still committing,
cannot abort



Glimpse of full error/warn
{noformat}
2017-11-01 20:46:31,732 | ERROR | ult-dispatcher-4 |
LocalThreePhaseCommitCohort      | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Failed to prepare transaction member
-3-datastore-config-fe-51-txn-326101 on backend
akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka://opendaylight-cluster-data/),
Path(/user/shardmanager-config/member-3-shard-default-config#-1977680713)]]
after [30000 ms].
Sender[null] sent message of type
"org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[150:com.typesafe.akka.actor:2.4.7]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[150:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]
2017-11-01 20:46:31,732 | WARN  | lt-dispatcher-36 |
ConcurrentDOMDataBroker          | 168 -
org.opendaylight.controller.sal-distributed-datastore - 1.4.3.Boron-SR3 |
Tx: DOM-326186 Error during phase CA
N_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on
[ActorSelection[Anchor(akka://opendaylight-cluster-data/),
Path(/user/shardmanager-config/member-3-shard-default-config#-1977680713)]]
after [30000 ms].
Sender[null] sent message of type
"org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[150:com.typesafe.akka.actor:2.4.7]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[146:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[150:com.typesafe.akka.actor:2.4.7]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[150:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]



Regards,
Sai
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to