Richard-yyf opened a new issue #695:
URL: https://github.com/apache/servicecomb-pack/issues/695
## Testing scenarios
I was testing how ServiceComb behave when the global Tx is timed out.
There were 3 services:
- order service
- account service'
- storage service
The order service calls the account service and the storage service in turn.
1. The order service creates an order locally
2. The account service deducts the corresponding amount of money of the
corresponding user
3. The storage service deducts the inventory of the corresponding count of
the corresponding product
The testing scenarios is as below:
1. Order Service set the global Tx timout to `10s` via `@SagaStart(timeout =
10)`
2. storage service will simulate a timeout of 10 seconds via `Thread.sleep()`
### code
#### order serivce
```java
@SagaStart(timeout = 10)
@Transactional(rollbackFor = Exception.class)
public boolean create(Order order) {
// 1. 创建订单
orderLocalService.createLocal(order);
// 扣钱
String result = accountClient.decrease(order.getUserId(),
order.getMoney());
if (Objects.equals(result, "fail")) {
log.error("扣库存失败,回滚本地事务");
throw new RuntimeException("扣钱失败");
}
// 扣库存
result = storageClient.decrease(order.getProductId(),
order.getCount());
if (Objects.equals(result, "fail")) {
log.error("扣库存失败,回滚本地事务");
throw new RuntimeException("扣库存失败");
}
return true;
}
```
#### account service
```java
@Override
@Transactional(rollbackFor = Exception.class)
@Compensable(compensationMethod = "compensate")
public void decrease(Long userId, BigDecimal money) {
accountMapper.decrease(userId, money);
}
```
#### storage service
```java
@Transactional(rollbackFor = Exception.class)
@Compensable(compensationMethod = "compensate")
public void decrease(Long productId, Integer count) {
log.info("执行 storage 正常流程, 全局事务ID: {}, 分支事务ID: {}",
omegaContext.globalTxId(), omegaContext.localTxId());
storageMapper.decrease(productId, count);
// 11s time out
try {
Thread.sleep(11000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
```
## Problem Description
In the above test case, when the global transaction times out, alpha-server
will continuously throw NPE exceptions. The exception log is as follows:
```
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-14]
o.a.s.p.a.f.SagaActor : recovery
SagaStartedEvent{timeout=10}
{"@type":"SagaStartedEvent","serviceName":"order-server","instanceId":"order-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","createTime":1612851846898,"timeout":10,"type":"SagaStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-14]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxStartedEvent","serviceName":"account-server","instanceId":"account-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"dc5e50b7-260e-4f8a-9e42-ead972d6ab28","createTime":1612851846912,"compensationMethod":"public
void
com.ricstudio.account.service.AccountServiceImpl.compensate(java.lang.Long,java.math.BigDecimal)","payloads":"AQMJAgEAamF2YS5tYXRoLkJpZ0RlY2ltYewBAmQA","retryMethod":"","forwardRetries":0,"forwardTimeout":0,"reverseRetries":0,"reverseTimeout":0,"retryDelayInMilliseconds":5,"type":"TxStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-14]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxEndedEvent","serviceName":"account-server","instanceId":"account-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"dc5e50b7-260e-4f8a-9e42-ead972d6ab28","createTime":1612851846944,"type":"TxEndedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-12]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxStartedEvent","serviceName":"storage-server","instanceId":"storage-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"2d815537-efa5-43ca-a2e4-3078679c8903","createTime":1612851846950,"compensationMethod":"public
void
com.ricstudio.storage.service.StorageServiceImpl.compensate(java.lang.Long,java.lang.Integer)","payloads":"AQMJAgIU","retryMethod":"public
void
com.ricstudio.storage.service.StorageServiceImpl.decrease(java.lang.Long,java.lang.Integer)","forwardRetries":3,"forwardTimeout":0,"reverseRetries":0,"reverseTimeout":0,"retryDelayInMilliseconds":5,"type":"TxStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-12]
o.a.s.p.a.f.SagaActor : recovery null
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-14]
o.a.s.p.a.f.SagaActor : recovery completed
[20fe99d8-1339-4bda-9a50-285b66eeab3f] state=SUSPENDED
2021-02-09 14:24:36.871 ERROR ruichadeMacBook.local --- [t-dispatcher-14]
o.a.s.p.a.f.SagaActor : stop
[20fe99d8-1339-4bda-9a50-285b66eeab3f] fail
2021-02-09 14:24:36.871 ERROR ruichadeMacBook.local --- [t-dispatcher-14]
a.a.OneForOneStrategy : null
java.lang.NullPointerException
at
org.apache.servicecomb.pack.alpha.core.fsm.repository.model.SagaSubTransaction$Builder.build(SagaSubTransaction.java:103)
at
org.apache.servicecomb.pack.alpha.fsm.spring.integration.akka.SagaDataExtension$SagaDataExt.lambda$stopSagaData$0(SagaDataExtension.java:76)
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
at
org.apache.servicecomb.pack.alpha.fsm.model.TxEntities.forEach(TxEntities.java:32)
at
org.apache.servicecomb.pack.alpha.fsm.spring.integration.akka.SagaDataExtension$SagaDataExt.stopSagaData(SagaDataExtension.java:69)
at
org.apache.servicecomb.pack.alpha.fsm.SagaActor.beforeStop(SagaActor.java:387)
at
org.apache.servicecomb.pack.alpha.fsm.SagaActor.lambda$new$31(SagaActor.java:325)
at
akka.persistence.fsm.japi.pf.FSMStateFunctionBuilder$2.apply(FSMStateFunctionBuilder.java:84)
at
akka.persistence.fsm.japi.pf.FSMStateFunctionBuilder$2.apply(FSMStateFunctionBuilder.java:81)
at akka.japi.pf.CaseStatement.apply(CaseStatements.scala:18)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.CaseStatement.applyOrElse(CaseStatements.scala:13)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at
akka.persistence.fsm.PersistentFSMBase.processEvent(PersistentFSMBase.scala:459)
at
akka.persistence.fsm.PersistentFSMBase.processEvent$(PersistentFSMBase.scala:456)
at
akka.persistence.fsm.AbstractPersistentFSMBase.processEvent(PersistentFSMBase.scala:639)
at
akka.persistence.fsm.PersistentFSMBase.akka$persistence$fsm$PersistentFSMBase$$processMsg(PersistentFSMBase.scala:453)
at
akka.persistence.fsm.PersistentFSMBase$$anonfun$receive$1.applyOrElse(PersistentFSMBase.scala:448)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at
akka.persistence.fsm.AbstractPersistentFSM.akka$persistence$Eventsourced$$super$aroundReceive(PersistentFSM.scala:441)
at
akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:743)
at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:222)
at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:221)
at
akka.persistence.fsm.AbstractPersistentFSM.aroundReceive(PersistentFSM.scala:441)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery
SagaStartedEvent{timeout=10}
{"@type":"SagaStartedEvent","serviceName":"order-server","instanceId":"order-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","createTime":1612851846898,"timeout":10,"type":"SagaStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxStartedEvent","serviceName":"account-server","instanceId":"account-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"dc5e50b7-260e-4f8a-9e42-ead972d6ab28","createTime":1612851846912,"compensationMethod":"public
void
com.ricstudio.account.service.AccountServiceImpl.compensate(java.lang.Long,java.math.BigDecimal)","payloads":"AQMJAgEAamF2YS5tYXRoLkJpZ0RlY2ltYewBAmQA","retryMethod":"","forwardRetries":0,"forwardTimeout":0,"reverseRetries":0,"reverseTimeout":0,"retryDelayInMilliseconds":5,"type":"TxStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxEndedEvent","serviceName":"account-server","instanceId":"account-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"dc5e50b7-260e-4f8a-9e42-ead972d6ab28","createTime":1612851846944,"type":"TxEndedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery
{"@type":"TxStartedEvent","serviceName":"storage-server","instanceId":"storage-server-172.20.160.108","globalTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","parentTxId":"20fe99d8-1339-4bda-9a50-285b66eeab3f","localTxId":"2d815537-efa5-43ca-a2e4-3078679c8903","createTime":1612851846950,"compensationMethod":"public
void
com.ricstudio.storage.service.StorageServiceImpl.compensate(java.lang.Long,java.lang.Integer)","payloads":"AQMJAgIU","retryMethod":"public
void
com.ricstudio.storage.service.StorageServiceImpl.decrease(java.lang.Long,java.lang.Integer)","forwardRetries":3,"forwardTimeout":0,"reverseRetries":0,"reverseTimeout":0,"retryDelayInMilliseconds":5,"type":"TxStartedEvent"}
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery null
2021-02-09 14:24:36.871 INFO ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : recovery completed
[20fe99d8-1339-4bda-9a50-285b66eeab3f] state=SUSPENDED
2021-02-09 14:24:36.872 ERROR ruichadeMacBook.local --- [t-dispatcher-29]
o.a.s.p.a.f.SagaActor : stop
[20fe99d8-1339-4bda-9a50-285b66eeab3f] fail
2021-02-09 14:24:36.872 ERROR ruichadeMacBook.local --- [t-dispatcher-29]
a.a.OneForOneStrategy : null
java.lang.NullPointerException
at
org.apache.servicecomb.pack.alpha.core.fsm.repository.model.SagaSubTransaction$Builder.build(SagaSubTransaction.java:103)
at
org.apache.servicecomb.pack.alpha.fsm.spring.integration.akka.SagaDataExtension$SagaDataExt.lambda$stopSagaData$0(SagaDataExtension.java:76)
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
at
org.apache.servicecomb.pack.alpha.fsm.model.TxEntities.forEach(TxEntities.java:32)
at
org.apache.servicecomb.pack.alpha.fsm.spring.integration.akka.SagaDataExtension$SagaDataExt.stopSagaData(SagaDataExtension.java:69)
at
org.apache.servicecomb.pack.alpha.fsm.SagaActor.beforeStop(SagaActor.java:387)
at
org.apache.servicecomb.pack.alpha.fsm.SagaActor.lambda$new$31(SagaActor.java:325)
at
akka.persistence.fsm.japi.pf.FSMStateFunctionBuilder$2.apply(FSMStateFunctionBuilder.java:84)
at
akka.persistence.fsm.japi.pf.FSMStateFunctionBuilder$2.apply(FSMStateFunctionBuilder.java:81)
at akka.japi.pf.CaseStatement.apply(CaseStatements.scala:18)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.CaseStatement.applyOrElse(CaseStatements.scala:13)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at
akka.persistence.fsm.PersistentFSMBase.processEvent(PersistentFSMBase.scala:459)
at
akka.persistence.fsm.PersistentFSMBase.processEvent$(PersistentFSMBase.scala:456)
at
akka.persistence.fsm.AbstractPersistentFSMBase.processEvent(PersistentFSMBase.scala:639)
at
akka.persistence.fsm.PersistentFSMBase.akka$persistence$fsm$PersistentFSMBase$$processMsg(PersistentFSMBase.scala:453)
at
akka.persistence.fsm.PersistentFSMBase$$anonfun$receive$1.applyOrElse(PersistentFSMBase.scala:448)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at
akka.persistence.fsm.AbstractPersistentFSM.akka$persistence$Eventsourced$$super$aroundReceive(PersistentFSM.scala:441)
at
akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:743)
at akka.persistence.Eventsourced.aroundReceive(Eventsourced.scala:222)
at akka.persistence.Eventsourced.aroundReceive$(Eventsourced.scala:221)
at
akka.persistence.fsm.AbstractPersistentFSM.aroundReceive(PersistentFSM.scala:441)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
```
The alpha UI show the suspended tx reaches 45.46K+...

## My exploration of the Problem
I debug the alpha in my local env and make the break point according to the
error stacktrace. Then I found what cause the probleam.
when alpha found the global tx is timed out, the global tx state will be
set to `SUSPEND`,and will stop the sagaActor.

before stopping, it will save the GlobalTransaction to the es. The NPE
happens when building the subTransaction.

although the global tx is `SUSPEND`, the sub tx is still active and its
endtime is null, and then cause NPE here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]