vqianxiao opened a new issue, #15881:
URL: https://github.com/apache/dubbo/issues/15881

   ### Pre-check
   
   - [x] I am sure that all the content I provide is in English.
   
   
   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dubbo/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Apache Dubbo Component
   
   Java SDK (apache/dubbo)
   
   ### Dubbo Version
   
   Dubbo java 3.2.15 OpenJDK 21-2024.03 Debian GNU/Linux 11 (bullseye) 
   
   ### Steps to reproduce this issue
   
   1.In high concurrency scenarios, the list of service providers is frequently 
updated
   2.Multiple consumer threads are simultaneously executing routing decisions
   3.When the routing chain is updated and the routing decision is executed 
concurrently, a "Reject to route" error will occur
   
   ### What you expected to happen
   
   During the release period, requests should be sent to the newly released 
producer server as usual.
   
   Actual behavior
   the monitoring shows that the call volume has decreased to 0.
   Here is some log:
   ```
   [DUBBO] Failed to execute router: 
nacos://xxxx:8848/org.apache.dubbo.registry.RegistryService?REGISTRY_CLUSTER=default&application=adx-web&check=false&dubbo=2.0.2&executor-management-mode=isolation&file-cache=true&password=xxx!&pid=7&qos.enable=false&register-mode=instance&release=3.2.15&timestamp=1766047589451&username=nacos,
 cause: reject to route, because the invokers has changed., dubbo version: 
3.2.15, current host: xxx, error code: 2-1. This may be caused by , go to 
https://dubbo.apache.org/faq/2/1 to find instructions. 
   java.lang.IllegalStateException: reject to route, because the invokers has 
changed.
        at 
org.apache.dubbo.rpc.cluster.SingleRouterChain.route(SingleRouterChain.java:147)
        at 
org.apache.dubbo.registry.integration.DynamicDirectory.doList(DynamicDirectory.java:214)
        at 
org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:235)
        at 
org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:452)
        at 
org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:355)
        at 
org.apache.dubbo.rpc.cluster.router.RouterSnapshotFilter.invoke(RouterSnapshotFilter.java:46)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
        at 
org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:108)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
        at 
org.apache.dubbo.rpc.cluster.filter.support.MetricsClusterFilter.invoke(MetricsClusterFilter.java:57)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
        at 
org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:52)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
        at 
org.apache.dubbo.rpc.cluster.filter.support.ObservationSenderFilter.invoke(ObservationSenderFilter.java:62)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
        at 
org.apache.dubbo.spring.security.filter.ContextHolderParametersSelectedTransferFilter.invoke(ContextHolderParametersSelectedTransferFilter.java:40)
        at 
org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:349)
   ```
   ```
   [DUBBO] No provider available after connectivity filter for the service 
cn.geo.api.remoteservice.RemoteIpAreaService All routed invokers' size: 0 from 
registry ServiceDiscoveryRegistryDirectory(registry: xxxx:8848, subscribed key: 
[geo-service])-Directory(invokers: 130[xxxx:20880, xxxx:20880, xxxx:20880], 
validInvokers: 130[xxxx:20880, xxxx:20880, xxxx:20880], invokersToReconnect: 
0[]) on the consumer xxxx using the dubbo version 3.2.15., dubbo version: 
3.2.15, current host: xxxx, error code: 2-2. This may be caused by provider 
server or registry center crashed, go to https://dubbo.apache.org/faq/2/2 to 
find instructions. 
   ```
   
   <img width="331" height="296" alt="Image" 
src="https://github.com/user-attachments/assets/6b9f3673-fd9d-4591-a472-504bab2e9483";
 />
   
   Monitor as above. Restart the consumer cluster and call recovery.
   I think there is a concurrency issue with the invokers variable in the org. 
Apache. Dubbo. rpc. cluster. SingleRouterChain # route method, which resulted 
in the 140 line judgment being true and throwing throw new 
IllegalStateException("reject to route, because the invokers has changed.");
   
   ```java
   public List<Invoker<T>> route(URL url, BitList<Invoker<T>> 
availableInvokers, Invocation invocation) {
           if (invokers.getOriginList() != availableInvokers.getOriginList()) {
               logger.error(
                       INTERNAL_ERROR,
                       "",
                       "Router's invoker size: " + 
invokers.getOriginList().size() + " Invocation's invoker size: "
                               + availableInvokers.getOriginList().size(),
                       "Reject to route, because the invokers has changed.");
               throw new IllegalStateException("reject to route, because the 
invokers has changed.");
           }
           if (RpcContext.getServiceContext().isNeedPrintRouterSnapshot()) {
               return routeAndPrint(url, availableInvokers, invocation);
           } else {
               return simpleRoute(url, availableInvokers, invocation);
           }
       }
   ```
   
   ### Anything else
   
   I think this issue should be inevitable during high-frequency calls.
   
   Root cause analysis:
   - The data consistency of available Invoke is guaranteed by 
invokerRefreshLock
   - The lock of routerChain is mainly used for switching and selecting routing 
chains
   - But the protection ranges of these two locks are different, resulting in 
inconsistent states during concurrent updates
   
   Current design issue:
   - The read lock of routerChain is released prematurely after obtaining 
singleChain
   - The routing decision (doList) is made after the routerChain read lock is 
released
   - This results in routerChain being modifiable by other threads during route 
execution
   
   
   ### Are you willing to submit a pull request to fix on your own?
   
   - [x] Yes I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to