AlinsRan commented on issue #2774:
URL: 
https://github.com/apache/apisix-ingress-controller/issues/2774#issuecomment-4666087725

   **Thanks for the detailed report — we've found the root cause.**
   
   ### Root cause
   
   It's a stale-cache problem in the long-lived ADC sidecar, triggered by 
leader election in a **multi-replica** deployment. A leader switch **doesn't 
restart the pods**, so ADC's in-memory cache survives on the standby pod.
   
   1. ADC keeps a per-`cacheKey` baseline in memory: last-synced content + the 
`conf_version` it generated.
   2. Only the **leader** runs the sync loop; a standby pod never refreshes its 
local ADC cache.
   3. Meanwhile the current leader pushes new config and bumps `*_conf_version` 
in APISIX.
   4. When leadership switches **back**, the controller rebuilds correct state 
from the API server, but the ADC cache on that pod is still **frozen at its 
last-leader snapshot** (old `conf_version`).
   5. ADC diffs current state against the stale baseline, reuses the **older** 
`conf_version`, and APISIX rejects it:
      ```
      upstreams_conf_version must be greater than or equal to (1779434128737)
      ```
   
   This matches the known workarounds: restarting a pod wipes the in-memory 
state, and scaling the backend changes the content enough that ADC stops 
reusing the stale version.
   
   ### Fix direction
   
   The stale state lives **inside ADC**, so the controller can't fully fix it 
alone. ADC needs to expose one of these, called when a pod **(re)acquires 
leadership**:
   
   1. **Clear cache** (reset by `cacheKey`) — next sync is a full push with 
fresh `conf_version`. Simple and robust, but re-pushes everything on each 
switch.
   2. **Update cache without syncing** — re-align the in-memory baseline to the 
current state without touching APISIX, so later syncs diff against fresh data. 
Avoids the full re-push, but needs a bit more work on both sides.
   
   Plan: start with **(1)** to fix the bug, then add **(2)** as an 
optimization. We'll update here once a PR is ready.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to