bzp2010 commented on issue #2514:
URL: 
https://github.com/apache/apisix-ingress-controller/issues/2514#issuecomment-3587751484

   Hi, @johannes-engler-mw. Apologies for the delay. Its content is so 
extensive that I must devote considerably more time to it, and even then I 
cannot guarantee that I have made every point perfectly clear.
   
   I should like to explain the issue you mentioned, "Full Configuration Push 
on Updates".
   
   **First, the conclusion: Full configuration is pushed, but it does not 
implement as you understand it.**
   
   If you have delved into the AIC 2.0 codebase, you will be aware that we 
employ ADC as the core adapter to support different modes (etcd or stateless), 
with these modes corresponding respectively to the `apisix` and 
`apisix-standalone` backends [1] within ADC.
   
   Given your interest in AIC's stateless mode, we shall focus primarily on the 
apisix-standalone backend. On this basis, we will see what optimisations we 
have made in terms of synchronisation efficiency and APISIX data plane 
efficiency respectively.
   
   
   
   We have implemented numerous optimisation patches to enhance synchronisation 
speed, including but not limited to: ADC remote configuration caching 
(exclusively for apisix-standalone), reduction in the frequency of differential 
checks, and ADC server mode.
   
   1. ADC remote configuration caching: I have implemented caching within the 
ADC to consistently minimise network round trips required to fetch 
configurations from the APISIX Admin API, thereby helping to reduce 
synchronisation latency.
   
   2. Reduce the frequency of differential check: 
   
   In the standard ADC CLI mode, users invariably attempt to push a series of 
changes to a single endpoint. This is because APISIX instances utilising etcd 
possess a centralised storage system. Regardless of which APISIX instance's 
Admin API we write data to, it is reliably propagated to all data planes. This 
approach is comparatively efficient.
   However, APISIX Standalone presents a special case. Here, there is no 
centralised storage, and configurations must be sent to the Admin API endpoint 
on each APISIX instance. In CLI mode, we must execute the full 
"pull-differential check-push" synchronisation process for every instance, 
which increases linearly with the number of APISIX instances. Our team soon 
realised this was unsustainable. The unpredictable number of APISIX instances 
and CPU-intensive differential checks risked slowing down our entire system, 
which seemed unacceptable.
   Then, we implemented an optimisation. We ensured that regardless of the 
number of APISIX instances, differences are calculated only once during a 
single synchronisation. Subsequently, the patched latest configuration is 
pushed to each APISIX instance. The time consumed by CPU-intensive tasks would 
no longer increase linearly.
   
   As I mentioned earlier, I have created a caching layer. This optimisation 
will work closely with the cache to ensure its effectiveness. It is a 
systematic optimisation involving every part of the AIC, ADC, and APISIX data 
plane. I shall endeavour to explain it briefly.
   
   > When AIC performs its initial synchronisation, the cache does not yet 
exist. ADC will attempt to establish the cache by fetching the configuration 
from any APISIX Admin API. The key modification introduced in APISIX 3.14 will 
prove significant here: the APISIX Admin API will return a timestamp indicating 
when it last received a configuration update. ADC will retrieve the latest 
configuration from this point (this is to minimise performance fluctuations in 
the data plane, which I shall explain shortly).
   >
   > For example, if you have several APISIX instances that have been running 
for some time, the ADC will endeavour to obtain the latest configuration from 
them. Should there be newly started APISIX instances among them, these will 
only be considered as backups; configurations already applied in the APISIX 
instances with the latest configuration will take precedence. Conversely, if 
all your APISIX instances are brand new, the cache will not be established from 
this point, and the ADC will perform a full rebuild.
   >
   > Assuming ADC has successfully established the cache from the latest 
configuration, it will then perform a differential check between the latest 
gateway configuration from the AIC-inputted Kubernetes cluster and the cached 
configuration. This generates a set of differences, after which ADC 
sequentially applies the differential patches to the cached configuration. The 
resulting updated configuration is subsequently pushed to the relevant APISIX 
instances.
   > Subsequently, the latest configuration will replace the previous cached 
version. Any subsequent changes will always be compared against the cache, 
eliminating the need to fetch the configuration from APISIX again.
   
   3. ADC server mode:
   
   ADC is now developed using TypeScript and runs on Node.js. It cannot 
directly interoperate with the Golang used to develop AIC (via code 
references). So, we must employ alternative methods for interaction.
   Initially, ADC was invoked in CLI mode. Each time AIC synchronised, a new 
process would launch and execute the synchronisation. Subsequently, the ADC 
process would exit. This was the implementation approach in earlier rc versions.
   
   We observed high CPU utilisation. Upon examining performance consumption 
(via flame graphs), we discovered that each time Node.js launched to execute 
ADC logic, it had to re-execute the code loading, parsing, and execution 
process – a CPU-intensive task. Furthermore, we recognised that to leverage the 
V8 VM's highly optimised JIT capabilities, we needed to keep ADC running for 
extended periods whenever possible.
   
   Then, we ultimately decided to build a server mode for ADC. It would listen 
on a Unix socket, with AIC invoking the API exposed upon it. This ensures code 
parsing occurs only once during ADC server startup, with subsequent operations 
benefiting from the V8 JIT's ongoing compilation optimisation process. It is 
specifically designed for AIC and will only be used in conjunction with AIC. 
This is the purpose of the adc-server container you now see on rc5.
   
   The major improvements to the ADC have already been discussed above (though 
I have omitted some of the finer details). Next, I shall address the 
enhancements made to the APISIX data plane.
   
   1. How APISIX loads configurations pushed by ADCs?
   
   When a configuration generated by ADC is sent to APISIX, it undergoes 
fundamental checks, such as verifying whether its format is valid JSON and 
whether any configuration details are undergoing version rollbacks.
   Upon passing these checks, APISIX replaces its currently running 
configuration entirely, applying the changes to each Nginx worker. 
Subsequently, it rebuilds the routing tree and the LRU cache as required. 
Thereafter, the updated configuration takes effect at the APISIX data plane.
   
   2. What optimises have we implemented?
   
   I mentioned routing trees and LRU cache rebuilding earlier; these are 
virtually the most crucial parts of the optimisation.
   
   APISIX uses a routing system called a radix tree to achieve its efficient 
URI-based routing. As the name suggests, it is a tree-based data structure. 
When routes within APISIX change or other configurations that may affect 
routing are altered, the routing tree is rebuilt. This process is a 
CPU-intensive task and consumes a significant amount of CPU time.
   
   Basically, this relates to alterations in routes and services. In detail, 
APISIX employs a dual versioning system to track configuration changes: an 
overall resource-type-level version and individual resource-level versions 
(e.g., for routes). In etcd mode, this utilises etcd's modifiedIndex as the 
version identifier, but standalone mode lacks this capability.
   Thus, in the traditional file-based standalone mode, updating the 
configuration causes all version numbers to vanish. This invalidates the route 
tree and all LRUCaches, potentially causing transient performance fluctuations. 
Given that configurations in this mode are essentially static, this issue is 
not significant and appears acceptable.
   This coarse-grained approach is unsuitable for API-driven standalone 
configurations, as configuration changes within Kubernetes can occur with 
considerable frequency. For instance, scaling adjustments to a deployment may 
alter the Service endpoint, necessitating immediate updates to APISIX's 
upstream configuration. Such modifications are commonplace in Kubernetes 
environments, rendering the complete invalidation of the entire cache structure 
with each change an untenable burden.
   
   Therefore, the ADC uses its caching mechanism to maintain a timestamp-based 
version numbering system. When a resource undergoes modification, the ADC 
updates only the configuration version numbers within its cache that pertain to 
the altered resource. APISIX has likewise been updated to accommodate this 
pattern, whereby only caches associated with resources within the modified 
version number range are invalidated. Should the change involve no alterations 
whatsoever to the routing tree, the routing tree remains entirely intact and is 
not rebuilt.
   
   Additionally, ADC traditionally always inlines the upstream within the 
service. Each upstream update triggers a service update, causing the service 
version number to change. According to certain internal principles of APISIX, 
when the service changes, the routing tree may also be rebuilt. We have 
optimised this behaviour: during synchronisation, the ADC separates the 
upstream from the service and treats it as a distinct upstream resource. 
Subsequently, the service references this upstream via its upstream ID. Then, 
changes to upstream nodes (Pod IPs) no longer necessitate a rebuild of the 
routing tree.
   
   
   
   The above explains the significant improvements we have implemented. Judging 
by the content, your concerns appear to be unfounded.
   Our team is also conducting performance testing to ensure acceptable 
performance in common scenarios. Should we encounter any unexpected performance 
degradation, we will investigate and rectify it. This research is scheduled to 
conclude prior to the 2.0 General Availability release – indeed, we are 
currently undertaking this work.
   Should you encounter specific performance-related issues, please provide the 
reproducible scenario and your configuration details. We will examine it and, 
where exists, resolve it.
   
   ---
   
   **[1]** ADC adopts a concept derived from LLVM, which we divide into 
frontend, intermediate representation, and backend components. Users input 
specific YAML or JSON configurations into ADC. Following processing by ADC, the 
backend initiates a batch of API calls to push the configuration to various 
backend systems: APISIX, APISIX API-driven standalone. This allows us to 
achieve modularisation and layering within the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to