Superskyyy opened a new issue, #10408: URL: https://github.com/apache/skywalking/issues/10408
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar feature requirement. ### Description Currently, our Python agent is implemented with the Threading module to provide data reporters. Yet with the growth of the Python agent, it is now fully capable and requires more resources than when only tracing was supported (we start many threads and gRPC itself creates even more threads when streaming). Casual testing shows using WRK to benchmark FastAPI, Uvicorn and agent together considerably reduces the throughput. Background: In Python, Global Interpreter Lock at least before Python 3.12 (There's hope that GIL will be removed in somewhat near future) will limit that, **at any given time only one thread can execute their code,** with the exception of I/O time and C lib time. Since our data reporting is mostly I/O bound, threads will not block each other but they introduce a lot of wasted operation of switching threads around to see if they have completed their I/O tasks. *Asyncio*: Asyncio is a built-in Python library to provide cooperative-multitasking (async/await) coroutines. Each of our used protocols (gRPC, HTTP, Kafka) have mature support for asyncio-based clients. This will totally eliminate thread-switching cost within the agent scope and we gain finer control over the I/O wait. *Uvloop*: Uviloop is also a mature library that is used by Uvicorn as a drop-in replacement for the built-in Python event loop. It can provide a 2-4x speed up to the native event loop. The plan is to deprecate or provide an alternative implementation of data reporters (Trace/Log/Meter), maybe also for profilers. The alternative implementation should also work for gRPC/HTTP/Kafka using corresponding async clients. gRPC with asyncio API: https://grpc.github.io/grpc/python/grpc_asyncio.html To replace: Sync API HTTP: AIOHTTP/ HTTPX, I personally prefer aiohttp since it's more mature, but both should be easily swappable and okay. (why not try both and see what is better). To replace: Requests Kafka: Confluent (seems better yet asyncio support is a bit shaky) https://www.confluent.io/blog/kafka-python-asyncio-integration/ aiokafka (may not have that active maintenance) https://github.com/aio-libs/aiokafka To replace: kafka-python (plus it's unmaintained) Important consideration: Eventloop cannot survive forks, be careful to postpone agent start if a fork can be predicted (like gunicorn + Uvicorn worker can work properly). Since FastAPI/ASGI-based web frameworks now dominate the Python web stack, and direct fork usage is very rare (eventloop can safely use processpool.executor) this can happen almost without breaking any user application. Old reporters should be slowly deprecated as a fallback for a release or two. ### Use case Make Python agent-introduced I/O overhead to user applications much lower. ### Related issues There could also be overhead in non-optimized span creation, yet the IO overhead should be addressed first as the primary target. ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
