x42005e1f commented on issue #50185: URL: https://github.com/apache/airflow/issues/50185#issuecomment-2928285691
By the way, below is an excerpt from my unwritten section for the aiologic documentation, which illustrates just how expensive context switching can actually be. It does not show the whole picture, but it may be useful for understanding what seldom-mentioned problems asynchronous programming has, especially in [lock-free and wait-free](https://concurrencyfreaks.blogspot.com/2013/05/lock-free-and-wait-free-definition-and.html) contexts. <details> <summary>challenges.md</summary> Challenges ========== Bridging between concurrency libraries is not the only thing that aiologic was designed to do. The purpose of this section is to show some of the problems considered in its design, in the hope that the interested reader will be able to make the best use of this library by clearly understanding its ideas. A world full of squares ----------------------- How much time are you willing to spend to get all the threads up and running? This may seem like a strange question, but it is not as simple as it seems at first glance. Living in the world of data, we tend to consider the time complexity of the algorithms we know. But what about the asynchronous world? We are used to seeing this world as a black box, forgetting that it is built on the same algorithms, albeit at a level that is not always available to us. And that comes at a price. Suppose we want to launch N threads to perform some long work. Whether it is for parallel processing of some NumPy arrays, for network operations, for simulating some game processes - it does not matter. Here is an example that models our task: ```python import threading import time N = ... stopped = False def work(i): global stopped if i == N - 1: # the last thread stopped = True # stop the work while not stopped: time.sleep(0) # do some work for i in range(N): threading.Thread(target=work, args=[i]).start() ``` In this example, we run the work in separate threads until the last thread starts. Let's see what happens if we set different N's. * N=100: 0.17 seconds * N=200: 0.55 seconds * N=300: 1.19 seconds * N=400: 2.14 seconds * N=500: 3.14 seconds * N=600: 4.69 seconds * N=700: 6.38 seconds * N=800: 8.11 seconds * N=900: 10.48 seconds * N=1000: 12.95 seconds Whoa! Increasing the number of threads by only 10 times increased the time by over 50 times! We can clearly see that the dependence of execution time on the number of threads is not linear - in fact, it is quadratic. Why does this happen? Starting a thread in Python is based on two main operations: 1. Asking the operating system to start a thread. 2. Waiting for that thread to start (e.g. to detect memory leaks). Starting each new thread forces the main thread to do a context switch. However, the operating system needs to emulate the concurrent execution of all threads, so the picture will usually not look like ping-pong between the main thread and the newly created thread - it will need to give CPU resources to the already running threads to execute as well. With a fair scheduling strategy that might look something like this: 1. main thread 2. thread 1 -> main thread 3. thread 1 -> thread 2 -> main thread 4. thread 1 -> thread 2 -> thread 3 -> main thread 5. ... With each new thread, the required number of context switches to start the next one increases. We see a triangle, which becomes a *square* when the constant is discarded - that is where the quadratic complexity comes from! </details> Again, this is not specific to Apache Airflow, however as a learning material it may have some value. Just as a footnote to how non-trivial the impact of approaches that rely on context switching is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
