[GitHub] [airflow] dimberman commented on issue #5788: [POC] multi-threading using asyncio
dimberman commented on issue #5788: [POC] multi-threading using asyncio URL: https://github.com/apache/airflow/pull/5788#issuecomment-570361064 @a > > @ash recently made some changes to the LocalExecutor to use os.fork instead of processes, this makes the multithreading in the LocalExecutor much faster and lighter weight. > > That's curious. AFAIK, `os.fork` is a child process, not a thread. In python, the GIL effectively makes all threading run with preemptive scheduling as though it is running in a single thread and the performance gains are almost none and sometimes it can hurt performance because the system threads start thrashing to get a hold of the GIL. The threads do not run in parallel because of the GIL, so parallel performance requires multiprocessing with a process pool. I recently delved into a good article on this topic, with a lot of good references for more details: > > * https://realpython.com/async-io-python > * https://realpython.com/python-gil/ > * plus links to talks by David Beazley and the gilectomy etc. @ashb perhaps you could speak to this better than I can? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] dimberman commented on issue #5788: [POC] multi-threading using asyncio
dimberman commented on issue #5788: [POC] multi-threading using asyncio URL: https://github.com/apache/airflow/pull/5788#issuecomment-569744458 Is this necessary? Asyncio doesn’t do multi-threading. It uses a single thread but switches off what coroutine uses that thread based on readiness. This is actually a pretty important distinction when it comes to parallel processing in python. @ash recently made some changes to the LocalExecutor to use os.fork instead of processes, this makes the multithreading in the LocalExecutor much faster and lighter weight. I don’t think that there’s really that much blocking in our operators outside of specific ones that are making SQL calls etc. I don’t think this would be more performant than the current LocalExecutor but would be glad to discuss potential benefits I’m not seeing/if you’ve done speed comparisons. via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx=10.0.32=10.14.5=email_footer_2] On Sun, Dec 29, 2019 at 5:56 PM, Darren Weber wrote: Tagging this as related to * https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-28%3A+Add+AsyncExecutor+option [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-28%3A+Add+AsyncExecutor+option] * https://issues.apache.org/jira/browse/AIRFLOW-6395 [https://issues.apache.org/jira/browse/AIRFLOW-6395] — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [https://github.com/apache/airflow/pull/5788?email_source=notifications_token=AAUFRAX635KVKD3QBFCES5DQ3FIONA5CNFSM4ILDNB72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHZNMGY#issuecomment-569562651] , or unsubscribe [https://github.com/notifications/unsubscribe-auth/AAUFRAVWOHYCRQB6N456YUTQ3FIONANCNFSM4ILDNB7Q] . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services