[GitHub] [airflow] dimberman commented on issue #5788: [POC] multi-threading using asyncio

2020-01-02 Thread GitBox
dimberman commented on issue #5788: [POC] multi-threading using asyncio
URL: https://github.com/apache/airflow/pull/5788#issuecomment-570361064
 
 
   @a
   
   > > @ash recently made some changes to the LocalExecutor to use os.fork 
instead of processes, this makes the multithreading in the LocalExecutor much 
faster and lighter weight.
   > 
   > That's curious. AFAIK, `os.fork` is a child process, not a thread. In 
python, the GIL effectively makes all threading run with preemptive scheduling 
as though it is running in a single thread and the performance gains are almost 
none and sometimes it can hurt performance because the system threads start 
thrashing to get a hold of the GIL. The threads do not run in parallel because 
of the GIL, so parallel performance requires multiprocessing with a process 
pool. I recently delved into a good article on this topic, with a lot of good 
references for more details:
   > 
   > * https://realpython.com/async-io-python
   > * https://realpython.com/python-gil/
   > * plus links to talks by David Beazley and the gilectomy etc.
   
   @ashb perhaps you could speak to this better than I can?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] dimberman commented on issue #5788: [POC] multi-threading using asyncio

2019-12-30 Thread GitBox
dimberman commented on issue #5788: [POC] multi-threading using asyncio
URL: https://github.com/apache/airflow/pull/5788#issuecomment-569744458
 
 
   Is this necessary?
   
   Asyncio doesn’t do multi-threading. It uses a single thread but switches off 
what coroutine uses that thread based on readiness. This is actually a pretty 
important distinction when it comes to parallel processing in python. @ash 
recently made some changes to the LocalExecutor to use os.fork instead of 
processes, this makes the multithreading in the LocalExecutor much faster and 
lighter weight.
   
   I don’t think that there’s really that much blocking in our operators 
outside of specific ones that are making SQL calls etc. I don’t think this 
would be more performant than the current LocalExecutor but would be glad to 
discuss potential benefits I’m not seeing/if you’ve done speed comparisons.
   
   via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=10.0.32=10.14.5=email_footer_2]
   On Sun, Dec 29, 2019 at 5:56 PM, Darren Weber  
wrote:
   Tagging this as related to
   
* 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-28%3A+Add+AsyncExecutor+option
 
[https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-28%3A+Add+AsyncExecutor+option]
* https://issues.apache.org/jira/browse/AIRFLOW-6395 
[https://issues.apache.org/jira/browse/AIRFLOW-6395]
   
   —
   You are receiving this because you authored the thread.
   Reply to this email directly, view it on GitHub 
[https://github.com/apache/airflow/pull/5788?email_source=notifications_token=AAUFRAX635KVKD3QBFCES5DQ3FIONA5CNFSM4ILDNB72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHZNMGY#issuecomment-569562651]
 , or unsubscribe 
[https://github.com/notifications/unsubscribe-auth/AAUFRAVWOHYCRQB6N456YUTQ3FIONANCNFSM4ILDNB7Q]
 .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services