Thanks a bundle, Jesse! On Wed, Dec 16, 2015 at 4:24 PM, A. Jesse Jiryu Davis <je...@emptysquare.net > wrote:
> Committed here: > > > https://github.com/python/asyncio/commit/39c135baf73762830148236da622787052efba19 > > Yury would you please update the CPython repo when the time is right? > > > On Wednesday, December 9, 2015 at 6:05:46 PM UTC-5, A. Jesse Jiryu Davis > wrote: >> >> Done: >> >> https://github.com/python/asyncio/pull/302 >> >> On Wednesday, December 9, 2015 at 2:58:07 PM UTC-5, Guido van Rossum >> wrote: >>> >>> It'll go much quicker if you send a PR to the asyncio github project. >>> Thanks! >>> >>> --Guido (mobile) >>> On Dec 9, 2015 11:56 AM, "A. Jesse Jiryu Davis" <je...@emptysquare.net> >>> wrote: >>> >>>> Thanks for your kind words about the crawler chapter. =) I think we >>>> didn't hit this problem there because the crawler only looked up one >>>> domain, or a few of them. On Mac, subsequent calls are cached at the OS >>>> layer for a few minutes, so getaddrinfo is very fast, even though it's >>>> serialized by the getaddrinfo lock. Besides I don't think the crawler has a >>>> connection timeout, so if it *were *looking up hundreds of domains it >>>> would wait as long as necessary for the lock. >>>> >>>> In this bug report, on the other hand, there are hundreds of coroutines >>>> all waiting to look up different domains, and some of those domains take >>>> several seconds to resolve. Resolving hundreds of different domains, plus >>>> the getaddrinfo lock, plus Motor's 20-second connection timeout, all >>>> combine to create this bad behavior. >>>> >>>> I like option #4 also; that gives me the freedom to either cache >>>> lookups in Motor, or to treat "localhost" specially, or at least to prevent >>>> a slow getaddrinfo from appearing to be a connection timeout. I haven't >>>> decided which is the best approach, but #4 allows me flexibility. Would you >>>> like me to write the patch or will you? >>>> >>>> I'm curious about #5 too, but I think that's a longer term project. >>>> >>>> On Wednesday, December 9, 2015 at 12:57:53 PM UTC-5, Guido van Rossum >>>> wrote: >>>>> >>>>> 4. Modify asyncio's getaddrinfo() so that if you pass it something >>>>> that looks like a numerical address (IPv4 or IPv6 syntax, looking exactly >>>>> like what getaddrinfo() returns) it skips calling socket.getaddrinfo(). >>>>> I've wanted this for a while but hadn't run into this queue issue, so this >>>>> is my favorite. >>>>> >>>>> 5. Do the research needed to prove that socket.getaddrinfo() on OS X >>>>> (or perhaps on sufficiently recent versions of OS X) is thread-safe and >>>>> submit a patch that avoids the getaddrinfo lock on those versions of OS X. >>>>> >>>>> FWIW, IIRC my crawler example (which your editing made so much >>>>> better!) calls getaddrinfo() for every connection and I hadn't experienced >>>>> any slowness there. But maybe in the Motor example you're hitting a slow >>>>> DNS server? (I'm guessing there are lots of system configuration >>>>> parameters >>>>> that may make the system's getaddrinfo() slower or faster, and in your >>>>> setup it may well be slower.) >>>>> >>>>> On Wed, Dec 9, 2015 at 7:05 AM, A. Jesse Jiryu Davis < >>>>> je...@emptysquare.net> wrote: >>>>> >>>>>> Thanks Guido, this all makes sense. >>>>>> >>>>>> One problem, though, is that even if I call getaddrinfo myself in >>>>>> Motor and wrap a cache around it, I *still* can't use the event >>>>>> loop's create_connection() call. create_connection always >>>>>> calls getaddrinfo, and even though it "should be quick", that doesn't >>>>>> matter in this scenario: the time is spent waiting for the getaddrinfo >>>>>> lock. So I would need one of: >>>>>> >>>>>> 1. Copy the body of create_connection into Motor so I can customize >>>>>> the getaddrinfo call. I especially don't like this because I'd have to >>>>>> call >>>>>> the loop's private _create_connection_transport() from Motor's >>>>>> customized create_connection(). Perhaps we could >>>>>> make _create_connection_transport public, or otherwise make a public API >>>>>> for separating getaddrinfo from actually establishing the connection? >>>>>> >>>>>> 2. Make getaddrinfo customizable in asyncio ( >>>>>> https://github.com/python/asyncio/issues/160). This isn't ideal, >>>>>> since it requires Motor users on Mac / BSD to change configuration for >>>>>> the >>>>>> whole event loop just so Motor's specific create_connection calls behave >>>>>> correctly. >>>>>> >>>>>> 3. Back to the original proposal: add a connection timeout parameter >>>>>> to create_connection. =) >>>>>> >>>>>> On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum >>>>>> wrote: >>>>>> >>>>>>> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis < >>>>>>> je...@emptysquare.net> wrote: >>>>>>> >>>>>>>> Hi, a Motor user began an interesting discussion on the >>>>>>>> MongoDB-user list: >>>>>>>> >>>>>>>> >>>>>>>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion >>>>>>>> >>>>>>>> The summary is this: he's fetching hundreds of URLs concurrently >>>>>>>> and inserting the results into MongoDB with Motor. Motor throws lots of >>>>>>>> connection-timeout errors. The problem is getaddrinfo: on Mac, Python >>>>>>>> only >>>>>>>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches in >>>>>>>> progress, there's a long queue waiting for the getaddrinfo lock. >>>>>>>> Whenever >>>>>>>> Motor wants to grow its connection pool it has to call getaddrinfo on >>>>>>>> "localhost", and it spends so long waiting for that call, it times out >>>>>>>> and >>>>>>>> thinks it can't reach MongoDB. >>>>>>>> >>>>>>> >>>>>>> If it's really looking up "localhost" over and over, maybe wrap a >>>>>>> cache around getaddrinfo()? >>>>>>> >>>>>>> >>>>>>>> Motor's connection-timeout implementation in asyncio is sort of >>>>>>>> wrong: >>>>>>>> >>>>>>>> coro = asyncio.open_connection(host, port) >>>>>>>> sock = yield from asyncio.wait_for(coro, timeout) >>>>>>>> >>>>>>>> The timer runs during the call to getaddrinfo, as well as the call >>>>>>>> to the loop's sock_connect(). This isn't the intention: the timeout >>>>>>>> should >>>>>>>> apply only to the connection. >>>>>>>> >>>>>>>> A philosophical digression: The "connection timeout" is a >>>>>>>> heuristic. "If I've waited N seconds and haven't established the >>>>>>>> connection, I probably never will. Give up." Based on what they know >>>>>>>> about >>>>>>>> their own networks, users can tweak the connection timeout. In a fast >>>>>>>> network, a server that hasn't responded in 20ms is probably down; but >>>>>>>> on a >>>>>>>> global network, 10 seconds might be reasonable. Regardless, the >>>>>>>> heuristic >>>>>>>> only applies to the actual TCP connection. Waiting for getaddrinfo is >>>>>>>> not >>>>>>>> related; that's up to the operating system. >>>>>>>> >>>>>>>> In a multithreaded client like PyMongo we distinguish the two >>>>>>>> phases: >>>>>>>> >>>>>>>> for res in socket.getaddrinfo(host, port, family, >>>>>>>> socket.SOCK_STREAM): >>>>>>>> af, socktype, proto, dummy, sa = res >>>>>>>> sock = socket.socket(af, socktype, proto) >>>>>>>> try: >>>>>>>> sock.settimeout(connect_timeout) >>>>>>>> >>>>>>>> # THE TIMEOUT ONLY APPLIES HERE. >>>>>>>> sock.connect(sa) >>>>>>>> sock.settimeout(None) >>>>>>>> return sock >>>>>>>> except socket.error as e: >>>>>>>> # Connection refused, or not established within the >>>>>>>> timeout. >>>>>>>> sock.close() >>>>>>>> >>>>>>>> Here, the call to getaddrinfo isn't timed at all, and each distinct >>>>>>>> attempt to connect on a different address is timed separately. So this >>>>>>>> kind >>>>>>>> of code matches the idea of a "connect timeout" as a heuristic for >>>>>>>> deciding >>>>>>>> whether the server is down. >>>>>>>> >>>>>>>> Two questions: >>>>>>>> >>>>>>>> 1. Should asyncio.open_connection support a connection timeout that >>>>>>>> acts like the blocking version above? That is, a connection timeout >>>>>>>> that >>>>>>>> does not include getaddrinfo, and restarts for each address we attempt >>>>>>>> to >>>>>>>> connect to? >>>>>>>> >>>>>>> >>>>>>> Hm, I don't really like adding timeouts to every API. As you >>>>>>> describe everyone has different needs. IMO if you don't want the >>>>>>> timeout to >>>>>>> cover the getaddrinfo() call, call getaddrinfo() yourself and pass the >>>>>>> host >>>>>>> address into the create_connection() call. That way you also have >>>>>>> control >>>>>>> over whether to e.g. implement "happy eyeballs". (It will still call >>>>>>> socket.getaddrinfo(), but it should be quick -- it's not going to a DNS >>>>>>> server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.) >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> 2. Why does Python lock around getaddrinfo on Mac and Windows >>>>>>>> anyway? The code comment says these are "systems on which >>>>>>>> getaddrinfo() is >>>>>>>> believed to not be thread-safe". Has this belief ever been confirmed? >>>>>>>> >>>>>>>> >>>>>>>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 >>>>>>>> >>>>>>> >>>>>>> I don't know -- the list of ifdefs seems to indicate this is a >>>>>>> generic BSD issue, which is OS X's heritage. Maybe someone can do an >>>>>>> experiment, or review the source code used by Apple (if it's still open >>>>>>> source)? While I agree that if this really isn't an issue we shouldn't >>>>>>> bother with the lock, I'd also much rather be safe than sorry when it >>>>>>> comes >>>>>>> to races in core Python. >>>>>>> >>>>>>> -- >>>>>>> --Guido van Rossum (python.org/~guido) >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> --Guido van Rossum (python.org/~guido) >>>>> >>>> -- --Guido van Rossum (python.org/~guido)