Thanks for your kind words about the crawler chapter. =) I think we didn't 
hit this problem there because the crawler only looked up one domain, or a 
few of them. On Mac, subsequent calls are cached at the OS layer for a few 
minutes, so getaddrinfo is very fast, even though it's serialized by the 
getaddrinfo lock. Besides I don't think the crawler has a connection 
timeout, so if it *were *looking up hundreds of domains it would wait as 
long as necessary for the lock.

In this bug report, on the other hand, there are hundreds of coroutines all 
waiting to look up different domains, and some of those domains take 
several seconds to resolve. Resolving hundreds of different domains, plus 
the getaddrinfo lock, plus Motor's 20-second connection timeout, all 
combine to create this bad behavior.

I like option #4 also; that gives me the freedom to either cache lookups in 
Motor, or to treat "localhost" specially, or at least to prevent a slow 
getaddrinfo from appearing to be a connection timeout. I haven't decided 
which is the best approach, but #4 allows me flexibility. Would you like me 
to write the patch or will you?

I'm curious about #5 too, but I think that's a longer term project.

On Wednesday, December 9, 2015 at 12:57:53 PM UTC-5, Guido van Rossum wrote:
>
> 4. Modify asyncio's getaddrinfo() so that if you pass it something that 
> looks like a numerical address (IPv4 or IPv6 syntax, looking exactly like 
> what getaddrinfo() returns) it skips calling socket.getaddrinfo(). I've 
> wanted this for a while but hadn't run into this queue issue, so this is my 
> favorite.
>
> 5. Do the research needed to prove that socket.getaddrinfo() on OS X (or 
> perhaps on sufficiently recent versions of OS X) is thread-safe and submit 
> a patch that avoids the getaddrinfo lock on those versions of OS X.
>
> FWIW, IIRC my crawler example (which your editing made so much better!) 
> calls getaddrinfo() for every connection and I hadn't experienced any 
> slowness there. But maybe in the Motor example you're hitting a slow DNS 
> server? (I'm guessing there are lots of system configuration parameters 
> that may make the system's getaddrinfo() slower or faster, and in your 
> setup it may well be slower.)
>
> On Wed, Dec 9, 2015 at 7:05 AM, A. Jesse Jiryu Davis <
> je...@emptysquare.net <javascript:>> wrote:
>
>> Thanks Guido, this all makes sense.
>>
>> One problem, though, is that even if I call getaddrinfo myself in Motor 
>> and wrap a cache around it, I *still* can't use the event 
>> loop's create_connection() call. create_connection always 
>> calls getaddrinfo, and even though it "should be quick", that doesn't 
>> matter in this scenario: the time is spent waiting for the getaddrinfo 
>> lock. So I would need one of:
>>
>> 1. Copy the body of create_connection into Motor so I can customize the 
>> getaddrinfo call. I especially don't like this because I'd have to call the 
>> loop's private _create_connection_transport() from Motor's 
>> customized create_connection(). Perhaps we could 
>> make _create_connection_transport public, or otherwise make a public API 
>> for separating getaddrinfo from actually establishing the connection?
>>
>> 2. Make getaddrinfo customizable in asyncio (
>> https://github.com/python/asyncio/issues/160). This isn't ideal, since 
>> it requires Motor users on Mac / BSD to change configuration for the whole 
>> event loop just so Motor's specific create_connection calls behave 
>> correctly.
>>
>> 3. Back to the original proposal: add a connection timeout parameter to 
>> create_connection. =)
>>
>> On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum wrote:
>>
>>> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis <
>>> je...@emptysquare.net> wrote:
>>>
>>>> Hi, a Motor user began an interesting discussion on the MongoDB-user 
>>>> list:
>>>>
>>>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion
>>>>
>>>> The summary is this: he's fetching hundreds of URLs concurrently and 
>>>> inserting the results into MongoDB with Motor. Motor throws lots of 
>>>> connection-timeout errors. The problem is getaddrinfo: on Mac, Python only 
>>>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches in 
>>>> progress, there's a long queue waiting for the getaddrinfo lock. Whenever 
>>>> Motor wants to grow its connection pool it has to call getaddrinfo on 
>>>> "localhost", and it spends so long waiting for that call, it times out and 
>>>> thinks it can't reach MongoDB.
>>>>
>>>
>>> If it's really looking up "localhost" over and over, maybe wrap a cache 
>>> around getaddrinfo()?
>>>  
>>>
>>>> Motor's connection-timeout implementation in asyncio is sort of wrong:
>>>>
>>>>     coro = asyncio.open_connection(host, port)
>>>>     sock = yield from asyncio.wait_for(coro, timeout)
>>>>
>>>> The timer runs during the call to getaddrinfo, as well as the call to 
>>>> the loop's sock_connect(). This isn't the intention: the timeout should 
>>>> apply only to the connection.
>>>>
>>>> A philosophical digression: The "connection timeout" is a heuristic. 
>>>> "If I've waited N seconds and haven't established the connection, I 
>>>> probably never will. Give up." Based on what they know about their own 
>>>> networks, users can tweak the connection timeout. In a fast network, a 
>>>> server that hasn't responded in 20ms is probably down; but on a global 
>>>> network, 10 seconds might be reasonable. Regardless, the heuristic only 
>>>> applies to the actual TCP connection. Waiting for getaddrinfo is not 
>>>> related; that's up to the operating system.
>>>>
>>>> In a multithreaded client like PyMongo we distinguish the two phases:
>>>>
>>>>     for res in socket.getaddrinfo(host, port, family, 
>>>> socket.SOCK_STREAM):
>>>>         af, socktype, proto, dummy, sa = res
>>>>         sock = socket.socket(af, socktype, proto)
>>>>         try:
>>>>             sock.settimeout(connect_timeout)
>>>>             
>>>>             # THE TIMEOUT ONLY APPLIES HERE.
>>>>             sock.connect(sa)
>>>>             sock.settimeout(None)
>>>>             return sock
>>>>         except socket.error as e:
>>>>             # Connection refused, or not established within the timeout.
>>>>             sock.close()
>>>>
>>>> Here, the call to getaddrinfo isn't timed at all, and each distinct 
>>>> attempt to connect on a different address is timed separately. So this 
>>>> kind 
>>>> of code matches the idea of a "connect timeout" as a heuristic for 
>>>> deciding 
>>>> whether the server is down.
>>>>
>>>> Two questions:
>>>>
>>>> 1. Should asyncio.open_connection support a connection timeout that 
>>>> acts like the blocking version above? That is, a connection timeout that 
>>>> does not include getaddrinfo, and restarts for each address we attempt to 
>>>> connect to?
>>>>
>>>
>>> Hm, I don't really like adding timeouts to every API. As you describe 
>>> everyone has different needs. IMO if you don't want the timeout to cover 
>>> the getaddrinfo() call, call getaddrinfo() yourself and pass the host 
>>> address into the create_connection() call. That way you also have control 
>>> over whether to e.g. implement "happy eyeballs". (It will still call 
>>> socket.getaddrinfo(), but it should be quick -- it's not going to a DNS 
>>> server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.)
>>>  
>>>
>>>>
>>>> 2. Why does Python lock around getaddrinfo on Mac and Windows anyway? 
>>>> The code comment says these are "systems on which getaddrinfo() is 
>>>> believed 
>>>> to not be thread-safe". Has this belief ever been confirmed?
>>>>
>>>>
>>>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185
>>>>
>>>
>>> I don't know -- the list of ifdefs seems to indicate this is a generic 
>>> BSD issue, which is OS X's heritage. Maybe someone can do an experiment, or 
>>> review the source code used by Apple (if it's still open source)? While I 
>>> agree that if this really isn't an issue we shouldn't bother with the lock, 
>>> I'd also much rather be safe than sorry when it comes to races in core 
>>> Python.
>>>
>>> -- 
>>> --Guido van Rossum (python.org/~guido)
>>>
>>
>
>
> -- 
> --Guido van Rossum (python.org/~guido)
>

Reply via email to