End-user troubleshooting of bad c-ares interaction with router

2024-01-17 Thread Nicholas Chammas via c-ares
Hello,

I am trying to troubleshoot a problem as an end-user of c-ares. I use a library 
(Apache Spark Connect 
) that uses 
gRPC, which in turn uses c-ares. I am two levels removed from c-ares itself and 
am a little out of my depth.

I have a little Python script that connects to a remote Apache Spark cluster 
via Spark Connect and runs a test query. When I run this script on my home 
network, it takes over 20 seconds to run. When I tether my workstation to my 
phone (which is connected via LTE), the same script runs in a second or two. In 
both cases the script runs successfully.

I enabled some gRPC debug flags which print out a lot of information. This led 
me to c-ares, as I believe the difference in runtime is related somehow to DNS.

I’ve extracted the log lines output by gRPC related to c-ares 
. (Be sure 
to scroll down to see both files; there is one for home and one for LTE.) The 
gRPC codebase is hosted on GitHub, where you can find the grpc_ares_wrapper.cc 

 file mentioned in the log files.

I tried changing the DNS servers configured in my home router but that didn’t 
seem to help. Interestingly, however, if I set the same DNS servers already 
configured in my home router directly on the network interface I’m using, the 
20 second delay disappears:

```
networksetup -setdnsservers “My Network" 1.1.1.1 1.0.0.1
```

But this setting doesn’t persist across restarts, and only Spark Connect seems 
to have this problem. It seems there is some kind of bad interaction between 
c-ares and my router.

How can I dig deeper to understand what’s going wrong with my home network? I 
checked the c-ares docs  but I don’t see a way 
for an end-user to enable debug output from c-ares, e.g. via an environment 
variable.

Any suggestions? I’m running macOS 14.2.1. The router is an Apple AirPort.

Nick

-- 
c-ares mailing list
c-ares@lists.haxx.se
https://lists.haxx.se/mailman/listinfo/c-ares


Re: End-user troubleshooting of bad c-ares interaction with router

2024-01-19 Thread Nicholas Chammas via c-ares

> On Jan 17, 2024, at 3:38 PM, Brad House  wrote:
> What version of c-ares is installed?
> 
Sorry about the delay in responding. Answering this question is more difficult 
than I expected.

I know that Spark Connect is running gRPC 1.160.0. Looking through the gRPC 
repo, I see mention of c-ares 1.13.0 
, but I don’t 
know how that translates to my runtime. Homebrew tells me I have c-ares 1.25.0 
installed, but again, I’m not sure if that’s what I’m actually running.

Is there a way I can directly query the version of c-ares being run via Spark 
Connect / gRPC? I asked this question on the gRPC forum 
 but no response yet.

For the record, I know that c-ares is involved because if I tell gRPC to not 
use it (via GRPC_DNS_RESOLVER=native 
)
 then my problem disappears.
> What DNS servers are configured on your MacOS system when its not operating 
> properly?  The output of "scutil --dns" would be helpful here.
> 
Here’s that output. 

 I believe 192.168.1.1 is just my local router, and on there is where I have 
the default DNS servers set to 1.1.1.1 and 1.0.0.1.

-- 
c-ares mailing list
c-ares@lists.haxx.se
https://lists.haxx.se/mailman/listinfo/c-ares


Re: End-user troubleshooting of bad c-ares interaction with router

2024-01-22 Thread Nicholas Chammas via c-ares
Here’s the output of adig and ahost 
,
 both with and without the DNS servers set directly on the network interface 
(vs. just on the router).

I also learned that gRPC 1.60.0 may be using c-ares 1.19.1 
, though again 
that’s just via looking at the gRPC source and not via some runtime query.


> On Jan 21, 2024, at 7:34 AM, Brad House  wrote:
> 
> I think homebrew distributes the 'adig' and 'ahost' utilities from c-ares.  
> Can you try using those to do the same lookup so we can see the results?
> 
> On 1/19/24 11:01 AM, Nicholas Chammas wrote:
>> 
>>> On Jan 17, 2024, at 3:38 PM, Brad House  
>>>  wrote:
>>> What version of c-ares is installed?
>>> 
>> Sorry about the delay in responding. Answering this question is more 
>> difficult than I expected.
>> 
>> I know that Spark Connect is running gRPC 1.160.0. Looking through the gRPC 
>> repo, I see mention of c-ares 1.13.0 
>> , but I 
>> don’t know how that translates to my runtime. Homebrew tells me I have 
>> c-ares 1.25.0 installed, but again, I’m not sure if that’s what I’m actually 
>> running.
>> 
>> Is there a way I can directly query the version of c-ares being run via 
>> Spark Connect / gRPC? I asked this question on the gRPC forum 
>>  but no response yet.
>> 
>> For the record, I know that c-ares is involved because if I tell gRPC to not 
>> use it (via GRPC_DNS_RESOLVER=native 
>> )
>>  then my problem disappears.
>>> What DNS servers are configured on your MacOS system when its not operating 
>>> properly?  The output of "scutil --dns" would be helpful here.
>>> 
>> Here’s that output. 
>> 
>>  I believe 192.168.1.1 is just my local router, and on there is where I have 
>> the default DNS servers set to 1.1.1.1 and 1.0.0.1.
>> 

-- 
c-ares mailing list
c-ares@lists.haxx.se
https://lists.haxx.se/mailman/listinfo/c-ares


Re: End-user troubleshooting of bad c-ares interaction with router

2024-01-23 Thread Nicholas Chammas via c-ares
Thank you for all the troubleshooting help, Brad.

I am using gRPC via Apache Spark Connect (a Python library), so I am two levels 
removed from c-ares itself. Looking in the Python virtual environment where 
gRPC is installed, I’m not sure what file to run otool on. The only seemingly 
relevant file I could find is called cygrpc.cpython-311-darwin.so, and otool 
didn’t turn up anything interesting on it.

I will take this issue up with the gRPC folks.

I see in several places that the gRPC folks are using ares_gethostbyname:
https://github.com/grpc/grpc/blob/v1.60.0/src/core/lib/event_engine/ares_resolver.cc#L287-L293
https://github.com/grpc/grpc/blob/v1.60.0/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc#L748-L758
https://github.com/grpc/grpc/blob/v1.60.0/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc#L1075-L1086


> On Jan 22, 2024, at 1:39 PM, Brad House  wrote:
> 
> Are you using gRPC installed via homebrew or is it bundled with something 
> else?  Usually package maintainers like homebrew will dynamically link to the 
> system versions of dependencies so they can be updated independently.  You 
> might be able to run otool -L on grpc to see what c-ares library its picking 
> up (and if none are listed, it might be compiled in statically).
> 
> That said, according to your grpc logs, it appears that grpc may be itself 
> performing both A and  queries and expect responses to both of those.  I 
> see the "A" reply comes back but the "" reply never comes and it bails at 
> that point.  Many years ago c-ares didn't have a way to request both A and 
>  records with one query, but does these days via ares_getaddrinfo(), and 
> it was recently enhanced with logic to assist in the exact scenario you are 
> seeing, basically it will stop retrying when at least one address family is 
> returned. 
> 
> You might need to escalate this to the gRPC folks.
> 
> On 1/22/24 12:10 PM, Nicholas Chammas wrote:
>> Here’s the output of adig and ahost 
>> ,
>>  both with and without the DNS servers set directly on the network interface 
>> (vs. just on the router).
>> 
>> I also learned that gRPC 1.60.0 may be using c-ares 1.19.1 
>> , though again 
>> that’s just via looking at the gRPC source and not via some runtime query.
>> 
>> 
>>> On Jan 21, 2024, at 7:34 AM, Brad House  
>>>  wrote:
>>> 
>>> I think homebrew distributes the 'adig' and 'ahost' utilities from c-ares.  
>>> Can you try using those to do the same lookup so we can see the results?
>>> 
>>> On 1/19/24 11:01 AM, Nicholas Chammas wrote:
 
> On Jan 17, 2024, at 3:38 PM, Brad House  
>  wrote:
> What version of c-ares is installed?
> 
 Sorry about the delay in responding. Answering this question is more 
 difficult than I expected.
 
 I know that Spark Connect is running gRPC 1.160.0. Looking through the 
 gRPC repo, I see mention of c-ares 1.13.0 
 , but I 
 don’t know how that translates to my runtime. Homebrew tells me I have 
 c-ares 1.25.0 installed, but again, I’m not sure if that’s what I’m 
 actually running.
 
 Is there a way I can directly query the version of c-ares being run via 
 Spark Connect / gRPC? I asked this question on the gRPC forum 
  but no response yet.
 
 For the record, I know that c-ares is involved because if I tell gRPC to 
 not use it (via GRPC_DNS_RESOLVER=native 
 )
  then my problem disappears.
> What DNS servers are configured on your MacOS system when its not 
> operating properly?  The output of "scutil --dns" would be helpful here.
> 
 Here’s that output. 
 
  I believe 192.168.1.1 is just my local router, and on there is where I 
 have the default DNS servers set to 1.1.1.1 and 1.0.0.1.
 
>> 

-- 
c-ares mailing list
c-ares@lists.haxx.se
https://lists.haxx.se/mailman/listinfo/c-ares


Re: End-user troubleshooting of bad c-ares interaction with router

2024-01-23 Thread Nicholas Chammas via c-ares
To close the loop on this discussion, I’ve filed the following issue with the 
gRPC folks:

https://github.com/grpc/grpc/issues/35638

Thank you again for all of your help. I would not have been able to understand 
what’s going on without it.


> On Jan 23, 2024, at 11:43 AM, Brad House  wrote:
> 
> Yeah, it does clearly show them enqueuing IPv4 and IPv6 requests separately.  
> So either they need to add logic similar to c-ares has internally with 
> https://github.com/c-ares/c-ares/pull/551 or just use ares_getaddrinfo() 
> instead of ares_gethostbyname() with address family AF_UNSPEC and let c-ares 
> do the right thing.
> 
> 
> 
> On 1/23/24 11:25 AM, Nicholas Chammas wrote:
>> Thank you for all the troubleshooting help, Brad.
>> 
>> I am using gRPC via Apache Spark Connect (a Python library), so I am two 
>> levels removed from c-ares itself. Looking in the Python virtual environment 
>> where gRPC is installed, I’m not sure what file to run otool on. The only 
>> seemingly relevant file I could find is called cygrpc.cpython-311-darwin.so, 
>> and otool didn’t turn up anything interesting on it.
>> 
>> I will take this issue up with the gRPC folks.
>> 
>> I see in several places that the gRPC folks are using ares_gethostbyname:
>> https://github.com/grpc/grpc/blob/v1.60.0/src/core/lib/event_engine/ares_resolver.cc#L287-L293
>> https://github.com/grpc/grpc/blob/v1.60.0/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc#L748-L758
>> https://github.com/grpc/grpc/blob/v1.60.0/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc#L1075-L1086
>> 
>> 
>>> On Jan 22, 2024, at 1:39 PM, Brad House  
>>>  wrote:
>>> 
>>> Are you using gRPC installed via homebrew or is it bundled with something 
>>> else?  Usually package maintainers like homebrew will dynamically link to 
>>> the system versions of dependencies so they can be updated independently.  
>>> You might be able to run otool -L on grpc to see what c-ares library its 
>>> picking up (and if none are listed, it might be compiled in statically).
>>> 
>>> That said, according to your grpc logs, it appears that grpc may be itself 
>>> performing both A and  queries and expect responses to both of those.  
>>> I see the "A" reply comes back but the "" reply never comes and it 
>>> bails at that point.  Many years ago c-ares didn't have a way to request 
>>> both A and  records with one query, but does these days via 
>>> ares_getaddrinfo(), and it was recently enhanced with logic to assist in 
>>> the exact scenario you are seeing, basically it will stop retrying when at 
>>> least one address family is returned. 
>>> 
>>> You might need to escalate this to the gRPC folks.
>>> 
>>> On 1/22/24 12:10 PM, Nicholas Chammas wrote:
 Here’s the output of adig and ahost 
 ,
  both with and without the DNS servers set directly on the network 
 interface (vs. just on the router).
 
 I also learned that gRPC 1.60.0 may be using c-ares 1.19.1 
 , though 
 again that’s just via looking at the gRPC source and not via some runtime 
 query.
 
 
> On Jan 21, 2024, at 7:34 AM, Brad House  
>  wrote:
> 
> I think homebrew distributes the 'adig' and 'ahost' utilities from 
> c-ares.  Can you try using those to do the same lookup so we can see the 
> results?
> 
> On 1/19/24 11:01 AM, Nicholas Chammas wrote:
>> 
>>> On Jan 17, 2024, at 3:38 PM, Brad House  
>>>  wrote:
>>> What version of c-ares is installed?
>>> 
>> Sorry about the delay in responding. Answering this question is more 
>> difficult than I expected.
>> 
>> I know that Spark Connect is running gRPC 1.160.0. Looking through the 
>> gRPC repo, I see mention of c-ares 1.13.0 
>> , but I 
>> don’t know how that translates to my runtime. Homebrew tells me I have 
>> c-ares 1.25.0 installed, but again, I’m not sure if that’s what I’m 
>> actually running.
>> 
>> Is there a way I can directly query the version of c-ares being run via 
>> Spark Connect / gRPC? I asked this question on the gRPC forum 
>>  but no response yet.
>> 
>> For the record, I know that c-ares is involved because if I tell gRPC to 
>> not use it (via GRPC_DNS_RESOLVER=native 
>> )
>>  then my problem disappears.
>>> What DNS servers are configured on your MacOS system when its not 
>>> operating properly?  The output of "scutil --dns" would be helpful here.
>>> 
>>