Happy to report that the issue has been fixed by having a custom DNS policy for BlackBox pods, skipping cluster DNS and pointing to external DNS server.
On Wednesday, November 25, 2020 at 8:29:18 AM UTC-5 Chris Paulraj wrote: > Tried with different build to include network tools, unable to figure out > why the lookup fails. Tried with a blackbox-exporter image from docker hub, > resulting with the same issue, although it lasted for 8 hours without > error. It does look like this is an environmental issue with my setup, > would you be able to help me on how I can increase the DNS lookup timeout > for HTTP probes? Where can I increase the timeout for > "probe_dns_lookup_time_seconds"? -Thank you. > > On Monday, November 23, 2020 at 11:04:12 AM UTC-5 Chris Paulraj wrote: > >> I created the image using RHEL 7 and I could see that the DNS is >> delegated to Openshift node hosting this pod. I was also able to run curl >> command from within the pod which was successful. But as you point out, >> issue could very well be within the image I built, will try to gather more >> information when it happens again. I updated the prometheus & alertmanager >> with most recent version and restarted the pods, keeping my fingers >> crossed. Thank you for your help. >> >> sh-4.2$ cat /etc/resolv.conf >> nameserver 10.244.60.18 >> search prometheus-custom.svc.cluster.local svc.cluster.local >> cluster.local localdomain xyz.com >> options ndots:5 >> sh-4.2$ >> >> On Monday, November 23, 2020 at 10:32:54 AM UTC-5 [email protected] >> wrote: >> >>> The OS that the host is running makes no difference; the question is >>> what OS the container is built from. You'll see this in the Dockerfile >>> used to build the container. >>> >>> If you are using the off-the-shelf docker container for >>> blackbox_exporter then it will be this Dockerfile >>> <https://github.com/prometheus/blackbox_exporter/blob/master/Dockerfile> >>> which >>> builds from quay.io/prometheus/busybox-linux-amd64:latest >>> This in turn appears to come from here >>> <https://github.com/prometheus/busybox>, which in turn is based on >>> debian:buster >>> <https://github.com/prometheus/busybox/blob/master/uclibc/Dockerfile> >>> or debian:buster-slim >>> <https://github.com/prometheus/busybox/blob/master/glibc/Dockerfile>. >>> I think those are systemd-based. >>> >>> I think you should docker exec into the running container, and see if >>> systemd-resolved is running, and/or if /etc/resolv.conf points to >>> 127.0.0.53. If so, the systemd bug I pointed to is relevant. >>> >>> If not, then you can try resolving host >>> arp-executor-sy-shra-arp-p.icl1p.xyz.com yourself to see if it resolves >>> or not. Ultimately, this problem isn't with blackbox-exporter, it's a case >>> of debugging why DNS isn't resolving. Intermittent DNS resolution can also >>> be caused by problems with your authoritative DNS. >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c90cadb9-65fb-4569-b0e0-7e7a650f9079n%40googlegroups.com.

