[Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests
Hello, This patch fixes a buffer overflow in TCP requests. Since the read is not actually being retried, the byte written by the child can be left in the pipe. When that happens, cache_recv_insert() reads the length of the name, which is now multiplied by 256 due to the extra 0 byte (8 bit shift) and results in daemon->namebuff being overflowed. Namebuff is immediately before the daemon struct in memory so it ends up corrupting the beginning of the daemon struct. diff --git a/src/dnsmasq.c b/src/dnsmasq.c index 6481de8..457dea3 100644 --- a/src/dnsmasq.c +++ b/src/dnsmasq.c @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) single byte comes back up the pipe, which is sent by the child after it has closed the netlink socket. */ - retry_send(read(pipefd[0], &a, 1)); + while (retry_send(read(pipefd[0], &a, 1))); #endif break; } @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now) #ifdef HAVE_LINUX_NETWORK /* See comment above re netlink socket. */ close(daemon->netlinkfd); - retry_send(write(pipefd[1], &a, 1)); + while (retry_send(write(pipefd[1], &a, 1))); #endif } Frank ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests
Hey Frank, dear list-members, thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81 early on and we're seeing reports for mysterious crashes scattered all over the dnsmasq code since the very first days of releasing our latest version. Crashes reports show several locations, e.g., in_zone(), iface_check() and even main() in a few places. TCP forks were always happening sometime (not necessarily immediately) before a crash. Running dnsmasq in debug mode (prevents forking) stopped any crashes. So far, we haven't had the time to dig into this ourselves properly, however, your description perfectly matches with my preliminary conclusion for one week ago: > the dangling pointer is in **all** bug reports known so far in the daemon pointer (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367) The addition of TCP forks being able to add cache replies in dnsmasq v2.81 is what is causing this code (which has been like this for along time) to become problematic. I recently looked at the same code but missed the chance that the byte may be still in the pipe. This is a very good catch! I'm writing this mail to raise awareness that this appears to be a high severity bug as it can easily lead to anything between indeterminable behavior to fatal failures (typically the case). Even when the bug can only be triggered under certain conditions, it recently caused over 200 posts on the Pi-hole issue ticket system (Github). Crashes due to this have been reported independently by several (more than 20) individual users. Best Dominik On Wed, 2020-06-17 at 19:52 -0700, Frank wrote: > Hello, > > This patch fixes a buffer overflow in TCP requests. Since the read is > not actually being retried, the byte written by the child can be left > in the pipe. When that happens, cache_recv_insert() reads the length > of the name, which is now multiplied by 256 due to the extra 0 byte > (8 bit shift) and results in daemon->namebuff being overflowed. > Namebuff is immediately before the daemon struct in memory so it ends > up corrupting the beginning of the daemon struct. > > diff --git a/src/dnsmasq.c b/src/dnsmasq.c > index 6481de8..457dea3 100644 > --- a/src/dnsmasq.c > +++ b/src/dnsmasq.c > @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) >single byte comes back up the pipe, which >is sent by the child after it has closed > the >netlink socket. */ > - retry_send(read(pipefd[0], &a, 1)); > + while (retry_send(read(pipefd[0], &a, 1))); > #endif > break; > } > @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now) > #ifdef HAVE_LINUX_NETWORK > /* See comment above re netlink socket. */ > close(daemon->netlinkfd); > - retry_send(write(pipefd[1], &a, 1)); > + while (retry_send(write(pipefd[1], &a, 1))); > #endif > } > > Frank > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests
That's a nasty bug, and could explain what pi-hole users are seeing. If I understand things correctly, this bug will only manifest itself when the write() or read() syscalls return EINTR ir EAGAIN, which is possible, but not common, hence the bugs wasn't detected earlier. Frank, did you find a way to reproduce this on demand, or just capture sporadic instances? I've applied Frank's patch, but immediately superseded it with a more general clean-up which should have the same effect. I've tagged a 2.82test1 release with this. If that fixes the problem for pi-hole, I plan to go to 2.82 fairly quickly: I don't want this hanging around. It's slightly ironic that the original change in 2.81 is to fix a theoretical hang which I've never actually seen and cannot reproduce, but it introduced a really crash bug. Cheers, Simon. On 18/06/2020 22:42, Dominik wrote: > Hey Frank, dear list-members, > > thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81 > early on and we're seeing reports for mysterious crashes scattered all > over the dnsmasq code since the very first days of releasing our latest > version. Crashes reports show several locations, e.g., in_zone(), > iface_check() and even main() in a few places. TCP forks were always > happening sometime (not necessarily immediately) before a crash. > Running dnsmasq in debug mode (prevents forking) stopped any crashes. > > So far, we haven't had the time to dig into this ourselves properly, > however, your description perfectly matches with my preliminary > conclusion for one week ago: >> the dangling pointer is in **all** bug reports known so far in the > daemon pointer > (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367) > > The addition of TCP forks being able to add cache replies in dnsmasq > v2.81 is what is causing this code (which has been like this for along > time) to become problematic. I recently looked at the same code but > missed the chance that the byte may be still in the pipe. This is a > very good catch! > > I'm writing this mail to raise awareness that this appears to be a high > severity bug as it can easily lead to anything between indeterminable > behavior to fatal failures (typically the case). Even when the bug can > only be triggered under certain conditions, it recently caused over 200 > posts on the Pi-hole issue ticket system (Github). Crashes due to this > have been reported independently by several (more than 20) individual > users. > Best > Dominik > > On Wed, 2020-06-17 at 19:52 -0700, Frank wrote: >> Hello, >> >> This patch fixes a buffer overflow in TCP requests. Since the read is >> not actually being retried, the byte written by the child can be left >> in the pipe. When that happens, cache_recv_insert() reads the length >> of the name, which is now multiplied by 256 due to the extra 0 byte >> (8 bit shift) and results in daemon->namebuff being overflowed. >> Namebuff is immediately before the daemon struct in memory so it ends >> up corrupting the beginning of the daemon struct. >> >> diff --git a/src/dnsmasq.c b/src/dnsmasq.c >> index 6481de8..457dea3 100644 >> --- a/src/dnsmasq.c >> +++ b/src/dnsmasq.c >> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) >>single byte comes back up the pipe, which >>is sent by the child after it has closed >> the >>netlink socket. */ >> - retry_send(read(pipefd[0], &a, 1)); >> + while (retry_send(read(pipefd[0], &a, 1))); >> #endif >> break; >> } >> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now) >> #ifdef HAVE_LINUX_NETWORK >> /* See comment above re netlink socket. */ >> close(daemon->netlinkfd); >> - retry_send(write(pipefd[1], &a, 1)); >> + while (retry_send(write(pipefd[1], &a, 1))); >> #endif >> } >> >> Frank >> >> >> ___ >> Dnsmasq-discuss mailing list >> Dnsmasq-discuss@lists.thekelleys.org.uk >> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests
Resending this because I realized I sent it to Simon rather than the list: Hi Simon, This bug is fairly easy to reproduce. It can take 10 mins or more to reproduce a crash so I suggest checking the length you get in cache_recv_insert. If the domains being used are > 4 characters, this will catch the bug the first time it occurs. That usually happens within a minute. Something like this: if (m > MAXDNAME) abort(); Then bombard dnsmasq with TCP queries: docker run --rm -v /root/dns_queries:/dns_queries aiys0211/flamethrower_v0.10.2 -f /dns_queries/queryfile-example-100thousand -l 1000 -c 20 -q 10 -d 300 -p 53 -P tcp I used the first 100 thousand entries from here: https://www.dns-oarc.net/files/dnsperf/data/queryfile-example-10million-201202.gz Frank On Sun, Jun 28, 2020 at 1:58 PM Simon Kelley wrote: > > That's a nasty bug, and could explain what pi-hole users are seeing. > > If I understand things correctly, this bug will only manifest itself > when the write() or read() syscalls return EINTR ir EAGAIN, which is > possible, but not common, hence the bugs wasn't detected earlier. > > Frank, did you find a way to reproduce this on demand, or just capture > sporadic instances? > > I've applied Frank's patch, but immediately superseded it with a more > general clean-up which should have the same effect. I've tagged a > 2.82test1 release with this. If that fixes the problem for pi-hole, I > plan to go to 2.82 fairly quickly: I don't want this hanging around. > > It's slightly ironic that the original change in 2.81 is to fix a > theoretical hang which I've never actually seen and cannot reproduce, > but it introduced a really crash bug. > > > Cheers, > > Simon. > > > > > On 18/06/2020 22:42, Dominik wrote: > > Hey Frank, dear list-members, > > > > thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81 > > early on and we're seeing reports for mysterious crashes scattered all > > over the dnsmasq code since the very first days of releasing our latest > > version. Crashes reports show several locations, e.g., in_zone(), > > iface_check() and even main() in a few places. TCP forks were always > > happening sometime (not necessarily immediately) before a crash. > > Running dnsmasq in debug mode (prevents forking) stopped any crashes. > > > > So far, we haven't had the time to dig into this ourselves properly, > > however, your description perfectly matches with my preliminary > > conclusion for one week ago: > >> the dangling pointer is in **all** bug reports known so far in the > > daemon pointer > > (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367) > > > > The addition of TCP forks being able to add cache replies in dnsmasq > > v2.81 is what is causing this code (which has been like this for along > > time) to become problematic. I recently looked at the same code but > > missed the chance that the byte may be still in the pipe. This is a > > very good catch! > > > > I'm writing this mail to raise awareness that this appears to be a high > > severity bug as it can easily lead to anything between indeterminable > > behavior to fatal failures (typically the case). Even when the bug can > > only be triggered under certain conditions, it recently caused over 200 > > posts on the Pi-hole issue ticket system (Github). Crashes due to this > > have been reported independently by several (more than 20) individual > > users. > > Best > > Dominik > > > > On Wed, 2020-06-17 at 19:52 -0700, Frank wrote: > >> Hello, > >> > >> This patch fixes a buffer overflow in TCP requests. Since the read is > >> not actually being retried, the byte written by the child can be left > >> in the pipe. When that happens, cache_recv_insert() reads the length > >> of the name, which is now multiplied by 256 due to the extra 0 byte > >> (8 bit shift) and results in daemon->namebuff being overflowed. > >> Namebuff is immediately before the daemon struct in memory so it ends > >> up corrupting the beginning of the daemon struct. > >> > >> diff --git a/src/dnsmasq.c b/src/dnsmasq.c > >> index 6481de8..457dea3 100644 > >> --- a/src/dnsmasq.c > >> +++ b/src/dnsmasq.c > >> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) > >>single byte comes back up the pipe, which > >>is sent by the child after it has closed > >> the > >>netlink socket. */ > >> - retry_send(read(pipefd[0], &a, 1)); > >> + while (retry_send(read(pipefd[0], &a, 1))); > >> #endif > >> break; > >> } > >> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now) > >> #ifdef HAVE_LINUX_NETWORK > >> /* See comment above re netlink socket. */ > >> close(daemon->netlinkfd); > >> - retry_send(write(pipefd[1], &a, 1)); > >> + while (retry_send(write(pipefd[1], &a, 1))); > >> #endif > >>
Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests
On 30/06/2020 20:22, Frank wrote: > Resending this because I realized I sent it to Simon rather than the list: > > Hi Simon, > > This bug is fairly easy to reproduce. It can take 10 mins or more to > reproduce a crash so I suggest checking the length you get in > cache_recv_insert. If the domains being used are > 4 characters, this > will catch the bug the first time it occurs. That usually happens > within a minute. Something like this: > > if (m > MAXDNAME) abort(); > OK, that makes sense. EINTR or EAGAIN are possible, but rare. Does the fix in 2.82test1 work for you? If so I think I'll work towards a 2.82 release before doing anything else, just to nail this. Simon. > Then bombard dnsmasq with TCP queries: > > docker run --rm -v /root/dns_queries:/dns_queries > aiys0211/flamethrower_v0.10.2 -f > /dns_queries/queryfile-example-100thousand -l 1000 -c 20 -q 10 -d 300 > -p 53 -P tcp > > I used the first 100 thousand entries from here: > https://www.dns-oarc.net/files/dnsperf/data/queryfile-example-10million-201202.gz > > Frank > > On Sun, Jun 28, 2020 at 1:58 PM Simon Kelley wrote: >> >> That's a nasty bug, and could explain what pi-hole users are seeing. >> >> If I understand things correctly, this bug will only manifest itself >> when the write() or read() syscalls return EINTR ir EAGAIN, which is >> possible, but not common, hence the bugs wasn't detected earlier. >> >> Frank, did you find a way to reproduce this on demand, or just capture >> sporadic instances? >> >> I've applied Frank's patch, but immediately superseded it with a more >> general clean-up which should have the same effect. I've tagged a >> 2.82test1 release with this. If that fixes the problem for pi-hole, I >> plan to go to 2.82 fairly quickly: I don't want this hanging around. >> >> It's slightly ironic that the original change in 2.81 is to fix a >> theoretical hang which I've never actually seen and cannot reproduce, >> but it introduced a really crash bug. >> >> >> Cheers, >> >> Simon. >> >> >> >> >> On 18/06/2020 22:42, Dominik wrote: >>> Hey Frank, dear list-members, >>> >>> thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81 >>> early on and we're seeing reports for mysterious crashes scattered all >>> over the dnsmasq code since the very first days of releasing our latest >>> version. Crashes reports show several locations, e.g., in_zone(), >>> iface_check() and even main() in a few places. TCP forks were always >>> happening sometime (not necessarily immediately) before a crash. >>> Running dnsmasq in debug mode (prevents forking) stopped any crashes. >>> >>> So far, we haven't had the time to dig into this ourselves properly, >>> however, your description perfectly matches with my preliminary >>> conclusion for one week ago: the dangling pointer is in **all** bug reports known so far in the >>> daemon pointer >>> (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367) >>> >>> The addition of TCP forks being able to add cache replies in dnsmasq >>> v2.81 is what is causing this code (which has been like this for along >>> time) to become problematic. I recently looked at the same code but >>> missed the chance that the byte may be still in the pipe. This is a >>> very good catch! >>> >>> I'm writing this mail to raise awareness that this appears to be a high >>> severity bug as it can easily lead to anything between indeterminable >>> behavior to fatal failures (typically the case). Even when the bug can >>> only be triggered under certain conditions, it recently caused over 200 >>> posts on the Pi-hole issue ticket system (Github). Crashes due to this >>> have been reported independently by several (more than 20) individual >>> users. >>> Best >>> Dominik >>> >>> On Wed, 2020-06-17 at 19:52 -0700, Frank wrote: Hello, This patch fixes a buffer overflow in TCP requests. Since the read is not actually being retried, the byte written by the child can be left in the pipe. When that happens, cache_recv_insert() reads the length of the name, which is now multiplied by 256 due to the extra 0 byte (8 bit shift) and results in daemon->namebuff being overflowed. Namebuff is immediately before the daemon struct in memory so it ends up corrupting the beginning of the daemon struct. diff --git a/src/dnsmasq.c b/src/dnsmasq.c index 6481de8..457dea3 100644 --- a/src/dnsmasq.c +++ b/src/dnsmasq.c @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now) single byte comes back up the pipe, which is sent by the child after it has closed the netlink socket. */ - retry_send(read(pipefd[0], &a, 1)); + while (retry_send(read(pipefd[0], &a, 1))); #endif break; } @@ -1928,7 +1928,7 @@ static void check_