Re: Having ten thousands of mount bind causes various processes to go into loops
On 19/06/2024 16:27, Julien Petit wrote: Does it have some logic to avoid descending into bind mounts? Maybe I am wrong with my expectation that it does not use anything besides st_dev from stat result. It may be promising case to demonstrate the issue in a way independent of systemd and sandboxing. You can obtain command line arguments. Attach to its mount namespace and inspect content of its /proc//mounts or mountinfo. The next step would be to profile or at least to trace a process. I'm not sure i understand you there. It was intended to express my surprise that "find" is affected. I may expect some bugs in udisksd or PHP related to number of entries in "mounts" or "mountinfo" /proc files, but find is much more simple and likely more convenient for debugging of the issue. (Actually I am even more surprised by presence of udisksd on a cloud platform with sharing files among virtual users.) On 20/06/2024 04:18, Julien Petit wrote: However do you need shared subtrees? I'm gonna test the effect of setting them to private. This doesn't seem to fix the problem either Sorry, but without any details what you have actually tried, it adds nothing to the following kind of summary: "Despite there are enough projects that actively use bind mounts, some person faced some obscure issue. The tool might be used in a wrong way". User and mount namespaces caused some challenges in respect to bind mounts. Personally I have not convinced that changes in kernel may contain some regression. However nowadays bind mounts perhaps should be treated with more care. It seems nobody on this list is motivated enough to actively participate in debugging starting from the script you posted. You may ask in other communities.
Re: Having ten thousands of mount bind causes various processes to go into loops
Julien Petit wrote: > How Linux is supposed to be used? That's why i'm here. There wasn't > until kernel 4.19 an official limit to the number of mounts in the > documentation. Even though we use mounts a lot, we're still far from > the official limit. Did we get lucky for 15 years and we should change > the way we do things or is it a bug ? I will now take this to the > kernel team and see what they have to say about it. I take it you have read https://docs.kernel.org/filesystems/sharedsubtree.html which says "A shared mount can be replicated to as many mountpoints and all the replicas continue to be exactly same" and seems to be trying to say your use case is valid. I'd be interested to follow your discussion with the kernel devs if you could post a link.
Re: Re: Having ten thousands of mount bind causes various processes to go into loops
> This can be solved with ACLs. Instead of creating a bind mount, this process > that allows the user to share the directory can set an ACL and create a > symlink. For a few users maybe but not that easy when you have many thousands users (that on top do not have local accounts). We'd probably hit another ACL limitation. Then again, this thread was not about finding new ways of doing what we do but to know the reason it stopped working. Is it a new limitation or a bug? > PS: It would be better if you used a mailer that correctly sets mail headers > References and/or In-Reply-To so that your replies are properly threaded. Sorry about that, i use the link provided on the list for mails i don't receive in my mailbox directly and gmail doesn't seem to be good about it...
Re: Having ten thousands of mount bind causes various processes to go into loops
> PS: if you maintain your own software and aren't able to find a way for your > user to do shares - especially while systems that most likely have such > functionality built-in out of the box surely exist, think Nextcloud etc - > that is covered by how Linux is supposed to be used, by definition it's > pretty much out of support. Nextcloud doesn't offer sftp or rsync access to users that i know of. The specifications are much simpler because they only deal with web access (the web interface and the webdav server written in PHP). How Linux is supposed to be used? That's why i'm here. There wasn't until kernel 4.19 an official limit to the number of mounts in the documentation. Even though we use mounts a lot, we're still far from the official limit. Did we get lucky for 15 years and we should change the way we do things or is it a bug ? I will now take this to the kernel team and see what they have to say about it. > Especially if you keep insisting on using a way that was never officially > supported, just because you got away with it for 15 years. That's the very question i guess! How much mount is too much mount ;) Thanks again for your help.
Re: Having ten thousands of mount bind causes various processes to go into loops
> At this point, I kinda doubt this issue has anything to do with Debian > itself, but will most likely be an issue/limitation of the Linux Kernel > itself. >From my latest tests, it seems to point that way. Kernel 5.4 came with a new mount API and it seems to break since then. During my search, i also found that since kernel 4.19, there is a default limit of mount set to 100 000 to avoid DOS. We're still far from it.
Re: Having ten thousands of mount bind causes various processes to go into loops
PS: if you maintain your own software and aren't able to find a way for your user to do shares - especially while systems that most likely have such functionality built-in out of the box surely exist, think Nextcloud etc - that is covered by how Linux is supposed to be used, by definition it's pretty much out of support. Especially if you keep insisting on using a way that was never officially supported, just because you got away with it for 15 years. Am Do., 20. Juni 2024 um 00:06 Uhr schrieb Julien Petit : > We're the maintainers of our software so it's not out of support :) > I'm here because we'd like to save a few trees reducing that cpu usage > down :D > Thanks again for your time! >
Re: Having ten thousands of mount bind causes various processes to go into loops
Software is only tested to a certain degree. So mounts are tested to a sensible number, if you move outside it, you have to bet on luck if it's supported or not. At this point, I kinda doubt this issue has anything to do with Debian itself, but will most likely be an issue/limitation of the Linux Kernel itself. So the biggest chance to get this fixed is compile the Kernel yourself ([1] is a great guide to do so with little to no effort, enabling and disabling all the same features Debian uses minus any potential additional patches. If it still occurs, you know it can't be a Debian problem. Try with both the sources of the Kernel version you use and the latest stable sources - 6.9.5 as of writing this. One thing though: replace make deb-pkg from the guide with make bindeb-pkg, and with -j# set a sensible number of concurrent jobs). If the issue still appears, head over to [2], see if someone else has reported a similar issue and if not, create a new bug report. This may be the only place to have the chance of getting a fix to ever be done, beyond hiring a service firm like Collabora etc and pay them for this specific thing. Richard [1]: https://www.debian.org/doc//manuals/debian-handbook/sect.kernel-compilation.html [2]: https://bugzilla.kernel.org/ Am Do., 20. Juni 2024 um 00:06 Uhr schrieb Julien Petit : > You're thinking of a traditional file server in a business. Our > solution is a cloud platform. We don't know ahead how our customers > are going to manage their files and shares. And we don't need to. > As i said to Eduardo, it doesn't really matter where folders/mounts > are. Users can share any directory (and subdirectories) in their home > directory with any other user. The shared folder is mounted in the > special directory "Shared with me" of the recipient home directory. > I.e: John/Sales/Invoices is mounted in Alice/Shared with me/Invoices. > The shares can be read/write or read-only. >
Re: Having ten thousands of mount bind causes various processes to go into loops
On 19/06/2024 19:06, Julien Petit wrote: It doesn't really matter where folders/mounts are. Users can share any directory (and subdirectories) in their home directory with any other user. The shared folder is mounted in the special directory "Shared with me" of the recipient home directory. I.e: John/Sales/Invoices is mounted in Alice/Shared with me/Invoices. Can be done with symlinks. I imagine there's some process that creates these bind mounts, so the process could create symlinks. Symlinks are no good since the user sharing his directory can decide to share it read/write to one user but read only to another This can be solved with ACLs. Instead of creating a bind mount, this process that allows the user to share the directory can set an ACL and create a symlink. PS: It would be better if you used a mailer that correctly sets mail headers References and/or In-Reply-To so that your replies are properly threaded. -- Go placidly amid the noise and waste, and remember what value there may be in owning a piece thereof. -- National Lampoon, "Deteriorata" Eduardo M KALINOWSKI edua...@kalinowski.com.br
Re: Having ten thousands of mount bind causes various processes to go into loops
> For this, probably the easiest is to set up a common directory/a few common > directories, set up proper permissions through use of groups and worst case > create some symlinks from the user's home directories, if these directories > really need to be accessible from within their home directories. That's > pretty much how shared directories are always done. As this would be a one > time effort, it would be doable. You're thinking of a traditional file server in a business. Our solution is a cloud platform. We don't know ahead how our customers are going to manage their files and shares. And we don't need to. As i said to Eduardo, it doesn't really matter where folders/mounts are. Users can share any directory (and subdirectories) in their home directory with any other user. The shared folder is mounted in the special directory "Shared with me" of the recipient home directory. I.e: John/Sales/Invoices is mounted in Alice/Shared with me/Invoices. The shares can be read/write or read-only. > But at this point, you should really think about paying some company with > deep knowledge of Linux that can come up with a sustainable plan. Because > obviously, your way of doing things isn't anything that could (or on that > note should) be a long-term solution. And maybe think about rewriting the > ancient software that causes this setup in the first place. We've come up with that solution in 2009 and it's been working until now (and still is but eating away cpus). So i guess it makes it a proven long-term solution ;) Is there a better way to do it now? Maybe. But not as easy as setting up a few symlinks and some permissions. Anyway, that's out of subject. My request is simpler than knowing how our solution works. Mounting many thousands folders wasn't an issue before and it is now (i haven't nailed on which Debian update it broke yet). As i said in my request, i know that this usage is an heavy usage of mounts but it worked perfectly for many years. I'm just trying to understand the cause behind it. > Desperately trying to cling to something that has been out of support for > decades is just not sustainable, not on any OS. We're the maintainers of our software so it's not out of support :) I'm here because we'd like to save a few trees reducing that cpu usage down :D Thanks again for your time!
Re: Re: Having ten thousands of mount bind causes various processes to go into loops
> Does it really have to be in the home directory? Can't the software (and/or > the users) open files in, say, /shared/accounting? It doesn't really matter where folders/mounts are. Users can share any directory (and subdirectories) in their home directory with any other user. The shared folder is mounted in the special directory "Shared with me" of the recipient home directory. I.e: John/Sales/Invoices is mounted in Alice/Shared with me/Invoices. > If it really needs to be under /home: symlinks. Symlinks are no good since the user sharing his directory can decide to share it read/write to one user but read only to another
Re: Re: Having ten thousands of mount bind causes various processes to go into loops
>> However do you need shared subtrees? > I'm gonna test the effect of setting them to private. This doesn't seem to fix the problem either
Re: Having ten thousands of mount bind causes various processes to go into loops
For this, probably the easiest is to set up a common directory/a few common directories, set up proper permissions through use of groups and worst case create some symlinks from the user's home directories, if these directories really need to be accessible from within their home directories. That's pretty much how shared directories are always done. As this would be a one time effort, it would be doable. But at this point, you should really think about paying some company with deep knowledge of Linux that can come up with a sustainable plan. Because obviously, your way of doing things isn't anything that could (or on that note should) be a long-term solution. And maybe think about rewriting the ancient software that causes this setup in the first place. Desperately trying to cling to something that has been out of support for decades is just not sustainable, not on any OS. Richard Am Mi., 19. Juni 2024 um 15:13 Uhr schrieb Julien Petit : > As i said to Richard, rights are not the challenge here. It's to be > able to share a directory across multiple users. For instance you > would have : /users/bob/accounting shared with Alice and accessible in > her home directory /users/alice/accounting > >
Re: Having ten thousands of mount bind causes various processes to go into loops
On 19/06/2024 05:46, Julien Petit wrote: Rights are not the challenge here. It's to be able to share a directory across multiple users. For instance you would have : /users/bob/accounting shared with Alice and accessible in her home directory /users/alice/accounting Does it really have to be in the home directory? Can't the software (and/or the users) open files in, say, /shared/accounting? If it really needs to be under /home: symlinks. -- "The following is not for the weak of heart or Fundamentalists." -- Dave Barry Eduardo M KALINOWSKI edua...@kalinowski.com.br
Re: Having ten thousands of mount bind causes various processes to go into loops
> Does it have some logic to avoid descending into bind mounts? Maybe I am > wrong with my expectation that it does not use anything besides st_dev from > stat result. It may be promising case to demonstrate the issue in a way > independent of systemd and sandboxing. You can obtain command line arguments. > Attach to its mount namespace and inspect content of its /proc//mounts > or mountinfo. The next step would be to profile or at least to trace a > process. I'm not sure i understand you there. > I have not figured out from your description what problem you solved by using > bind mounts, but bublewrap (so flatpak and snap) and firejail relies on bind > mounts as well. Perhaps you have some unique factors. Mounts are used as a way of sharing folders in different users' home directories. For instance you would have : /users/bob/accounting shared with Alice and accessible in her home directory /users/alice/accounting Thanks for your help :)
Re: Having ten thousands of mount bind causes various processes to go into loops
> Just to learn about it. > What about using acl rather than bind mounts? What should be the > problem in this solution? As i said to Richard, rights are not the challenge here. It's to be able to share a directory across multiple users. For instance you would have : /users/bob/accounting shared with Alice and accessible in her home directory /users/alice/accounting
Re: Having ten thousands of mount bind causes various processes to go into loops
> If there's a better way should be judged on what exactly that app expects. > For the web interface, maybe the http server - or whatever makes the web > interface accessible to the users - can limit permissions. For the rest of > the use cases it would be interesting which circumstances would need to be > fulfilled for a user to be able to change permissions on a file they own. And > if they could even change the permissions through sftp, webdav or rsync. > Because if not, the simplest fix would be a cron job that peridoically sets > the permissions on the directory, so you don't need a dedicated mount. But > Maybe you want to create a separate topic where you describe exatcly what the > basic requirements are and ask for suggestions what the best solution may be. > Maybe something like AppArmor rules or other methods that aren't known by > your typical user could be a better solution. Rights are not the challenge here. It's to be able to share a directory across multiple users. For instance you would have : /users/bob/accounting shared with Alice and accessible in her home directory /users/alice/accounting > If you haven't already, remember to create a bug report and include as much > detail and logs as you can gather, as people will need to be able to tell > what the actual issue is. Maybe it's a limitation of the file system, of the > hardware or something else. I haven't already. I want to test the using private mounts first. Thanks again for your input.
Re: Having ten thousands of mount bind causes various processes to go into loops
On 14/06/2024 16:30, Julien Petit wrote: What processes are CPU hungry? [...] udisksd, This one does not use mount namespace for the obvious reason. However it tends to generate unnecessary activity. Perhaps it needs optimizations for your case. (fstrim) There were some bugs including sandboxing setting in its unit file, but perhaps it is irrelevant. find Does it have some logic to avoid descending into bind mounts? Maybe I am wrong with my expectation that it does not use anything besides st_dev from stat result. It may be promising case to demonstrate the issue in a way independent of systemd and sandboxing. You can obtain command line arguments. Attach to its mount namespace and inspect content of its /proc//mounts or mountinfo. The next step would be to profile or at least to trace a process. It seems to happen with all processes accessing mounts. And since disabling sandboxing with php fixed the problem for the php process, it looks like it is linked to sandboxing. From my point of view PHP is more complex than find. We only use mount bind to share an initial folder with other users with different access rights (rw or ro). I have not figured out from your description what problem you solved by using bind mounts, but bublewrap (so flatpak and snap) and firejail relies on bind mounts as well. Perhaps you have some unique factors.
Re: Having ten thousands of mount bind causes various processes to go into loops
El Fri, 14 Jun 2024 11:30:50 +0200 Julien Petit va escriure el següent: > > What processes are CPU hungry? > > On a vanilla debian 11 : udisksd, gvfs-udisks2-vo, (fstrim), find > > > Perhaps it is not a Debian-specific bug, just more active usage of > > sandboxing in systemd. If some applications have troubles parsing > > /proc/mounts then bugs should be filed against them. > > It seems to happen with all processes accessing mounts. And since > disabling sandboxing with php fixed the problem for the php process, > it looks like it is linked to sandboxing. > > > However do you need shared subtrees? It may cause exponential > > growth of number of moutpoints, see > > We only use mount bind to share an initial folder with other users > with different access rights (rw or ro). So we probably don't need > shared subtrees (as long as mount bind doesn't rely on it). I'm not > really familiar with subtrees though. In my understanding, it is used > for chroot or containers and that's something we don't use. When i > list our mounts, it seems they are by default in shared mode. If the > default before was "private", it might be why it used to work and it > stopped. > I'm gonna test the effect of setting them to private. > > Thanks for your help > Just to learn about it. What about using acl rather than bind mounts? What should be the problem in this solution? Thanks.
Re: Having ten thousands of mount bind causes various processes to go into loops
On 14.06.24 11:38, Julien Petit wrote: We use the mounts to share an initial folder with either rw or ro wrights in a user directory. The user directory is then accessible through a web interface, sftp, webdav and rsync. There is probably better ways to do that now but that's a legacy app (2009) that we'd rather leave alone :) If there's a better way should be judged on what exactly that app expects. For the web interface, maybe the http server - or whatever makes the web interface accessible to the users - can limit permissions. For the rest of the use cases it would be interesting which circumstances would need to be fulfilled for a user to be able to change permissions on a file they own. And if they could even change the permissions through sftp, webdav or rsync. Because if not, the simplest fix would be a cron job that peridoically sets the permissions on the directory, so you don't need a dedicated mount. But Maybe you want to create a separate topic where you describe exatcly what the basic requirements are and ask for suggestions what the best solution may be. Maybe something like AppArmor rules or other methods that aren't known by your typical user could be a better solution. Yes, not urgent and very specific. I'm going to try to set the mounts to private as Max suggested and see how it goes. Thanks for your help. If you haven't already, remember to create a bug report and include as much detail and logs as you can gather, as people will need to be able to tell what the actual issue is. Maybe it's a limitation of the file system, of the hardware or something else. Richard
Re: Having ten thousands of mount bind causes various processes to go into loops
> Best question probably is: what exactly are you needing 14.000 mounts for? > Even snaps shouldn't be that ridiculous. So what's your use case? Maybe > there's a better solution to what you are doing. If it's just about having a > place that is rw only without execution permissions, just crate a separate > partition, mount it somewhere - e.g. /home/test/mounts and tell mount/fstab > to use the option noexec. No need for for your script. Or if it's a more > advanced file system like btrfs you may be able to simply create a subvolume > with the same capabilities, no need to tinker around with partitions. We use the mounts to share an initial folder with either rw or ro wrights in a user directory. The user directory is then accessible through a web interface, sftp, webdav and rsync. There is probably better ways to do that now but that's a legacy app (2009) that we'd rather leave alone :) > It's true this issue should be looked into, but it doesn't look urgent as > long as there are alternatives. Yes, not urgent and very specific. I'm going to try to set the mounts to private as Max suggested and see how it goes. Thanks for your help.
Re: Re: Having ten thousands of mount bind causes various processes to go into loops
> What processes are CPU hungry? On a vanilla debian 11 : udisksd, gvfs-udisks2-vo, (fstrim), find > Perhaps it is not a Debian-specific bug, just more active usage of sandboxing > in systemd. If some applications have troubles parsing /proc/mounts then bugs > should be filed against them. It seems to happen with all processes accessing mounts. And since disabling sandboxing with php fixed the problem for the php process, it looks like it is linked to sandboxing. > However do you need shared subtrees? It may cause exponential growth of > number of moutpoints, see We only use mount bind to share an initial folder with other users with different access rights (rw or ro). So we probably don't need shared subtrees (as long as mount bind doesn't rely on it). I'm not really familiar with subtrees though. In my understanding, it is used for chroot or containers and that's something we don't use. When i list our mounts, it seems they are by default in shared mode. If the default before was "private", it might be why it used to work and it stopped. I'm gonna test the effect of setting them to private. Thanks for your help
Re: Having ten thousands of mount bind causes various processes to go into loops
Best question probably is: what exactly are you needing 14.000 mounts for? Even snaps shouldn't be that ridiculous. So what's your use case? Maybe there's a better solution to what you are doing. If it's just about having a place that is rw only without execution permissions, just crate a separate partition, mount it somewhere - e.g. /home/test/mounts and tell mount/fstab to use the option noexec. No need for for your script. Or if it's a more advanced file system like btrfs you may be able to simply create a subvolume with the same capabilities, no need to tinker around with partitions. It's true this issue should be looked into, but it doesn't look urgent as long as there are alternatives. Richard Am Mi., 12. Juni 2024 um 16:33 Uhr schrieb Julien Petit : > Dear, > > Not sure i should report a bug so here is a report first. For more > than 10 years now, we've been using mount binds to create shares rw or > ro. It's been working perfectly under older Debian. A few months ago, > we migrated to Ubuntu Jammy and started having processes running 100% > non stop. While examining the processes in question, we could see the > same thing: it seemed to be reading all the mounts indefinitely. > It started with the phpsessionclean.service. We managed to fix it > editing /lib/systemd/system/phpsessionclean.service and disabling > sandboxing entries. But then it started to happen with other > processes. > Anything related to systemd seems affected in a way. For instance, we > cannot start haproxy if the mounts are mounted. > We tested with the last Debian and it is affected too. > > We understand that 14 000 mounts is a lot. So maybe our usage will be > questioned. But this has been working for ages so why not now? > > The problem can be very easily reproduced: > > 1. Launch the latest Debian stable > 2. Execute the following script to create mounts: > #!/bin/bash > mkdir /home/test/directories > mkdir /home/test/mounts > > for i in {1..14000} > do >echo "Mounting dir $i" >mkdir "/home/test/directories/dir_$i" >mkdir "/home/test/mounts/dir_$i" >mount --bind -o rw "/home/test/directories/dir_$i" > "/home/test/mounts/dir_$i" > done > > After that, the "top" command will show processes getting stuck using > 100% of CPU never ending. > > Has anyone a clue if this is fixable? Should i report a bug? > Thanks for your help. > >
Re: Having ten thousands of mount bind causes various processes to go into loops
On 12/06/2024 17:02, Julien Petit wrote: for i in {1..14000} do echo "Mounting dir $i" mkdir "/home/test/directories/dir_$i" mkdir "/home/test/mounts/dir_$i" mount --bind -o rw "/home/test/directories/dir_$i" "/home/test/mounts/dir_$i" done After that, the "top" command will show processes getting stuck using 100% of CPU never ending. What processes are CPU hungry? Has anyone a clue if this is fixable? Should i report a bug? Perhaps it is not a Debian-specific bug, just more active usage of sandboxing in systemd. If some applications have troubles parsing /proc/mounts then bugs should be filed against them. However do you need shared subtrees? It may cause exponential growth of number of moutpoints, see https://docs.kernel.org/filesystems/sharedsubtree.html https://manpages.debian.org/bookworm/mount/mount.8.en.html#Shared_subtree_operations
Having ten thousands of mount bind causes various processes to go into loops
Dear, Not sure i should report a bug so here is a report first. For more than 10 years now, we've been using mount binds to create shares rw or ro. It's been working perfectly under older Debian. A few months ago, we migrated to Ubuntu Jammy and started having processes running 100% non stop. While examining the processes in question, we could see the same thing: it seemed to be reading all the mounts indefinitely. It started with the phpsessionclean.service. We managed to fix it editing /lib/systemd/system/phpsessionclean.service and disabling sandboxing entries. But then it started to happen with other processes. Anything related to systemd seems affected in a way. For instance, we cannot start haproxy if the mounts are mounted. We tested with the last Debian and it is affected too. We understand that 14 000 mounts is a lot. So maybe our usage will be questioned. But this has been working for ages so why not now? The problem can be very easily reproduced: 1. Launch the latest Debian stable 2. Execute the following script to create mounts: #!/bin/bash mkdir /home/test/directories mkdir /home/test/mounts for i in {1..14000} do echo "Mounting dir $i" mkdir "/home/test/directories/dir_$i" mkdir "/home/test/mounts/dir_$i" mount --bind -o rw "/home/test/directories/dir_$i" "/home/test/mounts/dir_$i" done After that, the "top" command will show processes getting stuck using 100% of CPU never ending. Has anyone a clue if this is fixable? Should i report a bug? Thanks for your help.
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Tue, Mar 14, 2023 at 08:05:55PM +, Darac Marjal wrote: > On 13/03/2023 23:23, Greg Wooledge wrote: > > I have not to this day figured out what "vendor preset" means here. > It would appear to be > https://www.freedesktop.org/software/systemd/man/systemd.preset.html. If I'm > reading the introduction correctly, this is systemd's equivalent to Debian's > policy-rc.d, inasmuch as it's a place to define whether a service starts (or > not) _before_ installing the package. Hmm. Well, on my system there's a /lib/systemd/system-preset/90-systemd.preset file which contains, among other things: enable systemd-resolved.service I'm guessing this file came straight from upstream, and hasn't been modified by Debian. I'm also guessing this file is read by "systemctl status" (or something acting on its behalf) to generate that "vendor preset: enabled" verbiage. But because Debian doesn't actually *adhere* to all of the upstream semantics, this file and the "vendor preset" verbiage are not correct. They don't reflect reality on a Debian system. Which is fine. I'll just continue ignoring the "vendor preset" section as I have been doing. I mean, it would be *nice* if it actually showed the Debian system defaults rather than the upstream defaults, but I can live without that.
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/03/2023 23:23, Greg Wooledge wrote: On Tue, Mar 14, 2023 at 07:04:02AM +0800, Jeremy Ardley wrote: I replicated your test above and it seems your listing has been accidentally truncated... Pipe it through cat to avoid the "left/right scrolling" crap. If you want to do this regularly, you can set SYSTEMD_PAGER=cat jeremy@testldap:~$ systemctl status systemd-resolved ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:systemd-resolved.service(8) man:org.freedesktop.resolve1(5) https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients It would seem the debian default is enabled? See vendor preset below. I have not to this day figured out what "vendor preset" means here. It would appear to be https://www.freedesktop.org/software/systemd/man/systemd.preset.html. If I'm reading the introduction correctly, this is systemd's equivalent to Debian's policy-rc.d, inasmuch as it's a place to define whether a service starts (or not) _before_ installing the package. Mine shows the same as yours -- "disabled; vendor preset: enabled". All I care about is the part that says "disabled". That's the actual state. OpenPGP_signature Description: OpenPGP digital signature
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 23:33 by jer...@ardley.org: > You may be happy to learn you can't even install it as a separate package any > more. > > apt install --reinstall systemd-resolved > Reading package lists... Done > Building dependency tree... Done > Reading state information... Done > Package systemd-resolved is not available, but is referred to by another > package. > This may mean that the package is missing, has been obsoleted, or > is only available from another source > ... > > So the mystery is how it gets onto a system using a standard install and > which package it comes from now and what is done with any presets > On Debian 12 Bookworm it could be done: # aptitude show systemd-resolved Package: systemd-resolved Version: 252.6-1 New: yes State: not installed ... # aptitude install systemd-resolved --simulate The following NEW packages will be installed: libnss-myhostname{a} libnss-resolve{a} systemd-resolved 0 packages upgraded, 3 newly installed, 0 to remove and 0 not upgraded. Need to get 484 kB of archives. After unpacking 1,234 kB will be used. Note: Using 'Simulate' mode. Do you want to continue? [Y/n/?] n Abort. Regards,
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Tue, Mar 14, 2023 at 07:33:00AM +0800, Jeremy Ardley wrote: > So the mystery is how it gets onto a system using a standard install and > which package it comes from now and what is done with any presets unicorn:~$ dpkg -S systemd-resolved systemd: /usr/share/man/man8/systemd-resolved.8.gz systemd: /lib/systemd/systemd-resolved systemd: /lib/systemd/system/systemd-resolved.service systemd: /usr/share/man/man8/systemd-resolved.service.8.gz It's installed by default, but it's not ENABLED by default.
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 14/3/23 07:23, Greg Wooledge wrote: I have not to this day figured out what "vendor preset" means here. Mine shows the same as yours -- "disabled; vendor preset: enabled". All I care about is the part that says "disabled". That's the actual state. You may be happy to learn you can't even install it as a separate package any more. apt install --reinstall systemd-resolved Reading package lists... Done Building dependency tree... Done Reading state information... Done Package systemd-resolved is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source It's on the test VM I built recently so it comes from somewhere The general documentation says that if you install it as a package then it will rewrite various config files to take over the machine DNS. So the mystery is how it gets onto a system using a standard install and which package it comes from now and what is done with any presets -- Jeremy (Lists)
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Tue, Mar 14, 2023 at 07:04:02AM +0800, Jeremy Ardley wrote: > I replicated your test above and it seems your listing has been accidentally > truncated... Pipe it through cat to avoid the "left/right scrolling" crap. > jeremy@testldap:~$ systemctl status systemd-resolved > ● systemd-resolved.service - Network Name Resolution > Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; > vendor preset: enabled) > Active: inactive (dead) > Docs: man:systemd-resolved.service(8) > man:org.freedesktop.resolve1(5) > https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers > https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients > > It would seem the debian default is enabled? See vendor preset below. I have not to this day figured out what "vendor preset" means here. Mine shows the same as yours -- "disabled; vendor preset: enabled". All I care about is the part that says "disabled". That's the actual state.
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 14/3/23 06:34, Greg Wooledge wrote: On Tue, Mar 14, 2023 at 06:23:09AM +0800, Jeremy Ardley wrote: FYI systed-resolved is the inbuilt debian caching DNS server which may be enabled by default. It is NOT enabled by default. unicorn:~$ systemctl status systemd-resolved ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; ve> Active: inactive (dead) Docs: man:systemd-resolved.service(8) man:org.freedesktop.resolve1(5) https://www.freedesktop.org/wiki/Software/systemd/writing-network-> https://www.freedesktop.org/wiki/Software/systemd/writing-resolver> I replicated your test above and it seems your listing has been accidentally truncated... jeremy@testldap:~$ systemctl status systemd-resolved ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:systemd-resolved.service(8) man:org.freedesktop.resolve1(5) https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients It would seem the debian default is enabled? See vendor preset below. Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; vendor preset: enabled) -- Jeremy (Lists)
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 14/3/23 06:34, Greg Wooledge wrote: On Tue, Mar 14, 2023 at 06:23:09AM +0800, Jeremy Ardley wrote: FYI systed-resolved is the inbuilt debian caching DNS server which may be enabled by default. It is NOT enabled by default. It is if you are using NetworkManager -- Jeremy (Lists)
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 14/3/23 06:23, Jeremy Ardley wrote: I had a signed DNS error in a similar configuration using a bind authoritive and caching server. It turned out it was systemd-resolved interfering and/or replacing part of the DNS chain FYI systed-resolved is the inbuilt debian caching DNS server which may be enabled by default. If you run that you don't need a bind9 caching name server What does this report ? systemctl status systemd-resolved If there is anything there at all, check logs. You may find something Also FYI you can run bind9 and systemd-resolved at the same time and set bind9 to use systemd-resolved as forwarder |options { directory "/var/cache/bind"; // Use systemd-resolved as a DNS resolver forwarders { 127.0.0.53 port 53; }; dnssec-validation auto; auth-nxdomain no; # conform to RFC1035 ... Its probably a good idea to not be too keen on dnssec validation - as above. | -- Jeremy (Lists)
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Tue, Mar 14, 2023 at 06:23:09AM +0800, Jeremy Ardley wrote: > FYI systed-resolved is the inbuilt debian caching DNS server which may be > enabled by default. It is NOT enabled by default. unicorn:~$ systemctl status systemd-resolved ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; ve> Active: inactive (dead) Docs: man:systemd-resolved.service(8) man:org.freedesktop.resolve1(5) https://www.freedesktop.org/wiki/Software/systemd/writing-network-> https://www.freedesktop.org/wiki/Software/systemd/writing-resolver>
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Mon, Mar 13, 2023 at 11:14:20PM +0100, local10 wrote: > Strangely, the issue resolved itself without me having to do anything. Am > really puzzled as to what it was. Perhaps the internet provider suddenly > started to block DNS queries but then allowed them again? If so, why did > dig's message say that there was "communications error to 127.0.0.1#53: timed > out"? It really gives an impression that dig was failing to connect 127.0.0.1 > port 53, on which bind was running. > > # dig www.yahoo.com <http://www.yahoo.com> > ;; communications error to 127.0.0.1#53: timed out > ;; communications error to 127.0.0.1#53: timed out > ... > > Maybe someone will shed some light on this. UDP doesn't have a "connection". The client sends a datagram (a one-way message) to the UDP service, and then waits to receive a reply. If the UDP service in turn sends a datagram to a third party, and waits for a reply, but never receives one... and thus never responds to the original client... then all the client knows is that it never got a response. It doesn't know why.
Re: [SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 14/3/23 06:14, local10 wrote: Strangely, the issue resolved itself without me having to do anything. Am really puzzled as to what it was. Perhaps the internet provider suddenly started to block DNS queries but then allowed them again? If so, why did dig's message say that there was "communications error to 127.0.0.1#53: timed out"? It really gives an impression that dig was failing to connect 127.0.0.1 port 53, on which bind was running. # dig www.yahoo.com <http://www.yahoo.com> ;; communications error to 127.0.0.1#53: timed out ;; communications error to 127.0.0.1#53: timed out ... Maybe someone will shed some light on this. Thanks to everyone who responded. I had a signed DNS error in a similar configuration using a bind authoritive and caching server. It turned out it was systemd-resolved interfering and/or replacing part of the DNS chain FYI systed-resolved is the inbuilt debian caching DNS server which may be enabled by default. If you run that you don't need a bind9 caching name server What does this report ? systemctl status systemd-resolved If there is anything there at all, check logs. You may find something -- Jeremy (Lists)
Re: [SOLVED?] BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
> On Mar 13, 2023, at 4:14 PM, local10 wrote: > > Mar 13, 2023, 21:42 by recovery...@enotuniq.net: > >> Well, it was worth to check it. >> >> >> Next idea is somewhat more complicated. >> >> Install tcpdump. >> Run: >> tcpdump -pni any -s0 -w /tmp/dns.pcap -c 30 udp port 53 or tcp port 53 >> Bounce BIND, wait for a minute at least. >> Do some DNS queries. One or two will do. >> Interrupt tcpdump unless it completes by itself. >> Post dns.pcap. >> > > > Strangely, the issue resolved itself without me having to do anything. Am > really puzzled as to what it was. Perhaps the internet provider suddenly > started to block DNS queries but then allowed them again? Hard to tell without further data, but it's possible. > If so, why did dig's message say that there was "communications error to > 127.0.0.1#53: timed out"? It really gives an impression that dig was failing > to connect 127.0.0.1 port 53, on which bind was running. > > # dig www.yahoo.com <http://www.yahoo.com> > ;; communications error to 127.0.0.1#53: timed out > ;; communications error to 127.0.0.1#53: timed out > ... > > Maybe someone will shed some light on this. This one is a little misleading. The fact is that BIND tries really hard to resolve your name, trying all sorts of alternate servers and fallbacks to account for timeouts, DNSSEC validation failures, and more. Sometimes that can take a really long time. In one of the outputs you provided previously, you showed "timed out" followed by SERVFAIL. Those are symptoms of this behavior: first query times out with resolver trying things and second query returned the cached (SERVFAIL) failure. Casey
[SOLVED?] Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 21:42 by recovery...@enotuniq.net: > Well, it was worth to check it. > > > Next idea is somewhat more complicated. > > Install tcpdump. > Run: > tcpdump -pni any -s0 -w /tmp/dns.pcap -c 30 udp port 53 or tcp port 53 > Bounce BIND, wait for a minute at least. > Do some DNS queries. One or two will do. > Interrupt tcpdump unless it completes by itself. > Post dns.pcap. > Strangely, the issue resolved itself without me having to do anything. Am really puzzled as to what it was. Perhaps the internet provider suddenly started to block DNS queries but then allowed them again? If so, why did dig's message say that there was "communications error to 127.0.0.1#53: timed out"? It really gives an impression that dig was failing to connect 127.0.0.1 port 53, on which bind was running. # dig www.yahoo.com <http://www.yahoo.com> ;; communications error to 127.0.0.1#53: timed out ;; communications error to 127.0.0.1#53: timed out ... Maybe someone will shed some light on this. Thanks to everyone who responded.
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Hi. On Mon, Mar 13, 2023 at 08:53:35PM +0100, local10 wrote: > Mar 13, 2023, 12:06 by recovery...@enotuniq.net: > > > Looks correct, assuming that the contents of the key start with AwEAAaz > > and end with V74bU=. > > > > > > Look at /usr/share/dns/root.key. Compare its contents with > > /etc/bind/bind.keys. Replace the latter if needed. > > > > "dpkg-reconfigure -plow bind9" is probably more preferred way of doing > > it. > > > > They keys in the "/etc/bind/bind.keys" and "/usr/share/dns/root.key" are > identical: Well, it was worth to check it. Next idea is somewhat more complicated. Install tcpdump. Run: tcpdump -pni any -s0 -w /tmp/dns.pcap -c 30 udp port 53 or tcp port 53 Bounce BIND, wait for a minute at least. Do some DNS queries. One or two will do. Interrupt tcpdump unless it completes by itself. Post dns.pcap. Reco
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 11:50 by mv...@free.fr: > Did you check memory and disk space as suggested by jeremy ? > There's plenty of free RAM (4GB) and disk space (hundreds of GBs). Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 14:11 by ca...@deccio.net: > Based on what I saw in the logs, your resolver is having trouble reaching the > internet. It shows problems with both the priming query (./NS) and the trust > query (./DNSKEY). Could you try running the following? > > $ dig +norec @198.41.0.4 . NS > $ dig +norec @2001:503:ba3e::2:30 . NS > $ dig +norec @198.41.0.4 . DNSKEY > $ dig +norec @2001:503:ba3e::2:30 . DNSKEY > > These manually send the same queries to the internet that your resolver is > attempting. > > Cheers, > Casey > $ dig +norec @198.41.0.4 . NS ; <<>> DiG 9.18.12-1-Debian <<>> +norec @198.41.0.4 . NS ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19016 ;; flags: qr aa; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 27 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;. IN NS ;; ANSWER SECTION: . 518400 IN NS e.root-servers.net. . 518400 IN NS h.root-servers.net. . 518400 IN NS l.root-servers.net. . 518400 IN NS i.root-servers.net. . 518400 IN NS a.root-servers.net. . 518400 IN NS d.root-servers.net. . 518400 IN NS c.root-servers.net. . 518400 IN NS b.root-servers.net. . 518400 IN NS j.root-servers.net. . 518400 IN NS k.root-servers.net. . 518400 IN NS g.root-servers.net. . 518400 IN NS m.root-servers.net. . 518400 IN NS f.root-servers.net. ;; ADDITIONAL SECTION: e.root-servers.net. 518400 IN A 192.203.230.10 e.root-servers.net. 518400 IN 2001:500:a8::e h.root-servers.net. 518400 IN A 198.97.190.53 h.root-servers.net. 518400 IN 2001:500:1::53 l.root-servers.net. 518400 IN A 199.7.83.42 l.root-servers.net. 518400 IN 2001:500:9f::42 i.root-servers.net. 518400 IN A 192.36.148.17 i.root-servers.net. 518400 IN 2001:7fe::53 a.root-servers.net. 518400 IN A 198.41.0.4 a.root-servers.net. 518400 IN 2001:503:ba3e::2:30 d.root-servers.net. 518400 IN A 199.7.91.13 d.root-servers.net. 518400 IN 2001:500:2d::d c.root-servers.net. 518400 IN A 192.33.4.12 c.root-servers.net. 518400 IN 2001:500:2::c b.root-servers.net. 518400 IN A 199.9.14.201 b.root-servers.net. 518400 IN 2001:500:200::b j.root-servers.net. 518400 IN A 192.58.128.30 j.root-servers.net. 518400 IN 2001:503:c27::2:30 k.root-servers.net. 518400 IN A 193.0.14.129 k.root-servers.net. 518400 IN 2001:7fd::1 g.root-servers.net. 518400 IN A 192.112.36.4 g.root-servers.net. 518400 IN 2001:500:12::d0d m.root-servers.net. 518400 IN A 202.12.27.33 m.root-servers.net. 518400 IN 2001:dc3::35 f.root-servers.net. 518400 IN A 192.5.5.241 f.root-servers.net. 518400 IN 2001:500:2f::f ;; Query time: 43 msec ;; SERVER: 198.41.0.4#53(198.41.0.4) (UDP) ;; WHEN: Mon Mar 13 15:54:28 EDT 2023 ;; MSG SIZE rcvd: 811 # Note that I'm running bind with "-4" option, that is, IPv4 only $ dig +norec @2001:503:ba3e::2:30 . NS ;; UDP setup with 2001:503:ba3e::2:30#53(2001:503:ba3e::2:30) for . failed: network unreachable. ;; UDP setup with 2001:503:ba3e::2:30#53(2001:503:ba3e::2:30) for . failed: network unreachable. ;; UDP setup with 2001:503:ba3e::2:30#53(2001:503:ba3e::2:30) for . failed: network unreachable. $ dig +norec @198.41.0.4 . DNSKEY ; <<>> DiG 9.18.12-1-Debian <<>> +norec @198.41.0.4 . DNSKEY ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60299 ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1472 ;; QUESTION SECTION: ;. IN DNSKEY ;; ANSWER SECTION: . 172800 IN DNSKEY 256 3 8 AwEAAcVnO2jZFx4756Rb/yAhJnsl72eemsObU43nclmXwqdJlp+kC5WQ jGYkqLT5xkaUCPhkr4NKLLrIBZXeSGazc6gx/yrrMtUpXcQvax6kfDJP Tu974OmeEbtjyyP7ZG5tUfSwNWt/4EuxDNmZTESG8jU0ZLjYIB11pK0g SXQbMVPyIyGtFGHMPx6UxWn6zUzpECWRFbqEvkA6Y13zeJ1jG2Rki/zs 7a/o13FTl/kI9013Eh6l6Kc2zxbc14GS8fpM0/xQIrZZyeiXj/2C4Rcs PeqWuNj9m0qSQrbrCZtLHb20U8x1uue4iwSX9y7LpwZd6vjYd1d
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 12:06 by recovery...@enotuniq.net: > Looks correct, assuming that the contents of the key start with AwEAAaz > and end with V74bU=. > > > Look at /usr/share/dns/root.key. Compare its contents with > /etc/bind/bind.keys. Replace the latter if needed. > > "dpkg-reconfigure -plow bind9" is probably more preferred way of doing > it. > They keys in the "/etc/bind/bind.keys" and "/usr/share/dns/root.key" are identical: # cat /etc/bind/bind.keys ... trust-anchors { # This key (20326) was published in the root zone in 2017. . initial-key 257 3 8 "AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3 +/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kv ArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF 0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+e oZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfd RUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwN R1AkUTV74bU="; }; Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
> On Mar 13, 2023, at 12:08 AM, local10 wrote: > > I have a local caching DNS server that was working fine for a long time but > today, all of a sudden, it stopped resolving queries. > > More info: https://pastebin.com/iW5YeXgS > > Any ideas? Thanks Based on what I saw in the logs, your resolver is having trouble reaching the internet. It shows problems with both the priming query (./NS) and the trust query (./DNSKEY). Could you try running the following? $ dig +norec @198.41.0.4 . NS $ dig +norec @2001:503:ba3e::2:30 . NS $ dig +norec @198.41.0.4 . DNSKEY $ dig +norec @2001:503:ba3e::2:30 . DNSKEY These manually send the same queries to the internet that your resolver is attempting. Cheers, Casey
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Mon, Mar 13, 2023 at 12:29:44PM +0100, local10 wrote: > Mar 13, 2023, 10:57 by recovery...@enotuniq.net: > > > And now to the serious stuff. > > > > First things first, the log. > > > > Mar 13 05:03:18 tst named[52836]: 13-Mar-2023 05:03:18.963 queries: info: > > client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com > > <http://www.yahoo.com>): query: > > www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) > > Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.631 dnssec: warning: > > managed-keys-zone: Unable to fetch DNSKEY set '.': timed out > > > > The keyword here is not "managed-keys-zone", it's "dnssec". > > > > Second, to put it bluntly, if you force bind9 to do DNSSEC validation > > (which is enabled by default), bind9 won't be able to lookup anything > > unless it is trusting root DNSSEC key. Like, for your own security and > > stuff :) > > > > Third, as every DNSSEC key, root zone keys have their expiration. > > Meaning, you did not have to change anything to break your setup, every > > time you deal with DNSSEC you're dealing with a ticking bomb anyway. > > > > Fourth, Debian packaging helpfully forces bind9 to depend on dns-root-data, > > which should provide a current DNSSEC root key (KSK to be precise), but > > bind9 could also take said key from /etc/bind/bind.keys. > > > > > > In conclusion: > > > > 1) Check the contents of your /etc/bind/bind.keys, update if needed. > > 2) Check the version of your dns-root-data, versions above and including > > 2021011101 (aka ksk id 20326) are good. > > 3) Set "dnssec-validation no;" at named.conf.options as a last resort. > > 4) If you intend to troubleshoot DNS queries then consider installing > > tcpdump. The thing helps. > > > Very interesting, thanks. in the "bind.keys" I have only one entry: > > trust-anchors { > # This key (20326) was published in the root zone in 2017. > . initial-key 257 3 8 "...."; > }; Looks correct, assuming that the contents of the key start with AwEAAaz and end with V74bU=. > But in "/etc/bind/named.conf.options" I have "dnssec-validation > auto;", which, as I understand it should force bind to use the > built-in root key, no? Not exactly. "dnssec-validation auto;" should point BIND at /etc/bind/bind.keys. And bind.keys should be created (or updated) by debconf. At least debconf did it for me back in 2021 during buster->bullseye upgrade. > Anyhow, how would I know if an update of /etc/bind/bind.keys is needed (it's > not obvious just by looking at the key) Obviously you cannot know that ;) Luckily "Root KSK Rollovers", as they call it, are rare. Last one was in 2018, and the key (aka ksk id 20326) in question was released in 2017. > and, if so, how do I update it? Look at /usr/share/dns/root.key. Compare its contents with /etc/bind/bind.keys. Replace the latter if needed. "dpkg-reconfigure -plow bind9" is probably more preferred way of doing it. Reco
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Le 13 mars 2023 local a écrit : > Sure, I could have used some public DNS server and I may have to do that if I > can't get this issue resolved. Still, I'd like to understand why BIND > suddenly stopped working[1] for me and how to fix it. > > Regards, > > 1. It was working fine yesterday and I haven't done any config changes since. Did you check memory and disk space as suggested by jeremy ?
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 11:24 by g...@wooledge.org: > For the record: > > unicorn:~$ sudo ss -ntlp | grep :53 > [sudo] password for greg: > LISTEN 0 20 0.0.0.0:53 0.0.0.0:* > users:(("dnscache",pid=664,fd=4)) > > In general, ss replaces netstat for this kind of query. I don't know > all the options, so you may need to read the manual if this isn't > enough. > I see, thanks. The following is what I have: # ss -ntlp | grep :53 LISTEN 0 10 127.0.0.1:53 0.0.0.0:* users:(("named",pid=6233,fd=19)) LISTEN 0 10 127.0.0.1:53 0.0.0.0:* users:(("named",pid=6233,fd=20)) LISTEN 0 10 xxx.xxx.xxx.xxx:53 0.0.0.0:* users:(("named",pid=6233,fd=25)) LISTEN 0 10 xxx.xxx.xxx.xxx:53 0.0.0.0:* users:(("named",pid=6233,fd=26)) Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 10:57 by recovery...@enotuniq.net: > And now to the serious stuff. > > First things first, the log. > > Mar 13 05:03:18 tst named[52836]: 13-Mar-2023 05:03:18.963 queries: info: > client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com > <http://www.yahoo.com>): query: > www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) > Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.631 dnssec: warning: > managed-keys-zone: Unable to fetch DNSKEY set '.': timed out > > The keyword here is not "managed-keys-zone", it's "dnssec". > > Second, to put it bluntly, if you force bind9 to do DNSSEC validation > (which is enabled by default), bind9 won't be able to lookup anything > unless it is trusting root DNSSEC key. Like, for your own security and > stuff :) > > Third, as every DNSSEC key, root zone keys have their expiration. > Meaning, you did not have to change anything to break your setup, every > time you deal with DNSSEC you're dealing with a ticking bomb anyway. > > Fourth, Debian packaging helpfully forces bind9 to depend on dns-root-data, > which should provide a current DNSSEC root key (KSK to be precise), but > bind9 could also take said key from /etc/bind/bind.keys. > > > In conclusion: > > 1) Check the contents of your /etc/bind/bind.keys, update if needed. > 2) Check the version of your dns-root-data, versions above and including > 2021011101 (aka ksk id 20326) are good. > 3) Set "dnssec-validation no;" at named.conf.options as a last resort. > 4) If you intend to troubleshoot DNS queries then consider installing > tcpdump. The thing helps. > > Reco > Very interesting, thanks. in the "bind.keys" I have only one entry: trust-anchors { # This key (20326) was published in the root zone in 2017. . initial-key 257 3 8 ""; }; But in "/etc/bind/named.conf.options" I have "dnssec-validation auto;", which, as I understand it should force bind to use the built-in root key, no? Anyhow, how would I know if an update of /etc/bind/bind.keys is needed (it's not obvious just by looking at the key) and, if so, how do I update it? Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On Mon, Mar 13, 2023 at 09:19:41AM +0100, local10 wrote: > Mar 13, 2023, 07:25 by jer...@ardley.org: > > > Try > > > > netstat -tulpnW | grep 53 > > > > and see what's listening > > > > Bind seems to be listening on 127.0.0.1 port 53. > > I don't have netstat installed and can't easily install it as aptitude can't > resolve Debian server's name to an IP, so the following is what I tried: For the record: unicorn:~$ sudo ss -ntlp | grep :53 [sudo] password for greg: LISTEN 0 20 0.0.0.0:53 0.0.0.0:* users:(("dnscache",pid=664,fd=4)) In general, ss replaces netstat for this kind of query. I don't know all the options, so you may need to read the manual if this isn't enough.
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Hi. On Mon, Mar 13, 2023 at 10:57:48AM +0100, local10 wrote: > Mar 13, 2023, 09:32 by jer...@ardley.org: > > > My next best option is simply to remove your bind caching server (it sounds > > like it's not really necessary in your application) > > > > Backup /etc/bind and /var/cache/bind > > then > > systemctl remove bind9 > > systemctl purge bind9 LOL. > > And then edit /etc/resolv.conf to > > > > nameserver 8.8.8.8 > > nameserver 8.8.4.4 And redirect all your DNS queries to Google. I mean, people, if you suggest using a public DNS you could at least consider suggesting a privacy-respecting one, like 9.9.9.9. > Sure, I could have used some public DNS server and I may have to do that if I > can't get this issue resolved. Still, I'd like to understand why BIND > suddenly stopped working[1] for me and how to fix it. And now to the serious stuff. First things first, the log. Mar 13 05:03:18 tst named[52836]: 13-Mar-2023 05:03:18.963 queries: info: client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.631 dnssec: warning: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out The keyword here is not "managed-keys-zone", it's "dnssec". Second, to put it bluntly, if you force bind9 to do DNSSEC validation (which is enabled by default), bind9 won't be able to lookup anything unless it is trusting root DNSSEC key. Like, for your own security and stuff :) Third, as every DNSSEC key, root zone keys have their expiration. Meaning, you did not have to change anything to break your setup, every time you deal with DNSSEC you're dealing with a ticking bomb anyway. Fourth, Debian packaging helpfully forces bind9 to depend on dns-root-data, which should provide a current DNSSEC root key (KSK to be precise), but bind9 could also take said key from /etc/bind/bind.keys. In conclusion: 1) Check the contents of your /etc/bind/bind.keys, update if needed. 2) Check the version of your dns-root-data, versions above and including 2021011101 (aka ksk id 20326) are good. 3) Set "dnssec-validation no;" at named.conf.options as a last resort. 4) If you intend to troubleshoot DNS queries then consider installing tcpdump. The thing helps. Reco
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 09:32 by jer...@ardley.org: > My next best option is simply to remove your bind caching server (it sounds > like it's not really necessary in your application) > > Backup /etc/bind and /var/cache/bind > > then > > systemctl remove bind9 > > systemctl purge bind9 > > And then edit /etc/resolv.conf to > > nameserver 8.8.8.8 > nameserver 8.8.4.4 > > and with luck everything will work O.K. > > You can do variants on that to use your ISP DNS servers instead > ... > Sure, I could have used some public DNS server and I may have to do that if I can't get this issue resolved. Still, I'd like to understand why BIND suddenly stopped working[1] for me and how to fix it. Regards, 1. It was working fine yesterday and I haven't done any config changes since.
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/3/23 17:12, local10 wrote: "debug 1;" doesn't seem to be a valid option, couldn't start BIND with it. Anyhow, the following is what I get when running "dig www.yahoo.com" Mar 13 05:03:11 tst systemd[1]: Started named.service - BIND Domain Name Server. Mar 13 05:03:11 tst named[52836]: 13-Mar-2023 05:03:11.639 general: notice: running Mar 13 05:03:18 tst named[52836]: 13-Mar-2023 05:03:18.963 queries: info: client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.631 dnssec: warning: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.711 resolver: info: resolver priming query complete: timed out Mar 13 05:03:23 tst named[52836]: 13-Mar-2023 05:03:23.966 queries: info: client @0x7f7812817b68 127.0.0.1#51554 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:28 tst named[52836]: 13-Mar-2023 05:03:28.970 queries: info: client @0x7f78c9eb1168 127.0.0.1#42404 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 resolver: info: shut down hung fetch while resolving 'www.yahoo.com/A <http://www.yahoo.com/A>' Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f78c9eb1168 127.0.0.1#42404 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f7812817b68 127.0.0.1#51554 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:38 tst named[52836]: 13-Mar-2023 05:03:38.966 resolver: info: resolver priming query complete: timed out My next best option is simply to remove your bind caching server (it sounds like it's not really necessary in your application) Backup /etc/bind and /var/cache/bind then systemctl remove bind9 systemctl purge bind9 And then edit /etc/resolv.conf to nameserver 8.8.8.8 nameserver 8.8.4.4 and with luck everything will work O.K. You can do variants on that to use your ISP DNS servers instead You have to be careful in systemd about network processes overwriting /etc/resolv.conf. e.g. if you get a DHCP address, or if your system is somehow configured to use systemd-resolved which I know to have problems. Actually before your start anything do systemctl status systemd-resolved and if it's not installed things should be fine. You may get systemctl status systemd-resolved ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:systemd-resolved.service(8) man:org.freedesktop.resolve1(5) https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients which is fine also. In any case research on its configuration with man systemd-resolved I recall it uses a local address 127.0.0.53 to receive DNS queries -- Jeremy (Lists)
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 08:31 by jer...@ardley.org: > Sorry. Last message was garbled. Try this in /etc/bind/named.conf.options > > options { > // other configuration options ... > debug 1; > logging { > channel debug_log { > file "/var/log/bind9/debug.log" versions 3 size 5m; > severity dynamic; > print-time yes; > print-severity yes; > print-category yes; > }; > category default { > debug_log; > }; > }; > }; > > also try setting /etc/resolv.conf to your ISP DNS servers - at least to get > software updates > "debug 1;" doesn't seem to be a valid option, couldn't start BIND with it. Anyhow, the following is what I get when running "dig www.yahoo.com" Mar 13 05:03:11 tst systemd[1]: Started named.service - BIND Domain Name Server. Mar 13 05:03:11 tst named[52836]: 13-Mar-2023 05:03:11.639 general: notice: running Mar 13 05:03:18 tst named[52836]: 13-Mar-2023 05:03:18.963 queries: info: client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.631 dnssec: warning: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out Mar 13 05:03:21 tst named[52836]: 13-Mar-2023 05:03:21.711 resolver: info: resolver priming query complete: timed out Mar 13 05:03:23 tst named[52836]: 13-Mar-2023 05:03:23.966 queries: info: client @0x7f7812817b68 127.0.0.1#51554 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:28 tst named[52836]: 13-Mar-2023 05:03:28.970 queries: info: client @0x7f78c9eb1168 127.0.0.1#42404 (www.yahoo.com <http://www.yahoo.com>): query: www.yahoo.com <http://www.yahoo.com> IN A +E(0)K (127.0.0.1) Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 resolver: info: shut down hung fetch while resolving 'www.yahoo.com/A <http://www.yahoo.com/A>' Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f78c9eb1168 127.0.0.1#42404 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f7812816d68 127.0.0.1#38800 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:30 tst named[52836]: 13-Mar-2023 05:03:30.970 query-errors: info: client @0x7f7812817b68 127.0.0.1#51554 (www.yahoo.com <http://www.yahoo.com>): query failed (operation canceled) for www.yahoo.com/IN/A <http://www.yahoo.com/IN/A> at query.c:7775 Mar 13 05:03:38 tst named[52836]: 13-Mar-2023 05:03:38.966 resolver: info: resolver priming query complete: timed out Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/3/23 16:19, local10 wrote: Mar 13, 2023, 07:25 by jer...@ardley.org: Try netstat -tulpnW | grep 53 and see what's listening Bind seems to be listening on 127.0.0.1 port 53. I don't have netstat installed and can't easily install it as aptitude can't resolve Debian server's name to an IP, so the following is what I tried: # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # # # systemctl stop named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused # # # systemctl restart named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # Sorry. Last message was garbled. Try this in /etc/bind/named.conf.options options { // other configuration options ... debug 1; logging { channel debug_log { file "/var/log/bind9/debug.log" versions 3 size 5m; severity dynamic; print-time yes; print-severity yes; print-category yes; }; category default { debug_log; }; }; }; also try setting /etc/resolv.conf to your ISP DNS servers - at least to get software updates -- Jeremy (Lists)
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/3/23 16:19, local10 wrote: Bind seems to be listening on 127.0.0.1 port 53. I don't have netstat installed and can't easily install it as aptitude can't resolve Debian server's name to an IP, so the following is what I tried: # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # # # systemctl stop named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused # # # systemctl restart named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # At this stage I'd suggest wireshark but that won't be an option. Perhaps tcpdump is available? Another option might be to set up a forwarder such as 8.8.8.8 or 1.1.1.1. You can also edit debug options into to /etc/bind/named.conf.options |options { // other configuration options ... // Debug Options debug 1; logging { channel debug_log { file "/var/log/bind9/debug.log" versions 3 size 5m; severity dynamic; print-time yes; print-severity yes; print-category yes; }; category default { debug_log; }; }; // End debug options }; | -- Jeremy (Lists)
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 07:25 by jer...@ardley.org: > Try > > netstat -tulpnW | grep 53 > > and see what's listening > Bind seems to be listening on 127.0.0.1 port 53. I don't have netstat installed and can't easily install it as aptitude can't resolve Debian server's name to an IP, so the following is what I tried: # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # # # systemctl stop named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused # # # systemctl restart named.service # # # telnet -4 127.0.0.1 53 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ^] telnet> quit Connection closed. # Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/3/23 14:34, local10 wrote: Mar 13, 2023, 06:19 by jer...@ardley.org: The contents of /etc/resolv.conf are always of interest. There's really not much there: # cat /etc/resolv.conf nameserver 127.0.0.1 That and /etc/nsswitch.conf a/etc/hosts # cat /etc/nsswitch.conf # /etc/nsswitch.conf # # Example configuration of GNU Name Service Switch functionality. # If you have the `glibc-doc-reference' and `info' packages installed, try: # `info libc "Name Service Switch"' for information about this file. passwd: files group: files shadow: files gshadow: files hosts: files mdns4_minimal [NOTFOUND=return] dns networks: files protocols: db files services: db files ethers: db files rpc: db files netgroup: nis # cat /etc/hosts 127.0.0.1 localhost You should also check if there are any new firewall issues, and that you haven't run out of space somewhere. Finally, you may have forwarder(s) in your bind. It's best to check if that is working No changes were made to the firewall and there are no firewall issues I'm aware of. The forwarder's section in the "/etc/bind/named.conf.options" is commented out so there are no forwarders: // forwarders { // 0.0.0.0; // }; # aptitude show bind9 Package: bind9 Version: 1:9.18.12-1 Try netstat -tulpnW | grep 53 and see what's listening -- Jeremy (Lists)
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Mar 13, 2023, 06:19 by jer...@ardley.org: > The contents of /etc/resolv.conf are always of interest. > There's really not much there: # cat /etc/resolv.conf nameserver 127.0.0.1 > That and /etc/nsswitch.conf a/etc/hosts > # cat /etc/nsswitch.conf # /etc/nsswitch.conf # # Example configuration of GNU Name Service Switch functionality. # If you have the `glibc-doc-reference' and `info' packages installed, try: # `info libc "Name Service Switch"' for information about this file. passwd: files group: files shadow: files gshadow: files hosts: files mdns4_minimal [NOTFOUND=return] dns networks: files protocols: db files services: db files ethers: db files rpc: db files netgroup: nis # cat /etc/hosts 127.0.0.1 localhost > You should also check if there are any new firewall issues, and that you > haven't run out of space somewhere. > > Finally, you may have forwarder(s) in your bind. It's best to check if that > is working > No changes were made to the firewall and there are no firewall issues I'm aware of. The forwarder's section in the "/etc/bind/named.conf.options" is commented out so there are no forwarders: // forwarders { // 0.0.0.0; // }; # aptitude show bind9 Package: bind9 Version: 1:9.18.12-1 Regards,
Re: BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
On 13/3/23 14:08, local10 wrote: Hi, I have a local caching DNS server that was working fine for a long time but today, all of a sudden, it stopped resolving queries. More info: https://pastebin.com/iW5YeXgS Any ideas? Thanks The contents of /etc/resolv.conf are always of interest. That and /etc/nsswitch.conf a/etc/hosts You should also check if there are any new firewall issues, and that you haven't run out of space somewhere. Finally, you may have forwarder(s) in your bind. It's best to check if that is working -- Jeremy (Lists)
BIND: managed-keys-zone: Unable to fetch DNSKEY set '.': timed out
Hi, I have a local caching DNS server that was working fine for a long time but today, all of a sudden, it stopped resolving queries. More info: https://pastebin.com/iW5YeXgS Any ideas? Thanks
Re: Name or Sevice not known - bind
On Thu, Jan 19, 2023 at 09:12:19PM +0100, Maurizio Caloro wrote: > # host -t A pluto.sternbild.m 127.0.0.1 > Using domain server: > Name: 127.0.0.1 > Address: 127.0.0.1#53 > Aliases: > > Host pluto.sternbild.m not found: 3(NXDOMAIN) Hmm. In your previous message, you have: > # cat /etc/bind/named.conf.local > // > // Do any local configuration here > // > > zone "ns1.sternbild.m" { > type master; > file "/var/cache/bind/db.sternbild.m"; >}; > zone "D.C.B.in-addr.arpa" { > type master; > file "/var/cache/bind/db.reverse.sternbild.m"; > allow-query { any; }; >}; The most obvious issue here is that you don't have a "sternbild.m" zone definition here. You've got "ns1.sternbild.m" as a zone, but that's a hostname. Try changing that to zone "sternbild.m". Other issues: It seems strange that one zone has the "allow-query { any; };" line while the other does not. Either both zones should need it, or neither one should need it, I would think. Your "master" zones should be in /etc/bind/ rather than /var/cache/bind/ according to the README.Debian file. The /var/cache/bind/ directory should only contain information that can be recreated (e.g. secondary zones that can be re-pulled from the primary server). I doubt that's actually causing a problem, but it's something you should probably clean up eventually.
Re: Name or Sevice not known - bind
Am 19.01.2023 um 20:24 schrieb Greg Wooledge: On Thu, Jan 19, 2023 at 07:45:34PM +0100, Maurizio Caloro wrote: Let's start here. Why do you have multiple nameserver lines here? Which one is the bind9 server that you're configuring? That should be the only line. If the bind9 server is 127.0.0.1 then just keep that line, and remove the other two (or comment them out with semicolons). When you've got multiple nameserver lines, the resolver library will use whichever one it wants to use first. If that one returns NXDOMAIN, then it stops there. It doesn't look at the other lines. It will only look at the other lines if there's no response at all from the first nameserver it tries. The ping command is not the best choice for debugging DNS server setups. The major issue here is that you don't know which nameserver was used to get this result. There are dedicated tools for debugging DNS, including "host" and "dig" in the dnsutils package (on Debian 10 which you're using, or in bind9-host in Debian 11). Start with those. dig @127.0.0.1 A pluto.sternbild.m host -t A pluto.sternbild.m 127.0.0.1 Either of these commands will request the "A" record for pluto.sternbild.m from the DNS resolver at 127.0.0.1. I'm guessing that's the one you're trying to use and debug. You can try both and see which one you like better. Of the two commands, dig is the more feature-rich one, should you need to go into more detail. Since you have two other nameserver lines, you don't know which one(s) are returning the NXDOMAIN error, you might want to probe all three with dig or host. first let me thanks for you quick answer, thanks! i see and understod, i have now only search sternbild.m, and 127.0.0.1, on resolv.conf please see me result # dig @127.0.0.1 A pluto.sternbild.m ; <<>> DiG 9.11.5-P4-5.1+deb10u8-Debian <<>> @127.0.0.1 A pluto.sternbild.m ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 7559 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 46f08b6124e3fe216e3fe97663c99d8e691938f0921a7d42 (good) ;; QUESTION SECTION: ;pluto.sternbild.m. IN A ;; AUTHORITY SECTION: . 10699 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2023011901 1800 900 604800 86400 ;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Jan 19 20:44:14 CET 2023 ;; MSG SIZE rcvd: 146 - # host -t A pluto.sternbild.m 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: Host pluto.sternbild.m not found: 3(NXDOMAIN) - # cat /etc/resolv.conf search sternbild.m nameserver 127.0.0.1 - ok see that systemd-resolveconf are running, ok stoped! # netstat -plnt | grep ':53' tcp 0 0 0.0.0.0:5355 0.0.0.0:* LISTEN 32075/systemd-resol tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 17207/named tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 32075/systemd-resol tcp6 0 0 :::5355 :::* LISTEN 32075/systemd-resol tcp6 0 0 :::53 :::* LISTEN 17207/named # systemctl stop systemd-resolved.service # netstat -plnt | grep ':53' tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 17207/named tcp6 0 0 :::53 :::* LISTEN 17207/named - bind are restarted and running # systemctl status bind9 ● bind9.service - BIND Domain Name Server Loaded: loaded (/lib/systemd/system/bind9.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2023-01-19 21:09:12 CET; 4s ago Docs: man:named(8) Process: 17455 ExecStart=/usr/sbin/named $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 17456 (named) Tasks: 5 (limit: 2358) Memory: 12.3M - but no answer # ping pluto ping: pluto: Name or service not known # ping pluto.sternbild.m ping: pluto.sternbild.m: Name or service not known --
Re: Name or Sevice not known - bind
On Thu, Jan 19, 2023 at 07:45:34PM +0100, Maurizio Caloro wrote: > fighting little with bind9, on Debian 10.13, in my opinion appair right, but > # cat /etc/resolv.conf > search sternbild.m > nameserver 127.0.0.1 > nameserver A.B.C.D -> other Nameservers > nameserver A.B.C.D -> other Nameservers Let's start here. Why do you have multiple nameserver lines here? Which one is the bind9 server that you're configuring? That should be the only line. If the bind9 server is 127.0.0.1 then just keep that line, and remove the other two (or comment them out with semicolons). When you've got multiple nameserver lines, the resolver library will use whichever one it wants to use first. If that one returns NXDOMAIN, then it stops there. It doesn't look at the other lines. It will only look at the other lines if there's no response at all from the first nameserver it tries. > # ping pluto.sternbild.m > ping: pluto.sternbild.m: Name or service not known The ping command is not the best choice for debugging DNS server setups. The major issue here is that you don't know which nameserver was used to get this result. There are dedicated tools for debugging DNS, including "host" and "dig" in the dnsutils package (on Debian 10 which you're using, or in bind9-host in Debian 11). Start with those. dig @127.0.0.1 A pluto.sternbild.m host -t A pluto.sternbild.m 127.0.0.1 Either of these commands will request the "A" record for pluto.sternbild.m from the DNS resolver at 127.0.0.1. I'm guessing that's the one you're trying to use and debug. You can try both and see which one you like better. Of the two commands, dig is the more feature-rich one, should you need to go into more detail. > # ping ns1.sternbild.m > ping: ns1.sternbild.m: Name or service not known > > # ping ns1 > ping: ns1: Name or service not known Since you have two other nameserver lines, you don't know which one(s) are returning the NXDOMAIN error, you might want to probe all three with dig or host.
Name or Sevice not known - bind
hello fighting little with bind9, on Debian 10.13, in my opinion appair right, but arn't possible to ping local/inside the Client that i have add to my config. information this machine running on a VPS envirment. also the checks are positiv # named-checkzone sternbild.m /var/cache/bind/db.sternbild.m zone sternbild.m/IN: loaded serial 2023230217 OK # named-checkzone sternbild.m /var/cache/bind/db.reverse.sternbild.m zone sternbild.m/IN: loaded serial 2023230817 OK - # ping pluto.sternbild.m ping: pluto.sternbild.m: Name or service not known # ping ns1.sternbild.m ping: ns1.sternbild.m: Name or service not known # ping ns1 ping: ns1: Name or service not known - # cat /etc/resolv.conf search sternbild.m nameserver 127.0.0.1 nameserver A.B.C.D -> other Nameservers nameserver A.B.C.D -> other Nameservers - # /var/cache/bind# cat db.sternbild.m ; ; BIND data file for broadcast zone ; $TTL 3600 @ IN SOA ns1.sternbild.m. root.sternbild.m. ( 2023230217 ; Serial 3600 ; Refresh 600 ; Retry 86400 ; Expire 600 ) ; Negative Cache TTL ; @ IN NS ns1.sternbild.m. @ IN A 127.0.0.1 @ IN ::1 ns1 IN A 37.B.C.D pluto IN A 37.B.C.D - # cat db.reverse.sternbild.m ; ; BIND reverse data file for broadcast zone ; $TTL 3600 @ IN SOA ns1.sternbild.m. root.sternbild.m. ( 2023230817 ; Serial 3600 ; Refresh 600 ; Retry 86400 ; Expire 600 ) ; Negative Cache TTL ; @ IN NS ns1. 188 IN PTR ns1.sternbild.m ; @ IN A 127.0.0.1 ; @ IN ::1 188 IN PTR ns1.sternbild.m. 188 IN PTR pluto.sternbild.m. - # cat /etc/bind/named.conf.local // // Do any local configuration here // zone "ns1.sternbild.m" { type master; file "/var/cache/bind/db.sternbild.m"; }; zone "D.C.B.in-addr.arpa" { type master; file "/var/cache/bind/db.reverse.sternbild.m"; allow-query { any; }; }; include "/etc/bind/zones.rfc1918";
Re: Q 2 Bind?
On Wed, Jun 22, 2022 at 09:13:20PM +0200, Maurizio Caloro wrote: > pls, why this key arnt found? the file are here in this folder and write > *root:bind* it's set > > Jun 22 21:03:56 nmail named[27607]: zone 127.in-addr.arpa/IN: loaded serial > 1 > Jun 22 21:03:56 nmail named[27607]: zone 190.120.37.in-addr.arpa/IN: loaded > serial 1 > Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: > sig-re-signing-interval less than 3 * refresh. > Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: loaded serial 1 > (DNSSEC signed) > Jun 22 21:03:56 nmail named[27607]: all zones loaded > Jun 22 21:03:56 nmail named[27607]: running > Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: reconfiguring zone > keys > Jun 22 21:03:56 nmail named[27607]: *dns_dnssec_keylistfromrdataset: error > reading Kcaloro.nmail.+008+29553.private: file not found* > Jun 22 21:03:56 nmail named[27607]: *dns_dnssec_keylistfromrdataset: error > reading Kcaloro.nmail.+008+46817.private: file not found* > Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: next key event: > 22-Jun-2022 22:03:56.654 The file names have no path (are relative). Are you sure the program is looking "here in this folder" (wherever that is)? Cheers -- t signature.asc Description: PGP signature
Q 2 Bind?
pls, why this key arnt found? the file are here in this folder and write *root:bind* it's set Jun 22 21:03:56 nmail named[27607]: zone 127.in-addr.arpa/IN: loaded serial 1 Jun 22 21:03:56 nmail named[27607]: zone 190.120.37.in-addr.arpa/IN: loaded serial 1 Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: sig-re-signing-interval less than 3 * refresh. Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: loaded serial 1 (DNSSEC signed) Jun 22 21:03:56 nmail named[27607]: all zones loaded Jun 22 21:03:56 nmail named[27607]: running Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: reconfiguring zone keys Jun 22 21:03:56 nmail named[27607]: *dns_dnssec_keylistfromrdataset: error reading Kcaloro.nmail.+008+29553.private: file not found* Jun 22 21:03:56 nmail named[27607]: *dns_dnssec_keylistfromrdataset: error reading Kcaloro.nmail.+008+46817.private: file not found* Jun 22 21:03:56 nmail named[27607]: zone caloro.nmail/IN: next key event: 22-Jun-2022 22:03:56.654 thanks
Re: WARNING: debian11 + bind-9.16.15 + dnssec-policy in options{} = crashes
On Mon, 16 Aug 2021, raf wrote: > If like me, you've been eagerly awaiting debian11 to > get bind-9.16.15, which finally lets you implement > DNSSEC extremely easily on debian stable, I have a > warning. And I have another: make sure your system clock is correct. DNSSEC will fail if system time is too far off. There is a chicken-and-egg problem between NTP and DNSSEC if your first time sync depends on DNS to resolve the ntp server address *and* the system does not have a (correct) real-time clock. -- Henrique Holschuh
WARNING: debian11 + bind-9.16.15 + dnssec-policy in options{} = crashes
Hi, If like me, you've been eagerly awaiting debian11 to get bind-9.16.15, which finally lets you implement DNSSEC extremely easily on debian stable, I have a warning. Bind has a dnssec-policy {} stanza for defining your own policy if you're feeling adventurous, but there's also a default policy. And there's a dnssec-policy usage directive to specify which dnssec-policy should be applied to zones. Bind's documentation says that the dnssec-policy usage directive can either appear in the options {} stanza, so as to apply to all zones, or it can appear in individual zone {} stanzas. My advice is: DO NOT PUT DNSSEC-POLICY IN THE OPTIONS {} STANZA. ONLY PUT DNSSEC-POLICY IN THE ZONE {} STANZAS. I put it in the options {} stanza, not realising that "all zones" doesn't just mean all of *my* authoritative zones. It really means ALL zones. That means every zone /etc/bind/named.conf.local (i.e. my zones), as well as every zone in /etc/bind/named.conf.default-zones i.e.: localhost 127.in-addr.arpa 0.in-addr.arpa 255.in-addr.arpa And, if you uncomment the include "/etc/bind/zones.rfc1918" in /etc/bind/named.conf.local, then it also means all of those zones as well: 16.172.in-addr.arpa 17.172.in-addr.arpa ... 31.172.in-addr.arpa 168.192.in-addr.arpa What happens next is that bind tries and fails to create .jnl files in /etc/bind for these zones. Apparmor or the directory permissions prevents it. This sort of thing appears in the logs: general: error: /etc/bind/db.empty.jnl: create: permission denied general: error: /etc/bind/db.255.jnl: create: permission denied Then bind gets an assertion failure and exits: general: notice: all zones loaded general: notice: running general: critical: rbtdb.c:6780: REQUIRE(((rbtnode->nsec == DNS_RBT_NSEC_NSEC3 && (rdataset->type == ((dns_rdatatype_t)dns_rdatatype_nsec3) || rdataset->covers == ((dns_rdatatype_t)dns_rdatatype_nsec3))) || (rbtnode->nsec != DNS_RBT_NSEC_NSEC3 && rdataset->type != ((dns_rdatatype_t)dns_rdatatype_nsec3) && rdataset->covers != ((dns_rdatatype_t)dns_rdatatype_nsec3 failed, back trace general: critical: #0 0x558ce49ffeed in ?? general: critical: #1 0x7fd079be6d9a in ?? general: critical: #2 0x7fd079d7f73c in ?? general: critical: #3 0x7fd079e45680 in ?? general: critical: #4 0x7fd079c1b720 in ?? general: critical: #5 0x7fd079c20f52 in ?? general: critical: #6 0x7fd07995cea7 in ?? general: critical: #7 0x7fd079590def in ?? general: critical: exiting (due to assertion failure) This repeats again and again until you work out what happened, clean everything up, remove the dnssec-policy from the options {} stanza, and restart bind. And, unless I went temporarily insane, it even managed somehow to overwrite my source zonefiles with signed versions, and I had to restore them from backup. When it works properly, it puts the signed versions in separate files. However, if you put the dnssec-policy usage directive in the zone {} stanzas instead, it's absolutely brilliant. So, go nuts. DNSSEC all the zones! Well, not *ALL* the zones. You know what I mean. :-) cheers, raf
Re: sshd fails to bind to port to IP on boot
solved issue ... thank u On Fri, Sep 27, 2019 at 11:55 AM Greg Wooledge wrote: > On Fri, Sep 27, 2019 at 11:44:25AM -0400, yoda woya wrote: > > The public interface is listed defined as > > > > # The public network interface > > allow-hotplug eno1 > > iface eno1 inet static > > address x.x.x.x > > > > > > But I have that same configuration on another server and it works fine. > > Replace allow-hotplug with auto. > >
Re: sshd fails to bind to port to IP on boot
On Fri, Sep 27, 2019 at 11:44:25AM -0400, yoda woya wrote: > The public interface is listed defined as > > # The public network interface > allow-hotplug eno1 > iface eno1 inet static > address x.x.x.x > > > But I have that same configuration on another server and it works fine. Replace allow-hotplug with auto.
Re: sshd fails to bind to port to IP on boot
Hi. Please do not top-post. On Fri, Sep 27, 2019 at 11:51:08AM -0400, yoda woya wrote: > How can I use to solve the problem: > > "ssh.service has "After=network.target", and network.target only waits > for interfaces marked as "auto" to come up." You have this in your /etc/network/interfaces: The public interface is listed defined as # The public network interface allow-hotplug eno1 iface eno1 inet static address x.x.x.x What Greg is telling you is that you should have this instead: The public interface is listed defined as # The public network interface auto eno1 iface eno1 inet static address x.x.x.x Reco
Re: sshd fails to bind to port to IP on boot
How can I use to solve the problem: "ssh.service has "After=network.target", and network.target only waits for interfaces marked as "auto" to come up." On Fri, Sep 27, 2019 at 11:26 AM Greg Wooledge wrote: > On Fri, Sep 27, 2019 at 11:16:51AM -0400, yoda woya wrote: > > Below is the error I get. However the service works at boot if > > InternetAddress is commented out or set to 0.0.0.0. The service works > > manually ( /etc/init.d/ssh start) > > -- Subject: A start job for unit ssh.service has begun execution > > -- A start job for unit ssh.service has begun execution. > > Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x > > failed: Cannot assign requested address. > > Sep 27 10:52:31 nat6pub sshd[690]: fatal: Cannot bind any address. > > Sep 27 10:52:31 nat6pub systemd[1]: ssh.service: Main process exited, > > code=exited, status=255/EXCEPTION > > Sounds like the x.x.x.x address doesn't exist at the time ssh.service > is trying to run. The most likely reasons for this are that your x.x.x.x > address is assigned to an interface that's configured later in the boot > process (e.g. a wireless interface), or that it's assigned to an > interface which is marked as "allow-hotplug" rather than "auto" in > /etc/network/interfaces. > > ssh.service has "After=network.target", and network.target only waits > for interfaces marked as "auto" to come up. If that. > >
Re: sshd fails to bind to port to IP on boot
The public interface is listed defined as # The public network interface allow-hotplug eno1 iface eno1 inet static address x.x.x.x But I have that same configuration on another server and it works fine. On Fri, Sep 27, 2019 at 11:42 AM yoda woya wrote: > # The public network interface > allow-hotplug eno1 > iface eno1 inet static > address 128.59.176.101 > > On Fri, Sep 27, 2019 at 11:25 AM Dan Ritter wrote: > >> yoda woya wrote: >> > Below is the error I get. However the service works at boot if >> > InternetAddress is commented out or set to 0.0.0.0. The service works >> > manually ( /etc/init.d/ssh start) >> > -- Subject: A start job for unit ssh.service has begun execution >> > -- A start job for unit ssh.service has begun execution. >> > Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x >> > failed: Cannot assign requested address. >> >> >> Do you have an existing interface with x.x.x.x assigned to it? >> >> -dsr- >> >
Re: sshd fails to bind to port to IP on boot
# The public network interface allow-hotplug eno1 iface eno1 inet static address 128.59.176.101 On Fri, Sep 27, 2019 at 11:25 AM Dan Ritter wrote: > yoda woya wrote: > > Below is the error I get. However the service works at boot if > > InternetAddress is commented out or set to 0.0.0.0. The service works > > manually ( /etc/init.d/ssh start) > > -- Subject: A start job for unit ssh.service has begun execution > > -- A start job for unit ssh.service has begun execution. > > Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x > > failed: Cannot assign requested address. > > > Do you have an existing interface with x.x.x.x assigned to it? > > -dsr- >
Re: sshd fails to bind to port to IP on boot
On Fri, Sep 27, 2019 at 11:16:51AM -0400, yoda woya wrote: > Below is the error I get. However the service works at boot if > InternetAddress is commented out or set to 0.0.0.0. The service works > manually ( /etc/init.d/ssh start) > -- Subject: A start job for unit ssh.service has begun execution > -- A start job for unit ssh.service has begun execution. > Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x > failed: Cannot assign requested address. > Sep 27 10:52:31 nat6pub sshd[690]: fatal: Cannot bind any address. > Sep 27 10:52:31 nat6pub systemd[1]: ssh.service: Main process exited, > code=exited, status=255/EXCEPTION Sounds like the x.x.x.x address doesn't exist at the time ssh.service is trying to run. The most likely reasons for this are that your x.x.x.x address is assigned to an interface that's configured later in the boot process (e.g. a wireless interface), or that it's assigned to an interface which is marked as "allow-hotplug" rather than "auto" in /etc/network/interfaces. ssh.service has "After=network.target", and network.target only waits for interfaces marked as "auto" to come up. If that.
Re: sshd fails to bind to port to IP on boot
yoda woya wrote: > Below is the error I get. However the service works at boot if > InternetAddress is commented out or set to 0.0.0.0. The service works > manually ( /etc/init.d/ssh start) > -- Subject: A start job for unit ssh.service has begun execution > -- A start job for unit ssh.service has begun execution. > Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x > failed: Cannot assign requested address. Do you have an existing interface with x.x.x.x assigned to it? -dsr-
Re: sshd fails to bind to port to IP on boot
Below is the error I get. However the service works at boot if InternetAddress is commented out or set to 0.0.0.0. The service works manually ( /etc/init.d/ssh start) -- Subject: A start job for unit ssh.service has begun execution -- A start job for unit ssh.service has begun execution. Sep 27 10:52:31 nat6pub sshd[690]: error: Bind to port 2022 on x.x.x.x failed: Cannot assign requested address. Sep 27 10:52:31 nat6pub sshd[690]: fatal: Cannot bind any address. Sep 27 10:52:31 nat6pub systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION -- An ExecStart= process belonging to unit ssh.service has exited. Sep 27 10:52:31 nat6pub systemd[1]: ssh.service: Failed with result 'exit-code'. -- The unit ssh.service has entered the 'failed' state with result 'exit-code'. -- Subject: A start job for unit ssh.service has failed -- A start job for unit ssh.service has finished with a failure. -- Subject: A start job for unit ssh.service has begun execution -- A start job for unit ssh.service has begun execution. -- Subject: A start job for unit ssh.service has finished successfully -- A start job for unit ssh.service has finished successfully. On Thu, Sep 26, 2019 at 6:23 PM Roberto C. Sánchez wrote: > On Thu, Sep 26, 2019 at 05:34:02PM -0400, yoda woya wrote: > >when I use this, the binding fails: > >Port 2022 > >#AddressFamily any > >ListenAddress x.x.x.x > >#ListenAddress :: > >but if I do , it binds it to the ip on boot > >Port 2022 > >#AddressFamily any > >#ListenAddress x.x.x > >#ListenAddress :: > >How can i fix this. I want sshd to run only on this one IP > > What is the exact error message when it fails? > > Regards, > > -Roberto > -- > Roberto C. Sánchez > >
Re: sshd fails to bind to port to IP on boot
On Thu, Sep 26, 2019 at 05:34:02PM -0400, yoda woya wrote: > when I use this, the binding fails: > Port 2022 > #AddressFamily any > ListenAddress x.x.x.x > #ListenAddress :: > > but if I do , it binds it to the ip on boot > Port 2022 > #AddressFamily any > #ListenAddress x.x.x > #ListenAddress :: > > > How can i fix this. I want sshd to run only on this one IP Are you sure that specific interface is up at the time sshd starts? To double check that, you could try to restart sshd manually (check with your init's system's instructions) once the machine is up: does it then succeed in binding? Cheers -- t signature.asc Description: Digital signature
Re: sshd fails to bind to port to IP on boot
On Thu, Sep 26, 2019 at 05:34:02PM -0400, yoda woya wrote: >when I use this, the binding fails: >Port 2022 >#AddressFamily any >ListenAddress x.x.x.x >#ListenAddress :: >but if I do , it binds it to the ip on boot >Port 2022 >#AddressFamily any >#ListenAddress x.x.x >#ListenAddress :: >How can i fix this. I want sshd to run only on this one IP What is the exact error message when it fails? Regards, -Roberto -- Roberto C. Sánchez
sshd fails to bind to port to IP on boot
when I use this, the binding fails: Port 2022 #AddressFamily any ListenAddress x.x.x.x #ListenAddress :: but if I do , it binds it to the ip on boot Port 2022 #AddressFamily any #ListenAddress x.x.x #ListenAddress :: How can i fix this. I want sshd to run only on this one IP
Re: bind9 startup problems: /var/cache /bind
I tested my suspicion that bind9-resolvconf was somehow implicated in the bind9 start problems by returning bind9-resolvconf to its original, disabled, state and restarting the system. Unfortunately, it didn't help: May 25 19:05:34 barley named[804]: /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found But at least one theory has been eliminated. I also reviewed permissions and ownership of /var/cache/bind and the "directory" directive in named.conf.options for consistency with Debian post-install scripts and packaged files. There weren't any differences. Ross
Re: bind9 startup problems: /var/cache /bind
On Wed, May 22, 2019 at 2:47 PM Richard Hector wrote: > > RequiresMountsFor=/absolute/path/of/mount > > .. to go in the unit file - or IIRC running: > > sudo systemctl edit bind9.service > > ... and putting in: > > ---8< > [Unit] > RequiresMountsFor=/var > ---8< > > ... followed by: > sudo systemctl daemon-reload > Thank you for the very clear instructions. I wish README.Debian for systemd said how you're supposed to handle /etc, since it's somewhat non-standard. I tried it. Unfortunately, it didn't work. After rebooting I still get ------ May 22 18:38:49 barley named[829]: loading configuration from '/etc/bind/named.conf' May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2: parsing failed: file not found May 22 18:38:49 barley named[829]: loading configuration: file not found May 22 18:38:49 barley named[829]: exiting (due to fatal error) Again, when I restart the service it succeeds. As a check to see if my options took, systemctl show bind9 does have the lines (leaving in some resolvconf stuff because of my suspicions): - Requires=sysinit.target -.mount var.mount system.slice Wants=nss-lookup.target bind9-resolvconf.service ConsistsOf=bind9-resolvconf.service Before=bind9-resolvconf.service multi-user.target shutdown.target nss-lookup.target After=basic.target system.slice -.mount systemd-journald.socket sysinit.target var.mount network.target RequiresMountsFor=/var --- Finally, ls -ld /var/cache/bind drwxrwxr-x 2 root bind 4096 May 22 16:50 /var/cache/bind The manual restart of bin began just before 18:50 local time. Ross
Re: bind9 startup problems: /var/cache /bind
On 23/05/19 9:08 AM, Ross Boylan wrote: > /var is a separate file system, and like / it's encrypted, so it might > take a bit of time to activate it. Whether it's available when > needed, I don't know, though the error suggests it might not be. > Could systemd be launching services while some of the mounts (and the > required decryption) are still to be done? > > Is there some systemd way to ensure the file system is mounted before > launching bind? But I'd think if /var weren't available, bind > wouldn't be the only one with a problem. Well, I don't see anything in bind9.service to prevent it starting. I'm not sure what the best dependency is ... A bit of web searching finds: https://unix.stackexchange.com/questions/246935/set-systemd-service-to-execute-after-fstab-mount ... which suggests: RequiresMountsFor=/absolute/path/of/mount .. to go in the unit file - or IIRC running: sudo systemctl edit bind9.service ... and putting in: ---8< [Unit] RequiresMountsFor=/var ---8< ... followed by: sudo systemctl daemon-reload Not tested. Cheers, Richard signature.asc Description: OpenPGP digital signature
Re: bind9 startup problems: /var/cache /bind
/var is a separate file system, and like / it's encrypted, so it might take a bit of time to activate it. Whether it's available when needed, I don't know, though the error suggests it might not be. Could systemd be launching services while some of the mounts (and the required decryption) are still to be done? Is there some systemd way to ensure the file system is mounted before launching bind? But I'd think if /var weren't available, bind wouldn't be the only one with a problem. Ross
Re: bind9 startup problems: /var/cache /bind
On 23/05/19 8:00 AM, Ross Boylan wrote: > At system start, bind9 fails to start on a recently created buster > system. Some of the local bind is based on configuration from an > earlier bind. The logs show > /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' > failed: file not found > > But if I then start it manually via systemctl, it starts. But then I > need to fix up other services which were counting on working name > resolution when they started. Is /var/cache (or /var) a separate filesystem, that might not be mounted in time at boot? Richard signature.asc Description: OpenPGP digital signature
bind9 startup problems: /var/cache /bind
At system start, bind9 fails to start on a recently created buster system. Some of the local bind is based on configuration from an earlier bind. The logs show /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found But if I then start it manually via systemctl, it starts. But then I need to fix up other services which were counting on working name resolution when they started. /var/cache/bind exists (at least right now, with bind running). Somewhat oddly it has a recent timestamp that coincided with a chron.hourly run, but not much other activity I see in the log. I started experiencing this problem when I activated the bind9-resolvconf service, though it's very simple and I don't see how it could matter. Internet search turned up https://serverfault.com/questions/404219/bind-9-8-not-loading-var-cache-bind-failed-file-not-found with the response "make the directory". Except I have the directory. Also, README.Debian says "The working directory for named is now /var/cache/bind." So it seems like something the package should have created on installation, or at least dynamically as it starts. Double-checked that apparmor seems to have entries that match. Unless the trailing slash is a problem? /var/cache/bind/** lrw, /var/cache/bind/ rw, That is, the program is trying to open /var/cache/bind, but the pattern is /var/cache/bind/. Of course, if it were an apparmor problem then my later restarts would have failed too, and they didn't. Some kind of race condition? The bind9 daemon is running as the bind user. Ideas? Thanks. Ross Boylan
Re: bind gets permission errors in buster--systemd-related?
On Wed, May 15, 2019 at 10:39 AM Sven Joachim wrote: > I am not really familiar with apparmor or resolvconf, but in > /etc/apparmor.d/usr.sbin.named I found the following: > > , > | # support for resolvconf > | /{,var/}run/named/named.options r, > ` > > which suggests that the standard way would be to use > /run/named/named.options rather than /run/named/named.resolvers. > Alternatively, you may put the following line into > /etc/apparmor.d/local/usr.sbin.named: > > /{,var/}run/named/named.resolvers r, Yep. Not only that, but just below that is # some people like to put logs in /var/log/named/ instead of having # syslog do the heavy lifting. /var/log/named/** rw, /var/log/named/ rw, so if I switch my logs to there (and rename the directory), instead of /var/log/bind, the logging should work too. Or I could add apparmor entries for /var/log/bind. I'm still trying to figure out what, if anything, is necessary for revised apparmor settings to take effect. Thanks.
Re: bind gets permission errors in buster--systemd-related?
I also have a similar problem accessing /run/named. bind can't create the directory or any files in it. The error messages: couldn't mkdir '//run/named': Permission denied could not create //run/named/session.key Apparmor problems can be fixed by running aa-logprof and selecting the best "fix" for your system. I have done that if needed over the months since apparmor was installed. The other problem is that /run is a type tmpfs so it is created after each boot so any manual fixes are lost after a reboot. I also have the same problem for the apt-cacher-ng program. Since this machine is my router for my home network it is rarely rebooted so I have a temporary fix by running the following script manually: cd /run mkdir named chown bind.bind named systemctl restart bind9 mkdir apt-cacher-ng chown apt-cacher-ng.apt-cacher-ng apt-cacher-ng systemctl restart apt-cacher-ng My /etc/bind config directory has no reference to /run. I do see a /run/resolvconf directory which has resolv.conf in it pointing to localhost and search domain. This seems correct since bind is listening on localhost and you want to actually use bind to get and cache dns requests. My bind is version 9.11.5.P4+dfsg-5. -- *...Bob*
Re: bind gets permission errors in buster--systemd-related?
On 2019-05-15 09:33 -0700, Ross Boylan wrote: > Sven, thanks for the tip about AppArmor. Yet another presumably > complicated system I've avoided learning about til now. I guess it's > time. > > As to why bind is trying to open /run/named/named.resolvers: that is a > customized integration with resolvconf. It is not the default, but it > is something I want to work. Or I need an alternate way to achieve > the same functionality, which is that when resolvconf gets info on > nameservers it passes that on to bind. I am not really familiar with apparmor or resolvconf, but in /etc/apparmor.d/usr.sbin.named I found the following: , | # support for resolvconf | /{,var/}run/named/named.options r, ` which suggests that the standard way would be to use /run/named/named.options rather than /run/named/named.resolvers. Alternatively, you may put the following line into /etc/apparmor.d/local/usr.sbin.named: /{,var/}run/named/named.resolvers r, Cheers, Sven
Re: bind gets permission errors in buster--systemd-related?
On Wed, May 15, 2019 at 12:11:58PM -0400, Lee wrote: > The way I fixed my permission problems after telling bind to log to a > file instead of syslog was > su - > to become root > su bind > which didn't work because > # grep bind /etc/passwd > bind:x:116:119::/var/cache/bind:/bin/false > so edit /etc/passwd and change '/bin/false' to '/bin/sh' > su bind > then worked, so > /usr/sbin/named -g > to see all the errors. Adjust permissions, start bind as a daemon and > edit /etc/passwd to change '/bin/sh' back to '/bin/false' If sudo is installed, you can simply do sudo -u bind -s to start a shell as that user despite what /etc/passwd says. Of course, you need permission to use sudo.
Re: bind gets permission errors in buster--systemd-related?
Sven, thanks for the tip about AppArmor. Yet another presumably complicated system I've avoided learning about til now. I guess it's time. As to why bind is trying to open /run/named/named.resolvers: that is a customized integration with resolvconf. It is not the default, but it is something I want to work. Or I need an alternate way to achieve the same functionality, which is that when resolvconf gets info on nameservers it passes that on to bind. Lee, I don't think this is a vanilla permission problem. As I thought the comments in the original indicated, ownership and permissions look as if they should be good for the bind user, and I even su'd to bind and was able to access one of the files that the bind daemon said it couldn't access. Ross
Re: bind gets permission errors in buster--systemd-related?
On 5/15/19, Ross Boylan wrote: > I have a new buster system with a bind setup based on (much) older* > systems, on which it worked fine. On buster, it doesn't. > In two different places in my configuration I referred to files or > directories that were outside of bind proper, and in both cases this > failed with permission problems. > I'm pretty sure bind is running under systemd, and have seen various > references to systemd limiting access to the file system. However, I > don't see anything that appears to be requesting such limits for > bind9, or in general. /var is a different partition from /, and I > configured bind to run as an ordinary user. > > Any ideas what's going on, or what I can do to fix it? You're not showing file or directory permissions, so it's hard to guess. The way I fixed my permission problems after telling bind to log to a file instead of syslog was su - to become root su bind which didn't work because # grep bind /etc/passwd bind:x:116:119::/var/cache/bind:/bin/false so edit /etc/passwd and change '/bin/false' to '/bin/sh' su bind then worked, so /usr/sbin/named -g to see all the errors. Adjust permissions, start bind as a daemon and edit /etc/passwd to change '/bin/sh' back to '/bin/false' Regards, Lee > > // RB modified resolv.conf with custom > /etc/resolvconf/update.d/bind9 to create this file. > //include "/run/named/named.resolvers"; > /* Error was > May 11 12:46:27 barley named[15935]: loading configuration from > '/etc/bind/named.conf' > May 11 12:46:27 barley named[15935]: /etc/bind/named.conf.options:18: > open: /run/named/named.resolvers: permission denied > May 11 12:46:27 barley named[15935]: loading configuration: permission > denied > May 11 12:46:27 barley named[15935]: exiting (due to fatal error) > > The script clearly starts as the bind user, and when I su to bind I > can cat the file. > */ > > Second, I had a bunch of logging directives like > logging { > /* permission problems opening the log files. Not sure why. > channel update_debug{ > file "/var/log/bind/dnsupdate.log"; > severity debug 3; > print-category yes; > print-severity yes; > print-time yes; > }; > */ > /var/log/bind is owned by bind. > > For now I just commented the problems out, but I'd like it to work. > For one thing, my network configuration is not static. > > Thanks. > Ross > > *Specifically bind9 (1:9.8.4.dfsg.P1-6+nmu2+deb7u20) wheezy-security > >
Re: bind gets permission errors in buster--systemd-related?
On 2019-05-14 21:50 -0700, Ross Boylan wrote: > I have a new buster system with a bind setup based on (much) older* > systems, on which it worked fine. On buster, it doesn't. > In two different places in my configuration I referred to files or > directories that were outside of bind proper, and in both cases this > failed with permission problems. > I'm pretty sure bind is running under systemd, and have seen various > references to systemd limiting access to the file system. However, I > don't see anything that appears to be requesting such limits for > bind9, or in general. /var is a different partition from /, and I > configured bind to run as an ordinary user. > > Any ideas what's going on, or what I can do to fix it? Most likely this has nothing to do with systemd, rather it's apparmor which denies access to /run/named/named.resolvers. > // RB modified resolv.conf with custom > /etc/resolvconf/update.d/bind9 to create this file. > //include "/run/named/named.resolvers"; > /* Error was > May 11 12:46:27 barley named[15935]: loading configuration from > '/etc/bind/named.conf' > May 11 12:46:27 barley named[15935]: /etc/bind/named.conf.options:18: > open: /run/named/named.resolvers: permission denied The question is why your /etc/bind/named.conf.options file tries to open /run/named/named.resolvers. Certainly this is not done by default, and you probably want to fix that. Cheers, Sven
bind gets permission errors in buster--systemd-related?
I have a new buster system with a bind setup based on (much) older* systems, on which it worked fine. On buster, it doesn't. In two different places in my configuration I referred to files or directories that were outside of bind proper, and in both cases this failed with permission problems. I'm pretty sure bind is running under systemd, and have seen various references to systemd limiting access to the file system. However, I don't see anything that appears to be requesting such limits for bind9, or in general. /var is a different partition from /, and I configured bind to run as an ordinary user. Any ideas what's going on, or what I can do to fix it? // RB modified resolv.conf with custom /etc/resolvconf/update.d/bind9 to create this file. //include "/run/named/named.resolvers"; /* Error was May 11 12:46:27 barley named[15935]: loading configuration from '/etc/bind/named.conf' May 11 12:46:27 barley named[15935]: /etc/bind/named.conf.options:18: open: /run/named/named.resolvers: permission denied May 11 12:46:27 barley named[15935]: loading configuration: permission denied May 11 12:46:27 barley named[15935]: exiting (due to fatal error) The script clearly starts as the bind user, and when I su to bind I can cat the file. */ Second, I had a bunch of logging directives like logging { /* permission problems opening the log files. Not sure why. channel update_debug{ file "/var/log/bind/dnsupdate.log"; severity debug 3; print-category yes; print-severity yes; print-time yes; }; */ /var/log/bind is owned by bind. For now I just commented the problems out, but I'd like it to work. For one thing, my network configuration is not static. Thanks. Ross *Specifically bind9 (1:9.8.4.dfsg.P1-6+nmu2+deb7u20) wheezy-security
[SOLVED] Re: Bind: A caching local server caches but not for long
Sep 16, 2018, 6:40 PM by pas...@plouf.fr.eu.org: > old.reddit.com. 300 IN CNAME reddit.map.fastly.net. > reddit.map.fastly.net.30 IN A 151.101.121.140 > > These DNS records have short TTL, less than 8 minutes. > It is expected behaviour that a cached record is discarded when its TTL > expires. > Check if BIND has options to force a minimal TTL on cached records. > Bind, as I understand it, does not have a setting that allows to change the minimal TTL but unbound and dnsmasq (the later versions) do. Instead of installing unbound or dnsmasq I decided to simply update the /etc/hosts file with the IPs of the non-critical sites I use most often. Works well enough. Regards,
Re: Bind: A caching local server caches but not for long
On Mon, Sep 17, 2018 at 12:20:51AM +0200, local10 wrote: ;; ANSWER SECTION: old.reddit.com. 241 IN CNAME reddit.map.fastly.net. reddit.map.fastly.net. 8 IN A 151.101.21.140 this number ^^^ is the TTL/"time to live" in seconds. It is set by the server distributing the records and is typically set to something fairly long, like a day or more, for domains which don't change much. For domains which do change often, or if a move is planned in the near future, the TTL may be set low so that old records aren't cached long after they're obsolete. In this case the TTL in your cache shows 241 seconds left for old.reddit.com and 8 seconds remaining for reddit.map.fastly.net. By checking the authoritative nameservers directly it seems that the former is never more than 5 minutes while the latter is set to only 30 seconds. So your caching server should be updating fairly often. Mike Stone
Re: Bind: A caching local server caches but not for long
Le 17/09/2018 à 00:20, local10 a écrit : Hi, So I set up a local caching server with bind. It seems to work, kind of, the problem is that cached results do not stay in cache for long, if they placed in cache at all. For example, in the example below bind caches the result for "old.reddit.com" but 8 minutes later tries to look up "old.reddit.com" again when it's supposed to have the result in cache. old.reddit.com. 300 IN CNAME reddit.map.fastly.net. reddit.map.fastly.net. 30 IN A 151.101.121.140 These DNS records have short TTL, less than 8 minutes. It is expected behaviour that a cached record is discarded when its TTL expires. Check if BIND has options to force a minimal TTL on cached records.
Bind: A caching local server caches but not for long
Hi, So I set up a local caching server with bind. It seems to work, kind of, the problem is that cached results do not stay in cache for long, if they placed in cache at all. For example, in the example below bind caches the result for "old.reddit.com" but 8 minutes later tries to look up "old.reddit.com" again when it's supposed to have the result in cache. Any ideas? Thanks # aptitude show bind9 Package: bind9 ... Version: 1:9.8.4.dfsg.P1-6+nmu2+deb7u20 # cat lless named.conf.options ... options { directory "/var/cache/bind"; listen-on port 53 { our-nets; }; allow-query { our-nets; }; allow-query-cache { our-nets; }; recursion yes; allow-recursion { our-nets; }; auth-nxdomain no; # conform to RFC1035 blackhole { bogusnets; }; }; # rndc dumpdb --cache # cat /var/cache/bind/named_dump.db ; Dump complete # cat db.127 ; ; BIND reverse data file for local loopback interface ; $TTL 604800 @ IN SOA localhost. root.localhost. ( 2018091900 ; Serial 28800 ; Refresh 7200 ; Retry 604800 ; Expire 86400 ; Negative Cache TTL ); @ IN NS localhost. 1.0.0 IN PTR localhost. # dig old.reddit.com ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> old.reddit.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45712 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 4 ;; QUESTION SECTION: ;old.reddit.com. IN A ;; ANSWER SECTION: old.reddit.com. 241 IN CNAME reddit.map.fastly.net. reddit.map.fastly.net. 8 IN A 151.101.21.140 ;; AUTHORITY SECTION: fastly.net. 862 IN NS ns4.fastly.net. fastly.net. 862 IN NS ns1.fastly.net. fastly.net. 862 IN NS ns2.fastly.net. fastly.net. 862 IN NS ns3.fastly.net. ;; ADDITIONAL SECTION: ns1.fastly.net. 83935 IN A 23.235.32.32 ns2.fastly.net. 83935 IN A 104.156.80.32 ns3.fastly.net. 83935 IN A 23.235.36.32 ns4.fastly.net. 83935 IN A 104.156.84.32 ;; Query time: 3 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Sun Sep 16 17:41:59 2018 ;; MSG SIZE rcvd: 219 # dig old.reddit.com ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> old.reddit.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28790 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 4 ;; QUESTION SECTION: ;old.reddit.com. IN A ;; ANSWER SECTION: old.reddit.com. 234 IN CNAME reddit.map.fastly.net. reddit.map.fastly.net. 1 IN A 151.101.21.140 ;; AUTHORITY SECTION: fastly.net. 855 IN NS ns4.fastly.net. fastly.net. 855 IN NS ns3.fastly.net. fastly.net. 855 IN NS ns1.fastly.net. fastly.net. 855 IN NS ns2.fastly.net. ;; ADDITIONAL SECTION: ns1.fastly.net. 83928 IN A 23.235.32.32 ns2.fastly.net. 83928 IN A 104.156.80.32 ns3.fastly.net. 83928 IN A 23.235.36.32 ns4.fastly.net. 83928 IN A 104.156.84.32 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Sun Sep 16 17:42:06 2018 ;; MSG SIZE rcvd: 219 # dig old.reddit.com ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> old.reddit.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5572 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;old.reddit.com. IN A ;; Query time: 2537 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Sun Sep 16 17:50:08 2018 ;; MSG SIZE rcvd: 32
Bind bug
Hi there See; https://kb.isc.org/article/AA-01639/0 I don't think not using deny-answer-aliases is really an option. Regards, Rob
Re: BIND and iptables config
On Fri 16 Feb 2018 at 08:53:27 (-0500), Henning Follmann wrote: > On Fri, Feb 16, 2018 at 04:26:14AM +0100, Rodary Jacques wrote: > > Le jeudi 15 février 2018, 11:44:36 CET Henning Follmann a écrit : > > > On Thu, Feb 15, 2018 at 05:01:52PM +0100, Rodary Jacques wrote: > > > > With NetworkManager, /etc/network/interfaces has only the loopbak > > > > interface, and I can't use wicd which can't deal with two wired > > > > interfaces. And, Henning Follmann, my English is too poor to explain > > > > clearly my setup which is the standard one when your ISP gives you one > > > > routable address and you want your home LAN to have access to internet. > > > > Thanks for your interest anyway. > > > > Jacques > > > > > > > > > > Hello, > > > no your english was good enough to describe your setup. And I would say > > > that 90% of "us" have a form of "dialup" with on routable ip address and a > > > NAT setup. > > > First bind is not "standard" in this kind of situation and makes things > > > overly complicated. I would recommend dnsmasq instead. It is much more > > > staight forward for a NAT box to setup. It will also provide you with a > > > dhcp server. > > > And in your situation you also want to disable/avoid the NetworkManager. > > I told before that wiced can't deal with two wired interfaces. > > That is not true, but lets ignore this for now. I would be interested to know how you do this. I can't even see a way to make wicd make connections on two interfaces at the same time where one is wired and the other wireless. As soon as you select one interface, the other gets disconnected. Do you have some CLI magic that makes it keep the first connection going? Cheers, David.
Re: Re: BIND and iptables config
Because when I did , witen iI just installed Jessie in April 2016, my mailbox which is dedicated to debian-user was flooded with useless or even stupid posts. Sorry for my fellow countrymen. Salut. Jacques