On Mon, Aug 12, 2013 at 02:44:35PM -0600, Bob Proulx wrote: > I don't know anything about why you are having system crashes. But no > one else responded and so I decided to jump in.
Thank you for doing so. I actually went ahead and opened bugs against openswan and bind9 after getting no responses here in almost 24 hours. I was somewhat reluctant to do that, but if nobody here seems to have ideas on how to farther troubleshoot this, I figured the people who build those packages and are probably more familiar with how said package works than general users, would have ideas on how to proceed. > > I run a handful of VMs full time and let me assure you that they are > stable and reliable and don't crash. Your crashes are not > intrinsically part of the Linux kernel, Debian, or anything else. > They are something unique to your environment. And they should not be > happening. Yes, but I figured that if at least one of these programs works fine on fedora 16 in the same type of environment, then there must be something with how wheezy interacts with that environment which is causing this. So, while it is fair to say the problem is unique to my environment, I also think it's fair, and more precise to say that it's something having to do with how wheezy specifically interacts with that environment. > > Very bizarre! I can't guess as to any reason why. But I can't > believe the problem is related to bind code itself. It is simply a > user space program the same as any other. The problem is in the > kernel. Yes, that occurred to me as well. However, given that only two packages are doing this so far out of a bunch of them, I thought it would be better and more obvious to focus on those first, until I can actually trace the problem to the kernel itself. > I would contact your VPS provider support. If you are paying for the > service and it isn't working then you should get help to get it going. Yes, I plan to do that, once I've verified as much as possible the problem isn't exclusively on my end of things. Perhaps I've reached that point already. > Since all of your crashes appear to be network related I imagine the > problem is in the kernel network driver stack somewhere. I've thought of that as well, especially since research indicates that the virtio_net module has had problems in the past. In fact, the most recent batch of these seems to have been fixed earlier this month in linux 3.4.56 (more on that below). On the other hand, if it's something in the network stack, why am I for example able to query my VPS provider's servers for the same domains without crashes? If it's in the network stack, then I think it's reasonable to conclude I'd be seeing crashes regardless of what name servers I queried for those domains. Right? > To me it "feels" like an interaction between your very new Linux > kernel version 3.9 and your quite old qemu version 0.9.1. I would try > the *oldest* stock Debian kernel you can find that still supports your > libc and other libs and see if that fixes things. (At some point your > old kernel won't support the newer userland. I don't know where the > compatibility lines are drawn though.) I actually did do something along the same lines. I tried linux 3.10 from unstable, and then my own build of linux 3.10.5. Same results as with 3.9 from wheezy-backports. I then tried my own builds of 3.4.56, 3.0.89, and 2.6.32 from squeeze. My builds were done using the sources from kernel.org. I was really hoping that 3.4.56 would be the magic fix, because of the virtio_net fixes I mentioned above that went into it. Everything from 3.4.56 down behaved the same way as 3.2.0 in wheezy (I.E. crashes during boot when starting bind9, and crashes on resolving the domains that make it crash). The exception was 2.6.32 from squeeze which crashed the machine when I attempted to query my local bind for even the domains that work on higher kernels. So, I didn't go lower than that. There is one thing that's been bothering me on and off through all this, which I forgot to mention in my original post. The fedora machine with the same VPS provider. I noticed there is no virtio_ring.ko module, it simply doesn't exist on that machine. All the kernels I tried have virtio_ring built as module, and I couldn't find a .config option to disable it anywhere when I was doing my build of 3.10.5. I did a bit of research, but couldn't find a clear answer on what exactly virtio_ring does. I keep wondering on and off what would happen if I could find a way to black list it in the initrd image. Would all this suddenly go away, or would I end up with an unbootable system, because virtio_blk couldn't load with virtio_ring black listed. I would prefer not to risk the second alternative, so it would be best if I can simply find a debian kernel, or build my own without virtio_ring altogether. > > I would get your VPS support involved. If there are no other ideas > then I would have them move you to a host with a newer qemu 1.x > installed. The VPS provider should be able to do this relatively > easily. Hopefully that will work better with the newer Linux kernel. Yeah, it looks like I did as much as I can to troubleshoot things on my end. I'll contact them I guess. > > Good luck! > Bob Thanks again for your reply and suggestions! Two or more heads are better than one, on the debian angle so far anyway. Greg -- web site: http://www.gregn..net gpg public key: http://www.gregn..net/pubkey.asc skype: gregn1 (authorization required, add me to your contacts list first) -- Free domains: http://www.eu.org/ or mail dns-mana...@eu.org -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130812234219.ga6...@gregn.net