Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
Op 23 jun. 2013 om 03:15 heeft Jeremy Chadwick j...@koitsu.org het volgende geschreven: On Sun, Jun 23, 2013 at 02:41:27AM +0200, Willem Jan Withagen wrote: On 19-6-2013 17:04, Jeremy Chadwick wrote: - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) The bug is very specifically available in 9.1-RELEASE because I got bit by it before the release of 9.1. But discussed it with avg@ and it did not make it into the release, but was submitted only like 2 weeks later. So in that case you can probably stop looking. For just about any 9.1-STABLE after that should the fix be in the code. I'm not sure why so many people (so far) seem to think that this problem is always the same issue -- it isn't. There are multiple things that have historically (and/or presently) have caused this issue. Here's the list I composed only a few days ago, and it is far from thorough: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073863.html Being in software for over 30 years I assume very little about: it's correctness. But I assume that it could be me, causing pilot-errors. So what I was trying to say: Several of the bugs in this range were fixed shortly after the 9.1-release, so the first step I'd like to suggest, is to get beyond this point in the release stream. And test again. My reasoning was more the other way around: unless you have gone to a release with at least these fixes, you cannot tell whether it is already fixed or not. Until then, a lot of the debugging could be not fully usefull. --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 19-6-2013 15:41, Steven Hartland wrote: - Original Message - From: Ronald Klop ronald-freeb...@klop.yi.org On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl adams-free...@ateamsystems.com wrote: On 6/19/2013 19:21, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). Hi, My home computer had the same symptom (not rebooting after 'all buffers flushed' message) a couple of months ago. But I follow 9-STABLE and the problem is gone for a while now. avg@ did a lot of work on the ZFS vfs locking which fixed at least one hang on reboot for ZFS. I don't believe this is in 9.1-RELEASE, so you should test a stable/9 or 8.4-RELEASE (which is newer than 9.1-RELEASE) kernel. I was one of the victims of this bug, a while back. Patched and ran a lot of diffs from avg@, even tried some stuff he wrote for -Current, but we got things working. And in the end it was integrated into -STABLE. So I'm running an unpatched -STABLE, and I did not have the problem since. Currently running: 9.1-STABLE FreeBSD 9.1-STABLE #172 r250288: Mon May 6 06:49:36 CEST 2013 And the system got rebooted only just this week for some maintenance. So either you take a look at the -Current code to see if more changes have been made, or perhaps ask avg@ for suggestions. --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 19-6-2013 17:04, Jeremy Chadwick wrote: - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) The bug is very specifically available in 9.1-RELEASE because I got bit by it before the release of 9.1. But discussed it with avg@ and it did not make it into the release, but was submitted only like 2 weeks later. So in that case you can probably stop looking. For just about any 9.1-STABLE after that should the fix be in the code. --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Sun, Jun 23, 2013 at 02:41:27AM +0200, Willem Jan Withagen wrote: On 19-6-2013 17:04, Jeremy Chadwick wrote: - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) The bug is very specifically available in 9.1-RELEASE because I got bit by it before the release of 9.1. But discussed it with avg@ and it did not make it into the release, but was submitted only like 2 weeks later. So in that case you can probably stop looking. For just about any 9.1-STABLE after that should the fix be in the code. I'm not sure why so many people (so far) seem to think that this problem is always the same issue -- it isn't. There are multiple things that have historically (and/or presently) have caused this issue. Here's the list I composed only a few days ago, and it is far from thorough: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073863.html My point is that the shutdown -r issue issue might manifest itself in the same fashion for everyone, but the **root cause** often differs. I.e. what fixed it for you may not fix it for Adam. We must wait and see (he's in the process of getting a system to try stable/9 on). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 19/06/2013 12:35, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg Hi, Just to add a 'me too'. I see this on two different boxes, both currently running recentish 9.1-STABLE, and it has definitely been an issue for me since at least 9.0-RELEASE. One of the boxes is a Dell R210 II with a single WD HDD - dmesg: http://daniel.thekeelecentre.com/dmesg.txt I've tried booting/rebooting without the USB KVM dongle attached too. Notes - does not run moused and no OpenLDAP. The second host I have the issue with is a home-build using a Tyan Toledo i3210W (S5211) and two Seagate HDDs - dmesg: http://daniel.thekeelecentre.com/dmesg-daffy.txt (yes, a disk has failed, but the reboot issue pre-dated this). Note - does not run moused, but did run slapd. I saw the same DB corruption as the OP. I can play with the latter box as it is no longer in use and will try the following suggestions from Jeremy later this evening: 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? 5. Does sysctl hw.acpi.handle_reboot=1 help you? 6. Does sysctl hw.acpi.disable_on_reboot=1 help you? Regards, Richard ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. Another interesting thing is that this particular server runs slapd (OpenLDAP) which, when it comes back up, has a corrupted DB (easily fixed with db_recover, but still). This might be because FS commits aren't happening at the end. I can even manually stop slapd (service slapd stop) then run sync(8) (I assume this does something for ZFS too) and it still comes back as hosed if I reboot shortly after. If I start/stop slapd it's fine. So I feel like there is an FS/dismount thing going on here. Additional information: I also have some boxes which will reboot (ie; they don't freeze like some do at the end) but they don't dismount cleanly either and have to rebuild both GMIRROR and fsck. This might be a different issue, too. Anyone have any thoughts? Let me know if I can provide more details etc. -- Adam Strohl http://www.ateamsystems.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. 1. You mention as well as VMs. Anything under a virtual machine or under a hypervisor is going to be very, very, **VERY** different than bare metal. So I hope the issues you're talking about above are on bare metal -- I will assume so. 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? 5. Does sysctl hw.acpi.handle_reboot=1 help you? 6. Does sysctl hw.acpi.disable_on_reboot=1 help you? 7. If none of the above helps, can you please boot verbose mode and then when the system locks up on shutdown -r now take a picture of the VGA console? 8. Does the machine run moused(8) (check the process list please, do not rely on rc.conf) ? Another interesting thing is that this particular server runs slapd (OpenLDAP) which, when it comes back up, has a corrupted DB (easily fixed with db_recover, but still). This might be because FS commits aren't happening at the end. I can even manually stop slapd (service slapd stop) then run sync(8) (I assume this does something for ZFS too) and it still comes back as hosed if I reboot shortly after. If I start/stop slapd it's fine. So I feel like there is an FS/dismount thing going on here. sync(8) does not do what you think it does. Please read (not skim) this entire thread starting here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html Your problem is related to unclean shutdown; fix that and your issues go away. Additional information: I also have some boxes which will reboot (ie; they don't freeze like some do at the end) but they don't dismount cleanly either and have to rebuild both GMIRROR and fsck. This might be a different issue, too. Every issue needs to be handled/treated separately. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
OS version? - Original Message - From: Adam Strohl adams-free...@ateamsystems.com To: freebsd-stable@freebsd.org Sent: Wednesday, June 19, 2013 12:35 PM Subject: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. Another interesting thing is that this particular server runs slapd (OpenLDAP) which, when it comes back up, has a corrupted DB (easily fixed with db_recover, but still). This might be because FS commits aren't happening at the end. I can even manually stop slapd (service slapd stop) then run sync(8) (I assume this does something for ZFS too) and it still comes back as hosed if I reboot shortly after. If I start/stop slapd it's fine. So I feel like there is an FS/dismount thing going on here. Additional information: I also have some boxes which will reboot (ie; they don't freeze like some do at the end) but they don't dismount cleanly either and have to rebuild both GMIRROR and fsck. This might be a different issue, too. Anyone have any thoughts? Let me know if I can provide more details etc. -- Adam Strohl http://www.ateamsystems.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 6/19/2013 19:21, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. 1. You mention as well as VMs. Anything under a virtual machine or under a hypervisor is going to be very, very, **VERY** different than bare metal. So I hope the issues you're talking about above are on bare metal -- I will assume so. Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor (and yes it worries me the implications of something so broad). Those unites I just haven't been able to isolate on a server which isn't critical. Lets focus on this server for now though per your suggestion below. 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). Sorry, this ZFS box is 9.1-R P4 (kernel built today): FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. Sure take a look at the full log here: http://pastebin.com/k55gVVuU This includes a boot, then a reboot as I describe (you can see it logs the All Buffers Synced, etc) then powering back on. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. The Starting background file system checks in 60 seconds message appeared ... that only happens when something is dirty, right? So the second try with just this I could ctrl alt del it and it responded .. kind of: http://i.imgur.com/POAIaNg.jpg Still had to reset it though. 5. Does sysctl hw.acpi.handle_reboot=1 help you? No change, still responded to a ctrl alt del like above, but like that still needs to be reset and comes back dirty. 6. Does sysctl hw.acpi.disable_on_reboot=1 help you? No change. Same as above, ctrl alt del responds but needs a hard reset still. 7. If none of the above helps, can you please boot verbose mode and then when the system locks up on shutdown -r now take a picture of the VGA console? Lots of debug on boot obviously but not much different on shutdown/hang: http://i.imgur.com/SgzSsoP.jpg 8. Does the machine run moused(8) (check the process list please, do not rely on rc.conf) ? ps -auxww | grep moused reveals nothing running (which is how I have things set). Another interesting thing is that this particular server runs slapd (OpenLDAP) which, when it comes back up, has a corrupted DB (easily fixed with db_recover, but still). This might be because FS commits aren't happening at the end. I can even manually stop slapd (service slapd stop) then run sync(8) (I assume this does something for ZFS too) and it still comes back as hosed if I reboot shortly after. If I start/stop slapd it's fine. So I feel like there is an FS/dismount thing going on here. sync(8) does not do what you think it does. Please read (not skim) this entire thread starting here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html Groking this now .. Your problem is related to unclean shutdown; fix that and your issues go away. Yeah that is my feeling as well. Additional information: I also have some boxes which will reboot (ie; they don't freeze like some do at the end) but they don't dismount cleanly either and have to rebuild both GMIRROR and fsck. This might be a different issue, too. Every issue needs to be handled/treated separately. Sure, I just had run across some threads about that but will focus on this ZFS box (and see if anything that fixes here does anything with that once I can reliably reproduce it out of production). -- Adam Strohl http://www.ateamsystems.com/ ___
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 6/19/2013 19:53, Adam Strohl wrote: sync(8) does not do what you think it does. Please read (not skim) this entire thread starting here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html Groking this now .. Epic. So basically mount -u -o ro FS is really what I (and probably everyone else) wants and the man page needs a major overhaul + disclaimer (and possibly a recommendation to use mount -u -o ro FS instead). -- Adam Strohl http://www.ateamsystems.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl adams-free...@ateamsystems.com wrote: On 6/19/2013 19:21, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). Hi, My home computer had the same symptom (not rebooting after 'all buffers flushed' message) a couple of months ago. But I follow 9-STABLE and the problem is gone for a while now. Ronald. No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. 1. You mention as well as VMs. Anything under a virtual machine or under a hypervisor is going to be very, very, **VERY** different than bare metal. So I hope the issues you're talking about above are on bare metal -- I will assume so. Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor (and yes it worries me the implications of something so broad). Those unites I just haven't been able to isolate on a server which isn't critical. Lets focus on this server for now though per your suggestion below. 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). Sorry, this ZFS box is 9.1-R P4 (kernel built today): FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. Sure take a look at the full log here: http://pastebin.com/k55gVVuU This includes a boot, then a reboot as I describe (you can see it logs the All Buffers Synced, etc) then powering back on. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. The Starting background file system checks in 60 seconds message appeared ... that only happens when something is dirty, right? So the second try with just this I could ctrl alt del it and it responded .. kind of: http://i.imgur.com/POAIaNg.jpg Still had to reset it though. 5. Does sysctl hw.acpi.handle_reboot=1 help you? No change, still responded to a ctrl alt del like above, but like that still needs to be reset and comes back dirty. 6. Does sysctl hw.acpi.disable_on_reboot=1 help you? No change. Same as above, ctrl alt del responds but needs a hard reset still. 7. If none of the above helps, can you please boot verbose mode and then when the system locks up on shutdown -r now take a picture of the VGA console? Lots of debug on boot obviously but not much different on shutdown/hang: http://i.imgur.com/SgzSsoP.jpg 8. Does the machine run moused(8) (check the process list please, do not rely on rc.conf) ? ps -auxww | grep moused reveals nothing running (which is how I have things set). Another interesting thing is that this particular server runs slapd (OpenLDAP) which, when it comes back up, has a corrupted DB (easily fixed with db_recover, but still). This might be because FS commits aren't happening at the end. I can even manually stop slapd (service slapd stop) then run sync(8) (I assume this does something for ZFS too) and it still comes back as hosed if I reboot shortly after. If I start/stop slapd it's fine. So I feel like there is an FS/dismount thing going on here. sync(8) does not do what you think it does. Please read (not skim) this entire thread starting here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html Groking this now .. Your problem is related to unclean shutdown; fix that and your issues go away. Yeah that is my feeling as well. Additional information: I also have some boxes which will reboot (ie; they don't freeze like some do at the end) but they don't dismount cleanly either and have to rebuild both GMIRROR and fsck. This might be a different issue, too. Every issue needs to be handled/treated separately. Sure, I
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 07:53:19PM +0700, Adam Strohl wrote: On 6/19/2013 19:21, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). No matter what I do: reboot shutdown -p shutdown -r This specific server will stop at All buffers synced and not actually power down or reboot. KB input seems to be ignored. This server is a ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show this are using GMIRRORs for root/swap/boot (no ZFS). Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg When I reset the server it appears that disks were not dismounted cleanly ... on this ZFS box it comes back quick because ZFS is good like that but on the other servers with GMIRROR roots rebuilding the GMIRROR and fscking at the same time is murder on the disk/performance until it finishes. 1. You mention as well as VMs. Anything under a virtual machine or under a hypervisor is going to be very, very, **VERY** different than bare metal. So I hope the issues you're talking about above are on bare metal -- I will assume so. Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor (and yes it worries me the implications of something so broad). Those unites I just haven't been able to isolate on a server which isn't critical. Lets focus on this server for now though per your suggestion below. I'm sorry but I don't understand your first sentence -- the first part of your sentence says nope (I have to assume in reply to my on bare metal part), but then says I see basically the same thing sometimes under ESXi which implies an alternate environment in comparison (i.e. we *are* talking about bare metal). Consider me confused. :-) 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). Sorry, this ZFS box is 9.1-R P4 (kernel built today): FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 I suggest trying stable/9 (and staying with it, for that matter). 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. Sure take a look at the full log here: http://pastebin.com/k55gVVuU This includes a boot, then a reboot as I describe (you can see it logs the All Buffers Synced, etc) then powering back on. Thanks. I was mainly interested in the storage controller being used (in this case ahci(4)) and the disks being used (notorious ST3000DM001, known for excessively parking heads). AFAIK this isn't one of the controllers that was known for weird quirky issues pertaining to flushing data to disk on shutdown. I have to ask: is this FreeBSD box running under a HV? If it *is not* running under a HV, could we please get exact motherboard model and version (including BIOS version)? Sometimes (not always) you can get this from kenv | grep smbios. I can also see you're running your own kernel. We'll get to that in a moment. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. I'm not surprised. Pleas re-try with stable/9; Hans has been constantly working on the USB stack and fixing major bugs. The Starting background file system checks in 60 seconds message appeared ... that only happens when something is dirty, right? No it does not. That message is always printed when you use background fsck, which is the default. I do not advocate using background fsck, because it has been known (and may still do this -- I do not care to find out, I do not have time for unreliable filesystem nonsense) to not always fix all filesystem problems. Meaning: people using background fsck have been known to boot into single-user and issue fsck manually and find issues. Place background_fsck=no in /etc/rc.conf. If the machine does not have a clean filesystem on boot-up, you'll know because the system will immediately begin fsck (in the foreground actively). You'll recognise that output if it happens, trust me. So the second try with just this I could ctrl alt del it and it responded .. kind of: http://i.imgur.com/POAIaNg.jpg Still had to reset it though. This looks like a chicken-and-egg problem -- you're probably fighting with background fsck, as the message there indicate some processes would not die. I'm just taking a guess though. I am now going to ask you for more information: 1. gpart show -p xxx where xxx is each disk you have in the system 2.
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
- Original Message - From: Ronald Klop ronald-freeb...@klop.yi.org On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl adams-free...@ateamsystems.com wrote: On 6/19/2013 19:21, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: Hello -STABLE@, So I've seen this situation seemingly randomly on a number of both physical 9.1 boxes as well as VMs for I would say 6-9 months at least. I finally have a physical box here that reproduces it consistently that I can reboot easily (ie; not a production/client server). Hi, My home computer had the same symptom (not rebooting after 'all buffers flushed' message) a couple of months ago. But I follow 9-STABLE and the problem is gone for a while now. avg@ did a lot of work on the ZFS vfs locking which fixed at least one hang on reboot for ZFS. I don't believe this is in 9.1-RELEASE, so you should test a stable/9 or 8.4-RELEASE (which is newer than 9.1-RELEASE) kernel. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 6/19/2013 20:35, Jeremy Chadwick wrote: Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor (and yes it worries me the implications of something so broad). Those unites I just haven't been able to isolate on a server which isn't critical. Lets focus on this server for now though per your suggestion below. I'm sorry but I don't understand your first sentence -- the first part of your sentence says nope (I have to assume in reply to my on bare metal part), but then says I see basically the same thing sometimes under ESXi which implies an alternate environment in comparison (i.e. we *are* talking about bare metal). Consider me confused. :-) Basically: The issue is extremely similar if not the same root cause, be it a native or virtual server. This server though is native. 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). Sorry, this ZFS box is 9.1-R P4 (kernel built today): FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 I suggest trying stable/9 (and staying with it, for that matter). The issue is no binary updates and we have a large deploy base, so we've stuck with -R and use it internally because it's what we deploy. 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. Sure take a look at the full log here: http://pastebin.com/k55gVVuU This includes a boot, then a reboot as I describe (you can see it logs the All Buffers Synced, etc) then powering back on. Thanks. I was mainly interested in the storage controller being used (in this case ahci(4)) and the disks being used (notorious ST3000DM001, known for excessively parking heads). Yeah, was not my first choice but then again ... RAIDZ-2 :) HD supply chain here (Thailand) is weird considering how many are made here (and can't buy). Smartd screams about them possibly needing a firmware update (they don't according to Seagate). Had no issues aside from a failure a month or so again (it's an HD ... it happens). AFAIK this isn't one of the controllers that was known for weird quirky issues pertaining to flushing data to disk on shutdown. I have to ask: is this FreeBSD box running under a HV? No, native/direct for sure on this one. If it *is not* running under a HV, could we please get exact motherboard model and version (including BIOS version)? Sometimes (not always) you can get this from kenv | grep smbios. No problem I built this one personally: Asus P8B-X BIOS revision 6103 I can also see you're running your own kernel. We'll get to that in a moment. It's GENERIC with the following added to the end: # -- Add Support for nicer console # options VESA options SC_PIXEL_MODE # -- PF Support # device pf device pflog device pfsync # -- Core temperature reporting # device coretemp # For Intel CPUs device smbios 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. I'm not surprised. Pleas re-try with stable/9; Hans has been constantly working on the USB stack and fixing major bugs. Got it but probably not going to go this route as it means no more binary upgrades. While I can reboot it, it is the office NAS here and so 'testing out' -STABLE I think probably isn't going to happen. The Starting background file system checks in 60 seconds message appeared ... that only happens when something is dirty, right? No it does not. That message is always printed when you use background fsck, which is the default. Got it. I do not advocate using background fsck, because it has been known (and may still do this -- I do not care to find out, I do not have time for unreliable filesystem nonsense) to not always fix all filesystem problems. Meaning: people using background fsck have been known to boot into single-user and issue fsck manually and find issues. Place background_fsck=no in /etc/rc.conf. If the machine does not have a clean filesystem on boot-up, you'll know because the system will immediately begin fsck (in the foreground actively). You'll recognise that output if it happens, trust me. Preaching to the choir, we set this on all servers this one somehow did not have it set (I think due to ZFS making it unique and not copying our rc.conf template over properly). So the second try with just this I could ctrl alt del it and it responded .. kind of: http://i.imgur.com/POAIaNg.jpg Still had to reset it though. This looks like a chicken-and-egg problem -- you're probably fighting with background fsck, as the message there indicate some processes would not die. I'm just taking a guess though. Yeah. Even with no background fsck though I still
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
- Original Message - From: Adam Strohl adams-free...@ateamsystems.com To: Jeremy Chadwick j...@koitsu.org Cc: freebsd-stable@freebsd.org Sent: Wednesday, June 19, 2013 3:15 PM Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount On 6/19/2013 20:35, Jeremy Chadwick wrote: Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor (and yes it worries me the implications of something so broad). Those unites I just haven't been able to isolate on a server which isn't critical. Lets focus on this server for now though per your suggestion below. I'm sorry but I don't understand your first sentence -- the first part of your sentence says nope (I have to assume in reply to my on bare metal part), but then says I see basically the same thing sometimes under ESXi which implies an alternate environment in comparison (i.e. we *are* talking about bare metal). Consider me confused. :-) Basically: The issue is extremely similar if not the same root cause, be it a native or virtual server. This server though is native. 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). Sorry, this ZFS box is 9.1-R P4 (kernel built today): FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 I suggest trying stable/9 (and staying with it, for that matter). The issue is no binary updates and we have a large deploy base, so we've stuck with -R and use it internally because it's what we deploy. You still need to test if stable/9 fixes your issue though as otherwise you don't know if the issue your seeing has already been fixed, and if its the old know ZFS vfs hang on shutdown, it has. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 6/19/2013 21:21, Steven Hartland wrote: You still need to test if stable/9 fixes your issue though as otherwise you don't know if the issue your seeing has already been fixed, and if its the old know ZFS vfs hang on shutdown, it has. Thanks Steve, understood but probably not going to happen with this box. I can reboot this thing but it's our NAS and not a test bed. This problem on this machine isn't a big deal because its a server and not rebooted often (and easy to bring back). But I more was hoping it would let me easily test solutions to the issue since the other servers showing the issue are in client production with the mind that the VMs not use ZFS also show a similar/identical issue My gut says it appeared in/with 9.1 (We never saw this with 9.0 servers). It is also possible this is a different issue from those other servers and VMs. How far away is 9.2? ;-P Depending on how things go with Jeremy I'll probably have to wait this out unless I can get a test machine or VM where I can reproduce the issue AND upgrade it to -STABLE (again assuming it's even the same issue). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
- Original Message - From: Adam Strohl adams-free...@ateamsystems.com To: Steven Hartland kill...@multiplay.co.uk Cc: Jeremy Chadwick j...@koitsu.org; freebsd-stable@freebsd.org Sent: Wednesday, June 19, 2013 3:29 PM Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount On 6/19/2013 21:21, Steven Hartland wrote: You still need to test if stable/9 fixes your issue though as otherwise you don't know if the issue your seeing has already been fixed, and if its the old know ZFS vfs hang on shutdown, it has. Thanks Steve, understood but probably not going to happen with this box. I can reboot this thing but it's our NAS and not a test bed. This problem on this machine isn't a big deal because its a server and not rebooted often (and easy to bring back). But I more was hoping it would let me easily test solutions to the issue since the other servers showing the issue are in client production with the mind that the VMs not use ZFS also show a similar/identical issue My gut says it appeared in/with 9.1 (We never saw this with 9.0 servers). It is also possible this is a different issue from those other servers and VMs. How far away is 9.2? ;-P Depending on how things go with Jeremy I'll probably have to wait this out unless I can get a test machine or VM where I can reproduce the issue AND upgrade it to -STABLE (again assuming it's even the same issue). Don't rule out there being more than one issue at play. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote: On 6/19/2013 20:35, Jeremy Chadwick wrote: I've snipped out portions which aren't relevant at this point in the convo. I'm trying to be terse as much as possible here (honest). To recap for readers/mailing list: - Adam seems the same behaviour on systems on bare metal, as well as FreeBSD guests running under VMware ESXi 5.0 hypervisor. However, as I stated on the list just yesterday about lock-ups on shutdown, every situation may be different and there is a well-established history of this problem on FreeBSD where each root cause (bugs) were completely different from one another. - The system we're discussing at this point in the thread is on bare metal -- specifically an Asus P8B-X motherboard, with BIOS version 6103, driven entirely by on-board Intel AHCI (not BIOS-level RAID). - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) Thanks. I was mainly interested in the storage controller being used (in this case ahci(4)) and the disks being used (notorious ST3000DM001, known for excessively parking heads). Yeah, was not my first choice but then again ... RAIDZ-2 :) HD supply chain here (Thailand) is weird considering how many are made here (and can't buy). Smartd screams about them possibly needing a firmware update (they don't according to Seagate). Had no issues aside from a failure a month or so again (it's an HD ... it happens). Absolutely understood -- and FYI, in case you need backup, your thought process/conclusion here is spot on (re: it's a MHDD, failures happen). Irrelevant to your shutdown problem: as for smartmontools bitching about the firmware: no vendors disclose what actual changes go into their drive firmware updates (vendors if you are reading this: I will have your souls...), so I have to read a bunch of end-user forums where nobody knows what they're talking about, and then of course find this highly educational *cough* article from Adaptec: http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives The problem here is that there have been *so many* firmware bugs with Seagate's drives in the past 2 years or so that it's impossible for me to know which fixes what. You buy what you buy because that's what you buy, and that's cool -- but I avoid their stuff like the plague. unrelated Readers: if any of you have a ST[123]000DM001 drive running the CC24 firmware, and can confirm high head parking counts (SMART attribute 193), and are willing to upgrade your drive firmware to the latest then see if the LCC increments stop (or at least settle down to normal levels), I'd love to hear from you. I have been socially boycotting these models of drives because of that idiotic firmware design choice for quite some time now (not to mention the parking on those drives is audibly loud in a normal living room), and if the F/W actually inhibits the excessive parking then I have some drives to consider upgrading. :-) /unrelated I can also see you're running your own kernel. We'll get to that in a moment. It's GENERIC with the following added to the end: # -- Add Support for nicer console # options VESA options SC_PIXEL_MODE Can you try removing VESA and SC_PIXEL_MODE please? I know that sounds crazy (what on earth would that have to do with it?), but please try it. I can explain the justification if need be -- I'm being extra paranoid of something that got discovered here on -stable only a few days ago. It's a stretch, but I can see potential relevance. I can provide details/links later. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. I'm not surprised. Pleas re-try with stable/9; Hans has been constantly working on the USB stack and fixing major bugs. Got it but probably not going to go this route as it means no more binary upgrades. While I can reboot it, it is the office NAS here and so 'testing out' -STABLE I think probably isn't going to happen. I understand. I have a question relating to this below. Place background_fsck=no in /etc/rc.conf. If the machine does not have a clean filesystem on boot-up, you'll know because the system will immediately begin fsck (in the foreground actively). You'll recognise that output if it happens, trust me. Preaching to the choir, we set this on all servers this one somehow did not have it set (I think due to ZFS making it unique and not copying our rc.conf template over properly). Where should I send my bill for services rendered? (Totally kidding -- just had some breakfast so feeling chipper :-) ) So the second try with just this I could ctrl alt
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: unrelated Readers: if any of you have a ST[123]000DM001 drive running the CC24 firmware, and can confirm high head parking counts (SMART attribute 193), and are willing to upgrade your drive firmware to the latest then see if the LCC increments stop (or at least settle down to normal levels), I'd love to hear from you. I have been socially boycotting these models of drives because of that idiotic firmware design choice for quite some time now (not to mention the parking on those drives is audibly loud in a normal living room), and if the F/W actually inhibits the excessive parking then I have some drives to consider upgrading. :-) /unrelated I dunno about firmware, but you can smack 'em with a big hammer... /etc/rc.local: for i in 0 1; do /sbin/camcontrol cmd ada${i} -a EF 85 00 00 00 00 00 00 00 00 00 00 done x-ref: http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html LCC was somewhere in the upper 400's (I wanna say 480-some?) a year and change ago when I dropped that in. It's 506/493 now on the two drives. -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 10:53:46AM -0500, Matthew D. Fuller wrote: On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: unrelated Readers: if any of you have a ST[123]000DM001 drive running the CC24 firmware, and can confirm high head parking counts (SMART attribute 193), and are willing to upgrade your drive firmware to the latest then see if the LCC increments stop (or at least settle down to normal levels), I'd love to hear from you. I have been socially boycotting these models of drives because of that idiotic firmware design choice for quite some time now (not to mention the parking on those drives is audibly loud in a normal living room), and if the F/W actually inhibits the excessive parking then I have some drives to consider upgrading. :-) /unrelated I dunno about firmware, but you can smack 'em with a big hammer... /etc/rc.local: for i in 0 1; do /sbin/camcontrol cmd ada${i} -a EF 85 00 00 00 00 00 00 00 00 00 00 done x-ref: http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html LCC was somewhere in the upper 400's (I wanna say 480-some?) a year and change ago when I dropped that in. It's 506/493 now on the two drives. The above CDB + subcommand disables APM entirely. There is a lot more to APM than just parking heads (and in all honesty, APM should have nothing to do with parking heads). Disabling APM can actually have drastic effects on drive temperature (meaning there are certain chip and/or motor operations that said feature controls *in addition* to head parking), and other firmware-level features that aren't documented. Furthermore, that CDB does not work for all drives. There are Seagate drives -- I know because I bought some and returned them when the APM trick did not work -- that lack the LCC-disable tie-in to APM. The drive either rejected the CDB (ATA status code error returned), while others accepted it but nothing in 0xec (IDENTIFY) reported as got changed. The only model of drive I know that reliably works with this method is the WD Green/-GP drive, and the drive temperatures do increase. No idea on the Blues. (Another reason I recommend the Reds...) What *should* have happened is that a new 0xef subcommand should have been created for this. Subs range from 0x00-0xff. T13 spec shows that a huge number of them (I'd say 30% or more) are marked Reserved and an additional 30% or so are marked Obsolete. And finally, 0x56-0x5c, 0xd6-0xdc and 0xe0 are Vendor Specific. But looking at this from a more general view, the real issue is that these types of features should not have been introduced to begin with. The vendors introduced this problem, and now are marketing drives with said feature disabled, claiming we fixed the problem that annoys so many of you! -- the same problem **they introduced without asking anyone**. I will have -- and eat -- their souls. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: The above CDB + subcommand disables APM entirely. There is a lot more to APM than just parking heads (and in all honesty, APM should have nothing to do with parking heads). Disabling APM can actually have drastic effects on drive temperature (meaning there are certain chip and/or motor operations that said feature controls *in addition* to head parking), and other firmware-level features that aren't documented. True enough, in concept. With all the drives sitting behind ventilation perfectly capable of dealing with 15kRPM drives, I don't worry about what that might do to the 7200's though... Furthermore, that CDB does not work for all drives. There are Seagate drives -- I know because I bought some and returned them when the APM trick did not work -- that lack the LCC-disable tie-in to APM. The drive either rejected the CDB (ATA status code error returned), while others accepted it but nothing in 0xec (IDENTIFY) reported as got changed. Well, I haven't seen it with these. Several of ada0: ST1000DM003-9YN162 CC4D ATA-8 SATA 3.x device and some systems with CC4C too. I will have -- and eat -- their souls. The problem with that is that the undigestible bits of soul just get passed right back into the ecosystem, and in a more concentrated form. Some might suggest that's already happened, and is got us here in the first place 8-} -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 11:34:39AM -0500, Matthew D. Fuller wrote: On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: The above CDB + subcommand disables APM entirely. There is a lot more to APM than just parking heads (and in all honesty, APM should have nothing to do with parking heads). Disabling APM can actually have drastic effects on drive temperature (meaning there are certain chip and/or motor operations that said feature controls *in addition* to head parking), and other firmware-level features that aren't documented. True enough, in concept. With all the drives sitting behind ventilation perfectly capable of dealing with 15kRPM drives, I don't worry about what that might do to the 7200's though... Justified in your environment, but not in mine -- where most of my systems (at home) are extremely quiet (1000-1200rpm fans, lots of noise dampening material, etc.). A 10C increase *during idle* is enough to make me wary. I also have extremely sensitive hearing, so drives clicking is something I can hear from quite a distance -- I guess working with them for so long over the years has made me sensitive to 'em. Furthermore, that CDB does not work for all drives. There are Seagate drives -- I know because I bought some and returned them when the APM trick did not work -- that lack the LCC-disable tie-in to APM. The drive either rejected the CDB (ATA status code error returned), while others accepted it but nothing in 0xec (IDENTIFY) reported as got changed. Well, I haven't seen it with these. Several of ada0: ST1000DM003-9YN162 CC4D ATA-8 SATA 3.x device and some systems with CC4C too. The drives I was testing were STx000DM001. I don't remember if I had a DM002. I also don't remember the firmware version they had on them, but I do remember there were no updates available from Seagate at that time. On the other hand, their forum was *filled* with post after post about the issue, including one fellow whose drive in something like 3 months was almost reaching MTBF head park/reload count. But my point is this: 3.5 drives do not need this feature in 95% of environments. In desktop systems it's worthless -- in consumer desktops it accomplishes nothing but noise and annoyance and impacts I/O, and in business desktop desktop environments it serves no purpose because most places have their desktops go into sleep mode (so drive standby/sleep gets used). And in the server environment it's pure 100% worthless. With 2.5 drives I can see it being more useful, but only if the drive is used in a laptop. There are NASes (and now servers too!) which use 2.5 drives, and I sure as hell wouldn't want that happening there. So really it's just a bad feature all around that should be specific to one environment demographic; the vendors should have made a 2.5 drive dedicated for laptops that had this feature enabled, while disabld on all other drives (2.5 and 3.5). What we got was nearly opposite. I will have -- and eat -- their souls. The problem with that is that the undigestible bits of soul just get passed right back into the ecosystem, and in a more concentrated form. Some might suggest that's already happened, and is got us here in the first place 8-} If you had what I do (moderate-to-severe IBS), you'd know that it definitely doesn't get passed back in a more concentrated form. First joke I've been able to make about my health condition, yeah! Ha! I kill me! -- Alf -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 09:52:00AM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: Justified in your environment, but not in mine -- where most of my systems (at home) are extremely quiet (1000-1200rpm fans, lots of noise dampening material, etc.). A 10C increase *during idle* is enough to make me wary. Mmm. Well, some of them are in 1U cases, and so behind very loud little fans (but that's in a datacenter where *I* don't have to hear it). But the ones sitting beside me are behind 1kRPM fans (80 and 120 mm), and are around 28-30c (which is a tad high; the filters are overdue for cleaning). And ambient is probably 24-25. I'd be seriously creeped out if an *active* drive were 10 over ambient, much less if flipping some config setting moved anything 10. (this is also why I _hate_ laptops...) On the other hand, their forum was *filled* with post after post about the issue, including one fellow whose drive in something like 3 months was almost reaching MTBF head park/reload count. Oh, sure. If you don't get the stupid things to stop, you can measure their life with an egg timer. The 400-some these drives got before I turned APM off happened in, like, an afternoon. If you had what I do (moderate-to-severe IBS), you'd know that it definitely doesn't get passed back in a more concentrated form. First joke I've been able to make about my health condition, yeah! Well, if your diet consists of hard drive manufacturer's souls, it's no wonder your system got all screwed up! You gotta find something to eat with more moral fiber!;p -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On 6/19/2013 22:04, Jeremy Chadwick wrote: On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote: On 6/19/2013 20:35, Jeremy Chadwick wrote: I've snipped out portions which aren't relevant at this point in the convo. I'm trying to be terse as much as possible here (honest). To recap for readers/mailing list: - Adam seems the same behaviour on systems on bare metal, as well as FreeBSD guests running under VMware ESXi 5.0 hypervisor. However, as I stated on the list just yesterday about lock-ups on shutdown, every situation may be different and there is a well-established history of this problem on FreeBSD where each root cause (bugs) were completely different from one another. - The system we're discussing at this point in the thread is on bare metal -- specifically an Asus P8B-X motherboard, with BIOS version 6103, driven entirely by on-board Intel AHCI (not BIOS-level RAID). - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) This is all correct. Thanks. I was mainly interested in the storage controller being used (in this case ahci(4)) and the disks being used (notorious ST3000DM001, known for excessively parking heads). Yeah, was not my first choice but then again ... RAIDZ-2 :) HD supply chain here (Thailand) is weird considering how many are made here (and can't buy). Smartd screams about them possibly needing a firmware update (they don't according to Seagate). Had no issues aside from a failure a month or so again (it's an HD ... it happens). Absolutely understood -- and FYI, in case you need backup, your thought process/conclusion here is spot on (re: it's a MHDD, failures happen). Indeed :-D Irrelevant to your shutdown problem: as for smartmontools bitching about the firmware: no vendors disclose what actual changes go into their drive firmware updates (vendors if you are reading this: I will have your souls...), so I have to read a bunch of end-user forums where nobody knows what they're talking about, and then of course find this highly educational *cough* article from Adaptec: http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives Yeah I agree .. I tried to firmware upgrade them when I was building the system but it said they didn't qualify when using the boot ISO. I just checked the site and it says no firmware update available too when using their search by serial # tool. At this point I'm leery about updating given that I've got data on it anyway. I do occasionally (maybe once a week or two and they're in the same room as me/my office) hear one parking. I see nothing wrong in smart though, no dmesg errors and have noticed no issues with the array and it bench tests at around 850 MB/sec. Too bad 10 Gbit equipment isn't cheaper. Also when I bought the 6 for this array I got a 7th as a cold spare :P The problem here is that there have been *so many* firmware bugs with Seagate's drives in the past 2 years or so that it's impossible for me to know which fixes what. You buy what you buy because that's what you buy, and that's cool -- but I avoid their stuff like the plague. Yeah. I'd prefer WD myself but this place is swimming in green and now red drives. uhgl. Snipping out the unrelated parts ... Can you try removing VESA and SC_PIXEL_MODE please? I know that sounds crazy (what on earth would that have to do with it?), but please try it. I can explain the justification if need be -- I'm being extra paranoid of something that got discovered here on -stable only a few days ago. It's a stretch, but I can see potential relevance. I can provide details/links later. No change unfortunately. 4. Does sysctl hw.usb.no_shutdown_wait=1 help you? Weirdly this allowed it to reboot on the first try (without needing to be reset), but not the second. I'm not surprised. Pleas re-try with stable/9; Hans has been constantly working on the USB stack and fixing major bugs. Got it but probably not going to go this route as it means no more binary upgrades. While I can reboot it, it is the office NAS here and so 'testing out' -STABLE I think probably isn't going to happen. I understand. I have a question relating to this below. Place background_fsck=no in /etc/rc.conf. If the machine does not have a clean filesystem on boot-up, you'll know because the system will immediately begin fsck (in the foreground actively). You'll recognise that output if it happens, trust me. Preaching to the choir, we set this on all servers this one somehow did not have it set (I think due to ZFS making it unique and not copying our rc.conf template over properly). Where should I send my bill for services rendered? (Totally kidding -- just had some breakfast