Re: bad sector in gmirror HDD
On 2011-Aug-19 20:24:38 -0700, Jeremy Chadwick wrote: >The reallocated LBA cannot be dealt with aside from re-creating the >filesystem and telling it not to use the LBA. I see no flags in >newfs(8) that indicate a way to specify LBAs to avoid. And we don't >know what LBA it is so we can't refer to it right now anyway. > >As I said previously, I have no idea how UFS/FFS deals with this. It doesn't. UFS/FFS and ZFS expect and assume "perfect" media. It's up to the drive to transparently remap faulty sectors. UFS used to have support for visible bad sectors (and Solaris UFS still reserves space for this, though I don't know if it still works) but the code was removed from FreeBSD long ago. AFAIR, wd(4) supported bad sectors but it was removed long ago. -- Peter Jeremy pgpzqxeB9mDZP.pgp Description: PGP signature
Re: Unknown Re0 Hardware version
On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote: > Hi, > > I'm assembling a few system with a ASUS P8 H161-MLE motherboard > which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN > Controller(s)' onboard. > > And to be honestly I never expected that version not to be supported. > Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted > it to config the ehternet. > > Rebooted, and re0 kicks in. But gives a HW revision not supported. > It claims HW revision 0x2c80. > > Is this supported in later 8.2-Stable??? Or in 9.x?? > > I'm willing to tinker with the code to recompile the re0 driver. > Your controller looks like RTL8168E VL and support for the controller was added after 8.2-RELEASE. Either update your source to stable/8 or patch your source tree with back-ported re(4) driver for 8.2-RELEASE like the following. 1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and copy it to /usr/src/sys/dev/re directory. 2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and copy it /usr/src/sys/pci directory. And rebuild your kernel and your controller should be recognized in next boot. > --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unknown Re0 Hardware version
On 2011-08-22 1:01, YongHyeon PYUN wrote: On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote: Hi, I'm assembling a few system with a ASUS P8 H161-MLE motherboard which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN Controller(s)' onboard. And to be honestly I never expected that version not to be supported. Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted it to config the ehternet. Rebooted, and re0 kicks in. But gives a HW revision not supported. It claims HW revision 0x2c80. Is this supported in later 8.2-Stable??? Or in 9.x?? I'm willing to tinker with the code to recompile the re0 driver. Your controller looks like RTL8168E VL and support for the controller was added after 8.2-RELEASE. Either update your source to stable/8 or patch your source tree with back-ported re(4) driver for 8.2-RELEASE like the following. 1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and copy it to /usr/src/sys/dev/re directory. 2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and copy it /usr/src/sys/pci directory. And rebuild your kernel and your controller should be recognized in next boot. Hi YongHyeon PYUN, Oke, that would mean I temporarily have to insert another ether card to get things onboard. Or use the sneaker network. :) I did check the 9.x stuff, but there the revision number was not in /usr/src/sys/pci/if_rlreg.h And you are right, they are in 8.2-STABLE. Thanx for the files and pointers --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Serial multiport error Oxford/Startech PEX2S952
On Sun, Aug 21, 2011 at 09:44:41PM +0100, David Wood wrote: > I wrote and contributed the support code for the OXPCIe95x serial chips > - and just happened to notice your report. Thanks for the response. > In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk > writes > >I'm having a problem with a StarTech PEX2S952 dual-port serial > >card. > > > >I believe that it should be supported, as it has this entry in > >pucdata.c > > > >[...] > > { 0x1415, 0xc158, 0x, 0, > > "Oxford Semiconductor OXPCIe952 UARTs", > > DEFAULT_RCLK * 0x22, > > PUC_PORT_NONSTANDARD, 0x10, 0, -1, > > .config_function = puc_config_oxford_pcie > > }, > >[...] > > It should be supported. The OXPCIe952 is more awkward to support than > the OXPCIe954 and OXPCIe958 because it can be configured in so many > different ways by the board manufacturer. However, 0xc158 is > configuration that is identical in arrangement as the larger chips, so > is the configuration I'm most confident of. I've just double-checked the > data sheets, and can't see any relevant differences between 0xc158 > OXPCIe952 and the OXPCIe954 I tested the code with. > > I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports > from other OXPCIe954 and OXPCIe958 board users (including someone with a > 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x > on my hardware. > > > >And, while it is recognized at boot -- after adding > > > > device puc > > options COM_MULTIPORT > > I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) > code - I certainly don't need it on 8.x. Does it make any difference if > you delete that line and just leave "device puc"? I will rebuild my kernel and try. > >to my kernel, it doesn't seem to be working. The devices '/dev/cuau2' > >and '/dev/cuau3' show up, and I can connect to them, but they don't > >seem to pass any traffic. If I connect to the serial console of > >another machine (one that I know for certain is working), I get > >nothing at all. > > Have you remembered to set the speed (and other relevant options) on the > .init devices? This is a feature (or is it a quirk) of the uart(4) > driver that catches many people out. Setting options on the base device > is normally a no-op. > > For example, if the remote device on /dev/cuau2 operates at 115200 bps > with hardware handshaking, try: > > stty -f /dev/cuau2.init speed 115200 crtscts Interestingly, it -is- a no-op on the device, which I hadn't noticed. But trying to set it on the .init fails: # stty -f /dev/cuau2.init speed 115200 stty: /dev/cuau2.init isn't a terminal crtscts # > One frustrating aspect of adding puc(4) support for many devices is that > you can't be certain of the clock rate multiplier - the same device can > crop up on a different manufacturer's board with a different multiplier. > This problem doesn't occur with the OXPCIe95x devices as they derive > their 62.5MHz UART clock from the PCI Express clock. Consequently, the > problem can't be that your board inadvertently operating the UARTs at > the wrong speed. > > > >I suspect (?) that it may not be recognized as the proper card. Boot > >and pciconf messages are: > > > >puc0: mem > >0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq > >30 at device 0.0 on pci4 > > That is correct. Are there any more lines afterwards - especially one > giving the number of UARTs detected? That line is crucial, as, on these > chips, the number of UARTs has to be read from configuration space > because you can slave two chips together. > > My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64): > > puc0: mem > 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 > at device 0.0 on pci8 > puc0: 4 UARTs detected > puc0: [FILTER] > uart2: <16950 or compatible> on puc0 > uart2: [FILTER] > uart3: <16950 or compatible> on puc0 > uart3: [FILTER] > uart4: <16950 or compatible> on puc0 > uart4: [FILTER] > uart5: <16950 or compatible> on puc0 > uart5: [FILTER] puc0: mem 0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at device 0.0 on pci4 puc0: 2 UARTs detected uart2: <16950 or compatible> at port 1 on puc0 uart3: <16950 or compatible> at port 2 on puc0 > >puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 > >rev=0x00 hdr=0x00 > > vendor = 'Oxford Semiconductor Ltd' > > class = simple comms > > subclass = UART > > bar [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled > > bar [14] = type Memory, range 32, base 0xfa00, size 2097152, > > enabled > > bar [18] = type Memory, range 32, base 0xf9e0, size 2097152, > > enabled > > That is correct. > > >The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite > >'STABLE' yet, but I don't think that this should matt
Re: Serial multiport error Oxford/Startech PEX2S952
Hi Greg, I wrote and contributed the support code for the OXPCIe95x serial chips - and just happened to notice your report. In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk writes I'm having a problem with a StarTech PEX2S952 dual-port serial card. I believe that it should be supported, as it has this entry in pucdata.c [...] { 0x1415, 0xc158, 0x, 0, "Oxford Semiconductor OXPCIe952 UARTs", DEFAULT_RCLK * 0x22, PUC_PORT_NONSTANDARD, 0x10, 0, -1, .config_function = puc_config_oxford_pcie }, [...] It should be supported. The OXPCIe952 is more awkward to support than the OXPCIe954 and OXPCIe958 because it can be configured in so many different ways by the board manufacturer. However, 0xc158 is configuration that is identical in arrangement as the larger chips, so is the configuration I'm most confident of. I've just double-checked the data sheets, and can't see any relevant differences between 0xc158 OXPCIe952 and the OXPCIe954 I tested the code with. I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports from other OXPCIe954 and OXPCIe958 board users (including someone with a 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x on my hardware. And, while it is recognized at boot -- after adding device puc options COM_MULTIPORT I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) code - I certainly don't need it on 8.x. Does it make any difference if you delete that line and just leave "device puc"? to my kernel, it doesn't seem to be working. The devices '/dev/cuau2' and '/dev/cuau3' show up, and I can connect to them, but they don't seem to pass any traffic. If I connect to the serial console of another machine (one that I know for certain is working), I get nothing at all. Have you remembered to set the speed (and other relevant options) on the .init devices? This is a feature (or is it a quirk) of the uart(4) driver that catches many people out. Setting options on the base device is normally a no-op. For example, if the remote device on /dev/cuau2 operates at 115200 bps with hardware handshaking, try: stty -f /dev/cuau2.init speed 115200 crtscts One frustrating aspect of adding puc(4) support for many devices is that you can't be certain of the clock rate multiplier - the same device can crop up on a different manufacturer's board with a different multiplier. This problem doesn't occur with the OXPCIe95x devices as they derive their 62.5MHz UART clock from the PCI Express clock. Consequently, the problem can't be that your board inadvertently operating the UARTs at the wrong speed. I suspect (?) that it may not be recognized as the proper card. Boot and pciconf messages are: puc0: mem 0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at device 0.0 on pci4 That is correct. Are there any more lines afterwards - especially one giving the number of UARTs detected? That line is crucial, as, on these chips, the number of UARTs has to be read from configuration space because you can slave two chips together. My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64): puc0: mem 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 at device 0.0 on pci8 puc0: 4 UARTs detected puc0: [FILTER] uart2: <16950 or compatible> on puc0 uart2: [FILTER] uart3: <16950 or compatible> on puc0 uart3: [FILTER] uart4: <16950 or compatible> on puc0 uart4: [FILTER] uart5: <16950 or compatible> on puc0 uart5: [FILTER] puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 rev=0x00 hdr=0x00 vendor = 'Oxford Semiconductor Ltd' class = simple comms subclass = UART bar [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled bar [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled bar [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled That is correct. The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite 'STABLE' yet, but I don't think that this should matter. Any advice would be much appreciated. The machine is still in test phase, so I can mess around with it as necessary. Hopefully this gets your Startech board working. I look forward to your feedback. If all else fails, the board I'm using is Lindy 51189. It's a OXPCIe954 board, offering four ports via a breakout cable, and is normally pretty cheap direct from lindy.com (quite possibly cheaper than your two port Startech board!). However, this recommendation comes with the proviso that I haven't yet tried it with FreeBSD 9.x. With best wishes, David -- David Wood da...@wood2.org.uk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsub
Re: debugging frequent kernel panics on 8.2-RELEASE
On Sat, 20 Aug 2011, Steven Hartland wrote: Are you seeing a double fault panic? We're seeing both. At least one double (or more) fault finishing with "Fatal Trap 12: page fault while in kernel mode". Subsequent panics have been single fault (all visible on the IPMI console) "Fatal Trap 9: general protection fault while in kernel mode". Could well be unrelated. The system is undergoing hardware diags now. Roger Marquis ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: debugging frequent kernel panics on 8.2-RELEASE
On 08/21/11 05:01, Steven Hartland wrote: - Original Message - From: "Jamie Gritton" The problem isn't with the conditional locking of tpr in prison_deref. That locking is actually correct, and there's no race condition. Are you sure? I do think that unlocking the mtx half way through the call allows the above scenario to create a race condition, all be it very briefly, when ignoring the overriding issue. In addition if the code where changed to so that the pr_uref++ also maintained the parents uref this would definitely lead to a potential problems in my mind, especially if you had more than one child prison, of a given parent, entering the dying state at any one time. In this case I believe you would have to acquire the locks of all the parent prisons before it would be safe to precede. Lock order requires that I unlock the child if I want to lock the parent. While that does allow periods where neither is locked, it's safe in this case. There may be multiple processes dying in one jail, or in multiple children of a single jail. But as long as a parent jail is locked while decrementing pr_uref, then only one of these simultaneous prison_deref calls would set pr_uref to zero and continue in the loop to that prison's parent. This might be mixed with pr_uref being incremented elsewhere, but that's not a problem either as long as the jail in question is locked. The trouble lies in the resurrection of dead jails, as Andriy has noted (though not just attaching, but also by setting its persist flag causes the same problem). I not sure that persistent prisons actually suffer from this in any different way tbh, as they have an additional uref increment so would never hit this case unless they have been actively removed and hence unpersisted first. Right - both the attach and persist cases are only a problem when a jail has disappeared. There are various ways for a jail to be removed, potentially to be kept around but in the dying state, but only two related ways for it to be resurrected: attaching a new process or setting the persist flag, both via jail_set with the JAIL_DYING flag passed. There are two possible fixes to this. One is the patch you've given, which only decrements a parent jail's pr_uref when the child jail completely goes away (as opposed to when it loses its last uref). This provides symmetry with the current way pr_uref is incremented on the parent, which is only when a jail is created. The other fix is to increment a parent's pr_uref when a jail is resurrected, which will match the current logic in prison_deref. I like the external semantics of this solution: a jail isn't visible if it is not persistent and has no processes and no *visible* sub-jails, as opposed to having no sub-jails at all. But this solution ends up pretty complicated - there are a few places where pr_uref is incremented, where I might need to increment parent jails' pr_uref as well, much like the current tpr loop in prison_deref decrements them. Ahh yes in the hierarchical case my patch would indeed mean that none persistent parent jails would remain visible even when its last child jail is in a dying state. As you say making this not the case would likely require replacing all instances of pr_uref++ with a prison_uref method that implements the opposite of the loop in prison_dref should the prisons pr_uref be 0 when called. Yes, that's the problem. Maybe not all instances, but at least most have enough times a jail is unlocked that we can't assume the pr_uref hasn't been set to zero somewhere else, and so we need to do that loop. Your solution removes code instead of adding it, which is generally a good thing. While it does change the semantics of pr_uref in the hierarchical case at least from what I thought it was, those semantics haven't been working properly anyway. Good to know my interpretation was correct, even if I was missing the visibility factor in the hierarchical case :) Bjoern, I'm adding you to the CC list for this because the whole pr_uref thing was your idea (though it was pr_nprocs at the time), so you might care about the hierarchical semantics of it - or you may not. Also, this is a panic-inducing bug in current and may interest you for that reason. From an admin perspective the current jail dying state does cause confusion when your not aware of its existence. You ask a jail to stop it appears to have completed that request, but really hasn't, an generally due to just a lingering tcp connection. With the introduction of hierarchical jails that gets a little worse where a whole series of jails could disappear from normal view only to be resurrected shortly after. Something to bear in mind when deciding which solution of the two presented to use. The good news is that the only time a jail (or perhaps a whole set of jails) can only come back from the dead when the administrator makes a concerted effort to do so. So it at least shouldn't surprise the administrator w
Serial multiport error Oxford/Startech PEX2S952
Not sure if -stable is the right place for this, but I'll give it a shot; if it's not, then a pointer in the right direction would be much appreciated. I'm having a problem with a StarTech PEX2S952 dual-port serial card. I believe that it should be supported, as it has this entry in pucdata.c [...] { 0x1415, 0xc158, 0x, 0, "Oxford Semiconductor OXPCIe952 UARTs", DEFAULT_RCLK * 0x22, PUC_PORT_NONSTANDARD, 0x10, 0, -1, .config_function = puc_config_oxford_pcie }, [...] And, while it is recognized at boot -- after adding device puc options COM_MULTIPORT to my kernel, it doesn't seem to be working. The devices '/dev/cuau2' and '/dev/cuau3' show up, and I can connect to them, but they don't seem to pass any traffic. If I connect to the serial console of another machine (one that I know for certain is working), I get nothing at all. I suspect (?) that it may not be recognized as the proper card. Boot and pciconf messages are: puc0: mem 0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at device 0.0 on pci4 puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 rev=0x00 hdr=0x00 vendor = 'Oxford Semiconductor Ltd' class = simple comms subclass = UART bar [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled bar [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled bar [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite 'STABLE' yet, but I don't think that this should matter. Any advice would be much appreciated. The machine is still in test phase, so I can mess around with it as necessary. Thanks. -- greg byshenk - free...@byshenk.net - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Unknown Re0 Hardware version
Hi, I'm assembling a few system with a ASUS P8 H161-MLE motherboard which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN Controller(s)' onboard. And to be honestly I never expected that version not to be supported. Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted it to config the ehternet. Rebooted, and re0 kicks in. But gives a HW revision not supported. It claims HW revision 0x2c80. Is this supported in later 8.2-Stable??? Or in 9.x?? I'm willing to tinker with the code to recompile the re0 driver. --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: debugging frequent kernel panics on 8.2-RELEASE
- Original Message - From: "Jamie Gritton" In essence I think we can get the following flow where 1# = process1 and 2# = process2 1#1. prison1.pr_uref = 1 (single process jail) 1#2. prison_deref( prison1,... 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 1#3. prison0.pr_uref-- 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) 2#2. process1.exit 2#3. prison_deref( prison1,... 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) First off thanks for the feedback Jamie most appreciated :) The problem isn't with the conditional locking of tpr in prison_deref. That locking is actually correct, and there's no race condition. Are you sure? I do think that unlocking the mtx half way through the call allows the above scenario to create a race condition, all be it very briefly, when ignoring the overriding issue. In addition if the code where changed to so that the pr_uref++ also maintained the parents uref this would definitely lead to a potential problems in my mind, especially if you had more than one child prison, of a given parent, entering the dying state at any one time. In this case I believe you would have to acquire the locks of all the parent prisons before it would be safe to precede. The trouble lies in the resurrection of dead jails, as Andriy has noted (though not just attaching, but also by setting its persist flag causes the same problem). I not sure that persistent prisons actually suffer from this in any different way tbh, as they have an additional uref increment so would never hit this case unless they have been actively removed and hence unpersisted first. There are two possible fixes to this. One is the patch you've given, which only decrements a parent jail's pr_uref when the child jail completely goes away (as opposed to when it loses its last uref). This provides symmetry with the current way pr_uref is incremented on the parent, which is only when a jail is created. The other fix is to increment a parent's pr_uref when a jail is resurrected, which will match the current logic in prison_deref. I like the external semantics of this solution: a jail isn't visible if it is not persistent and has no processes and no *visible* sub-jails, as opposed to having no sub-jails at all. But this solution ends up pretty complicated - there are a few places where pr_uref is incremented, where I might need to increment parent jails' pr_uref as well, much like the current tpr loop in prison_deref decrements them. Ahh yes in the hierarchical case my patch would indeed mean that none persistent parent jails would remain visible even when its last child jail is in a dying state. As you say making this not the case would likely require replacing all instances of pr_uref++ with a prison_uref method that implements the opposite of the loop in prison_dref should the prisons pr_uref be 0 when called. Your solution removes code instead of adding it, which is generally a good thing. While it does change the semantics of pr_uref in the hierarchical case at least from what I thought it was, those semantics haven't been working properly anyway. Good to know my interpretation was correct, even if I was missing the visibility factor in the hierarchical case :) Bjoern, I'm adding you to the CC list for this because the whole pr_uref thing was your idea (though it was pr_nprocs at the time), so you might care about the hierarchical semantics of it - or you may not. Also, this is a panic-inducing bug in current and may interest you for that reason. From an admin perspective the current jail dying state does cause confusion when your not aware of its existence. You ask a jail to stop it appears to have completed that request, but really hasn't, an generally due to just a lingering tcp connection. With the introduction of hierarchical jails that gets a little worse where a whole series of jails could disappear from normal view only to be resurrected shortly after. Something to bear in mind when deciding which solution of the two presented to use. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bad sector in gmirror HDD
Jeremy Chadwick wrote: > On Sun, Aug 21, 2011 at 02:00:33AM -0700, per...@pluto.rain.com > wrote: > > Jeremy Chadwick wrote: > > > ... using dd to find the bad LBAs is the only choice he has. > > or sysutils/diskcheckd ... > That software has a major problem where it runs constantly, rather > than periodically. Even in light of the discussion below, I would not think that a problem for the particular purpose under discussion, where it's presumably going to be terminated after completing a single pass. The "dd" approach is also going to soak the drive for the duration. > I know because I'm the one who opened the PR on it: > http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853 > There's a discussion about this port/issue from a few days ago > (how sweet!): > http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html > With comments from you stating that the software is behaving as > designed and that I misread the man page, but also stating point > blank that "either way the software runs continuously" (which is > what the PR was about in the first place): > http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html > ... > Back to my PR. > I state that I set up diskcheckd.conf using the option you > describe as "a length of time over which to spread each pass", > yet what happened was that it did as much I/O as it could > (read the entire disk in 45 minutes) then proceeded to do > it again (no sleep()) ... Agreed, that is not what is supposed to happen. What I see as a misreading of the manpage is reflected in your assertion, in the closing comment on 7/1/2008, that "the code does not do what the manpage says (or vice-versa)." Having looked at both the code and the manpage, I don't agree with that assessment. As I read it, the manpage sentence Naturally, it would be contradictory to specify both the frequency and the rate, so only one of these should be specified. has to mean that the "days" (frequency) setting is simply an alternative way of specifying the rate. Is there some other interpretation that I'm missing? Based on the code, it looks to me as if diskcheckd is supposed to read 64KB checking for errors, then sleep for a calculated length of time before reading the next 64KB, so as to average out to the (directly or indirectly) specified rate. Thus it is intended to run "continuously" in the sense that its I/O load is supposed to be as uniform as possible, consistent with reading 64KB at a time, rather than imposing a heavier load for some period of time and then pausing for the balance of the specified number of days. This is entirely consistent with my understanding of the manpage. Given that 115853 was closed (which AFAIK is supposed to mean "no longer considered a problem"), and seemed to have involved a misunderstanding of how diskcheckd was intended to operate, I decided to investigate the open 143566 instead -- and 143566 explicitly stated that "diskcheckd runs fine when gmirror is not involved ..." So I've been running diskcheckd on a gmirrored system and it seems to be working. As to what is actually going on: Earlier this evening I started looking into the failure to call updateproctitle() as mentioned in 115853's closing comment, which I had also noticed in my own testing, and it seems that this _is_ related to the now-clarified problem of diskcheckd running flat-out instead of pausing between each 64KB read. When the specified or calculated rate exceeds 64KB/sec, the required sleep interval between 64KB chunks is less than one second. Since diskcheckd calculates the interval in whole seconds -- because it calls sleep() rather than usleep() or nanosleep() -- an interval of less than one second is calculated as zero. That zero "interval" gets passed to sleep(), which dutifully returns immediately or nearly so, and the same zero is also used to "increment" the counter that is supposed to cause updateproctitle() to be called every 300 seconds. I suspect the fix will be to calculate in microseconds, and call usleep() instead of sleep(). And yes, I am planning to fix it -- and clarify the manpage -- but not tonight. > ... and besides, such a utility really shouldn't be a daemon > anyway but a periodic(8)-called utility with appropriate locks put > in place to ensure more than one instance can't be run at once. I suppose that can be argued either way. It's not obvious to me that using, say, 7x as much bandwidth for one day and then taking 6 days off is somehow better than spreading the testing over an entire week. Furthermore, using periodic(8) could get _really_ messy if checking multiple drives using different frequencies -- unless one wanted to run a separate instance of the program for each drive (and then we would have to prevent multiple simultaneous instances for any one drive, while allowing simultaneous checking of multiple drives). ___ freebsd-stable@freebsd.org mailing list
Re: bad sector in gmirror HDD
Am 20.08.2011 19:34, schrieb Dan Langille: > This is an older system. I suspect insufficient ventilation. I'll look at > getting > a new case fan, if not some HDD fans. The answer is quite simple, get new drives. They have gone for some 24000 hours, IOW, at least 3 years (assuming 24x7), and at around 50 °C, they're worn. After three years, at the slightest hitch, replace drives, before Something Bad[tm] happens. You'll get faster replacements anyhow :) On a related note, since this is about gmirror: Linux has a similar subsystem in place called the drive mapper (dm), with user-space tools mdadm. The whole rig (kernel + user space) supports various RAID levels through modules, the gmirror equivalent being raid1 -- and that module somewhat recently acquired an interesting *feature:* it can automatically rewrite broken sectors. Meaning that when it sees a read error on one drive, it will read the block from the intact other drive and re-write it on the faulty drive so that it gets reallocated (assuming nobody turned the drive's ARWE feature off). Perhaps that's a useful feature for gmirror, too. > 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) Eek, someone should fix dd to use proper units and not confuse seconds (s) with the secans function (sec). Anyways, that's pretty low by today's standards. My I/O speeds even on lowly Samsung 5400/min drives are in excess of 100 MBytes/s, and that's talking about drives made in 2009. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: debugging frequent kernel panics on 8.2-RELEASE
On 08/20/11 19:19, Steven Hartland wrote: - Original Message - From: "Andriy Gapon" on 20/08/2011 23:24 Steven Hartland said the following: - Original Message - From: "Steven Hartland" Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following code:- static void prison_deref(struct prison *pr, int flags) { struct prison *ppr, *tpr; int vfslocked; if (!(flags & PD_LOCKED)) mtx_lock(&pr->pr_mtx); /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { for (tpr = pr;; tpr = tpr->pr_parent) { if (tpr != pr) mtx_lock(&tpr->pr_mtx); if (--tpr->pr_uref > 0) break; KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); mtx_unlock(&tpr->pr_mtx); } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { mtx_unlock(&tpr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } if (tpr != pr) { mtx_unlock(&tpr->pr_mtx); mtx_lock(&pr->pr_mtx); } } If you take a scenario of a simple one level prison setup running a single process where a prison has just been stopped. In the above code pr_uref of the processes prison is decremented. As this is the last process then pr_uref will hit 0 and the loop continues instead of breaking early. Now at the end of the loop iteration the mtx is unlocked so other process can now manipulate the jail, this is where I think the problem may be. If we now have another process come in and attach to the jail but then instantly exit, this process may allow another kernel thread to hit this same bit of code and so two process for the same prison get into the section which decrements prison0's pr_uref, instead of only one. In essence I think we can get the following flow where 1# = process1 and 2# = process2 1#1. prison1.pr_uref = 1 (single process jail) 1#2. prison_deref( prison1,... 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 1#3. prison0.pr_uref-- 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) 2#2. process1.exit 2#3. prison_deref( prison1,... 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) It seems like the action on the parent prison to decrement the pr_uref is happening too early, while the jail can still be used and without the lock on the child jails mtx, so causing a race condition. I think the fix is to the move the decrement of parent prison pr_uref's down so it only takes place if the jail is "really" being removed. Either that or to change the locking semantics so that once the lock is aquired in this prison_deref its not unlocked until the function completes. What do people think? After reviewing the changes to prison_deref in commit which added hierarchical jails, the removal of the lock by the inital loop on the passed in prison may be unintentional. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h If so the following may be all that's needed to fix this issue:- diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 @@ -2455,7 +2455,8 @@ if (--tpr->pr_uref > 0) break; KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); - mtx_unlock(&tpr->pr_mtx); + if (tpr != pr) + mtx_unlock(&tpr->pr_mtx); } /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { Not sure if this would fly as is - please double check the later block where pr->pr_mtx is re-locked. Your right, and its actually more complex than that. Although changing it to not unlock in the middle of prison_deref fixes that race condition it doesn't prevent pr_uref being incorrectly decremented each time the jail gets into the dying state, which is really the problem we are seeing. If hierarchical prisons are used there seems to be an additional problem where the counter of all prisons in the hierarchy are decremented, but as far as I can tell only the immediate parent is ever incremented, so another reference problem there as well I think. The following patch I believe fixes both of these issues. I've testing with debug added and confirmed prison0's pr_uref is maintained correctly even when a jail hits dying state multiple times. It essentially reverts the changes to the "if (flags & PD_DEUREF)" by 192895 and moves it to after the jail has been actually removed. diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100 @@ -2449,27 +2449,16 @@ mtx_lock(&pr->pr_mtx); /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { - for