date:20110821

Re: bad sector in gmirror HDD

2011-08-21 Thread Peter Jeremy

On 2011-Aug-19 20:24:38 -0700, Jeremy Chadwick  wrote:
>The reallocated LBA cannot be dealt with aside from re-creating the
>filesystem and telling it not to use the LBA.  I see no flags in
>newfs(8) that indicate a way to specify LBAs to avoid.  And we don't
>know what LBA it is so we can't refer to it right now anyway.
>
>As I said previously, I have no idea how UFS/FFS deals with this.

It doesn't.  UFS/FFS and ZFS expect and assume "perfect" media.  It's
up to the drive to transparently remap faulty sectors.  UFS used to
have support for visible bad sectors (and Solaris UFS still reserves
space for this, though I don't know if it still works) but the code
was removed from FreeBSD long ago.

AFAIR, wd(4) supported bad sectors but it was removed long ago.

-- 
Peter Jeremy

pgpzqxeB9mDZP.pgp
Description: PGP signature

Re: Unknown Re0 Hardware version

2011-08-21 Thread YongHyeon PYUN

On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote:
> Hi,
> 
> I'm assembling a few system with a ASUS P8 H161-MLE motherboard
> which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN 
> Controller(s)' onboard.
> 
> And to be honestly I never expected that version not to be supported.
> Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted 
> it to config the ehternet.
> 
> Rebooted, and re0 kicks in. But gives a HW revision not supported.
> It claims HW revision 0x2c80.
> 
> Is this supported in later 8.2-Stable??? Or in 9.x??
> 
> I'm willing to tinker with the code to recompile the re0 driver.
> 

Your controller looks like RTL8168E VL and support for the
controller was added after 8.2-RELEASE.
Either update your source to stable/8 or patch your source tree
with back-ported re(4) driver for 8.2-RELEASE like the following.

1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and
   copy it to /usr/src/sys/dev/re directory.
2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and
   copy it /usr/src/sys/pci directory.
And rebuild your kernel and your controller should be recognized in
next boot.

> --WjW
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Unknown Re0 Hardware version

2011-08-21 Thread Willem Jan Withagen


On 2011-08-22 1:01, YongHyeon PYUN wrote:

On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote:

Hi,

I'm assembling a few system with a ASUS P8 H161-MLE motherboard
which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN
Controller(s)' onboard.

And to be honestly I never expected that version not to be supported.
Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted
it to config the ehternet.

Rebooted, and re0 kicks in. But gives a HW revision not supported.
It claims HW revision 0x2c80.

Is this supported in later 8.2-Stable??? Or in 9.x??

I'm willing to tinker with the code to recompile the re0 driver.



Your controller looks like RTL8168E VL and support for the
controller was added after 8.2-RELEASE.
Either update your source to stable/8 or patch your source tree
with back-ported re(4) driver for 8.2-RELEASE like the following.

1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and
copy it to /usr/src/sys/dev/re directory.
2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and
copy it /usr/src/sys/pci directory.
And rebuild your kernel and your controller should be recognized in
next boot.


Hi YongHyeon PYUN,

Oke, that would mean I temporarily have to insert another ether card
to get things onboard. Or use the sneaker network. :)

I did check the 9.x stuff, but there the revision number was not in 
/usr/src/sys/pci/if_rlreg.h 

And you are right, they are in 8.2-STABLE.

Thanx for the files and pointers

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk

On Sun, Aug 21, 2011 at 09:44:41PM +0100, David Wood wrote:
 
> I wrote and contributed the support code for the OXPCIe95x serial chips 
> - and just happened to notice your report.

Thanks for the response.


> In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk 
>  writes
> >I'm having a problem with a StarTech PEX2S952 dual-port serial
> >card.
> >
> >I believe that it should be supported, as it has this entry in
> >pucdata.c
> >
> >[...]
> >   {   0x1415, 0xc158, 0x, 0,
> >   "Oxford Semiconductor OXPCIe952 UARTs",
> >   DEFAULT_RCLK * 0x22,
> >   PUC_PORT_NONSTANDARD, 0x10, 0, -1,
> >   .config_function = puc_config_oxford_pcie
> >   },
> >[...]
> 
> It should be supported. The OXPCIe952 is more awkward to support than 
> the OXPCIe954 and OXPCIe958 because it can be configured in so many 
> different ways by the board manufacturer. However, 0xc158 is 
> configuration that is identical in arrangement as the larger chips, so 
> is the configuration I'm most confident of. I've just double-checked the 
> data sheets, and can't see any relevant differences between 0xc158 
> OXPCIe952 and the OXPCIe954 I tested the code with.
> 
> I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
> from other OXPCIe954 and OXPCIe958 board users (including someone with a 
> 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
> on my hardware.
> 
> 
> >And, while it is recognized at boot -- after adding
> >
> >  device  puc
> >  options COM_MULTIPORT
> 
> I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) 
> code - I certainly don't need it on 8.x. Does it make any difference if 
> you delete that line and just leave "device puc"?

I will rebuild my kernel and try.
 
 
> >to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
> >and '/dev/cuau3' show up, and I can connect to them, but they don't
> >seem to pass any traffic. If I connect to the serial console of
> >another machine (one that I know for certain is working), I get
> >nothing at all.
> 
> Have you remembered to set the speed (and other relevant options) on the 
> .init devices? This is a feature (or is it a quirk) of the uart(4) 
> driver that catches many people out. Setting options on the base device 
> is normally a no-op.
> 
> For example, if the remote device on /dev/cuau2 operates at 115200 bps 
> with hardware handshaking, try:
> 
> stty -f /dev/cuau2.init speed 115200 crtscts

Interestingly, it -is- a no-op on the device, which I hadn't noticed.
But trying to set it on the .init fails:

# stty -f /dev/cuau2.init speed 115200
stty: /dev/cuau2.init isn't a terminal crtscts
# 

 
> One frustrating aspect of adding puc(4) support for many devices is that 
> you can't be certain of the clock rate multiplier - the same device can 
> crop up on a different manufacturer's board with a different multiplier. 
> This problem doesn't occur with the OXPCIe95x devices as they derive 
> their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
> problem can't be that your board inadvertently operating the UARTs at 
> the wrong speed.
> 
> 
> >I suspect (?) that it may not be recognized as the proper card. Boot
> >and pciconf messages are:
> >
> >puc0:  mem 
> >0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
> >30 at device 0.0 on pci4
> 
> That is correct. Are there any more lines afterwards - especially one 
> giving the number of UARTs detected? That line is crucial, as, on these 
> chips, the number of UARTs has to be read from configuration space 
> because you can slave two chips together.
> 
> My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):
> 
> puc0:  mem 
> 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
> at device 0.0 on pci8
> puc0: 4 UARTs detected
> puc0: [FILTER]
> uart2: <16950 or compatible> on puc0
> uart2: [FILTER]
> uart3: <16950 or compatible> on puc0
> uart3: [FILTER]
> uart4: <16950 or compatible> on puc0
> uart4: [FILTER]
> uart5: <16950 or compatible> on puc0
> uart5: [FILTER]

puc0:  mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4
puc0: 2 UARTs detected
uart2: <16950 or compatible> at port 1 on puc0
uart3: <16950 or compatible> at port 2 on puc0

 
> >puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 
> >rev=0x00 hdr=0x00
> >   vendor = 'Oxford Semiconductor Ltd'
> >   class  = simple comms
> >   subclass   = UART
> >   bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
> >   bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, 
> >   enabled
> >   bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, 
> >   enabled
> 
> That is correct.
> 
> >The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
> >'STABLE' yet, but I don't think that this should matt

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread David Wood


Hi Greg,

I wrote and contributed the support code for the OXPCIe95x serial chips 
- and just happened to notice your report.



In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk 
 writes

I'm having a problem with a StarTech PEX2S952 dual-port serial
card.

I believe that it should be supported, as it has this entry in
pucdata.c

[...]
   {   0x1415, 0xc158, 0x, 0,
   "Oxford Semiconductor OXPCIe952 UARTs",
   DEFAULT_RCLK * 0x22,
   PUC_PORT_NONSTANDARD, 0x10, 0, -1,
   .config_function = puc_config_oxford_pcie
   },
[...]


It should be supported. The OXPCIe952 is more awkward to support than 
the OXPCIe954 and OXPCIe958 because it can be configured in so many 
different ways by the board manufacturer. However, 0xc158 is 
configuration that is identical in arrangement as the larger chips, so 
is the configuration I'm most confident of. I've just double-checked the 
data sheets, and can't see any relevant differences between 0xc158 
OXPCIe952 and the OXPCIe954 I tested the code with.


I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
from other OXPCIe954 and OXPCIe958 board users (including someone with a 
16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
on my hardware.




And, while it is recognized at boot -- after adding

  device  puc
  options COM_MULTIPORT


I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) 
code - I certainly don't need it on 8.x. Does it make any difference if 
you delete that line and just leave "device puc"?




to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
and '/dev/cuau3' show up, and I can connect to them, but they don't
seem to pass any traffic. If I connect to the serial console of
another machine (one that I know for certain is working), I get
nothing at all.


Have you remembered to set the speed (and other relevant options) on the 
.init devices? This is a feature (or is it a quirk) of the uart(4) 
driver that catches many people out. Setting options on the base device 
is normally a no-op.


For example, if the remote device on /dev/cuau2 operates at 115200 bps 
with hardware handshaking, try:


stty -f /dev/cuau2.init speed 115200 crtscts


One frustrating aspect of adding puc(4) support for many devices is that 
you can't be certain of the clock rate multiplier - the same device can 
crop up on a different manufacturer's board with a different multiplier. 
This problem doesn't occur with the OXPCIe95x devices as they derive 
their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
problem can't be that your board inadvertently operating the UARTs at 
the wrong speed.




I suspect (?) that it may not be recognized as the proper card. Boot
and pciconf messages are:

puc0:  mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
30 at device 0.0 on pci4


That is correct. Are there any more lines afterwards - especially one 
giving the number of UARTs detected? That line is crucial, as, on these 
chips, the number of UARTs has to be read from configuration space 
because you can slave two chips together.



My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):

puc0:  mem 
0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
at device 0.0 on pci8

puc0: 4 UARTs detected
puc0: [FILTER]
uart2: <16950 or compatible> on puc0
uart2: [FILTER]
uart3: <16950 or compatible> on puc0
uart3: [FILTER]
uart4: <16950 or compatible> on puc0
uart4: [FILTER]
uart5: <16950 or compatible> on puc0
uart5: [FILTER]


puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 
rev=0x00 hdr=0x00

   vendor = 'Oxford Semiconductor Ltd'
   class  = simple comms
   subclass   = UART
   bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
   bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled
   bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled


That is correct.



The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
'STABLE' yet, but I don't think that this should matter.

Any advice would be much appreciated. The machine is still in
test phase, so I can mess around with it as necessary.


Hopefully this gets your Startech board working. I look forward to your 
feedback.



If all else fails, the board I'm using is Lindy 51189. It's a OXPCIe954 
board, offering four ports via a breakout cable, and is normally pretty 
cheap direct from lindy.com (quite possibly cheaper than your two port 
Startech board!). However, this recommendation comes with the proviso 
that I haven't yet tried it with FreeBSD 9.x.




With best wishes,




David
--
David Wood
da...@wood2.org.uk
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsub

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Roger Marquis


On Sat, 20 Aug 2011, Steven Hartland wrote:

Are you seeing a double fault panic?


We're seeing both.  At least one double (or more) fault finishing with
"Fatal Trap 12: page fault while in kernel mode".  Subsequent panics have
been single fault (all visible on the IPMI console) "Fatal Trap 9:
general protection fault while in kernel mode".

Could well be unrelated.  The system is undergoing hardware diags now.

Roger Marquis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton


On 08/21/11 05:01, Steven Hartland wrote:

- Original Message - From: "Jamie Gritton" 

The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


Lock order requires that I unlock the child if I want to lock the
parent. While that does allow periods where neither is locked, it's safe
in this case. There may be multiple processes dying in one jail, or in
multiple children of a single jail. But as long as a parent jail is
locked while decrementing pr_uref, then only one of these simultaneous
prison_deref calls would set pr_uref to zero and continue in the loop to
that prison's parent. This might be mixed with pr_uref being incremented
elsewhere, but that's not a problem either as long as the jail in
question is locked.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.


Right - both the attach and persist cases are only a problem when a jail
has disappeared. There are various ways for a jail to be removed,
potentially to be kept around but in the dying state, but only two
related ways for it to be resurrected: attaching a new process or
setting the persist flag, both via jail_set with the JAIL_DYING flag passed.


There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Yes, that's the problem. Maybe not all instances, but at least most have
enough times a jail is unlocked that we can't assume the pr_uref hasn't
been set to zero somewhere else, and so we need to do that loop.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.


 From an admin perspective the current jail dying state does cause
confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.


The good news is that the only time a jail (or perhaps a whole set of
jails) can only come back from the dead when the administrator makes a
concerted effort to do so. So it at least shouldn't surprise the
administrator w

Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk

Not sure if -stable is the right place for this, but I'll give it
a shot; if it's not, then a pointer in the right direction would
be much appreciated.

I'm having a problem with a StarTech PEX2S952 dual-port serial
card.

I believe that it should be supported, as it has this entry in
pucdata.c

[...]
{   0x1415, 0xc158, 0x, 0,
"Oxford Semiconductor OXPCIe952 UARTs",
DEFAULT_RCLK * 0x22,
PUC_PORT_NONSTANDARD, 0x10, 0, -1,
.config_function = puc_config_oxford_pcie
},
[...]

And, while it is recognized at boot -- after adding

device  puc
options COM_MULTIPORT

to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
and '/dev/cuau3' show up, and I can connect to them, but they don't
seem to pass any traffic. If I connect to the serial console of
another machine (one that I know for certain is working), I get 
nothing at all.

I suspect (?) that it may not be recognized as the proper card. Boot
and pciconf messages are:

puc0:  mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4

puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 rev=0x00 
hdr=0x00
vendor = 'Oxford Semiconductor Ltd'
class  = simple comms
subclass   = UART
bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled
bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled

The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
'STABLE' yet, but I don't think that this should matter.

Any advice would be much appreciated. The machine is still in
test phase, so I can mess around with it as necessary.

Thanks.

-- 
greg byshenk  -  free...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Unknown Re0 Hardware version

2011-08-21 Thread Willem Jan Withagen


Hi,

I'm assembling a few system with a ASUS P8 H161-MLE motherboard
which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN 
Controller(s)' onboard.


And to be honestly I never expected that version not to be supported.
Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted 
it to config the ehternet.


Rebooted, and re0 kicks in. But gives a HW revision not supported.
It claims HW revision 0x2c80.

Is this supported in later 8.2-Stable??? Or in 9.x??

I'm willing to tinker with the code to recompile the re0 driver.

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Steven Hartland

- Original Message - 
From: "Jamie Gritton" 

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)


First off thanks for the feedback Jamie most appreciated :)


The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.



There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.



From an admin perspective the current jail dying state does cause

confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: bad sector in gmirror HDD

2011-08-21 Thread perryh

Jeremy Chadwick  wrote:
> On Sun, Aug 21, 2011 at 02:00:33AM -0700, per...@pluto.rain.com
> wrote:
> > Jeremy Chadwick  wrote:
> > > ... using dd to find the bad LBAs is the only choice he has.
> > or sysutils/diskcheckd ...
> That software has a major problem where it runs constantly, rather
> than periodically.

Even in light of the discussion below, I would not think that a
problem for the particular purpose under discussion, where it's
presumably going to be terminated after completing a single pass.
The "dd" approach is also going to soak the drive for the duration.

> I know because I'm the one who opened the PR on it:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853
> There's a discussion about this port/issue from a few days ago
> (how sweet!):
> http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html
> With comments from you stating that the software is behaving as
> designed and that I misread the man page, but also stating point
> blank that "either way the software runs continuously" (which is
> what the PR was about in the first place):
> http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html
> ...
> Back to my PR.
> I state that I set up diskcheckd.conf using the option you
> describe as "a length of time over which to spread each pass",
> yet what happened was that it did as much I/O as it could
> (read the entire disk in 45 minutes) then proceeded to do
> it again (no sleep()) ...

Agreed, that is not what is supposed to happen.

What I see as a misreading of the manpage is reflected in your
assertion, in the closing comment on 7/1/2008, that "the code does
not do what the manpage says (or vice-versa)."  Having looked at
both the code and the manpage, I don't agree with that assessment.

As I read it, the manpage sentence

Naturally, it would be contradictory to specify both the
frequency and the rate, so only one of these should be
specified.

has to mean that the "days" (frequency) setting is simply an
alternative way of specifying the rate.  Is there some other
interpretation that I'm missing?

Based on the code, it looks to me as if diskcheckd is supposed to
read 64KB checking for errors, then sleep for a calculated length
of time before reading the next 64KB, so as to average out to the
(directly or indirectly) specified rate.  Thus it is intended to
run "continuously" in the sense that its I/O load is supposed to
be as uniform as possible, consistent with reading 64KB at a time,
rather than imposing a heavier load for some period of time and
then pausing for the balance of the specified number of days.
This is entirely consistent with my understanding of the manpage.

Given that 115853 was closed (which AFAIK is supposed to mean
"no longer considered a problem"), and seemed to have involved
a misunderstanding of how diskcheckd was intended to operate,
I decided to investigate the open 143566 instead -- and 143566
explicitly stated that "diskcheckd runs fine when gmirror is not
involved ..."  So I've been running diskcheckd on a gmirrored
system and it seems to be working.

As to what is actually going on:  Earlier this evening I started
looking into the failure to call updateproctitle() as mentioned
in 115853's closing comment, which I had also noticed in my own
testing, and it seems that this _is_ related to the now-clarified
problem of diskcheckd running flat-out instead of pausing between
each 64KB read.  When the specified or calculated rate exceeds
64KB/sec, the required sleep interval between 64KB chunks is less
than one second.  Since diskcheckd calculates the interval in
whole seconds -- because it calls sleep() rather than usleep() or
nanosleep() -- an interval of less than one second is calculated as
zero.  That zero "interval" gets passed to sleep(), which dutifully
returns immediately or nearly so, and the same zero is also used to
"increment" the counter that is supposed to cause updateproctitle()
to be called every 300 seconds.

I suspect the fix will be to calculate in microseconds, and call
usleep() instead of sleep().  And yes, I am planning to fix it --
and clarify the manpage -- but not tonight.

> ... and besides, such a utility really shouldn't be a daemon
> anyway but a periodic(8)-called utility with appropriate locks put
> in place to ensure more than one instance can't be run at once.

I suppose that can be argued either way.  It's not obvious to me
that using, say, 7x as much bandwidth for one day and then taking
6 days off is somehow better than spreading the testing over an
entire week.  Furthermore, using periodic(8) could get _really_
messy if checking multiple drives using different frequencies --
unless one wanted to run a separate instance of the program for
each drive (and then we would have to prevent multiple simultaneous
instances for any one drive, while allowing simultaneous checking
of multiple drives).
___
freebsd-stable@freebsd.org mailing list

Re: bad sector in gmirror HDD

2011-08-21 Thread Matthias Andree

Am 20.08.2011 19:34, schrieb Dan Langille:
> This is an older system.  I suspect insufficient ventilation.  I'll look at 
> getting
> a new case fan, if not some HDD fans.

The answer is quite simple, get new drives.

They have gone for some 24000 hours, IOW, at least 3 years (assuming
24x7), and at around 50 °C, they're worn.  After three years, at the
slightest hitch, replace drives, before Something Bad[tm] happens.
You'll get faster replacements anyhow :)


On a related note, since this is about gmirror:

Linux has a similar subsystem in place called the drive mapper (dm),
with user-space tools mdadm.  The whole rig (kernel + user space)
supports various RAID levels through modules, the gmirror equivalent
being raid1 -- and that module somewhat recently acquired an interesting
*feature:* it can automatically rewrite broken sectors.  Meaning that
when it sees a read error on one drive, it will read the block from the
intact other drive and re-write it on the faulty drive so that it gets
reallocated (assuming nobody turned the drive's ARWE feature off).
Perhaps that's a useful feature for gmirror, too.

> 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)

Eek, someone should fix dd to use proper units and not confuse seconds
(s) with the secans function (sec).

Anyways, that's pretty low by today's standards.  My I/O speeds even on
lowly Samsung 5400/min drives are in excess of 100 MBytes/s, and that's
talking about drives made in 2009.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton


On 08/20/11 19:19, Steven Hartland wrote:

- Original Message - From: "Andriy Gapon" 


on 20/08/2011 23:24 Steven Hartland said the following:

- Original Message - From: "Steven Hartland"

Looking through the code I believe I may have noticed a scenario
which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags & PD_LOCKED))
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
for (tpr = pr;; tpr = tpr->pr_parent) {
if (tpr != pr)
mtx_lock(&tpr->pr_mtx);
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
mtx_unlock(&tpr->pr_mtx);
if (flags & PD_LIST_SLOCKED)
sx_sunlock(&allprison_lock);
else if (flags & PD_LIST_XLOCKED)
sx_xunlock(&allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(&tpr->pr_mtx);
mtx_lock(&pr->pr_mtx);
}
}

If you take a scenario of a simple one level prison setup running a
single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As
this is the
last process then pr_uref will hit 0 and the loop continues instead
of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other
process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but
then instantly
exit, this process may allow another kernel thread to hit this same
bit of code
and so two process for the same prison get into the section which
decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)

It seems like the action on the parent prison to decrement the
pr_uref is
happening too early, while the jail can still be used and without
the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison
pr_uref's down
so it only takes place if the jail is "really" being removed. Either
that or
to change the locking semantics so that once the lock is aquired in
this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added
hierarchical
jails, the removal of the lock by the inital loop on the passed in
prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h



If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {


Not sure if this would fly as is - please double check the later block
where
pr->pr_mtx is re-locked.


Your right, and its actually more complex than that. Although changing
it to
not unlock in the middle of prison_deref fixes that race condition it
doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the "if (flags & PD_DEUREF)" by
192895 and moves it to after the jail has been actually removed.

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100
@@ -2449,27 +2449,16 @@
mtx_lock(&pr->pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags & PD_DEUREF) {
- for

Re: bad sector in gmirror HDD

Re: Unknown Re0 Hardware version

Re: Unknown Re0 Hardware version

Re: Serial multiport error Oxford/Startech PEX2S952

Re: Serial multiport error Oxford/Startech PEX2S952

Re: debugging frequent kernel panics on 8.2-RELEASE

Re: debugging frequent kernel panics on 8.2-RELEASE

Serial multiport error Oxford/Startech PEX2S952

Unknown Re0 Hardware version

Re: debugging frequent kernel panics on 8.2-RELEASE

Re: bad sector in gmirror HDD

Re: bad sector in gmirror HDD

Re: debugging frequent kernel panics on 8.2-RELEASE

13 matches

Site Navigation

Mail list logo

Footer information