Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton

On 08/20/11 19:19, Steven Hartland wrote:

- Original Message - From: Andriy Gapon a...@freebsd.org


on 20/08/2011 23:24 Steven Hartland said the following:

- Original Message - From: Steven Hartland

Looking through the code I believe I may have noticed a scenario
which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags  PD_LOCKED))
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

If you take a scenario of a simple one level prison setup running a
single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As
this is the
last process then pr_uref will hit 0 and the loop continues instead
of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other
process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but
then instantly
exit, this process may allow another kernel thread to hit this same
bit of code
and so two process for the same prison get into the section which
decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)

It seems like the action on the parent prison to decrement the
pr_uref is
happening too early, while the jail can still be used and without
the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison
pr_uref's down
so it only takes place if the jail is really being removed. Either
that or
to change the locking semantics so that once the lock is aquired in
this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added
hierarchical
jails, the removal of the lock by the inital loop on the passed in
prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h



If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
- mtx_unlock(tpr-pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block
where
pr-pr_mtx is re-locked.


Your right, and its actually more complex than that. Although changing
it to
not unlock in the middle of prison_deref fixes that race condition it
doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the if (flags  PD_DEUREF) by
192895 and moves it to after the jail has been actually removed.

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100
@@ -2449,27 +2449,16 @@
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
- for (tpr = pr;; tpr = tpr-pr_parent) 

Re: bad sector in gmirror HDD

2011-08-21 Thread Matthias Andree
Am 20.08.2011 19:34, schrieb Dan Langille:
 This is an older system.  I suspect insufficient ventilation.  I'll look at 
 getting
 a new case fan, if not some HDD fans.

The answer is quite simple, get new drives.

They have gone for some 24000 hours, IOW, at least 3 years (assuming
24x7), and at around 50 °C, they're worn.  After three years, at the
slightest hitch, replace drives, before Something Bad[tm] happens.
You'll get faster replacements anyhow :)


On a related note, since this is about gmirror:

Linux has a similar subsystem in place called the drive mapper (dm),
with user-space tools mdadm.  The whole rig (kernel + user space)
supports various RAID levels through modules, the gmirror equivalent
being raid1 -- and that module somewhat recently acquired an interesting
*feature:* it can automatically rewrite broken sectors.  Meaning that
when it sees a read error on one drive, it will read the block from the
intact other drive and re-write it on the faulty drive so that it gets
reallocated (assuming nobody turned the drive's ARWE feature off).
Perhaps that's a useful feature for gmirror, too.

 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)

Eek, someone should fix dd to use proper units and not confuse seconds
(s) with the secans function (sec).

Anyways, that's pretty low by today's standards.  My I/O speeds even on
lowly Samsung 5400/min drives are in excess of 100 MBytes/s, and that's
talking about drives made in 2009.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-21 Thread perryh
Jeremy Chadwick free...@jdc.parodius.com wrote:
 On Sun, Aug 21, 2011 at 02:00:33AM -0700, per...@pluto.rain.com
 wrote:
  Jeremy Chadwick free...@jdc.parodius.com wrote:
   ... using dd to find the bad LBAs is the only choice he has.
  or sysutils/diskcheckd ...
 That software has a major problem where it runs constantly, rather
 than periodically.

Even in light of the discussion below, I would not think that a
problem for the particular purpose under discussion, where it's
presumably going to be terminated after completing a single pass.
The dd approach is also going to soak the drive for the duration.

 I know because I'm the one who opened the PR on it:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853
 There's a discussion about this port/issue from a few days ago
 (how sweet!):
 http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html
 With comments from you stating that the software is behaving as
 designed and that I misread the man page, but also stating point
 blank that either way the software runs continuously (which is
 what the PR was about in the first place):
 http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html
 ...
 Back to my PR.
 I state that I set up diskcheckd.conf using the option you
 describe as a length of time over which to spread each pass,
 yet what happened was that it did as much I/O as it could
 (read the entire disk in 45 minutes) then proceeded to do
 it again (no sleep()) ...

Agreed, that is not what is supposed to happen.

What I see as a misreading of the manpage is reflected in your
assertion, in the closing comment on 7/1/2008, that the code does
not do what the manpage says (or vice-versa).  Having looked at
both the code and the manpage, I don't agree with that assessment.

As I read it, the manpage sentence

Naturally, it would be contradictory to specify both the
frequency and the rate, so only one of these should be
specified.

has to mean that the days (frequency) setting is simply an
alternative way of specifying the rate.  Is there some other
interpretation that I'm missing?

Based on the code, it looks to me as if diskcheckd is supposed to
read 64KB checking for errors, then sleep for a calculated length
of time before reading the next 64KB, so as to average out to the
(directly or indirectly) specified rate.  Thus it is intended to
run continuously in the sense that its I/O load is supposed to
be as uniform as possible, consistent with reading 64KB at a time,
rather than imposing a heavier load for some period of time and
then pausing for the balance of the specified number of days.
This is entirely consistent with my understanding of the manpage.

Given that 115853 was closed (which AFAIK is supposed to mean
no longer considered a problem), and seemed to have involved
a misunderstanding of how diskcheckd was intended to operate,
I decided to investigate the open 143566 instead -- and 143566
explicitly stated that diskcheckd runs fine when gmirror is not
involved ...  So I've been running diskcheckd on a gmirrored
system and it seems to be working.

As to what is actually going on:  Earlier this evening I started
looking into the failure to call updateproctitle() as mentioned
in 115853's closing comment, which I had also noticed in my own
testing, and it seems that this _is_ related to the now-clarified
problem of diskcheckd running flat-out instead of pausing between
each 64KB read.  When the specified or calculated rate exceeds
64KB/sec, the required sleep interval between 64KB chunks is less
than one second.  Since diskcheckd calculates the interval in
whole seconds -- because it calls sleep() rather than usleep() or
nanosleep() -- an interval of less than one second is calculated as
zero.  That zero interval gets passed to sleep(), which dutifully
returns immediately or nearly so, and the same zero is also used to
increment the counter that is supposed to cause updateproctitle()
to be called every 300 seconds.

I suspect the fix will be to calculate in microseconds, and call
usleep() instead of sleep().  And yes, I am planning to fix it --
and clarify the manpage -- but not tonight.

 ... and besides, such a utility really shouldn't be a daemon
 anyway but a periodic(8)-called utility with appropriate locks put
 in place to ensure more than one instance can't be run at once.

I suppose that can be argued either way.  It's not obvious to me
that using, say, 7x as much bandwidth for one day and then taking
6 days off is somehow better than spreading the testing over an
entire week.  Furthermore, using periodic(8) could get _really_
messy if checking multiple drives using different frequencies --
unless one wanted to run a separate instance of the program for
each drive (and then we would have to prevent multiple simultaneous
instances for any one drive, while allowing simultaneous checking
of multiple drives).
___
freebsd-stable@freebsd.org mailing list

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Steven Hartland
- Original Message - 
From: Jamie Gritton ja...@freebsd.org

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)


First off thanks for the feedback Jamie most appreciated :)


The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.



There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.



From an admin perspective the current jail dying state does cause

confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 

Unknown Re0 Hardware version

2011-08-21 Thread Willem Jan Withagen

Hi,

I'm assembling a few system with a ASUS P8 H161-MLE motherboard
which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN 
Controller(s)' onboard.


And to be honestly I never expected that version not to be supported.
Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted 
it to config the ehternet.


Rebooted, and re0 kicks in. But gives a HW revision not supported.
It claims HW revision 0x2c80.

Is this supported in later 8.2-Stable??? Or in 9.x??

I'm willing to tinker with the code to recompile the re0 driver.

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk
Not sure if -stable is the right place for this, but I'll give it
a shot; if it's not, then a pointer in the right direction would
be much appreciated.

I'm having a problem with a StarTech PEX2S952 dual-port serial
card.

I believe that it should be supported, as it has this entry in
pucdata.c

[...]
{   0x1415, 0xc158, 0x, 0,
Oxford Semiconductor OXPCIe952 UARTs,
DEFAULT_RCLK * 0x22,
PUC_PORT_NONSTANDARD, 0x10, 0, -1,
.config_function = puc_config_oxford_pcie
},
[...]

And, while it is recognized at boot -- after adding

device  puc
options COM_MULTIPORT

to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
and '/dev/cuau3' show up, and I can connect to them, but they don't
seem to pass any traffic. If I connect to the serial console of
another machine (one that I know for certain is working), I get 
nothing at all.

I suspect (?) that it may not be recognized as the proper card. Boot
and pciconf messages are:

puc0: Oxford Semiconductor OXPCIe952 UARTs mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4

puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 rev=0x00 
hdr=0x00
vendor = 'Oxford Semiconductor Ltd'
class  = simple comms
subclass   = UART
bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled
bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled

The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
'STABLE' yet, but I don't think that this should matter.

Any advice would be much appreciated. The machine is still in
test phase, so I can mess around with it as necessary.

Thanks.

-- 
greg byshenk  -  free...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton

On 08/21/11 05:01, Steven Hartland wrote:

- Original Message - From: Jamie Gritton ja...@freebsd.org

The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


Lock order requires that I unlock the child if I want to lock the
parent. While that does allow periods where neither is locked, it's safe
in this case. There may be multiple processes dying in one jail, or in
multiple children of a single jail. But as long as a parent jail is
locked while decrementing pr_uref, then only one of these simultaneous
prison_deref calls would set pr_uref to zero and continue in the loop to
that prison's parent. This might be mixed with pr_uref being incremented
elsewhere, but that's not a problem either as long as the jail in
question is locked.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.


Right - both the attach and persist cases are only a problem when a jail
has disappeared. There are various ways for a jail to be removed,
potentially to be kept around but in the dying state, but only two
related ways for it to be resurrected: attaching a new process or
setting the persist flag, both via jail_set with the JAIL_DYING flag passed.


There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Yes, that's the problem. Maybe not all instances, but at least most have
enough times a jail is unlocked that we can't assume the pr_uref hasn't
been set to zero somewhere else, and so we need to do that loop.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.


 From an admin perspective the current jail dying state does cause
confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.


The good news is that the only time a jail (or perhaps a whole set of
jails) can only come back from the dead when the administrator makes a
concerted effort to do so. So it at least shouldn't surprise the

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Roger Marquis

On Sat, 20 Aug 2011, Steven Hartland wrote:

Are you seeing a double fault panic?


We're seeing both.  At least one double (or more) fault finishing with
Fatal Trap 12: page fault while in kernel mode.  Subsequent panics have
been single fault (all visible on the IPMI console) Fatal Trap 9:
general protection fault while in kernel mode.

Could well be unrelated.  The system is undergoing hardware diags now.

Roger Marquis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread David Wood

Hi Greg,

I wrote and contributed the support code for the OXPCIe95x serial chips 
- and just happened to notice your report.



In message 20110821154249.ge92...@core.byshenk.net, Greg Byshenk 
free...@byshenk.net writes

I'm having a problem with a StarTech PEX2S952 dual-port serial
card.

I believe that it should be supported, as it has this entry in
pucdata.c

[...]
   {   0x1415, 0xc158, 0x, 0,
   Oxford Semiconductor OXPCIe952 UARTs,
   DEFAULT_RCLK * 0x22,
   PUC_PORT_NONSTANDARD, 0x10, 0, -1,
   .config_function = puc_config_oxford_pcie
   },
[...]


It should be supported. The OXPCIe952 is more awkward to support than 
the OXPCIe954 and OXPCIe958 because it can be configured in so many 
different ways by the board manufacturer. However, 0xc158 is 
configuration that is identical in arrangement as the larger chips, so 
is the configuration I'm most confident of. I've just double-checked the 
data sheets, and can't see any relevant differences between 0xc158 
OXPCIe952 and the OXPCIe954 I tested the code with.


I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
from other OXPCIe954 and OXPCIe958 board users (including someone with a 
16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
on my hardware.




And, while it is recognized at boot -- after adding

  device  puc
  options COM_MULTIPORT


I'm 99% certain that options COM_MULTIPORT relates to the old sio(4) 
code - I certainly don't need it on 8.x. Does it make any difference if 
you delete that line and just leave device puc?




to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
and '/dev/cuau3' show up, and I can connect to them, but they don't
seem to pass any traffic. If I connect to the serial console of
another machine (one that I know for certain is working), I get
nothing at all.


Have you remembered to set the speed (and other relevant options) on the 
.init devices? This is a feature (or is it a quirk) of the uart(4) 
driver that catches many people out. Setting options on the base device 
is normally a no-op.


For example, if the remote device on /dev/cuau2 operates at 115200 bps 
with hardware handshaking, try:


stty -f /dev/cuau2.init speed 115200 crtscts


One frustrating aspect of adding puc(4) support for many devices is that 
you can't be certain of the clock rate multiplier - the same device can 
crop up on a different manufacturer's board with a different multiplier. 
This problem doesn't occur with the OXPCIe95x devices as they derive 
their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
problem can't be that your board inadvertently operating the UARTs at 
the wrong speed.




I suspect (?) that it may not be recognized as the proper card. Boot
and pciconf messages are:

puc0: Oxford Semiconductor OXPCIe952 UARTs mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
30 at device 0.0 on pci4


That is correct. Are there any more lines afterwards - especially one 
giving the number of UARTs detected? That line is crucial, as, on these 
chips, the number of UARTs has to be read from configuration space 
because you can slave two chips together.



My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):

puc0: Oxford Semiconductor OXPCIe954 UARTs mem 
0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
at device 0.0 on pci8

puc0: 4 UARTs detected
puc0: [FILTER]
uart2: 16950 or compatible on puc0
uart2: [FILTER]
uart3: 16950 or compatible on puc0
uart3: [FILTER]
uart4: 16950 or compatible on puc0
uart4: [FILTER]
uart5: 16950 or compatible on puc0
uart5: [FILTER]


puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 
rev=0x00 hdr=0x00

   vendor = 'Oxford Semiconductor Ltd'
   class  = simple comms
   subclass   = UART
   bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
   bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled
   bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled


That is correct.



The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
'STABLE' yet, but I don't think that this should matter.

Any advice would be much appreciated. The machine is still in
test phase, so I can mess around with it as necessary.


Hopefully this gets your Startech board working. I look forward to your 
feedback.



If all else fails, the board I'm using is Lindy 51189. It's a OXPCIe954 
board, offering four ports via a breakout cable, and is normally pretty 
cheap direct from lindy.com (quite possibly cheaper than your two port 
Startech board!). However, this recommendation comes with the proviso 
that I haven't yet tried it with FreeBSD 9.x.




With best wishes,




David
--
David Wood
da...@wood2.org.uk
___
freebsd-stable@freebsd.org mailing list

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk
On Sun, Aug 21, 2011 at 09:44:41PM +0100, David Wood wrote:
 
 I wrote and contributed the support code for the OXPCIe95x serial chips 
 - and just happened to notice your report.

Thanks for the response.


 In message 20110821154249.ge92...@core.byshenk.net, Greg Byshenk 
 free...@byshenk.net writes
 I'm having a problem with a StarTech PEX2S952 dual-port serial
 card.
 
 I believe that it should be supported, as it has this entry in
 pucdata.c
 
 [...]
{   0x1415, 0xc158, 0x, 0,
Oxford Semiconductor OXPCIe952 UARTs,
DEFAULT_RCLK * 0x22,
PUC_PORT_NONSTANDARD, 0x10, 0, -1,
.config_function = puc_config_oxford_pcie
},
 [...]
 
 It should be supported. The OXPCIe952 is more awkward to support than 
 the OXPCIe954 and OXPCIe958 because it can be configured in so many 
 different ways by the board manufacturer. However, 0xc158 is 
 configuration that is identical in arrangement as the larger chips, so 
 is the configuration I'm most confident of. I've just double-checked the 
 data sheets, and can't see any relevant differences between 0xc158 
 OXPCIe952 and the OXPCIe954 I tested the code with.
 
 I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
 from other OXPCIe954 and OXPCIe958 board users (including someone with a 
 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
 on my hardware.
 
 
 And, while it is recognized at boot -- after adding
 
   device  puc
   options COM_MULTIPORT
 
 I'm 99% certain that options COM_MULTIPORT relates to the old sio(4) 
 code - I certainly don't need it on 8.x. Does it make any difference if 
 you delete that line and just leave device puc?

I will rebuild my kernel and try.
 
 
 to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
 and '/dev/cuau3' show up, and I can connect to them, but they don't
 seem to pass any traffic. If I connect to the serial console of
 another machine (one that I know for certain is working), I get
 nothing at all.
 
 Have you remembered to set the speed (and other relevant options) on the 
 .init devices? This is a feature (or is it a quirk) of the uart(4) 
 driver that catches many people out. Setting options on the base device 
 is normally a no-op.
 
 For example, if the remote device on /dev/cuau2 operates at 115200 bps 
 with hardware handshaking, try:
 
 stty -f /dev/cuau2.init speed 115200 crtscts

Interestingly, it -is- a no-op on the device, which I hadn't noticed.
But trying to set it on the .init fails:

# stty -f /dev/cuau2.init speed 115200
stty: /dev/cuau2.init isn't a terminal crtscts
# 

 
 One frustrating aspect of adding puc(4) support for many devices is that 
 you can't be certain of the clock rate multiplier - the same device can 
 crop up on a different manufacturer's board with a different multiplier. 
 This problem doesn't occur with the OXPCIe95x devices as they derive 
 their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
 problem can't be that your board inadvertently operating the UARTs at 
 the wrong speed.
 
 
 I suspect (?) that it may not be recognized as the proper card. Boot
 and pciconf messages are:
 
 puc0: Oxford Semiconductor OXPCIe952 UARTs mem 
 0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
 30 at device 0.0 on pci4
 
 That is correct. Are there any more lines afterwards - especially one 
 giving the number of UARTs detected? That line is crucial, as, on these 
 chips, the number of UARTs has to be read from configuration space 
 because you can slave two chips together.
 
 My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):
 
 puc0: Oxford Semiconductor OXPCIe954 UARTs mem 
 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
 at device 0.0 on pci8
 puc0: 4 UARTs detected
 puc0: [FILTER]
 uart2: 16950 or compatible on puc0
 uart2: [FILTER]
 uart3: 16950 or compatible on puc0
 uart3: [FILTER]
 uart4: 16950 or compatible on puc0
 uart4: [FILTER]
 uart5: 16950 or compatible on puc0
 uart5: [FILTER]

puc0: Oxford Semiconductor OXPCIe952 UARTs mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4
puc0: 2 UARTs detected
uart2: 16950 or compatible at port 1 on puc0
uart3: 16950 or compatible at port 2 on puc0

 
 puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 
 rev=0x00 hdr=0x00
vendor = 'Oxford Semiconductor Ltd'
class  = simple comms
subclass   = UART
bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, 
enabled
bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, 
enabled
 
 That is correct.
 
 The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
 'STABLE' yet, but I don't think that this should matter.
 
 Any advice would be much 

Re: Unknown Re0 Hardware version

2011-08-21 Thread Willem Jan Withagen

On 2011-08-22 1:01, YongHyeon PYUN wrote:

On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote:

Hi,

I'm assembling a few system with a ASUS P8 H161-MLE motherboard
which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN
Controller(s)' onboard.

And to be honestly I never expected that version not to be supported.
Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted
it to config the ehternet.

Rebooted, and re0 kicks in. But gives a HW revision not supported.
It claims HW revision 0x2c80.

Is this supported in later 8.2-Stable??? Or in 9.x??

I'm willing to tinker with the code to recompile the re0 driver.



Your controller looks like RTL8168E VL and support for the
controller was added after 8.2-RELEASE.
Either update your source to stable/8 or patch your source tree
with back-ported re(4) driver for 8.2-RELEASE like the following.

1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and
copy it to /usr/src/sys/dev/re directory.
2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and
copy it /usr/src/sys/pci directory.
And rebuild your kernel and your controller should be recognized in
next boot.


Hi YongHyeon PYUN,

Oke, that would mean I temporarily have to insert another ether card
to get things onboard. Or use the sneaker network. :)

I did check the 9.x stuff, but there the revision number was not in 
/usr/src/sys/pci/if_rlreg.h 

And you are right, they are in 8.2-STABLE.

Thanx for the files and pointers

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unknown Re0 Hardware version

2011-08-21 Thread YongHyeon PYUN
On Sun, Aug 21, 2011 at 04:01:10PM +0200, Willem Jan Withagen wrote:
 Hi,
 
 I'm assembling a few system with a ASUS P8 H161-MLE motherboard
 which was supposed to have a 'Realtek® 8112L, 1 x Gigabit LAN 
 Controller(s)' onboard.
 
 And to be honestly I never expected that version not to be supported.
 Just booted 8.2-RELEASE on it, and the Installer crashed when I wanted 
 it to config the ehternet.
 
 Rebooted, and re0 kicks in. But gives a HW revision not supported.
 It claims HW revision 0x2c80.
 
 Is this supported in later 8.2-Stable??? Or in 9.x??
 
 I'm willing to tinker with the code to recompile the re0 driver.
 

Your controller looks like RTL8168E VL and support for the
controller was added after 8.2-RELEASE.
Either update your source to stable/8 or patch your source tree
with back-ported re(4) driver for 8.2-RELEASE like the following.

1. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_re.c and
   copy it to /usr/src/sys/dev/re directory.
2. Fetch http://people.freebsd.org/~yongari/re/8.2R/if_rlreg.h and
   copy it /usr/src/sys/pci directory.
And rebuild your kernel and your controller should be recognized in
next boot.

 --WjW
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org