Re: important NFS client patch for FreeBSD8.n

2011-01-11 Thread Jeremy Chadwick
On Mon, Jan 10, 2011 at 11:40:37PM -0800, Chris H wrote:
 Greetings, and thank you for the heads up.
 On Mon, January 10, 2011 2:22 pm, Rick Macklem wrote:
  I just commited a patch (r217242) to head. Anyone who is using client
  side NFS on FreeBSD8.n should apply this patch. It is also available at:
  http://people.freebsd.org/~rmacklem/krpc.patch
 
 
  It fixes a problem where the kernel rpc assumes that 4 bytes of data
  exists in the first mbuf without checking. If the data straddles multiple 
  mbufs,
  it uses garbage and then a typical case will wedge for a minute or so until 
  it
  times out and establishes a new TCP connection. It also replaces m_pullup() 
  with
  m_copydata(), since m_pullup() can fail for rare cases when there is data
  available. (m_pullup() uses MGET(, M_DONTWAIT,) which can fail when mbuf
  allocation is constrainted, for example.)
 
  Thanks to john.gemignani at isilon.com for spotting this problem, rick
 
 I just fired a message off to @amd64  @net because I am seeing messages 
 like:
 
 nfe0: tx v2 error 0x6204UNDERFLOW
 
 on a recent 8.1/amd64 install which is connected to an 8.0/i386 via NFS.
 They both run NFS client  server, and they both utilize mount points
 on each other. They are only 2 of several interconnected servers. The
 others are all 7x/i386. But I only see these messages on the 8.1/amd64,
 and only when connected to, and utilizing mounts on the 8.0/i386, and even
 then, only when the data exceeds ~1.5Mb.
 I guess I'm asking if the messages I'm receiving are related to the
 corrections your patch provides. Or should I keep looking for the answer
 for the messages I am seeing.

The above message is coming from the nfe(4) NIC driver, not from NFS.
It's possible that NFS tickles some kind of I/O throughput quirk in
drivers such as nfe(4), given that they're intended for cheap desktops.

CC'ing Yong-Hyeon Pyun to assist in debugging/explaining the above
error.

In the interim, can you please provide output from the following
commands:

# uname -a
# dmesg   (please include relevant nfe details and miibus)
# pciconf -lvcb   (please only include nfe-related output)
# netstat -ind(you can XX-out MACs and/or IPs)
# ifconfig -a (you can XX-out MACs and/or IPs)

Thanks.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: important NFS client patch for FreeBSD8.n

2011-01-11 Thread Chris H
Hello Jeremy, and thank you for your reply.
On Tue, January 11, 2011 12:17 am, Jeremy Chadwick wrote:
 On Mon, Jan 10, 2011 at 11:40:37PM -0800, Chris H wrote:

 Greetings, and thank you for the heads up.
 On Mon, January 10, 2011 2:22 pm, Rick Macklem wrote:

 I just commited a patch (r217242) to head. Anyone who is using client
 side NFS on FreeBSD8.n should apply this patch. It is also available at:
 http://people.freebsd.org/~rmacklem/krpc.patch



 It fixes a problem where the kernel rpc assumes that 4 bytes of data
 exists in the first mbuf without checking. If the data straddles multiple
 mbufs, it uses garbage and then a typical case will wedge for a minute or so
 until it times out and establishes a new TCP connection. It also replaces
 m_pullup() with m_copydata(), since m_pullup() can fail for rare cases when
 there is data available. (m_pullup() uses MGET(, M_DONTWAIT,) which can fail
 when mbuf allocation is constrainted, for example.)

 Thanks to john.gemignani at isilon.com for spotting this problem, rick


 I just fired a message off to @amd64  @net because I am seeing messages
 like:


 nfe0: tx v2 error 0x6204UNDERFLOW


 on a recent 8.1/amd64 install which is connected to an 8.0/i386 via NFS. They
 both run NFS client  server, and they both utilize mount points on each
 other. They are only 2 of several interconnected servers. The others are all
 7x/i386. But I only see these messages on the 8.1/amd64,
 and only when connected to, and utilizing mounts on the 8.0/i386, and even
 then, only when the data exceeds ~1.5Mb. I guess I'm asking if the messages
 I'm receiving are related to the
 corrections your patch provides. Or should I keep looking for the answer for
 the messages I am seeing.

 The above message is coming from the nfe(4) NIC driver, not from NFS.
 It's possible that NFS tickles some kind of I/O throughput quirk in
 drivers such as nfe(4), given that they're intended for cheap desktops.

Well, I'd argue that point given I'm happily running an AM3 XIII 6-core
4Ghz motherboard that is military grade, which /also/ sports the nfe(4).
Oh, and it wasn't cheap. :)

However, the one I'm working with here is only an AM2 with a 2-core.


 CC'ing Yong-Hyeon Pyun to assist in debugging/explaining the above
 error.

Yong-Hyeon Pyun kindly responded to my message to @amd64 || @net, and
requested much the same info - which I provided. I /assumed/ that it
was an amd64 issue, as this box is the only amd64 of the lot, that, or
because it was the only 8.1 - the others are all = 8.0. After posting/
responding @amd64  @net, I noticed the NFS patch in the @stable, and
figured it worth asking about.


 In the interim, can you please provide output from the following
 commands:


 # uname -a

 # dmesg   (please include relevant nfe details and miibus)
SEE ATTACHED FILE: dmesg.boot.udns0
 # pciconf -lvcb   (please only include nfe-related output)
n...@pci0:0:10:0:   class=0x068000 card=0x73101462 chip=0x005710de rev=0xf3 
hdr=0x00
vendor = 'NVIDIA Corporation'
device = 'NVIDIA Network Bus Enumerator (CK804)'
class  = bridge
bar   [10] = type Memory, range 32, base 0xf9ffb000, size 4096, enabled
bar   [14] = type I/O Port, range 32, base 0xc080, size  8, enabled
cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0
 # netstat -ind(you can XX-out MACs and/or IPs)
NameMtu Network   Address  Ipkts Ierrs IdropOpkts Oerrs
Coll Drop
nfe0   1500 Link#1  00:19:db:22:74:87   729801 0 0   529029   182
   00
nfe0   1500 XXX.XXX.XXX.0 XXX.XXX.XXX.26  695750 - -   631781 -
   --
nfe0   1500 fe80:1::219:d fe80:1::219:dbff:0 - -6 -
   --
plip0  1500 Link#2   0 0 00 0
   00
lo0   16384 Link#3 315 0 0  315 0
   00
lo0   16384 127.0.0.0/8   127.0.0.1  313 - -  313 -
   --
lo0   16384 ::1/128   ::1  0 - -2 -
   --
lo0   16384 fe80:3::1/64  fe80:3::10 - -0 -
   --
 # ifconfig -a (you can XX-out MACs and/or IPs)
nfe0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=8010bRXCSUM,TXCSUM,VLAN_MTU,TSO4,LINKSTATE
ether 00:19:db:22:74:87
inet XXX.XXX.XXX.26 netmask 0xffe0 broadcast XXX.XXX.XXX.31
inet6 fe80::219:dbff:fe22:7487%nfe0 prefixlen 64 scopeid 0x1
nd6 options=3PERFORMNUD,ACCEPT_RTADV
media: Ethernet autoselect (100baseTX half-duplex)
status: active
plip0: flags=8810POINTOPOINT,SIMPLEX,MULTICAST metric 0 mtu 1500
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
options=3RXCSUM,TXCSUM
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
nd6 

Re: ZFS - hot spares : automatic or not?

2011-01-11 Thread Miroslav Lachman



Dan Langille wrote:

On 1/4/2011 11:52 AM, John Hawkes-Reed wrote:


[...]


As far as our testing could discover, it's not automatic.

I wrote some Ugly Perl that's called by devd when it spots a drive-fail
event, which seemed to DTRT when simulating a failure by pulling a drive.


Without such a script, what is the value in creating hot spares?


IMHO hot spares are totally useless in the current state (in FreeBSD).

I think there should be some strong warning somewhere (in man zpool?). 
Some users can be misleaded otherwise.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS - hot spares : automatic or not?

2011-01-11 Thread John Hawkes-Reed

On 11/01/2011 03:38, Dan Langille wrote:

On 1/4/2011 11:52 AM, John Hawkes-Reed wrote:

On 04/01/2011 03:08, Dan Langille wrote:

Hello folks,

I'm trying to discover if ZFS under FreeBSD will automatically pull in a
hot spare if one is required.

This raised the issue back in March 2010, and refers to a PR opened in
May 2009

* http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html
* http://www.freebsd.org/cgi/query-pr.cgi?pr=134491

In turn, the PR refers to this March 2010 post referring to using devd
to accomplish this task.

http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html

Does the above represent the the current state?

I ask because I just ordered two more HDD to use as spares. Whether they
sit on the shelf or in the box is open to discussion.


As far as our testing could discover, it's not automatic.

I wrote some Ugly Perl that's called by devd when it spots a drive-fail
event, which seemed to DTRT when simulating a failure by pulling a drive.


Without such a script, what is the value in creating hot spares?


We went through that loop in the office.

We're used to the way the Netapps work here, where often one's first 
notice of a failed disk is a visit from the courier with a replacement. 
(I'm only half joking)


In the end, writing enough perl to swap in the spare disk made much more 
sense than paging the relevant admin on disk-fail and expecting them to 
be able to type straight at 4AM.


Our thinking is that having a hot spare allows us to do the physical 
disk-swap in office hours, rather than (for instance) running in a 
degraded state over a long weekend.


If it's of interest, I'll see if I can share the code.

--
JH-R
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Enabling DDB prevent kernel from panicing

2011-01-11 Thread Mark Saad
On Mon, Jan 10, 2011 at 10:29 PM, Mark Saad nones...@longcount.org wrote:
 On Mon, Jan 10, 2011 at 9:13 PM, Jeremy Chadwick
 free...@jdc.parodius.com wrote:
 On Mon, Jan 10, 2011 at 07:42:21PM -0500, Mark Saad wrote:
 On Mon, Jan 10, 2011 at 6:59 PM,  nickolas...@gmail.com wrote:
  Hello, Mark
 
  2011/1/11 Mark Saad nones...@longcount.org:
  All
  This was originally posted to hackers@
 
  I have a good question that I cant find an answer for. I believe
  found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
  kernels on HP's DL360 G4p . The kernel dies with Fatal trap 12: page
  fault while in kernel mode  . The hardware works fine in 7.2-RELEASE
  amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
 
  In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
  stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
  issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
  kernel using patches sources and tried to boot and I got the same
  crash.
 
   Next I rebuilt the kernel with KDB and DDB to see if I could get a
  core-dump of the system. I also set loader.conf to
 
  kernel=kernel.DEBUG
  kern.dumpdev=/dev/da0s1b
 
  Next I pxebooted  the box and the system does not crash on boot up, it
  will easily load a nfs root and work fine. So I copied my debug
  kernel, and loader.conf to the local disk and rebooted and it boots
  fine from the local disk .
 
  Looks like a race condition.
  Well, you don't need to compile KDB and DDB, just add
 
  makeoptions DEBUG=-g
 
  into your kernel config file and rebuild kernel.
 
  Then after you got a crash dump you can easy debug it (see FreeBSD
  Developers Handbok):
  http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
 
 
  wbr,
  Nickolas
 

   Sorry let me clarify the issue, When you install a generic
 7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics
 in boot up
 when it probes the sio driver . Here is a part of my dmesg.boot file

 atkbd0: [ITHREAD]
 psm0: PS/2 Mouse irq 12 on atkbdc0
 psm0: [GIANT-LOCKED]
 psm0: [ITHREAD]
 psm0: model Generic PS/2 mouse, device ID 0
 sio0: configured irq 4 not in bitmap of probed irqs 0
 sio0: port may not be enabled
 sio0: configured irq 4 not in bitmap of probed irqs 0
 sio0: port may not be enabled
 sio0: Standard PC COM port port 0x3f8-0x3ff irq 4 on acpi0
 sio0: type 16550A
 sio0: [FILTER]
 Say about here in the boot up , is where the box crashes with the
 above noted error.

 If I then boot the same box off a 7.1-RELEASE amd64 netboot server ,
 mount the local disks of the 7.3-RELEASE install and edit the
 /boot/device.hints and comment out the sio hints like this

 hint.vga.0.at=isa
 hint.sc.0.at=isa
 hint.sc.0.flags=0x100
 #hint.sio.0.at=isa
 #hint.sio.0.port=0x3F8
 #hint.sio.0.flags=0x10
 #hint.sio.0.irq=4
 #hint.sio.1.at=isa
 #hint.sio.1.port=0x2F8
 #hint.sio.1.irq=3
 #hint.sio.2.at=isa
 #hint.sio.2.disabled=1
 #hint.sio.2.port=0x3E8
 #hint.sio.2.irq=5
 #hint.sio.3.at=isa
 #hint.sio.3.disabled=1
 #hint.sio.3.port=0x2E8
 #hint.sio.3.irq=9
 hint.ppc.0.at=isa
 hint.ppc.0.irq=7

 then boot the server off the local disks , the server boots correctly.

 The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on
 another working server, and installed it on the broken server and
 booted it off the local disks, with out any changes to the hints file
 and the server booted correctly and I was able to manually break out
 into the debugger , but nothing looked wrong .

 The sio(4) driver has been deprecated in RELENG_8, which uses uart(4).
 uart(4) is better in a lot of regards, and should also be available for
 use on RELENG_7 but you'll need to adjust /etc/ttys to refer to the new
 device names (ttyuX vs. ttydX), plus add the uart entries to
 /boot/device.hints.

 I found that too, and I was thinking about the change but its going to
 require a source build of the kernel to fix that along with a bunch of
 manual work
 on my side that  I would rather not do .

 I'm mentioning this as a workaround.

 Also worth considering is that the sio(4) ISA probe may be touching
 something Bad(tm) as a result, so you might try adding the following
 lines to your loader.conf (not a typo) to disable sio(4) entries
 entirely:

 hint.sio.0.disabled=1
 hint.sio.1.disabled=1

 And see if that improves things.  If it does, remove the sio.1.disabled
 entry and see if that suffices.

 I'll try the hint disabling but how is that different from removing
 the hint outright ?

so adding the hint to the loader.conf worked .  my understanding of
how the loader's 4th bits work make me believe
we can use either file for this hint . but I am still unsure of why
the stock hint breaks the box, where as no hint works
and disabling port via hint works. the other thing is the port works
in its intended way with no hint or disabled hint.


 So to sum this up there is something broken in 7.3-RELEASE but I cant
 figure out what. This server works with 

sed is broken under freebsd?

2011-01-11 Thread Oliver Pinter
hi all!

The freebsd versions of sed contained a bug/regression, when \n char
can i subsitue, gsed not affected with this bug:

FreeBSD xxx 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:55:53
UTC 2010 r...@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
 i386
a...@xxx ~ echo axa | sed s/x/\n/g
ana
a...@xxx ~ echo axa | sed s/x/'\n'/g
ana
it is FreeBSD 8.1 base systems sed

a...@centaur:~$ uname -a
Linux centaur 2.6.18-6-686 #1 SMP Thu Aug 20 21:56:59 UTC 2009 i686 GNU/Linux
a...@centaur:~$ echo axa | sed s/x/\n/g
ana
a...@centaur:~$ echo axa | sed s/x/'\n'/g
a
a
a...@centaur:~$ sed --version
GNU sed verzió 4.1.5



ural2:~$ uname -a
SunOS ural2 5.8 Generic_117350-62 sun4u sparc
ural2:~$ echo axa | sed s/x/\n/g
ana
ural2:~$ echo axa | sed s/x/'\n'/g
a
a
ural2:~$ sed --version
GNU sed version 4.2.1
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Supermicro Bladeserver

2011-01-11 Thread Steven Hartland

Out of interest what change was that?

- Original Message - 
From: Vogel, Jack jack.vo...@intel.com

To: TAKAHASHI Yoshihiro n...@freebsd.org; jfvo...@gmail.com
Cc: freebsd-...@freebsd.org; freebsd-stable@freebsd.org
Sent: Monday, January 10, 2011 9:17 PM
Subject: RE: Supermicro Bladeserver


We attempted to repro this problem with the 82566DM (ich8 btw) in house and 
failed, it worked correctly for my testers.

Oh, and just so the mailing lists have an update, the SM Blade problem was not an issue in the driver, it was a local change in 
the loader.conf that caused the problem.




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Supermicro Bladeserver

2011-01-11 Thread Robin Sommer

On Wed, Jan 12, 2011 at 03:13 -, you wrote:

 Out of interest what change was that?

As what seems to have been a left-over from a debugging session a
long time ago, I had MSI disabled in loader.conf. That's not
supported by the driver. So simply reenabling that solved my
problem.

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * ro...@icir.org
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sed is broken under freebsd?

2011-01-11 Thread Clifton Royston
On Wed, Jan 12, 2011 at 02:32:52AM +0100, Oliver Pinter wrote:
 hi all!
 
 The freebsd versions of sed contained a bug/regression, when \n char
 can i subsitue, gsed not affected with this bug:
 
 FreeBSD xxx 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:55:53
 UTC 2010 r...@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
  i386
 a...@xxx ~ echo axa | sed s/x/\n/g
 ana
 a...@xxx ~ echo axa | sed s/x/'\n'/g
 ana

Different than GNU is not a bug.

I have 7.3 here.  It behaves as the above, which is how the man page says it
should work.  The following is how the man page specifies you can substitute
a newline, by prefacing a quoted actual newline with a backslash:

$ echo axa | sed 's/x/\
 /g'
a
a

  That's how I remember classic sed behaving (Unix v7 or thereabouts.)
  -- Clifton

-- 
Clifton Royston  --  clift...@iandicomputing.com / clift...@lava.net
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org