Re: important NFS client patch for FreeBSD8.n

2011-01-10 Thread Chris H
Greetings, and thank you for the "heads up".
On Mon, January 10, 2011 2:22 pm, Rick Macklem wrote:
> I just commited a patch (r217242) to head. Anyone who is using client
> side NFS on FreeBSD8.n should apply this patch. It is also available at:
> http://people.freebsd.org/~rmacklem/krpc.patch
>
>
> It fixes a problem where the kernel rpc assumes that 4 bytes of data
> exists in the first mbuf without checking. If the data straddles multiple 
> mbufs,
> it uses garbage and then a typical case will wedge for a minute or so until it
> times out and establishes a new TCP connection. It also replaces m_pullup() 
> with
> m_copydata(), since m_pullup() can fail for rare cases when there is data
> available. (m_pullup() uses MGET(, M_DONTWAIT,) which can fail when mbuf
> allocation is constrainted, for example.)
>
> Thanks to john.gemignani at isilon.com for spotting this problem, rick

I just fired a message off to @amd64 && @net because I am seeing messages like:

nfe0: tx v2 error 0x6204

on a recent 8.1/amd64 install which is connected to an 8.0/i386 via NFS.
They both run NFS client && server, and they both utilize mount points
on each other. They are only 2 of several interconnected servers. The
others are all 7x/i386. But I only see these messages on the 8.1/amd64,
and only when connected to, and utilizing mounts on the 8.0/i386, and even
then, only when the data exceeds ~1.5Mb.
I guess I'm asking if the messages I'm receiving are related to the
corrections your patch provides. Or should I keep looking for the answer
for the messages I am seeing.

Thank you for all your time and consideration.

--Chris

> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
>


-- 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS - hot spares : automatic or not?

2011-01-10 Thread Dan Langille

On 1/4/2011 11:52 AM, John Hawkes-Reed wrote:

On 04/01/2011 03:08, Dan Langille wrote:

Hello folks,

I'm trying to discover if ZFS under FreeBSD will automatically pull in a
hot spare if one is required.

This raised the issue back in March 2010, and refers to a PR opened in
May 2009

* http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html
* http://www.freebsd.org/cgi/query-pr.cgi?pr=134491

In turn, the PR refers to this March 2010 post referring to using devd
to accomplish this task.

http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html

Does the above represent the the current state?

I ask because I just ordered two more HDD to use as spares. Whether they
sit on the shelf or in the box is open to discussion.


As far as our testing could discover, it's not automatic.

I wrote some Ugly Perl that's called by devd when it spots a drive-fail
event, which seemed to DTRT when simulating a failure by pulling a drive.


Without such a script, what is the value in creating hot spares?

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling DDB prevent kernel from panicing

2011-01-10 Thread Mark Saad
On Mon, Jan 10, 2011 at 9:13 PM, Jeremy Chadwick
 wrote:
> On Mon, Jan 10, 2011 at 07:42:21PM -0500, Mark Saad wrote:
>> On Mon, Jan 10, 2011 at 6:59 PM,   wrote:
>> > Hello, Mark
>> >
>> > 2011/1/11 Mark Saad :
>> >> All
>> >> This was originally posted to hackers@
>> >>
>> >> I have a good question that I cant find an answer for. I believe
>> >> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
>> >> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page
>> >> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE
>> >> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
>> >>
>> >> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
>> >> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
>> >> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
>> >> kernel using patches sources and tried to boot and I got the same
>> >> crash.
>> >>
>> >>  Next I rebuilt the kernel with KDB and DDB to see if I could get a
>> >> core-dump of the system. I also set loader.conf to
>> >>
>> >> kernel="kernel.DEBUG"
>> >> kern.dumpdev="/dev/da0s1b"
>> >>
>> >> Next I pxebooted  the box and the system does not crash on boot up, it
>> >> will easily load a nfs root and work fine. So I copied my debug
>> >> kernel, and loader.conf to the local disk and rebooted and it boots
>> >> fine from the local disk .
>> >
>> > Looks like a race condition.
>> > Well, you don't need to compile KDB and DDB, just add
>> >
>> > makeoptions DEBUG=-g
>> >
>> > into your kernel config file and rebuild kernel.
>> >
>> > Then after you got a crash dump you can easy debug it (see FreeBSD
>> > Developers Handbok):
>> > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>> >
>> >
>> > wbr,
>> > Nickolas
>> >
>>
>>   Sorry let me clarify the issue, When you install a generic
>> 7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics
>> in boot up
>> when it probes the sio driver . Here is a part of my dmesg.boot file
>>
>> atkbd0: [ITHREAD]
>> psm0:  irq 12 on atkbdc0
>> psm0: [GIANT-LOCKED]
>> psm0: [ITHREAD]
>> psm0: model Generic PS/2 mouse, device ID 0
>> sio0: configured irq 4 not in bitmap of probed irqs 0
>> sio0: port may not be enabled
>> sio0: configured irq 4 not in bitmap of probed irqs 0
>> sio0: port may not be enabled
>> sio0:  port 0x3f8-0x3ff irq 4 on acpi0
>> sio0: type 16550A
>> sio0: [FILTER]
>> Say about here in the boot up , is where the box crashes with the
>> above noted error.
>>
>> If I then boot the same box off a 7.1-RELEASE amd64 netboot server ,
>> mount the local disks of the 7.3-RELEASE install and edit the
>> /boot/device.hints and comment out the sio hints like this
>>
>> hint.vga.0.at="isa"
>> hint.sc.0.at="isa"
>> hint.sc.0.flags="0x100"
>> #hint.sio.0.at="isa"
>> #hint.sio.0.port="0x3F8"
>> #hint.sio.0.flags="0x10"
>> #hint.sio.0.irq="4"
>> #hint.sio.1.at="isa"
>> #hint.sio.1.port="0x2F8"
>> #hint.sio.1.irq="3"
>> #hint.sio.2.at="isa"
>> #hint.sio.2.disabled="1"
>> #hint.sio.2.port="0x3E8"
>> #hint.sio.2.irq="5"
>> #hint.sio.3.at="isa"
>> #hint.sio.3.disabled="1"
>> #hint.sio.3.port="0x2E8"
>> #hint.sio.3.irq="9"
>> hint.ppc.0.at="isa"
>> hint.ppc.0.irq="7"
>>
>> then boot the server off the local disks , the server boots correctly.
>>
>> The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on
>> another working server, and installed it on the broken server and
>> booted it off the local disks, with out any changes to the hints file
>> and the server booted correctly and I was able to manually break out
>> into the debugger , but nothing looked wrong .
>
> The sio(4) driver has been deprecated in RELENG_8, which uses uart(4).
> uart(4) is better in a lot of regards, and should also be available for
> use on RELENG_7 but you'll need to adjust /etc/ttys to refer to the new
> device names (ttyuX vs. ttydX), plus add the uart entries to
> /boot/device.hints.
>
I found that too, and I was thinking about the change but its going to
require a source build of the kernel to fix that along with a bunch of
manual work
on my side that  I would rather not do .

> I'm mentioning this as a workaround.
>
> Also worth considering is that the sio(4) ISA probe may be touching
> something Bad(tm) as a result, so you might try adding the following
> lines to your loader.conf (not a typo) to disable sio(4) entries
> entirely:
>
> hint.sio.0.disabled="1"
> hint.sio.1.disabled="1"
>
> And see if that improves things.  If it does, remove the sio.1.disabled
> entry and see if that suffices.

I'll try the hint disabling but how is that different from removing
the hint outright ?

>
>> So to sum this up there is something broken in 7.3-RELEASE but I cant
>> figure out what. This server works with a generic install of
>> 7.1-RELEASE 7.2-RELEASE , 6.1-RELEASE, 6.2-RELEASE and 6.4-RELEASE in
>> both amd64 and i386 , but not 7.3-RELEASE in amd64 . It also worked in
>> 7.4-RC1 .
>>
>> avg recomme

Re: Enabling DDB prevent kernel from panicing

2011-01-10 Thread Jeremy Chadwick
On Mon, Jan 10, 2011 at 07:42:21PM -0500, Mark Saad wrote:
> On Mon, Jan 10, 2011 at 6:59 PM,   wrote:
> > Hello, Mark
> >
> > 2011/1/11 Mark Saad :
> >> All
> >> This was originally posted to hackers@
> >>
> >> I have a good question that I cant find an answer for. I believe
> >> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
> >> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page
> >> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE
> >> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
> >>
> >> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
> >> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
> >> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
> >> kernel using patches sources and tried to boot and I got the same
> >> crash.
> >>
> >>  Next I rebuilt the kernel with KDB and DDB to see if I could get a
> >> core-dump of the system. I also set loader.conf to
> >>
> >> kernel="kernel.DEBUG"
> >> kern.dumpdev="/dev/da0s1b"
> >>
> >> Next I pxebooted  the box and the system does not crash on boot up, it
> >> will easily load a nfs root and work fine. So I copied my debug
> >> kernel, and loader.conf to the local disk and rebooted and it boots
> >> fine from the local disk .
> >
> > Looks like a race condition.
> > Well, you don't need to compile KDB and DDB, just add
> >
> > makeoptions DEBUG=-g
> >
> > into your kernel config file and rebuild kernel.
> >
> > Then after you got a crash dump you can easy debug it (see FreeBSD
> > Developers Handbok):
> > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
> >
> >
> > wbr,
> > Nickolas
> >
> 
>   Sorry let me clarify the issue, When you install a generic
> 7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics
> in boot up
> when it probes the sio driver . Here is a part of my dmesg.boot file
> 
> atkbd0: [ITHREAD]
> psm0:  irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: [ITHREAD]
> psm0: model Generic PS/2 mouse, device ID 0
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0:  port 0x3f8-0x3ff irq 4 on acpi0
> sio0: type 16550A
> sio0: [FILTER]
> Say about here in the boot up , is where the box crashes with the
> above noted error.
> 
> If I then boot the same box off a 7.1-RELEASE amd64 netboot server ,
> mount the local disks of the 7.3-RELEASE install and edit the
> /boot/device.hints and comment out the sio hints like this
> 
> hint.vga.0.at="isa"
> hint.sc.0.at="isa"
> hint.sc.0.flags="0x100"
> #hint.sio.0.at="isa"
> #hint.sio.0.port="0x3F8"
> #hint.sio.0.flags="0x10"
> #hint.sio.0.irq="4"
> #hint.sio.1.at="isa"
> #hint.sio.1.port="0x2F8"
> #hint.sio.1.irq="3"
> #hint.sio.2.at="isa"
> #hint.sio.2.disabled="1"
> #hint.sio.2.port="0x3E8"
> #hint.sio.2.irq="5"
> #hint.sio.3.at="isa"
> #hint.sio.3.disabled="1"
> #hint.sio.3.port="0x2E8"
> #hint.sio.3.irq="9"
> hint.ppc.0.at="isa"
> hint.ppc.0.irq="7"
> 
> then boot the server off the local disks , the server boots correctly.
> 
> The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on
> another working server, and installed it on the broken server and
> booted it off the local disks, with out any changes to the hints file
> and the server booted correctly and I was able to manually break out
> into the debugger , but nothing looked wrong .

The sio(4) driver has been deprecated in RELENG_8, which uses uart(4).
uart(4) is better in a lot of regards, and should also be available for
use on RELENG_7 but you'll need to adjust /etc/ttys to refer to the new
device names (ttyuX vs. ttydX), plus add the uart entries to
/boot/device.hints.

I'm mentioning this as a workaround.

Also worth considering is that the sio(4) ISA probe may be touching
something Bad(tm) as a result, so you might try adding the following
lines to your loader.conf (not a typo) to disable sio(4) entries
entirely:

hint.sio.0.disabled="1"
hint.sio.1.disabled="1"

And see if that improves things.  If it does, remove the sio.1.disabled
entry and see if that suffices.

> So to sum this up there is something broken in 7.3-RELEASE but I cant
> figure out what. This server works with a generic install of
> 7.1-RELEASE 7.2-RELEASE , 6.1-RELEASE, 6.2-RELEASE and 6.4-RELEASE in
> both amd64 and i386 , but not 7.3-RELEASE in amd64 . It also worked in
> 7.4-RC1 .
> 
> avg recommended I see what changed from r212964  to r212994 I am
> currently looking into this . Has anyone seen this before ?

If the server works fine with 7.4-PRERELEASE/RC1, why are you caring
about 7.3?  Upgrade.  :-)

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977. 

Re: Enabling DDB prevent kernel from panicing

2011-01-10 Thread Mark Saad
On Mon, Jan 10, 2011 at 6:59 PM,   wrote:
> Hello, Mark
>
> 2011/1/11 Mark Saad :
>> All
>> This was originally posted to hackers@
>>
>> I have a good question that I cant find an answer for. I believe
>> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
>> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page
>> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE
>> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
>>
>> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
>> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
>> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
>> kernel using patches sources and tried to boot and I got the same
>> crash.
>>
>>  Next I rebuilt the kernel with KDB and DDB to see if I could get a
>> core-dump of the system. I also set loader.conf to
>>
>> kernel="kernel.DEBUG"
>> kern.dumpdev="/dev/da0s1b"
>>
>> Next I pxebooted  the box and the system does not crash on boot up, it
>> will easily load a nfs root and work fine. So I copied my debug
>> kernel, and loader.conf to the local disk and rebooted and it boots
>> fine from the local disk .
>
> Looks like a race condition.
> Well, you don't need to compile KDB and DDB, just add
>
> makeoptions DEBUG=-g
>
> into your kernel config file and rebuild kernel.
>
> Then after you got a crash dump you can easy debug it (see FreeBSD
> Developers Handbok):
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>
>
> wbr,
> Nickolas
>

  Sorry let me clarify the issue, When you install a generic
7.3-RELEASE amd64 on some of the HP servers I use, the kernel panics
in boot up
when it probes the sio driver . Here is a part of my dmesg.boot file

atkbd0: [ITHREAD]
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model Generic PS/2 mouse, device ID 0
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0:  port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
sio0: [FILTER]
Say about here in the boot up , is where the box crashes with the
above noted error.

If I then boot the same box off a 7.1-RELEASE amd64 netboot server ,
mount the local disks of the 7.3-RELEASE install and edit the
/boot/device.hints and comment out the sio hints like this

hint.vga.0.at="isa"
hint.sc.0.at="isa"
hint.sc.0.flags="0x100"
#hint.sio.0.at="isa"
#hint.sio.0.port="0x3F8"
#hint.sio.0.flags="0x10"
#hint.sio.0.irq="4"
#hint.sio.1.at="isa"
#hint.sio.1.port="0x2F8"
#hint.sio.1.irq="3"
#hint.sio.2.at="isa"
#hint.sio.2.disabled="1"
#hint.sio.2.port="0x3E8"
#hint.sio.2.irq="5"
#hint.sio.3.at="isa"
#hint.sio.3.disabled="1"
#hint.sio.3.port="0x2E8"
#hint.sio.3.irq="9"
hint.ppc.0.at="isa"
hint.ppc.0.irq="7"

then boot the server off the local disks , the server boots correctly.

The odd thing was, I rebuilt a debug 7.3-RELEASE amd64 kernel on
another working server, and installed it on the broken server and
booted it off the local disks, with out any changes to the hints file
and the server booted correctly and I was able to manually break out
into the debugger , but nothing looked wrong .

So to sum this up there is something broken in 7.3-RELEASE but I cant
figure out what. This server works with a generic install of
7.1-RELEASE 7.2-RELEASE , 6.1-RELEASE, 6.2-RELEASE and 6.4-RELEASE in
both amd64 and i386 , but not 7.3-RELEASE in amd64 . It also worked in
7.4-RC1 .

avg recommended I see what changed from r212964  to r212994 I am
currently looking into this . Has anyone seen this before ?



-- 

mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling DDB prevent kernel from panicing

2011-01-10 Thread nickolasbug
Hello, Mark

2011/1/11 Mark Saad :
> All
> This was originally posted to hackers@
>
> I have a good question that I cant find an answer for. I believe
> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page
> fault while in kernel mode " . The hardware works fine in 7.2-RELEASE
> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
>
> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
> kernel using patches sources and tried to boot and I got the same
> crash.
>
>  Next I rebuilt the kernel with KDB and DDB to see if I could get a
> core-dump of the system. I also set loader.conf to
>
> kernel="kernel.DEBUG"
> kern.dumpdev="/dev/da0s1b"
>
> Next I pxebooted  the box and the system does not crash on boot up, it
> will easily load a nfs root and work fine. So I copied my debug
> kernel, and loader.conf to the local disk and rebooted and it boots
> fine from the local disk .

Looks like a race condition.
Well, you don't need to compile KDB and DDB, just add

makeoptions DEBUG=-g

into your kernel config file and rebuild kernel.

Then after you got a crash dump you can easy debug it (see FreeBSD
Developers Handbok):
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html


wbr,
Nickolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Enabling DDB prevent kernel from panicing

2011-01-10 Thread Mark Saad
All
This was originally posted to hackers@

I have a good question that I cant find an answer for. I believe
found a kernel bug in 7.3-RELEASE that prevents me from booting 64-bit
kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: page
fault while in kernel mode " . The hardware works fine in 7.2-RELEASE
amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .

In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using the
stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if this
issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
kernel using patches sources and tried to boot and I got the same
crash.

 Next I rebuilt the kernel with KDB and DDB to see if I could get a
core-dump of the system. I also set loader.conf to

kernel="kernel.DEBUG"
kern.dumpdev="/dev/da0s1b"

Next I pxebooted  the box and the system does not crash on boot up, it
will easily load a nfs root and work fine. So I copied my debug
kernel, and loader.conf to the local disk and rebooted and it boots
fine from the local disk .

Rebooting the server and running off the local disks and debug kernel,
I cant find any issues.

Reboot the box into a GENERIC 7.3-RELEASE-p4 kernel and it crashes

With this error

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code = supervisor write data, page not present
instruction pointer = 0x8:0x800070fa
stack pointer= 0x10:0x8153cbe0
frame pointer= 0x10:0x8153cc50
code segment  = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags  = interrupt enabled, resume, IOPL = 0
current process   = 0 (swapper)
[thread pid 0 tid 10 ]
Stopped at  bzero+0xa: repe stosq   %es:(%rdi)


It was recommended to comment out the sio hints in /boot/device.hints
I did this and I can properly boot a GENERIC 7.3-RELEASE kernel.

I reran this same test using 7.4-RC1 the system boots with out any
changes to anything.

So my question, does anyone know what changed in stable/7 after the
creation of 7.3-RELEASE that could have
fixed this or does anyone know what  could be causing this issue. The
sio code does not look like its been changed in
a long while . Do we still need s the hits for the sio ports anyway
does omitting them from the hints file cause any
major issues, I can use the serial port for a console and to connect
to to other serial devices with out any issues.

-- 

mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: tmpfs regression in recent -STABLE

2011-01-10 Thread Jeremy Chadwick
On Mon, Jan 10, 2011 at 11:14:24PM +0100, Ulrich Spörlein wrote:
> On Mon, 10.01.2011 at 16:49:14 -0500, John Baldwin wrote:
> > On Monday, January 10, 2011 4:40:04 pm Ulrich Spörlein wrote:
> > > Hey,
> > > 
> > > the following line in fstab used to work just fine for my /tmp:
> > > 
> > > tmpfs   /tmptmpfs   rw,size=1g,mode=17770 0
> > 
> > I thought there was a thread recently about tmpfs not supporting things 
> > like 
> > "1g" for size?
> 
> Nah, this must be some leak of another kind. Luckily I could bandaid
> this by unionfs mounting an mfs disk over /tmp so programs continue to
> run.
> 
> But, tmpfs really is out of resources, as I cannot create new tmpfs's
> for example:
> 
> r...@elmar: ~# mount -t tmpfs tmpfs /media
> mount: tmpfs : No space left on device
> 
> And besides, the /tmp mount comes up fine and shows enough free space (I
> checked this the last time, after I had rebooted the box).

Are you using ZFS on the same machine?  If so, ZFS and tmpfs don't play
well together, don't use tmpfs.  Please search the below page for "tmpfs
runs out of space" for all relevant posts:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/thread.html

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


important NFS client patch for FreeBSD8.n

2011-01-10 Thread Rick Macklem
I just commited a patch (r217242) to head. Anyone who is using client
side NFS on FreeBSD8.n should apply this patch. It is also available at:
   http://people.freebsd.org/~rmacklem/krpc.patch

It fixes a problem where the kernel rpc assumes that 4 bytes of data
exists in the first mbuf without checking. If the data straddles multiple
mbufs, it uses garbage and then a typical case will wedge for a minute
or so until it times out and establishes a new TCP connection. It also
replaces m_pullup() with m_copydata(), since m_pullup() can fail for
rare cases when there is data available. (m_pullup() uses MGET(, M_DONTWAIT,)
which can fail when mbuf allocation is constrainted, for example.)

Thanks to john.gemignani at isilon.com for spotting this problem, rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: tmpfs regression in recent -STABLE

2011-01-10 Thread Ulrich Spörlein
On Mon, 10.01.2011 at 16:49:14 -0500, John Baldwin wrote:
> On Monday, January 10, 2011 4:40:04 pm Ulrich Spörlein wrote:
> > Hey,
> > 
> > the following line in fstab used to work just fine for my /tmp:
> > 
> > tmpfs   /tmptmpfs   rw,size=1g,mode=17770 0
> 
> I thought there was a thread recently about tmpfs not supporting things like 
> "1g" for size?

Nah, this must be some leak of another kind. Luckily I could bandaid
this by unionfs mounting an mfs disk over /tmp so programs continue to
run.

But, tmpfs really is out of resources, as I cannot create new tmpfs's
for example:

r...@elmar: ~# mount -t tmpfs tmpfs /media
mount: tmpfs : No space left on device

And besides, the /tmp mount comes up fine and shows enough free space (I
checked this the last time, after I had rebooted the box).

Cheers,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: NFS performance

2011-01-10 Thread Rick Macklem
> >
> > So, did the patch get rid of the 1min + stalls you reported earlier?
> >
> Yes. The stalls (and the "server not responding" log messages are
> gone. Thanks! -- George
> 
Ok, thats a start anyhow. Maybe someday we can explain the slow read
rates you are still observing.

Thanks for letting us know, rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: tmpfs regression in recent -STABLE

2011-01-10 Thread John Baldwin
On Monday, January 10, 2011 4:40:04 pm Ulrich Spörlein wrote:
> Hey,
> 
> the following line in fstab used to work just fine for my /tmp:
> 
> tmpfs   /tmptmpfs   rw,size=1g,mode=17770 0

I thought there was a thread recently about tmpfs not supporting things like 
"1g" for size?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Supermicro Bladeserver

2011-01-10 Thread Vogel, Jack
We attempted to repro this problem with the 82566DM (ich8 btw) in house and 
failed, it worked correctly for my testers.

Oh, and just so the mailing lists have an update, the SM Blade problem was not 
an issue in the driver, it was a local change in the loader.conf that caused 
the problem.

Regards,

Jack


-Original Message-
From: TAKAHASHI Yoshihiro [mailto:n...@freebsd.org] 
Sent: Friday, January 07, 2011 7:40 PM
To: jfvo...@gmail.com
Cc: freebsd-...@freebsd.org; freebsd-stable@freebsd.org; Vogel, Jack
Subject: Re: Supermicro Bladeserver

In article 
Jack Vogel  writes:

> I am trying to track down a problem being experienced at icir.org using
> SuperMicro
> bladeservers, the SERDES 82575 interfaces are having connectivity or perhaps
> autoneg problems, resulting in link transitions and watchdog resets.
> 
> The closest hardware my org at Intel has is a Fujitsu server who's blades
> also have
> this device, but testing on that has failed to repro the problem.
> 
> I was wondering if anyone else out there has this hardware, if so could you
> let me
> know your experience, have you had problems or not, etc etc?

My machine has the following em(4) device and it has a autoneg
problem.  When I was using 8-stable kernel at 2010/11/01, it has no
problem.  But I update to 8-stable at 2010/12/01, the kernel is only
linked up as 10M.


e...@pci0:0:25:0:class=0x02 card=0x13d510cf chip=0x104a8086 rev=0x02 
hdr=0x00
 vendor = 'Intel Corporation'
 device = '82566DM Gigabit Network Connection'
 class  = network
 subclass   = ethernet

---
TAKAHASHI Yoshihiro 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

Link to mfsBSD ISO files for testing (i386 and amd64):
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso

The root password for the ISO files: "mfsroot"
The ISO files work on real systems and in virtualbox.
They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
simply use the provided "zfsinstall" script.

The patch is against FreeBSD 8-STABLE as of 2010-12-15.

When applying the patch be sure to use correct options for patch(1)
and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
deleted:

 # cd /usr/src
 # fetch
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 # xz -d stable-8-zfsv28-20101215.patch.xz
 # patch -E -p0<  stable-8-zfsv28-20101215.patch
 # rm sys/cddl/compat/opensolaris/sys/sysmacros.h

I've just got a panic:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/IMAGE_006.jpg

The panic line for google:
panic: solaris assert: task->ost_magic == TASKQ_MAGIC, file: 
/usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/kern/opensolaris_taskq.c, 
line: 150


I hope this is enough for debugging, if it's not yet otherwise known. If 
not, I will try to catch it againt and make a dump.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


tmpfs regression in recent -STABLE

2011-01-10 Thread Ulrich Spörlein
Hey,

the following line in fstab used to work just fine for my /tmp:

tmpfs   /tmptmpfs   rw,size=1g,mode=17770 0

But since I upgraded to 8.2-PRERELEASE, /tmp will soon run out of space
(usually after leaving the box overnight).

% df /tmp
Filesystem 1K-blocks Used Avail Capacity  Mounted on
tmpfs 12   12 0   100%/tmp


Yes, what you see here, is not "stuff" filling up the /tmp partition,
*BUT* the /tmp partition shrinking to a ridiculous size. /tmp only has
the usual stuff on it, as I can now no longer create temporary files
there:

% du /tmp
4   /tmp/.X11-unix
0   /tmp/.XIM-unix
0   /tmp/.ICE-unix
0   /tmp/.font-unix
4   /tmp/ssh-tEgl0QxQHp
4   /tmp/ksocket-uqs
12  /tmp/kde-uqs
4   /tmp/fam-uqs
8   /tmp/.vbox-uqs-ipc
0   /tmp/worker-uqs
44  /tmp


Anything I could try?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nfsd stuck in *rc_lock state

2011-01-10 Thread Rick Macklem
> Hello Rick,
> 
> Am 11.11.2010 23:54, schrieb Rick Macklem:
> > That patch is "self contained", so I think it should be fine to
> > apply it
> > to an 8.0 server.
> >
> > You might also want
> > 
> > http://people.freebsd.org/~rmacklem/freebsd8.0-patches/freebsd8-svc-mbufleak.patch
> > which plugged an mbuf leak in the regular FreeBSD8.0 server.
> >
> > Good luck with it, rick
> 
> the patch fixes the 100% cpu utilization, but we now had two times the
> issue, that all boxes lost connection to the nfs server (/home not
> responding), but nfsd was at about 1%.
> 
> Top did not show a strange behaviour here:
> 
> 
> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 703 root 55 0 4772K 1384K RUN 5 329:12 1.37%
> {nfsd: service}
> 703 root 56 0 4772K 1384K rpcsvc 0 326:41 0.59%
> {nfsd: service}
> 703 root 52 0 4772K 1384K rpcsvc 6 326:28 0.29%
> {nfsd: service}
> 703 root 60 0 4772K 1384K rpcsvc 5 328:42 0.00%
> {nfsd: master}
> 703 root 54 0 4772K 1384K rpcsvc 0 327:44 0.00%
> {nfsd: service}
> 703 root 53 0 4772K 1384K rpcsvc 1 327:37 0.00%
> {nfsd: service}
> 703 root 54 0 4772K 1384K rpcsvc 6 326:51 0.00%
> {nfsd: service}
> 703 root 57 0 4772K 1384K rpcsvc 2 326:44 0.00%
> {nfsd: service}
> 703 root 50 0 4772K 1384K rpcsvc 1 326:20 0.00%
> {nfsd: service}
> 703 root 71 0 4772K 1384K rpcsvc 2 323:11 0.00%
> {nfsd: service}
> 703 root 47 0 4772K 1384K rpcsvc 7 321:11 0.00%
> {nfsd: service}
> 703 root 46 0 4772K 1384K tx->tx 2 320:00 0.00%
> {nfsd: service}
> 
> there was nothing special in the logfiles, too.
> How to debug such a situation?
> 
First off, I hope you don't mind me adding the mailing
list as a cc. I'd like this stuff captured in the archive
for others to see. (If people don't like the noise, I'll
take the heat:-)

Ok, I'm sure others have better techniques, but here's how
I would start trying to resolve the above, done when the
server is stuck.
1 - Make sure the network is still functioning for other
things like ssh.
2 - Do a "ps axHlww" and look at all the nfsd threads. I
am primarily interested in the MWCHAN field.
If it is:
rpcsvc - the thread is just waiting for an RPC-->normal
ufs or zfs - waiting for a vnode lock on the underlying
file system
anything else - I need to look in the kernel sources for
the "sleep" with that argument.
If I can't easily explain what all the nfsd threads are
waiting for, wading through a "procstat -ka" is my next
step. (I find this rather painful, so I tend to delay doing
this as long as possible.:-)
3 - Do a "nfsstat -s" repeatedly and see if any of the counters
are increasing.
4 - Fire up a "tcpdump" and see if there is any NFS traffic.
(If there is, I'll capture it and put it in wireshark.)
5 - Do a "vmstat -z | fgrep mbuf" and look at the mbuf allocation.
(If the machine is running out of mbufs, all sorts of quirky
 behaviour is possible.)

What top shows above isn't much, although I'd wonder what mbuf
usage looks like? If you haven't applied the patch mentioned
in the above message, you should do that.

I don't know if this helps, but... rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.

2011-01-10 Thread Rick Macklem
> >
> > Hi,
> >
> > I have got the first steps set up. No solution yet.
> > 1. With the patch OpenOffice opens my homedir (yeah!), but it gives
> > an
> > I/O
> > error when saving a file and everything hangs after that.
> 
> Hmm, I don't think you mentioned what server you were using. It
> wouldn't happen to be a FreeBSD one exported ZFS? If so, make
> sure you have this patch in it:
> http://people.freebsd.org/~rmacklem/freebsd8.0-patches/freebsd8-nfsserver-estale.patch
> (With it a stale file handle can result in EIO from a server exporting
Oops, I meant "Without the patch a stale file handle...", rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.

2011-01-10 Thread Rick Macklem
> 
> Hi,
> 
> I have got the first steps set up. No solution yet.
> 1. With the patch OpenOffice opens my homedir (yeah!), but it gives an
> I/O
> error when saving a file and everything hangs after that.

Hmm, I don't think you mentioned what server you were using. It
wouldn't happen to be a FreeBSD one exported ZFS? If so, make
sure you have this patch in it:
  
http://people.freebsd.org/~rmacklem/freebsd8.0-patches/freebsd8-nfsserver-estale.patch
(With it a stale file handle can result in EIO from a server exporting ZFS
 and that can make the client loop around, retrying the RPC.)
  
> 2. I have dumps and stuff. I will mail some links in private e-mail.

I'll take a look at some point.

> 3. Didn't work. It mount, but ls -l /home gives "Operation not
> permitted".
> 
It should work. This hints at a server issue.

Anyhow, I'll look at the dumps at some point, rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Hang in VOP_LOCK1_APV on 8-STABLE with NFS.

2011-01-10 Thread Ronald Klop
On Fri, 07 Jan 2011 20:52:57 +0100, Kostik Belousov   
wrote:



On Fri, Jan 07, 2011 at 02:37:25PM -0500, Rick Macklem wrote:

> Hi,
>
> OpenOffice hangs on NFS when I try to save a file or even when I try
> to
> open the save dialog in this case.
>
>
> $ 17:25:35 ron...@ronald [~]
> procstat -kk 85575
> PID TID COMM TDNAME KSTACK
> 85575 100322 soffice.bin initial thread mi_switch+0x176
> sleepq_wait+0x3b __lockmgr_args+0x655 vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x44 vget+0x67 vfs_hash_get+0xeb nfs_nget+0xa8
> nfs_lookup+0x65e
> VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 kern_statat_vnhook+0x82
> kern_statat+0x15 lstat+0x22 syscallenter+0x186 syscall+0x40
> 85575 100502 soffice.bin - mi_switch+0x176
> sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
> do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
> syscall+0x40
> Xfast_syscall+0xe2
> 85575 100576 soffice.bin - mi_switch+0x176
> sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
> do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
> syscall+0x40
> Xfast_syscall+0xe2
> 85575 100577 soffice.bin - mi_switch+0x176
> sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _sleep+0x25d
> kern_accept+0x19c accept+0xfe syscallenter+0x186 syscall+0x40
> Xfast_syscall+0xe2
> 85575 100578 soffice.bin - mi_switch+0x176
> sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _cv_wait_sig+0x10e
> seltdwait+0xed poll+0x457 syscallenter+0x186 syscall+0x40
> Xfast_syscall+0xe2
> 85575 100579 soffice.bin - mi_switch+0x176
> sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12
> _cv_timedwait_sig+0x11d seltdwait+0x79 poll+0x457 syscallenter+0x186
> syscall+0x40 Xfast_syscall+0xe2
>
> $ 17:25:35 ron...@ronald [~]
> uname -a
> FreeBSD ronald.office.base.nl 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE
> #6:
> Mon Dec 27 23:49:30 CET 2010
> r...@ronald.office.base.nl:/usr/obj/usr/src/sys/GENERIC amd64
>
I think all the above tells us is that the thread is waiting for
a vnode lock. The question then becomes "what is holding a lock
on that vnode and why?".

> It is not possible to exit or kill soffice.bin. I had a slighty
> different
> procstat stack before, but that was fixed a couple of days ago.

Yea, it will be in an uniterruptible sleep when waiting for a vnode  
lock.


> Any thoughts? Enabling local locks in NFS doesn't fix it.

Here's some things you could try:
1 - apply the attached patch. It fixes a known problem w.r.t. the
client side of the krpc. Not likely to fix this, but I can hope:-)

1a - Look around of other processes in the uninterruptible sleep state,
quite possible, one of them also owns the lock the openoffice is waiting
for. Also see
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Of the particular interest are the witness output and backtraces for
all threads that are reported by witness as owning the vnode locks.


2 - If #1 doesn't fix the problem:
- before making it hang, start capturing packets via:
# tcpdump -s 0 -w xxx host server
- then make it hang, kill the above and
# procstat -ka
# ps axHlww
and capture the output of both of these. Hopefully these 2 commands
will indicate what is holding the vnode lock and maybe, why. The
"xxx" file can be looked at in wireshark to see what/if any NFS
traffic is happening.
If you aren't comfortable looking at the above, you can email them
to me and I'll take a stab at them someday.
3 - Try the experimental client to see if it behaves differently. The
mount command is:
# mount -t newnfs -o nfsv3,  
server:/path /mntpath
(This might ideantify if the regular client has an infrequently  
executed code
 path that forgets to unlock the vnode, since it uses a somewhat  
different RPC
 layer. The buffer cache handling etc are almost the same, but the  
RPC stuff is

 fairly different.)

> The nfs server is an up-to-date Linux Debian 5 with kernel 2.6.26.
>
I'm afraid I can't blame Linux (at least not until we have more info;-).

> If more info is needed. I can easily reproduce this.

See above #2.

Good luck with it and let us know how it goes, rick


Hi,

I have got the first steps set up. No solution yet.
1. With the patch OpenOffice opens my homedir (yeah!), but it gives an I/O  
error when saving a file and everything hangs after that.

2. I have dumps and stuff. I will mail some links in private e-mail.
3. Didn't work. It mount, but ls -l /home gives "Operation not permitted".

I didn't see other processes in uninterruptable state. But maybe you guys  
see more than I do.


If you don't see anything in wireshark I will try WITNESS and friends  
later this week. Already 2 hours busy with this during work hours.


Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 09:57 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:52:56PM +0100, Attila Nagy wrote:
[...]

I've finally found the time to read the v28 patch and figured out the
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the
previous hit rates and lowered the load on the hard disks significantly.

Well, not storing prefetched data on L2ARC vdevs is the default is
Solaris. For some reason it was changed by kmacy@ in r205231. Not sure
why and we can't ask him now, I'm afraid. I just sent an e-mail to

What happened to him?

Brendan Gregg from Oracle who originally implemented L2ARC in ZFS why
this is turned off by default. Once I get answer we can think about
turning it on again.

I think it makes some sense as a stupid form of preferring random IO in 
the L2ARC instead of sequential. But if I rely on auto tuning and let 
prefetch enabled, even a busy mailserver will prefetch a lot of blocks 
and I think that's a fine example of random IO (also, it makes the 
system unusable, but that's another story).


Having this choice is good, and in this case enabling this makes sense 
for me. I don't know any reasons about why you wouldn't use all of your 
L2ARC space (apart from sparing the quickly wearing out flash space and 
move disk heads instead), but I'm sure Brendan made this choice with a 
good reason.

If you get an answer, please tell us. :)

Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 10:02 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read
error)
and it seems it froze the whole zpool. Removing the disk by hand solved
the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
requests on its own, this is driver's responsibility. It is still
strange that the driver didn't pass I/O error up to ZFS or it might as
well be ZFS bug, but I don't think so.

Indeed, it may to be a controller/driver bug. The newly released (last 
december) firmware says something about a similar problem. I've 
upgraded, we'll see whether it will help next time a drive goes awry.
I've only seen these errors in dmesg, not in zpool status, there 
everything was clear (all zeroes).


BTW, I've swapped those bad drives (da4, which reported the above 
errors, and da16, which didn't reported anything to the OS, it was just 
plain bad according to the controller firmware -and after its deletion, 
I could offline da4, so it seems it's the real cause, see my previous 
e-mail), and zpool replaced first da4, but after some seconds of 
thinking all IO on all disks deceased.
After waiting some minutes, it was still the same, so I've rebooted. 
Then I noticed that a scrub is going on, so I stopped it.
Then the zpool replace da4 went fine, it started to resilver the disk. 
But another zpool replace (for da16) causes the same error: some seconds 
of IO, then nothing and it stuck in that.


Has anybody tried replacing two drives simultaneously with the zfs v28 
patch? (this is a stripe of two raidz2s and da4 and da16 are in 
different raidz2)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: NFS performance

2011-01-10 Thread george+freebsd
>
> So, did the patch get rid of the 1min + stalls you reported earlier?
>
Yes.  The stalls (and the "server not responding" log messages are
gone.  Thanks!  -- George

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.2-PRERELEASE: live deadlock, almost all processes in "pfault" state

2011-01-10 Thread Ivan Voras

On 08/01/2011 20:42, Lev Serebryakov wrote:

Hello, Kostik.
You wrote 8 января 2011 г., 22:02:32:



If I am guessing right, this creature has a classic deadlock when
bio processing requires memory allocation. It seems that tid 100079
is sleeping not even due to the free page shortage, but due to address
space exhaustion. As result, read/write requests are stalled.

   I want to say, that ZFS, for example, could allocate much more
memory, and, yes, it had problems on i386 with this, but not on amd64,
AFAIK...

   So, I'm (geom_radi5) doing something wrong...


geom_raid5 (I'm assuming you're talking about the module that was 
written some time ago by an external developer) does serveral things 
wrong - that's why it wasn't included in FreeBSD. IIRC, one of those 
things is that it aggressively caches writes below the file system 
layer, which is a no-no.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.2-PRERELEASE: live deadlock, almost all processes in "pfault" state

2011-01-10 Thread Ivan Voras

On 08/01/2011 23:06, Lev Serebryakov wrote:



   I need to look how raid3 and vinum/raid5 lives with that situation.


One other standard solution is to spawn a thread and offload the job to 
that thread, instead of within GEOM start(). This is what most current 
complex GEOM classes to.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sat, Dec 18, 2010 at 10:00:11AM +0100, Krzysztof Dajka wrote:
> Hi,
> I applied patch against evening 2010-12-16 STABLE. I did what Martin asked:
> 
> On Thu, Dec 16, 2010 at 1:44 PM, Martin Matuska  wrote:
> >    # cd /usr/src
> >    # fetch
> > http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
> >    # xz -d stable-8-zfsv28-20101215.patch.xz
> >    # patch -E -p0 < stable-8-zfsv28-20101215.patch
> >    # rm sys/cddl/compat/opensolaris/sys/sysmacros.h
> >
> Patch applied cleanly.
> 
> #make buildworld
> #make buildkernel
> #make installkernel
> Reboot into single user mode.
> #mergemaster -p
> #make installworld
> #mergemaster
> Reboot.
> 
> 
> Rebooting with old world and new kernel went fine. But after reboot
> with new world I got:
> ZFS: zfs_alloc()/zfs_free() mismatch
> Just before loading kernel modules, after that my system hangs.

Could you tell me more about you pool configuration?
'zpool status' output might be helpful.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptxjJnkkXhF.pgp
Description: PGP signature


8.2-BETA1 / 8.2-RC1 ACPI and other errors in dmesg after upgrade from 7.2

2011-01-10 Thread Miroslav Lachman

Hi,

I have a few machines Sun Fire X2100 M2. I upgraded from FreeBSD 7.2 to 
8.2-BETA1 and 8.2-RC1 and now I see following errors in dmesg:



acpi0:  on motherboard
ACPI Error: Invalid type (Alias) for target of Scope operator [CPU1] 
(Cannot override) (20101013/dswload-324)
ACPI Exception: AE_AML_OPERAND_TYPE, During name lookup/catalog 
(20101013/psloop-326)

.
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, 7ff0 (3) failed
.
.
uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1


And then in /var/log/messages
pid 3802 (sshd) is using legacy pty devices - not logging anymore
pid 38796 (try), uid 0: exited on signal 10 (core dumped)


Except these messages, system and services are running fine. So this is 
just a report of something suspected.



Full dmesg:

Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RC1 #0: Wed Dec 22 17:34:20 UTC 2010
r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual-Core AMD Opteron(tm) Processor 1210 (1811.10-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x40f33  Family = f  Model = 43 
Stepping = 3


Features=0x178bfbff
  Features2=0x2001
  AMD Features=0xea500800
  AMD Features2=0x1f
real memory  = 4294967296 (4096 MB)
avail memory = 4114534400 (3923 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
ACPI Error: Invalid type (Alias) for target of Scope operator [CPU1] 
(Cannot override) (20101013/dswload-324)
ACPI Exception: AE_AML_OPERAND_TYPE, During name lookup/catalog 
(20101013/psloop-326)

acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, dff0 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0
cpu0:  on acpi0
cpu1:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.0 (no driver attached)
isab0:  at device 1.0 on pci0
isa0:  on isab0
pci0:  at device 1.1 (no driver attached)
ohci0:  mem 0xfcffb000-0xfcffbfff 
irq 21 at device 2.0 on pci0

ohci0: [ITHREAD]
usbus0:  on ohci0
ehci0:  mem 
0xfcffac00-0xfcffacff irq 22 at device 2.1 on pci0

ehci0: [ITHREAD]
usbus1: EHCI version 1.0
usbus1:  on ehci0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 4.0 on pci0

ata0:  on atapci0
ata0: [ITHREAD]
ata1:  on atapci0
ata1: [ITHREAD]
atapci1:  port 
0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc0f 
mem 0xfcff9000-0xfcff9fff irq 23 at device 5.0 on pci0

atapci1: [ITHREAD]
ata2:  on atapci1
ata2: [ITHREAD]
ata3:  on atapci1
ata3: [ITHREAD]
pcib1:  at device 6.0 on pci0
pci1:  on pcib1
vgapci0:  port 0xec00-0xec7f mem 
0xfd00-0xfd7f,0xfdee-0xfdef irq 16 at device 5.0 on pci1
nfe0:  port 0xc880-0xc887 mem 
0xfcff8000-0xfcff8fff,0xfcffa800-0xfcffa8ff,0xfcffa400-0xfcffa40f irq 20 
at device 8.0 on pci0

miibus0:  on nfe0
e1000phy0:  PHY 2 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

nfe0: Ethernet address: 00:1b:24:bd:e2:0f
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe0: [FILTER]
nfe1:  port 0xc800-0xc807 mem 
0xfcff7000-0xfcff7fff,0xfcffa000-0xfcffa0ff,0xfcff6c00-0xfcff6c0f irq 21 
at device 9.0 on pci0

miibus1:  on nfe1
e1000phy1:  PHY 3 on miibus1
e1000phy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

nfe1: Ethernet address: 00:1b:24:bd:e2:10
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
nfe1: [FILTER]
pcib2:  at device 10.0 on pci0
pci2:  on pcib2
pcib3:  at device 11.0 on pci0
pci3:  on pcib3
pcib4:  at device 12.0 on pci0
pci4:  on pcib4
pcib5:  at device 13.0 on pci0
pci5:  on pcib5
pcib6:  at device 0.0 on pci5
pci6:  on pcib6
bge0: 0x009003> mem 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at 
device 4.0 on pci6

bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X
miibus2:  on bge0
brgphy0:  PHY 1 on miibus2
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

bge0: Ethernet address: 00:1b:24:bd:e2:0d
bge0: [ITHREAD]
bge1: 0x009003> mem 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at 
device 4.1 o

Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sun, Jan 09, 2011 at 12:52:56PM +0100, Attila Nagy wrote:
[...]
> I've finally found the time to read the v28 patch and figured out the 
> problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use 
> the prefetched data on the L2ARC devices.
> This is a major hit in my case. Enabling this again restored the 
> previous hit rates and lowered the load on the hard disks significantly.

Well, not storing prefetched data on L2ARC vdevs is the default is
Solaris. For some reason it was changed by kmacy@ in r205231. Not sure
why and we can't ask him now, I'm afraid. I just sent an e-mail to
Brendan Gregg from Oracle who originally implemented L2ARC in ZFS why
this is turned off by default. Once I get answer we can think about
turning it on again.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpxIXdIFOMEK.pgp
Description: PGP signature


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
> No, it's not related. One of the disks in the RAIDZ2 pool went bad:
> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
> (da4:arcmsr0:0:4:0): SCSI status: Check Condition
> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
> error)
> and it seems it froze the whole zpool. Removing the disk by hand solved 
> the problem.
> I've seen this previously on other machines with ciss.
> I wonder why ZFS didn't throw it out of the pool.

Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
requests on its own, this is driver's responsibility. It is still
strange that the driver didn't pass I/O error up to ZFS or it might as
well be ZFS bug, but I don't think so.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp246BCVH7mU.pgp
Description: PGP signature


classes and kernel_cookie was Re: Specifying root mount options on diskless boot.

2011-01-10 Thread Daniel Braniss
...
> I note that the response to your message from "danny" offers the ability 
> to pass arguments to the nfs mount command, but also seems to offer a fix 
> for the fact that "classes" are not supported under PXE:
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90368
> 
> I hope "danny" will offer a patch to mainline code - it would be an 
> important improvement (and already promised in the documentation).
...
I'm willing to try and add the missing pieces, but I need some better 
explanantion as to what they are, for example, I have no clue what the
kernel_cookie is used for, nor what the ${class} is all about.
BTW, it would be kind if the line in the pxeboot(8):
As PXE is still in its infancy ...
can be changed :-)

"danny"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"