Re: Panic during kernel boot, igb-init related? (8.3-RELEASE)

2012-11-01 Thread Eugene Grosbein
31.10.2012 23:58, Charles Owens пишет:
 Hello,
 
 We're seeing boot-time panics in about 4% of cases when upgrading from 
 FreeBSD 8.1 to 8.3-RELEASE (i386).  This problem is subtle enough that 
 it escaped detection during our regular testing cycle... now with over 
 100 systems upgraded we're convinced there's a real issue.  Our kernel 
 config is essentially PAE (ie. static modules ... with a few drivers 
 added/removed).  The hardware is Intel Server System SR1625UR.
 
 This appears to match a finding discussed in these threads, having to do 
 with timing of initialization of the igb(4)-based NICs (if I'm 
 understanding it properly):
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html
 http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html
 http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html
 http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html
 
 
 These threads include some potential patches and possibility of 
 commit/MFC... but it isn't clear that there was ever final resolution 
 (and MFC to 8-stable).  I've cc'd a few folks from back then.
 
 A real challenge here is the frequency of occurrence. As mentioned, it 
 only hit's a fraction of our systems.  When it _does_ hit, the system 
 may enter a reboot loop for days and then mysteriously break out of 
 it... and thereafter seem to work fine.
 
 I'd be very grateful for any help.  Some questions:
 
   * Was there ever a final blessed patch?
   o if so, will it apply to RELENG_8_3?
   * Is there anything that could be said that might help us with
 reproducing-the-problem / testing / validating-a-fix?
 
 
 Panic message is --
 
 panic: m_getzone: m_getjcl: invalid cluster type
 cpuid = 0
 KDB: stack backtrace:
 #0 0xc059c717 at kdb_backtrace+0x47
 #1 0xc056caf7 at panic+0x117
 #2 0xc03c979e at igb_refresh_mbufs+0x25e
 #3 0xc03c9f98 at igb_rxeof+0x638
 #4 0xc03ca135 at igb_msix_que+0x105
 #5 0xc0541e2b at intr_event_execute_handlers+0x13b
 #6 0xc05434eb at ithread_loop+0x6b
 #7 0xc053efb7 at fork_exit+0x97
 #8 0xc0806744 at fork_trampoline+0x8
 
 Thanks very much,
 
 Charles

Take a look at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113 that 
contains
simple workaround in followup message not involving any patching, and the fix.

Eugene Grosbein



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

RE: 9-Stable panic: resource_list_unreserve: can't find resource

2012-11-01 Thread Tom Lislegaard


 -Original Message-
 From: Andriy Gapon [mailto:a...@freebsd.org]
 Sent: 31. oktober 2012 19:51
 To: Tom Lislegaard
 Cc: 'freebsd-stable@freebsd.org'
 Subject: Re: 9-Stable panic: resource_list_unreserve: can't find resource
 
 on 31/10/2012 12:14 Tom Lislegaard said the following:
  Hi
 
  I'm running
  FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 
  16:11:35 CET 2012
 tl@stingray:/usr/obj/usr/src/sys/stingray  amd64
  on a new Dell laptop and keep getting these panics (typically once or twice 
  per day)
 
  (kgdb) set pagination off
  (kgdb) bt
  #0  doadump (textdump=Variable textdump is not available.
  ) at pcpu.h:229
  #1  0x80425e64 in kern_reboot (howto=260) at 
  /usr/src/sys/kern/kern_shutdown.c:448
  #2  0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at
 /usr/src/sys/kern/kern_shutdown.c:636
  #3  0x8045773e in resource_list_unreserve (rl=Variable rl is not 
  available.
  ) at /usr/src/sys/kern/subr_bus.c:3338
  #4  0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, 
  child=0xfe00052c1500,
 type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405
  #5  0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, 
  type=0xfe00052b786c,
 rid=0xfe00052b7978, gas=Variable gas is not available.
  ) at /usr/src/sys/dev/acpica/acpi.c:1450
  #6  0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable 
  res is not available.
  ) at /usr/src/sys/dev/acpica/acpi_package.c:120
  #7  0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at
 /usr/src/sys/dev/acpica/acpi_cpu.c:782
  #8  0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available.
  ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050
  #9  0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at
 /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283
  #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, 
  pending=Variable pending
 is not available.
  ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134
  #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at
 /usr/src/sys/kern/subr_taskqueue.c:308
  #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not 
  available.
  ) at /usr/src/sys/kern/subr_taskqueue.c:497
  #13 0x803f762f in fork_exit (callout=0x80469320 
  taskqueue_thread_loop,
 arg=0x80a20cc8, frame=0xff80002cdb00) at 
 /usr/src/sys/kern/kern_fork.c:992
  #14 0x806be6be in fork_trampoline () at 
  /usr/src/sys/amd64/amd64/exception.S:602
 
 Could you please provide *sc from frame 7?
 
 --
 Andriy Gapon

(kgdb) up 7
#7  0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at 
/usr/src/sys/dev/acpica/acpi_cpu.c:782
782 acpi_PkgGas(sc-cpu_dev, pkg, 0, cx_ptr-res_type, 
sc-cpu_rid,
(kgdb) print *sc
$1 = {cpu_dev = 0xfe00052c1500, cpu_handle = 0xfe00052e7a80, cpu_pcpu = 
0x80aa6a80, cpu_acpi_id = 1, cpu_p_blk = 1040, cpu_p_blk_len = 6, 
cpu_cx_states = {{p_lvlx = 0xfe0196f0e380, type = 1, trans_lat = 1, power = 
1000, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, 
res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type = 
4}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 
0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 
0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat 
= 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 
0, res_type = 0}}, cpu_cx_count = 2, cpu_prev_sleep = 619, cpu_features = 31, 
cpu_non_c3 = 1, cpu_cx_stats = {390, 0, 0, 0, 0, 0, 0, 0}, cpu_sysctl_ctx = 
{tqh_first = 0xfe00088931a0, tqh_last = 0xfe0008893228}, 
cpu_sysctl_tree = 0x0, cpu_cx_lowest = 0, cpu_cx_lowest_lim = 0, c
 pu_cx_supported = C1/1 C2/59 C3/87, '\0' repeats 47 times, cpu_rid = 3323}

-tom

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: make release fails on find

2012-11-01 Thread Andreas Nilsson
On Wed, Oct 31, 2012 at 3:12 PM, Glen Barber g...@freebsd.org wrote:

 On Wed, Oct 31, 2012 at 08:30:29AM +0100, Andreas Nilsson wrote:
  On a more whislist topic: I'd really appreciate if  .zfs dirs would be
  excluded from  the tarballs.
 

 Hmm, I didn't realize this was happening.

 So I can verify my change works for all environments, are you using any
 local zfs dataset properties, specifically unhiding the snapshot
 directory?

 Glen

 Yes, I have the following:
 tank/cvs/9.1/src  snapdir   visibleinherited
from tank/cvs

Andreas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS corruption due to lack of space?

2012-11-01 Thread Steven Hartland

After destroying and re-creating the pool and then writing
zeros to the disk in multiple files without filling the fs
I've manged to reproduce the corruption again so we can
rule out full disk as the cause.

I'm now testing different senarios to try and identify the
culprit, first test is removing the SSD ZIL and cache disks.

Suspects: HW issues (memory, cables, MB, disks), driver issue
(not used mfi on tbolt 2208 based cards before).

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


9.1 stability/robustness?

2012-11-01 Thread Brett Glass
I need to build up a few servers and routers, and am wondering how
FreeBSD 9.1 is shaping up. Will it be likely to be more stable and
robust than 9.0-RELEASE? Are there issues that will have to wait
until 9.2-RELEASE to be fixed? Opinions welcome.

--Brett Glass
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 9-Stable panic: resource_list_unreserve: can't find resource

2012-11-01 Thread Andriy Gapon
on 01/11/2012 11:45 Tom Lislegaard said the following:
 
 
 -Original Message-
 From: Andriy Gapon [mailto:a...@freebsd.org]
 Sent: 31. oktober 2012 19:51
 To: Tom Lislegaard
 Cc: 'freebsd-stable@freebsd.org'
 Subject: Re: 9-Stable panic: resource_list_unreserve: can't find resource

 on 31/10/2012 12:14 Tom Lislegaard said the following:
 Hi

 I'm running
 FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 
 16:11:35 CET 2012
 tl@stingray:/usr/obj/usr/src/sys/stingray  amd64
 on a new Dell laptop and keep getting these panics (typically once or twice 
 per day)

 (kgdb) set pagination off
 (kgdb) bt
 #0  doadump (textdump=Variable textdump is not available.
 ) at pcpu.h:229
 #1  0x80425e64 in kern_reboot (howto=260) at 
 /usr/src/sys/kern/kern_shutdown.c:448
 #2  0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at
 /usr/src/sys/kern/kern_shutdown.c:636
 #3  0x8045773e in resource_list_unreserve (rl=Variable rl is not 
 available.
 ) at /usr/src/sys/kern/subr_bus.c:3338
 #4  0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, 
 child=0xfe00052c1500,
 type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405
 #5  0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, 
 type=0xfe00052b786c,
 rid=0xfe00052b7978, gas=Variable gas is not available.
 ) at /usr/src/sys/dev/acpica/acpi.c:1450
 #6  0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable 
 res is not available.
 ) at /usr/src/sys/dev/acpica/acpi_package.c:120
 #7  0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at
 /usr/src/sys/dev/acpica/acpi_cpu.c:782
 #8  0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available.
 ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050
 #9  0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at
 /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283
 #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, 
 pending=Variable pending
 is not available.
 ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134
 #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at
 /usr/src/sys/kern/subr_taskqueue.c:308
 #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not 
 available.
 ) at /usr/src/sys/kern/subr_taskqueue.c:497
 #13 0x803f762f in fork_exit (callout=0x80469320 
 taskqueue_thread_loop,
 arg=0x80a20cc8, frame=0xff80002cdb00) at 
 /usr/src/sys/kern/kern_fork.c:992
 #14 0x806be6be in fork_trampoline () at 
 /usr/src/sys/amd64/amd64/exception.S:602

 Could you please provide *sc from frame 7?
 
 (kgdb) up 7
 #7  0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at 
 /usr/src/sys/dev/acpica/acpi_cpu.c:782
 782   acpi_PkgGas(sc-cpu_dev, pkg, 0, cx_ptr-res_type, 
 sc-cpu_rid,
 (kgdb) print *sc
 $1 = {cpu_dev = 0xfe00052c1500, cpu_handle = 0xfe00052e7a80, cpu_pcpu 
 = 0x80aa6a80, cpu_acpi_id = 1, cpu_p_blk = 1040, cpu_p_blk_len = 6, 
 cpu_cx_states = {{p_lvlx = 0xfe0196f0e380, type = 1, trans_lat = 1, power 
 = 1000, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, 
 res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type 
 = 4}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, 
 {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 
 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 
 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, 
 trans_lat = 0, power = 0, res_type = 0}}, cpu_cx_count = 2, cpu_prev_sleep = 
 619, cpu_features = 31, cpu_non_c3 = 1, cpu_cx_stats = {390, 0, 0, 0, 0, 0, 
 0, 0}, cpu_sysctl_ctx = {tqh_first = 0xfe00088931a0, tqh_last = 
 0xfe0008893228}, cpu_sysctl_tree = 0x0, cpu_cx_lowest = 0, 
 cpu_cx_lowest_lim = 0,
  !
cpu_cx_s
upported = C1/1 C2/59 C3/87, '\0' repeats 47 times, cpu_rid = 3323}

Thank you.
Did this crash occur at the time when you plugged or unplugged AC line?
Do you plug and unplug the line often?
Do you think that the line could have any problems like flaky contacts or some 
such?


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS corruption due to lack of space?

2012-11-01 Thread Peter Jeremy
On 2012-Nov-01 13:29:34 -, Steven Hartland kill...@multiplay.co.uk wrote:
After destroying and re-creating the pool and then writing
zeros to the disk in multiple files without filling the fs
I've manged to reproduce the corruption again so we can
rule out full disk as the cause.

Many years ago, I wrote a simple utility that fills a raw disk with
a pseudo-random sequence and then verifies it.  This sort of tool
can be useful for detecting the presence of silent data corruption
(or disk address wraparound).

Suspects: HW issues (memory, cables, MB, disks), driver issue
(not used mfi on tbolt 2208 based cards before).

There has been a recent thread about various strange behaviours from
LSI controllers and it has been stated that (at least for the 2008)
the card firmware _must_ match the FreeBSD driver version.  See
http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069205.html

-- 
Peter Jeremy


pgp0iCOscX7cA.pgp
Description: PGP signature


Re: mfi corrupts JBOD disks 2TB due to LBA overflow (was: ZFS corruption due to lack of space?)

2012-11-01 Thread Steven Hartland

Ok after revisiting all the facts and spotting that
the corruption only seemed to happen after my zpool
was nearly full I came up with a wild idea, could
the corruption be being caused by writes after 2TB?

A few command lines latter and this was confirmed
writes to the 3TB disks under mfi are wrapping at
2TB!!!

Steps to prove:-
1. zero out block 1 on the disk
dd if=/dev/zero bs=512 count=1 of=/dev/mfisyspd0
1+0 records in
1+0 records out
512 bytes transferred in 0.000728 secs (703171 bytes/sec)

2. confirm the first block is zeros
dd if=/dev/mfisyspd0 bs=512 count=1 | hexdump -C
1+0 records in
1+0 records out
512 bytes transferred in 0.000250 secs (2047172 bytes/sec)
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0200

3. write 1 block random after the 2TB boundary
dd if=/dev/random bs=512 count=1 of=/dev/mfisyspd0 oseek=4294967296
1+0 records in
1+0 records out
512 bytes transferred in 0.000717 secs (714162 bytes/sec)

4. first block of the disk now contains random data
dd if=/dev/mfisyspd0 bs=512 count=8 | hexdump -C
  9c d1 d2 1d 9f 2c fc 30  ab 09 7a f7 64 16 2a 58  |.,.0..z.d.*X|
0010  18 27 9d 1f ae 4d 27 53  1a 50 e7 c1 b1 3a 9b e4  |.'...M'S.P...:..|
0020  c3 7c d0 25 83 e2 bd 85  33 f2 33 8e 71 55 70 7c  |.|.%3.3.qUp||
0030  8c 15 af 55 f6 88 8d 6e  40 1c f3 1a 5c e7 80 4b  |...U...n@...\..K|
...

Looking at the driver code the problem is that IO on syspd
disks aka JBOD is always done using 10 byte CDB commands
in mfi_build_syspdio. This is clearly a serious problem as
it results in total corruption on disks  2^32 sectors
when sectors above 2^32 are accessed.

The fix doesn't seem too hard and I think I've already
got a basic version working, just needs more testing need.

The bug also effects kernel mfi_dump_blocks but thats
less likely to trigger due to how its used.

Will create PR when I've finished testing and am happy
with the patch, but wanted to let others know in the
mean time given how serious the bug is.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


FreeBSD 9.1 stability/robustness?

2012-11-01 Thread Brett Glass
I need to build up a few servers and routers, and am wondering how
FreeBSD 9.1 is shaping up. Will it be likely to be more stable and
robust than 9.0-RELEASE? Are there issues that will have to wait
until 9.2-RELEASE to be fixed? Opinions welcome.

--Brett Glass
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1 stability/robustness?

2012-11-01 Thread Doug Hardie

On 1 November 2012, at 19:14, Brett Glass wrote:

 I need to build up a few servers and routers, and am wondering how
 FreeBSD 9.1 is shaping up. Will it be likely to be more stable and
 robust than 9.0-RELEASE?

It appears to be for me.  I had problems with 9.0 not reading CDs and rebooting 
with no error messages frequently.  I have upgraded to 9.1-RC2 and it now reads 
CDs just fine, and has not rebooted.  However, the uptimes with 9.0 ranged from 
about 2 hours to 30 days.  I have only had 9.1-RC2 running for a couple weeks 
so have not declared victory yet.  I has been running for more than most of the 
uptimes already.


 Are there issues that will have to wait
 until 9.2-RELEASE to be fixed? Opinions welcome.

I have no information on this.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org