Re: Panic during kernel boot, igb-init related? (8.3-RELEASE)
31.10.2012 23:58, Charles Owens пишет: Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final blessed patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles Take a look at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113 that contains simple workaround in followup message not involving any patching, and the fix. Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: 9-Stable panic: resource_list_unreserve: can't find resource
-Original Message- From: Andriy Gapon [mailto:a...@freebsd.org] Sent: 31. oktober 2012 19:51 To: Tom Lislegaard Cc: 'freebsd-stable@freebsd.org' Subject: Re: 9-Stable panic: resource_list_unreserve: can't find resource on 31/10/2012 12:14 Tom Lislegaard said the following: Hi I'm running FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 16:11:35 CET 2012 tl@stingray:/usr/obj/usr/src/sys/stingray amd64 on a new Dell laptop and keep getting these panics (typically once or twice per day) (kgdb) set pagination off (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:229 #1 0x80425e64 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x8045773e in resource_list_unreserve (rl=Variable rl is not available. ) at /usr/src/sys/kern/subr_bus.c:3338 #4 0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, child=0xfe00052c1500, type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405 #5 0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, type=0xfe00052b786c, rid=0xfe00052b7978, gas=Variable gas is not available. ) at /usr/src/sys/dev/acpica/acpi.c:1450 #6 0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable res is not available. ) at /usr/src/sys/dev/acpica/acpi_package.c:120 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 #8 0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available. ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050 #9 0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283 #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, pending=Variable pending is not available. ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134 #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at /usr/src/sys/kern/subr_taskqueue.c:308 #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:497 #13 0x803f762f in fork_exit (callout=0x80469320 taskqueue_thread_loop, arg=0x80a20cc8, frame=0xff80002cdb00) at /usr/src/sys/kern/kern_fork.c:992 #14 0x806be6be in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 Could you please provide *sc from frame 7? -- Andriy Gapon (kgdb) up 7 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 782 acpi_PkgGas(sc-cpu_dev, pkg, 0, cx_ptr-res_type, sc-cpu_rid, (kgdb) print *sc $1 = {cpu_dev = 0xfe00052c1500, cpu_handle = 0xfe00052e7a80, cpu_pcpu = 0x80aa6a80, cpu_acpi_id = 1, cpu_p_blk = 1040, cpu_p_blk_len = 6, cpu_cx_states = {{p_lvlx = 0xfe0196f0e380, type = 1, trans_lat = 1, power = 1000, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type = 4}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}}, cpu_cx_count = 2, cpu_prev_sleep = 619, cpu_features = 31, cpu_non_c3 = 1, cpu_cx_stats = {390, 0, 0, 0, 0, 0, 0, 0}, cpu_sysctl_ctx = {tqh_first = 0xfe00088931a0, tqh_last = 0xfe0008893228}, cpu_sysctl_tree = 0x0, cpu_cx_lowest = 0, cpu_cx_lowest_lim = 0, c pu_cx_supported = C1/1 C2/59 C3/87, '\0' repeats 47 times, cpu_rid = 3323} -tom ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: make release fails on find
On Wed, Oct 31, 2012 at 3:12 PM, Glen Barber g...@freebsd.org wrote: On Wed, Oct 31, 2012 at 08:30:29AM +0100, Andreas Nilsson wrote: On a more whislist topic: I'd really appreciate if .zfs dirs would be excluded from the tarballs. Hmm, I didn't realize this was happening. So I can verify my change works for all environments, are you using any local zfs dataset properties, specifically unhiding the snapshot directory? Glen Yes, I have the following: tank/cvs/9.1/src snapdir visibleinherited from tank/cvs Andreas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
After destroying and re-creating the pool and then writing zeros to the disk in multiple files without filling the fs I've manged to reproduce the corruption again so we can rule out full disk as the cause. I'm now testing different senarios to try and identify the culprit, first test is removing the SSD ZIL and cache disks. Suspects: HW issues (memory, cables, MB, disks), driver issue (not used mfi on tbolt 2208 based cards before). Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
9.1 stability/robustness?
I need to build up a few servers and routers, and am wondering how FreeBSD 9.1 is shaping up. Will it be likely to be more stable and robust than 9.0-RELEASE? Are there issues that will have to wait until 9.2-RELEASE to be fixed? Opinions welcome. --Brett Glass ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-Stable panic: resource_list_unreserve: can't find resource
on 01/11/2012 11:45 Tom Lislegaard said the following: -Original Message- From: Andriy Gapon [mailto:a...@freebsd.org] Sent: 31. oktober 2012 19:51 To: Tom Lislegaard Cc: 'freebsd-stable@freebsd.org' Subject: Re: 9-Stable panic: resource_list_unreserve: can't find resource on 31/10/2012 12:14 Tom Lislegaard said the following: Hi I'm running FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 16:11:35 CET 2012 tl@stingray:/usr/obj/usr/src/sys/stingray amd64 on a new Dell laptop and keep getting these panics (typically once or twice per day) (kgdb) set pagination off (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:229 #1 0x80425e64 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x8045773e in resource_list_unreserve (rl=Variable rl is not available. ) at /usr/src/sys/kern/subr_bus.c:3338 #4 0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, child=0xfe00052c1500, type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405 #5 0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, type=0xfe00052b786c, rid=0xfe00052b7978, gas=Variable gas is not available. ) at /usr/src/sys/dev/acpica/acpi.c:1450 #6 0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable res is not available. ) at /usr/src/sys/dev/acpica/acpi_package.c:120 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 #8 0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available. ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050 #9 0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283 #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, pending=Variable pending is not available. ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134 #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at /usr/src/sys/kern/subr_taskqueue.c:308 #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:497 #13 0x803f762f in fork_exit (callout=0x80469320 taskqueue_thread_loop, arg=0x80a20cc8, frame=0xff80002cdb00) at /usr/src/sys/kern/kern_fork.c:992 #14 0x806be6be in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 Could you please provide *sc from frame 7? (kgdb) up 7 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 782 acpi_PkgGas(sc-cpu_dev, pkg, 0, cx_ptr-res_type, sc-cpu_rid, (kgdb) print *sc $1 = {cpu_dev = 0xfe00052c1500, cpu_handle = 0xfe00052e7a80, cpu_pcpu = 0x80aa6a80, cpu_acpi_id = 1, cpu_p_blk = 1040, cpu_p_blk_len = 6, cpu_cx_states = {{p_lvlx = 0xfe0196f0e380, type = 1, trans_lat = 1, power = 1000, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type = 4}, {p_lvlx = 0x0, type = 3, trans_lat = 87, power = 200, res_type = 4}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}, {p_lvlx = 0x0, type = 0, trans_lat = 0, power = 0, res_type = 0}}, cpu_cx_count = 2, cpu_prev_sleep = 619, cpu_features = 31, cpu_non_c3 = 1, cpu_cx_stats = {390, 0, 0, 0, 0, 0, 0, 0}, cpu_sysctl_ctx = {tqh_first = 0xfe00088931a0, tqh_last = 0xfe0008893228}, cpu_sysctl_tree = 0x0, cpu_cx_lowest = 0, cpu_cx_lowest_lim = 0, ! cpu_cx_s upported = C1/1 C2/59 C3/87, '\0' repeats 47 times, cpu_rid = 3323} Thank you. Did this crash occur at the time when you plugged or unplugged AC line? Do you plug and unplug the line often? Do you think that the line could have any problems like flaky contacts or some such? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
On 2012-Nov-01 13:29:34 -, Steven Hartland kill...@multiplay.co.uk wrote: After destroying and re-creating the pool and then writing zeros to the disk in multiple files without filling the fs I've manged to reproduce the corruption again so we can rule out full disk as the cause. Many years ago, I wrote a simple utility that fills a raw disk with a pseudo-random sequence and then verifies it. This sort of tool can be useful for detecting the presence of silent data corruption (or disk address wraparound). Suspects: HW issues (memory, cables, MB, disks), driver issue (not used mfi on tbolt 2208 based cards before). There has been a recent thread about various strange behaviours from LSI controllers and it has been stated that (at least for the 2008) the card firmware _must_ match the FreeBSD driver version. See http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069205.html -- Peter Jeremy pgp0iCOscX7cA.pgp Description: PGP signature
Re: mfi corrupts JBOD disks 2TB due to LBA overflow (was: ZFS corruption due to lack of space?)
Ok after revisiting all the facts and spotting that the corruption only seemed to happen after my zpool was nearly full I came up with a wild idea, could the corruption be being caused by writes after 2TB? A few command lines latter and this was confirmed writes to the 3TB disks under mfi are wrapping at 2TB!!! Steps to prove:- 1. zero out block 1 on the disk dd if=/dev/zero bs=512 count=1 of=/dev/mfisyspd0 1+0 records in 1+0 records out 512 bytes transferred in 0.000728 secs (703171 bytes/sec) 2. confirm the first block is zeros dd if=/dev/mfisyspd0 bs=512 count=1 | hexdump -C 1+0 records in 1+0 records out 512 bytes transferred in 0.000250 secs (2047172 bytes/sec) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || * 0200 3. write 1 block random after the 2TB boundary dd if=/dev/random bs=512 count=1 of=/dev/mfisyspd0 oseek=4294967296 1+0 records in 1+0 records out 512 bytes transferred in 0.000717 secs (714162 bytes/sec) 4. first block of the disk now contains random data dd if=/dev/mfisyspd0 bs=512 count=8 | hexdump -C 9c d1 d2 1d 9f 2c fc 30 ab 09 7a f7 64 16 2a 58 |.,.0..z.d.*X| 0010 18 27 9d 1f ae 4d 27 53 1a 50 e7 c1 b1 3a 9b e4 |.'...M'S.P...:..| 0020 c3 7c d0 25 83 e2 bd 85 33 f2 33 8e 71 55 70 7c |.|.%3.3.qUp|| 0030 8c 15 af 55 f6 88 8d 6e 40 1c f3 1a 5c e7 80 4b |...U...n@...\..K| ... Looking at the driver code the problem is that IO on syspd disks aka JBOD is always done using 10 byte CDB commands in mfi_build_syspdio. This is clearly a serious problem as it results in total corruption on disks 2^32 sectors when sectors above 2^32 are accessed. The fix doesn't seem too hard and I think I've already got a basic version working, just needs more testing need. The bug also effects kernel mfi_dump_blocks but thats less likely to trigger due to how its used. Will create PR when I've finished testing and am happy with the patch, but wanted to let others know in the mean time given how serious the bug is. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
FreeBSD 9.1 stability/robustness?
I need to build up a few servers and routers, and am wondering how FreeBSD 9.1 is shaping up. Will it be likely to be more stable and robust than 9.0-RELEASE? Are there issues that will have to wait until 9.2-RELEASE to be fixed? Opinions welcome. --Brett Glass ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1 stability/robustness?
On 1 November 2012, at 19:14, Brett Glass wrote: I need to build up a few servers and routers, and am wondering how FreeBSD 9.1 is shaping up. Will it be likely to be more stable and robust than 9.0-RELEASE? It appears to be for me. I had problems with 9.0 not reading CDs and rebooting with no error messages frequently. I have upgraded to 9.1-RC2 and it now reads CDs just fine, and has not rebooted. However, the uptimes with 9.0 ranged from about 2 hours to 30 days. I have only had 9.1-RC2 running for a couple weeks so have not declared victory yet. I has been running for more than most of the uptimes already. Are there issues that will have to wait until 9.2-RELEASE to be fixed? Opinions welcome. I have no information on this. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org