Re: Amd64 Unstable Areca
Phillip N. wrote: Im getting the following on Releng_6 from Apr 4. Is this related to the areca driver? thanks. It's not directly coming from the areca driver. It could be that there is some memory or disk corruption that is triggering these panics, but that's just a wild guess. Scott ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
Im getting the following on Releng_6 from Apr 4. Is this related to the areca driver? thanks. [EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x38 fault code = supervisor write data, page not present instruction pointer = 0x8:0x8046c33f stack pointer = 0x10:0xb776c790 frame pointer = 0x10:0xa0cf1358 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1793 (bmon) trap number = 12 panic: page fault cpuid = 0 Uptime: 1d19h58m47s Dumping 2047 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xa2 fault code = supervisor write data, page not present instruction pointer = 0x8:0x805c67a0 stack pointer = 0x10:0xb499d970 frame pointer = 0x10:0xff0005e301f0 code segment= base 0x0, limit 0xf, type 0x1b 1551 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 40 (syncer) trap number = 12 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x804156d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80415d71 in panic (fmt=0xff0072a78980 "X�Kl") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0x8063a0cf in trap_fatal (frame=0xff0072a78980, eva=18446742976014820184) at /usr/src/sys/amd64/amd64/trap.c:668 #5 0x8063a44c in trap_pfault (frame=0xb776c6e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:580 #6 0x8063a703 in trap (frame= {tf_rdi = -1597041832, tf_rsi = -1097588045440, tf_rdx = -1097405784824, tf_rcx = 4, tf_r8 = -1098265798384, tf_r9 = -1098676372624, tf_rax = 4, tf_rbx = 4, tf_rbp = -1597041832, tf_r10 = -1097492439552, tf_r11 = -1097588045440, tf_r12 = 0, tf_r13 = -1597041832, tf_r14 = -1098676373072, tf_r15 = -1597041832, tf_trapno = 12, tf_addr = 56, tf_flags = -2142769062, tf_err = 2, tf_rip = -2142846145, tf_cs = 8, tf_rflags = 66050, tf_rsp = -1216952416, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:353 #7 0x80622dab in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0x8046c33f in vfs_setdirty (bp=0xa0cf1358) at atomic.h:139 #9 0x80470773 in bdwrite (bp=0xa0cf1358) at /usr/src/sys/kern/vfs_bio.c:963 ---Type to continue, or q to quit--- #10 0x805a3514 in ffs_write (ap=0xb776ca30) at /usr/src/sys/ufs/ffs/ffs_vnops.c:772 #11 0x80691c4b in VOP_WRITE_APV (vop=0x808e5900, a=0xb776ca30) at vnode_if.c:698 #12 0x8049137a in vn_write (fp=0xff004b7a7708, uio=0xb776cb50, active_cred=0xff007d849d08, flags=0, td=0xff0072a78980) at vnode_if.h:372 #13 0x80440d67 in dofilewrite (td=0xff0072a78980, fd=3, fp=0xff004b7a7708, auio=0xb776cb50, offset=-1098265798384, flags=0) at file.h:253 #14 0x804410d0 in kern_writev (td=0xff0072a78980, fd=3, auio=0xb776cb50) at /usr/src/sys/kern/sys_generic.c:402 #15 0x804411c8 in write (td=0xa0cf1358, uap=0xff0072a78980) at /usr/src/sys/kern/sys_generic.c:326 #16 0x8063af81 in syscall (frame= {tf_rdi = 3, tf_rsi = 5382144, tf_rdx = 4096, tf_rcx
Re: Amd64 Unstable Areca
El vie, 30-03-2007 a las 17:09 -0500, Nikolas Britton escribió: > Have you tried making it crash? > Well, i have the same process running in cron as i did before, and it not crashing.. raidtest seem to run just fine[1] Maybe you could recomend me a better stress case.. [1] raid5 device: Read 5 requests from raidtest.data. Number of READ requests: 24900. Number of WRITE requests: 25100. Number of bytes to transmit: 3287726080. Number of processes: 10. Bytes per second: 9979088 Requests per second: 151 thanks! -- KillFill <[EMAIL PROTECTED]> ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
Nikolas Britton wrote: On 3/30/07, Scott Long <[EMAIL PROTECTED]> wrote: Nikolas Britton wrote: > On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote: >> El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: >> > Please try the following patch against the latest 6-STABLE driver >> > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. >> > >> > Scott >> >> Just in case you mind, the problem does not seem to be present with the >> patch. >> >> Havent had crash in days. >> >> (When one occur, ill notify) >> >> thanks!! >> > > Have you tried making it crash? Erich Chen pointed a problem with the patch I generated, but I think it's mostly harmless. Good to know that it seems to be helping the problem. Scott And??? don't leave us hanging, what's the problem? My patch added a line with "bus_dmamap_unload(...)" that isn't needed. I don't believe that it'll cause problems, though. The whole line will be removed when I commit the patch to CVS. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
On 3/30/07, Scott Long <[EMAIL PROTECTED]> wrote: Nikolas Britton wrote: > On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote: >> El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: >> > Please try the following patch against the latest 6-STABLE driver >> > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. >> > >> > Scott >> >> Just in case you mind, the problem does not seem to be present with the >> patch. >> >> Havent had crash in days. >> >> (When one occur, ill notify) >> >> thanks!! >> > > Have you tried making it crash? Erich Chen pointed a problem with the patch I generated, but I think it's mostly harmless. Good to know that it seems to be helping the problem. Scott And??? don't leave us hanging, what's the problem? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
Nikolas Britton wrote: On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote: El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: > Please try the following patch against the latest 6-STABLE driver > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. > > Scott Just in case you mind, the problem does not seem to be present with the patch. Havent had crash in days. (When one occur, ill notify) thanks!! Have you tried making it crash? Erich Chen pointed a problem with the patch I generated, but I think it's mostly harmless. Good to know that it seems to be helping the problem. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
On 3/30/07, Phillip Neumann <[EMAIL PROTECTED]> wrote: El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: > Please try the following patch against the latest 6-STABLE driver > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. > > Scott Just in case you mind, the problem does not seem to be present with the patch. Havent had crash in days. (When one occur, ill notify) thanks!! Have you tried making it crash? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: > Please try the following patch against the latest 6-STABLE driver > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. > > Scott Just in case you mind, the problem does not seem to be present with the patch. Havent had crash in days. (When one occur, ill notify) thanks!! -- Phillip Neumann <[EMAIL PROTECTED]> ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
El dom, 25-03-2007 a las 10:11 -0600, Scott Long escribió: > Nikolas Britton wrote: > > Yeah are hardware is nearly identical. I don't remember what I did to > > my custom driver, I know I fixed some syntax errors and merged in > > changes that were made on top of the 1.20.00.02 code base. I'm not > > sure but I think most of those changes were thrown out with the import > > of 1.20.00.13 and 14. > > > > Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any > > g_vfs errors, I do see a crap load of httpd errors that someone needs > > to investigate, lucky me. :-/ It's not likely I'd see them anyhow > > because the business slows way down during this time of year: > > Please try the following patch against the latest 6-STABLE driver > sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. > > Scott Scott unfortunatly, yesterday the box crashed again, with your patches dont know if the problem is FS related tho. I still didnt do what Jan sudgested, i.e. reformat the partitions. i may do this tonight. thanks! Unread portion of the kernel message buffer: <6>pid 94219 (sh), uid 0: exited on signal 11 <6>pid 94241 (sh), uid 0: exited on signal 11 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x38 fault code = supervisor write data, page not present instruction pointer = 0x8:0x805bda49 stack pointer = 0x10:0xb7aab9a0 frame pointer = 0x10:0x4 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 94245 (sh) trap number = 12 panic: page fault cpuid = 0 Uptime: 5h19m40s Dumping 2047 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x80409447 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80409ae1 in panic (fmt=0xff003b32e260 "�\226�A") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0x8062b17f in trap_fatal (frame=0xff003b32e260, eva=18446742975299032752) at /usr/src/sys/amd64/amd64/trap.c:668 #5 0x8062b4fc in trap_pfault (frame=0xb7aab8f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:580 #6 0x8062b7b3 in trap (frame= {tf_rdi = -1097399663144, tf_rsi = 130, tf_rdx = -2140101835, tf_rcx = -1099185419872, tf_r8 = -2137937120, tf_r9 = -1098518437280, tf_rax = -1098518437280, tf_rbx = 0, tf_rbp = 4, tf_r10 = -2137799976, tf_r11 = -1098518437280, tf_r12 = -1097399663144, tf_r13 = -2140101835, tf_r14 = -1213547824, tf_r15 = -1097492989984, tf_trapno = 12, tf_addr = 56, tf_flags = -1098518437280, tf_err = 2, tf_rip = -2141463991, tf_cs = 8, tf_rflags = 66050, tf_rsp = -1213548112, tf_ss = 0}) at /usr/src/sys/amd64/amd64/trap.c:353 #7 0x806167eb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #8 0x805bda49 in vm_page_sleep_if_busy (m=0xff007de205d8, also_m_busy=130, msg=0x8070a335 "vmpfw") at atomic.h:139 #9 0x805b09e7 in vm_fault (map=0xff0013532440, vaddr=140737488338944, fault_type=2 '\002', fault_flags=8) at /usr/src/sys/vm/vm_fault.c:397 #10 0x8062b3b7 in trap_pfault (frame=0xb7aabc40, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:557 #11 0x8062b943 in trap (frame= {tf_rdi = 5358048, tf_rsi = 0, tf_rdx = 1, tf_rcx = 34369249980, tf_r8 = -1098728259232, tf_r9 = 140737488343064, tf_rax = 0, tf_rbx = 140737488343408, tf_rbp = 5423104, tf_r10 = -2035732464, tf_r11 = 0, tf_r12 = 0, tf_r13 = 2, tf_r14 = 1, tf_r15 = 5399624, tf_trapno = 12, tf_addr = 140737488343016, tf_flags = 12, tf_err = 7, tf_rip = 4254692, tf_cs = 43, tf_rflags = 66118, tf_rsp = 140737488343024, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:283 ---Type to continue, or q to quit--- #12 0x806167eb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #13 0x0040ebe4 in ?? () Previous frame inner to this frame (corrupt stack?) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/f
Re: Amd64 Unstable Areca
On 3/25/07, Scott Long <[EMAIL PROTECTED]> wrote: Nikolas Britton wrote: > Yeah are hardware is nearly identical. I don't remember what I did to > my custom driver, I know I fixed some syntax errors and merged in > changes that were made on top of the 1.20.00.02 code base. I'm not > sure but I think most of those changes were thrown out with the import > of 1.20.00.13 and 14. > > Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any > g_vfs errors, I do see a crap load of httpd errors that someone needs > to investigate, lucky me. :-/ It's not likely I'd see them anyhow > because the business slows way down during this time of year: Please try the following patch against the latest 6-STABLE driver sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. Thanks Scott... Unfortunately, or fortunately depending on your outlook, spring break ends today and I've got a crap load of studying to do, so no time to play guinea pig... But because I'm such a nice guy I've compiled i386 and amd64 binaries so the others can play along. I compiled everything against 6.2-RELEASE-p2... Download them here: http://www.nbritton.org/uploads/areca/fb62/ Do you have any testing scripts so the group can do repeatable comparison tests using the old and new kernel modules? Thanks. Oh and here's a file list but gmail is probably going to mutilate it: arcmsr.c.1824: ASCII C program text arcmsr.h.1142: ISO-8859 C program text arcmsr.kld.fb62.i386.032607: ELF 32-bit LSB relocatable, Intel 80386, version 1 (FreeBSD), not stripped arcmsr.ko.debug.fb62.amd64.032607: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (FreeBSD), not stripped arcmsr.ko.fb62.amd64.032607: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (FreeBSD), not stripped arcmsr.ko.fb62.i386.032607:ELF 32-bit LSB shared object, Intel 80386, version 1 (FreeBSD), not stripped arcmsr.o.fb62.amd64.032607:ELF 64-bit LSB relocatable, AMD x86-64, version 1 (FreeBSD), not stripped arcmsr.o.fb62.i386.032607: ELF 32-bit LSB relocatable, Intel 80386, version 1 (FreeBSD), not stripped i386_build_log.txt:ASCII text, with very long lines ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
Nikolas Britton wrote: Yeah are hardware is nearly identical. I don't remember what I did to my custom driver, I know I fixed some syntax errors and merged in changes that were made on top of the 1.20.00.02 code base. I'm not sure but I think most of those changes were thrown out with the import of 1.20.00.13 and 14. Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any g_vfs errors, I do see a crap load of httpd errors that someone needs to investigate, lucky me. :-/ It's not likely I'd see them anyhow because the business slows way down during this time of year: Please try the following patch against the latest 6-STABLE driver sources: http://people.freebsd.org/~scottl/arcmsr.simq.diff. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote: Nikolas Britton wrote: > On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Nikolas Britton wrote: > > > If that doesn't work move back down to 1.20.00.12: > > > http://www.nbritton.org/uploads/areca/ > > > > I could consistently make 1.20.00.12 corrupt data. If you > are going to go > > back, that's probably a bad choice. 1.20.00.02 didn't seem to have > > corruption problems. > > > > The 1.20.00.12 driver I pointed to was a custom hack I did for my > servers, It worked fine for the 7 months I was using it... I'm > assuming we're talking about I/O load, the servers rarely see high cpu > loads... the hardware: > > http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm > > arcmsr0: ARECA RAID ADAPTER0: Driver Version 1.20.00.13 2006-8-18 > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.41 2006-5-24 > pass1 at arcmsr0 bus 0 target 16 lun 0 > pass1: Fixed Processor SCSI-0 device > da0 at arcmsr0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-3 device > da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged > Queueing Enabled > da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C) > > I just upgraded them for the DST change. :-/ Interesting. How did you modify the 1.20.00.12 driver from Areca? Is the 1.20.00.13 driver mentioned above the one from 6.2-RELEASE? If so, do you ever see g_vfs_done errors on this machine when it is under heavy I/O load? From the machine I used to test the corruption issue (currently running 6-STABLE): arcmsr0: mem 0xc850-0xc8500fff,0xc8c0-0xc8ff irq 16 at device 14.0 on pci10 ARECA RAID ADAPTER0: Driver Version 1.20.00.14 2007-2-05 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.42 2006-10-13 pass4 at arcmsr0 bus 0 target 16 lun 0 pass4: Fixed Processor SCSI-0 device This is an ARC-1220 in a Supermicro X7DB8 based machine. Yeah are hardware is nearly identical. I don't remember what I did to my custom driver, I know I fixed some syntax errors and merged in changes that were made on top of the 1.20.00.02 code base. I'm not sure but I think most of those changes were thrown out with the import of 1.20.00.13 and 14. Yes ..0.13 is the driver from 6.2-RELEASE-p2 and no I don't see any g_vfs errors, I do see a crap load of httpd errors that someone needs to investigate, lucky me. :-/ It's not likely I'd see them anyhow because the business slows way down during this time of year: uptime: 11:53PM up 3 days, 8:56, 1 user, load averages: 0.00, 0.00, 0.00 I remember seeing that g_vfs error one time when one of the sata cables came loose. All hell broke loose and FreeBSD had a major brain fart, I've had several other sata cable 'incidents' but that's the only time FreeBSD croaked (with an areca card). The cable incidents happen with all are sata raid gear and it's not specific to areca products. I've come to the conclusion that the sata cables simply vibrate loose. I'm debating at the moment if we should move to sas multi-lane backplanes. The biggest problem I forsee is if one of the multi-lane cables comes loose, if that happened the array is toast .. anyhow... I simply rebooted the server, rebuilt the array, verified the array, and ran fsck. The file system corruption was typical of a power failure during disk write. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Amd64 Unstable Areca
Nikolas Britton wrote: > On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Nikolas Britton wrote: > > > If that doesn't work move back down to 1.20.00.12: > > > http://www.nbritton.org/uploads/areca/ > > > > I could consistently make 1.20.00.12 corrupt data. If you > are going to go > > back, that's probably a bad choice. 1.20.00.02 didn't seem to have > > corruption problems. > > > > The 1.20.00.12 driver I pointed to was a custom hack I did for my > servers, It worked fine for the 7 months I was using it... I'm > assuming we're talking about I/O load, the servers rarely see high cpu > loads... the hardware: > > http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm > > arcmsr0: ARECA RAID ADAPTER0: Driver Version 1.20.00.13 2006-8-18 > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.41 2006-5-24 > pass1 at arcmsr0 bus 0 target 16 lun 0 > pass1: Fixed Processor SCSI-0 device > da0 at arcmsr0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-3 device > da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged > Queueing Enabled > da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C) > > I just upgraded them for the DST change. :-/ Interesting. How did you modify the 1.20.00.12 driver from Areca? Is the 1.20.00.13 driver mentioned above the one from 6.2-RELEASE? If so, do you ever see g_vfs_done errors on this machine when it is under heavy I/O load? >From the machine I used to test the corruption issue (currently running 6-STABLE): arcmsr0: mem 0xc850-0xc8500fff,0xc8c0-0xc8ff irq 16 at device 14.0 on pci10 ARECA RAID ADAPTER0: Driver Version 1.20.00.14 2007-2-05 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.42 2006-10-13 pass4 at arcmsr0 bus 0 target 16 lun 0 pass4: Fixed Processor SCSI-0 device This is an ARC-1220 in a Supermicro X7DB8 based machine. Jan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
On 3/24/07, Jan Mikkelsen <[EMAIL PROTECTED]> wrote: Hi, Nikolas Britton wrote: > If that doesn't work move back down to 1.20.00.12: > http://www.nbritton.org/uploads/areca/ I could consistently make 1.20.00.12 corrupt data. If you are going to go back, that's probably a bad choice. 1.20.00.02 didn't seem to have corruption problems. The 1.20.00.12 driver I pointed to was a custom hack I did for my servers, It worked fine for the 7 months I was using it... I'm assuming we're talking about I/O load, the servers rarely see high cpu loads... the hardware: http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE.cfm arcmsr0: Fixed Processor SCSI-0 device da0 at arcmsr0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit), Tagged Queueing Enabled da0: 1430511MB (2929687040 512 byte sectors: 255H 63S/T 182364C) I just upgraded them for the DST change. :-/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Amd64 Unstable Areca
KillFill wrote: > Are you sudgesting to newfs FS's? Yes, using the 1.20.00.14 driver (ie: from 6-STABLE). I don't know whether the corruption is coming from the current driver, or from blocks the previous driver wrote. Doing a newfs will help if the corrupt blocks came from the previous driver. Regards, Jan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Amd64 Unstable Areca
Hi, Nikolas Britton wrote: > If that doesn't work move back down to 1.20.00.12: > http://www.nbritton.org/uploads/areca/ I could consistently make 1.20.00.12 corrupt data. If you are going to go back, that's probably a bad choice. 1.20.00.02 didn't seem to have corruption problems. Regards, Jan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
KillFill wrote: El vie, 23-03-2007 a las 21:01 -0500, Nikolas Britton escribió: A newer version of the driver has been release to fix this problem (I think): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/ If that doesn't work move back down to 1.20.00.12: http://www.nbritton.org/uploads/areca/ Added erich and scott to the cc list. Well, you may notice im using RELENG_6 im using the latest versions for releng_6: * $FreeBSD: src/sys/dev/arcmsr/arcmsr.c,v 1.8.2.3 * $FreeBSD: src/sys/dev/arcmsr/arcmsr.h,v 1.1.4.2 unfortunatly cannot move the box to -CURRENT to test arcmsr.c 1.20 and arcmsr.h 1.4 thanks. It's possible that I might have botched the merge from -CURRENT. Your panics definitely point to data corruption, which implicates the driver. I'll take a look. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
El vie, 23-03-2007 a las 21:01 -0500, Nikolas Britton escribió: > A newer version of the driver has been release to fix this problem (I > think): > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/ > > If that doesn't work move back down to 1.20.00.12: > http://www.nbritton.org/uploads/areca/ > > Added erich and scott to the cc list. > Well, you may notice im using RELENG_6 im using the latest versions for releng_6: * $FreeBSD: src/sys/dev/arcmsr/arcmsr.c,v 1.8.2.3 * $FreeBSD: src/sys/dev/arcmsr/arcmsr.h,v 1.1.4.2 unfortunatly cannot move the box to -CURRENT to test arcmsr.c 1.20 and arcmsr.h 1.4 thanks. -- KillFill <[EMAIL PROTECTED]> ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Amd64 Unstable Areca
A newer version of the driver has been release to fix this problem (I think): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/arcmsr/ If that doesn't work move back down to 1.20.00.12: http://www.nbritton.org/uploads/areca/ Added erich and scott to the cc list. On 3/22/07, Phillip Neumann <[EMAIL PROTECTED]> wrote: Dear FreeBSD-stable... My amd64 box is not very stable. In its hardware list, you can see there is an areca 1210 card, wich suffer the errata of 6.2-release (high load crash) Last week or so, i saw a commit where the areca bugs were fixed, so i updated the system. I can still see the mashine crashing under load attached are dmesg -a, and a simple 'bt' of kgdb. sometimes (under load) i see this message: Interrupt storm detected on "swi2:"; throttling interrupt source i get the same behaviour with sched_bsd or sched_ule Is this info useful to determine where the problem is? If so, where is it? :-) Has this something to do with the fs? or the scheduler? thanks! killfill. [EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug /var/crash/vmcore.1 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: panic: handle_workitem_remove: lost inodedep cpuid = 0 Uptime: 2h19m9s Dumping 2047 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x804085e7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80408c81 in panic (fmt=0xff007a9cd980 "�VTx") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0x80589fc6 in handle_workitem_remove (dirrem=0xff001a6c7b40, xp=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3599 #5 0x8058a3dc in process_worklist_item (mp=0xff0031482630, flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:962 #6 0x8058fd3d in softdep_process_worklist (mp=0xff0031482630, full=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:851 #7 0x80590031 in softdep_flush () at /usr/src/sys/ufs/ffs/ffs_softdep.c:762 #8 0x803ed647 in fork_exit (callout=0x8058fee0 , arg=0x0, frame=0xb49abc50) at /usr/src/sys/kern/kern_fork.c:821 #9 0x80615b0e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 #10 0x in ?? () #11 0x in ?? () #12 0x0001 in ?? () #13 0x in ?? () #14 0x in ?? () #15 0x in ?? () #16 0x in ?? () #17 0x in ?? () #18 0x in ?? () #19 0x in ?? () #20 0x in ?? () #21 0x in ?? () #22 0x in ?? () #23 0x in ?? () #24 0x in ?? () #25 0x in ?? () #26 0x in ?? () #27 0x in ?? () #28 0x in ?? () #29 0x in ?? () #30 0x in ?? () #31 0x in ?? () #32 0x in ?? () ---Type to continue, or q to quit--- #33 0x in ?? () #34 0x in ?? () #35 0x in ?? () #36 0x in ?? () #37 0x in ?? () #38 0x in ?? () #39 0x in ?? () #40 0x in ?? () #41 0x in ?? () #42 0x00b7d000 in ?? () #43 0x in ?? () #44 0xff002d080600 in ?? () #45 0x in ?? () #46 0xff00785456b0 in ?? () #47 0xff007a9f2000 in ?? () #48 0xb49ab878 in ?? () #49 0xff007a9cd980 in ?? () #50 0x8041ed56 in sched_switch (td=0x0, newtd=0x0, flags=1) at /usr/src/sys/kern/sched_4bsd.c:973 #51 0x in ?? () #52 0x in ?? () #53 0x in ?? () #54 0x00
RE: Amd64 Unstable Areca
Hello... El vie, 23-03-2007 a las 11:12 +1100, Jan Mikkelsen escribió: > Hi, > > Phillip Neumann wrote: > > My amd64 box is not very stable. > > In its hardware list, you can see there is an areca 1210 card, wich > > suffer the errata of 6.2-release (high load crash) > > > > Last week or so, i saw a commit where the areca bugs were fixed, so i > > updated the system. > > > > I can still see the mashine crashing under load > > How heavy is the load? I can't make 6-STABLE crash, but I could make > 6.2-RELEASE crash. My guess is that you have filesystem corruption > introduced with the earlier driver which is now causing problems even > though the driver now works. > Im attaching a new backtrace, when the system crashed, iostat reported actually not very high load: 00 21.80 171 3.64 0.00 0 0.00 16.00 2 0.03 6 0 3 0 91 00 25.00 92 2.24 0.00 0 0.00 0.00 0 0.00 7 0 6 0 87 00 13.99 143 1.95 8.00 5 0.04 0.00 0 0.00 3 0 7 0 90 00 18.25 331 5.89 0.00 0 0.00 16.00 1 0.02 0 0 1 1 97 00 9.84 766 7.36 9.00 8 0.07 23.32 74 1.68 1 0 3 1 95 00 5.05 551 2.72 9.00 8 0.07 11.31 113 1.25 4 0 1 0 94 when it crashed, jailed-apache was the most moving process in the system... The real load is coused by tinderbox, wich uses the disks and one of both CPUs present in the system. Are you sudgesting to newfs FS's? Actually i used plain 6.2 install disks to do that... > Have you done an fsck in single user mode, not a background fsck? > yes, sometimes (after panic) i need to fsck in single user mode... > > sometimes (under load) i see this message: > > Interrupt storm detected on "swi2:"; throttling interrupt source > > I see this too. It seems to be benign. > okey. > Regards, > > Jan Mikkelsen How would i get more info? Any tips are welcome, Thanks! -- KillFill <[EMAIL PROTECTED]> [EMAIL PROTECTED] /usr/obj/usr/src/sys/WORM]# kgdb kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: /usr/local: bad dir ino 363554 at offset 512: mangled entry panic: ufs_dirbad: bad dir cpuid = 0 Uptime: 19h46m33s Dumping 2047 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x804085e7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80408c81 in panic (fmt=0xff005e027000 "\b�L^") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0x8059b230 in ufs_dirbad (ip=0x0, offset=0, how=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:599 #5 0x8059b657 in ufs_lookup (ap=0xb76867a0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:287 #6 0x806807fa in VOP_CACHEDLOOKUP_APV (vop=0x0, a=0x0) at vnode_if.c:150 #7 0x80465bd5 in vfs_cache_lookup (ap=0x0) at vnode_if.h:82 #8 0x8068153d in VOP_LOOKUP_APV (vop=0x808d4540, a=0xb7686890) at vnode_if.c:99 #9 0x8046a555 in lookup (ndp=0xb7686990) at vnode_if.h:56 #10 0x8046b285 in namei (ndp=0xb7686990) at /usr/src/sys/kern/vfs_lookup.c:216 #11 0x8047c4b4 in kern_lstat (td=0xff005e027000, path=0x0, pathseg=UIO_USERSPACE, sbp=0xb7686af0) at /usr/src/sys/kern/vfs_syscalls.c:2141 #12 0x8047c9a7 in lstat (td=0x0, uap=0xb7686bc0) at /usr/src/sys/kern/vfs_syscalls.c:2124 #13 0x8062ae91 in syscall (frame= {tf_rdi = 5275880, tf_rsi = 5275760, tf_rdx = 0, tf_rcx = 0, tf_r8 = -140737483074239, tf_r9 = 128, tf_rax = 190, tf_rbx = 5275648, tf_rbp = 5275760, tf_r10 = 0, tf_r11 = 0, tf_r12 = 5259264, tf_r13 = 0, tf_r14 = 5271552, tf_r15 = 0, tf_trapno = 12, tf_addr = 5275744, tf_flags = 0, tf_
RE: Amd64 Unstable Areca
Hi, Phillip Neumann wrote: > My amd64 box is not very stable. > In its hardware list, you can see there is an areca 1210 card, wich > suffer the errata of 6.2-release (high load crash) > > Last week or so, i saw a commit where the areca bugs were fixed, so i > updated the system. > > I can still see the mashine crashing under load How heavy is the load? I can't make 6-STABLE crash, but I could make 6.2-RELEASE crash. My guess is that you have filesystem corruption introduced with the earlier driver which is now causing problems even though the driver now works. Have you done an fsck in single user mode, not a background fsck? > sometimes (under load) i see this message: > Interrupt storm detected on "swi2:"; throttling interrupt source I see this too. It seems to be benign. Regards, Jan Mikkelsen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"