Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Hello, > [ ... ] > > I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see > > if that makes a difference) > > > One thing to try is to loose any geom raid, if raid needed use ataraid > instead. Nope : i did a "newfs ad6" (the disk at the Promise TX4) and then an rsync on it panics the same way as the geom_concat case did. > I'm shuffeling boards and controllers here to try to reproduce, so far > no luck it "just works(tm)", it seems to depend quite heavily on the > "right" combination of possibly marginal HW Rather than the marginal HW part, it seems, for me, closely related to MB/BIOS (as well (Alexander apperently has about the same setup as I have for this test)): a while ago (using releng_6) i tried the same setup on three different MBs: ahd-controller + scsi-boot-disk and TX4 and three disks in geom_mirror; results : - on ASUS A8? board (I use plenty of them without the sligthest problem for years; not really expensive but not marginal IMHO) : just look at it and it would crash (g_vfs_done) - on Tyan S28?? : rock stable, unable to crash however hard I tried - on some MSI K8 (I usually run Vista on for testing; this one I really bought "as cheap as possible" ) : would run OK, even under rather heavy load, but when pushing really hard it finaly deliveres the lovely g_vfs_done ... I vaguely remember from another PR that the Promise card does something with PCI-bursting which fbsd does not detect and/or handle correctly (and beyond my simple skills as dumb tester, but maybe the linux-sources contain a clue about that as well). Regards and thanx for your efforts Arno ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Alexander Sabourenkov <[EMAIL PROTECTED]> writes: > Arno J. Klaassen wrote: > > definitely an improvement, but not sufficient (for my setup ) : > > > > amd64-releng_6 on an ASUS A8V UP (box ran rock-stable > > for years i386-releng_5 with same hardware apart TX4 and > > drives) > > > > from dmesg : > > > > Setup is identical to mine, except for the drives. > http://lxnt.info/tx4/freebsd/dmesg.text > > > > > Improvement : I now can fsck /dev/concat/data without > > ad6 being detached > > It was that bad? wow. yop (often even beyond repair ... ) > > Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, > > I get after about some Gigs of data have been transfered : > > > > That's strange. Are you sure cables, PSU and line power are ok? > Back in October upgrading PSU halved the error count for me (under linux). I could try, but don't believe in it : just three disks and an extra controller iso the two disks it used to run with ... > > > > I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see > > if that makes a difference) > > > > Please do. bon, it does : no more scaring messages about DMA SETFEATURES etc, though it now ends in a panic ... the end of my /var/log/messages (I turned on your printf as well ) : Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte last message repeated 15 times Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 22:59:11 charlotte last message repeated 11 times Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128) Nov 2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 23:01:18 charlotte syslogd: kernel boot file is /boot/kernel/kernel Nov 2 23:01:18 charlotte kernel: splitting trailing PRD of 4096 (limit 128) Nov 2 23:01:18 charlotte last message repeated 17 times Nov 2 23:01:18 charlotte kernel: Copyright (c) 1992-2007 The FreeBSD Project. Nov 2 23:01:18 charlotte kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 And for the panic : panic: ffs_clusteralloc: map mismatch Uptime: 35m27s Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261808 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:172 #1 0x0004 in ?? () #2 0x8025e233 in boot (howto=260) at /files/bsd/src6/sys/kern/kern_shutdown.c:409 #3 0x8025e836 in panic (fmt=0xff00305bebe0 "") at /files/bsd/src6/sys/kern/kern_shutdown.c:565 #4 0x8037ab26 in ffs_clusteralloc (ip=0xff00241ae900, cg=9425, bpref=0, len=5) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1663 #5 0x803769a8 in ffs_hashalloc (ip=0xff00241ae900, cg=395, pref=0, size=5, allocator=0x8037a650 ) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1281 #6 0x8037841a in ffs_reallocblks (ap=0x0) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:778 #7 0x8042496d in VOP_REALLOCBLKS_APV (vop=0x0, a=0x0) at vnode_if.c:2056 #8 0x802bd70c in cluster_write (vp=0xff0015904ba0, bp=0x9e74ea10, filesize=81920, seqcount=17) at vnode_if.h:1052 #9 0x8039662f in ffs_write (ap=0xad243a30) at /files/bsd/src6/sys/ufs/ffs/ffs_vnops.c:763 #10 0x804251fb in VOP_WRITE_APV (vop=0x805ad880, a=0xad243a30) at vnode_if.c:698 #11 0x802d9bca in vn_write (fp=0xff002e86da50, uio=0xad243b50, active_cred=0x0, flags=0, td=0xff00305bebe0) at vnode_if.h:372 #12 0x802894d7 in dofilewrite (td=0xff00305bebe0, fd=1, fp=0xff002e86da50, auio=0xad243b50, offset=0, flags=0) at file.h:253 #13 0x80289840 in kern_writev (td=0xff00305bebe0, fd=1, auio=0xad243b50) at /files/bsd/src6/sys/kern/sys_generic.c:402 #14 0x80289938 in write (td=0x0, uap=0x0) at /files/bsd/src6/sys/kern/sys_generic.c:326 #15 0x803e0b21 in syscall (frame= {tf_rdi = 1, tf_rsi = 277012480, tf_rdx = 262144, tf_rcx = 262144, tf_r8 = 262144, tf_r9 = 3219503195, tf_rax = 4, tf_rbx = 277012480, tf_rbp = 32768, tf_r10 = 1669914800, tf_r11 = 2860306816, tf_r12 = 0, tf_r
Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.
Hello, Alexander Sabourenkov <[EMAIL PROTECTED]> writes: > Hello. > > I have ported the workaround for the hardware bug that causes data > corruption on Promise SATA300 TX4 cards to RELENG_7. > > Bug description: > SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is > larger than 164 bytes. This was found while analysing vendor-supplied > linux driver. > > Workaround: > Split trailing PRD entry if it's larger that 164 bytes. > > Two supplied patches do fix problem on my machine. definitely an improvement, but not sufficient (for my setup ) : amd64-releng_6 on an ASUS A8V UP (box ran rock-stable for years i386-releng_5 with same hardware apart TX4 and drives) from dmesg : atapci0: port 0xe000-0xe07f,0xd800-0xd8ff mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0 ata2: on atapci0 ata3: on atapci0 ata4: on atapci0 ata5: on atapci0 atapci1: port 0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device 15.0 on pci0 ata6: on atapci1 ata7: on atapci1 atapci2: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: on atapci2 ata1: on atapci2 [ ... ] ad0: 38166MB at ata0-master UDMA100 ad6: 476940MB at ata3-master SATA300 ad12: 305245MB at ata6-master SATA150 booting from ad0 and simple gconcat over ad6 and ad12. Improvement : I now can fsck /dev/concat/data without ad6 being detached Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, I get after about some Gigs of data have been transfered : Nov 2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435392 Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=ff error=ff LBA=268435392 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5 Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=268435648 Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=268435648 Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5 ... I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see if that makes a difference) Regards, Arno ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
"indefinite" wait buffer patch
ards, Arno Index: sys/vm/swap_pager.c === RCS file: /home/ncvs/src/sys/vm/swap_pager.c,v retrieving revision 1.295 diff -u -r1.295 swap_pager.c --- sys/vm/swap_pager.c 5 Aug 2007 21:04:32 - 1.295 +++ sys/vm/swap_pager.c 1 Nov 2007 18:59:18 - @@ -941,6 +941,10 @@ vm_page_t mreq; int i; int j; + int retry = 0; +#define TIMO_CHUNK 2 +#define TIMO_START 1 /* set low to force quick first timeout */ + static int timo_secs = TIMO_START; daddr_t blk; mreq = m[reqpage]; @@ -1066,16 +1070,28 @@ */ VM_OBJECT_LOCK(object); while ((mreq->oflags & VPO_SWAPINPROG) != 0) { + if (retry == 0) { mreq->oflags |= VPO_WANTED; vm_page_lock_queues(); vm_page_flag_set(mreq, PG_REFERENCED); vm_page_unlock_queues(); PCPU_INC(cnt.v_intrans); - if (msleep(mreq, VM_OBJECT_MTX(object), PSWP, "swread", hz*20)) { + } + if (msleep(mreq, VM_OBJECT_MTX(object), PSWP, "swread", hz*timo_secs)) { printf( -"swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", - bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); +"swap_pager: wait buffer timeout %d (%d secs): bufobj: %p, blkno: %jd, size: %ld\n", + ++retry, timo_secs, bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); + if (retry*TIMO_CHUNK > timo_secs) { + timo_secs = retry*TIMO_CHUNK; + } + } else { + if (retry > 0) { + printf( +"swap_pager: wait buffer completed (%d retry): bufobj: %p, blkno: %jd, size: %ld\n", + retry, bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); + } } + } /* @@ -1553,6 +1569,7 @@ swp_pager_force_pagein(vm_object_t object, vm_pindex_t pindex) { vm_page_t m; + int ret; vm_object_pip_add(object, 1); m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL|VM_ALLOC_RETRY); @@ -1567,8 +1584,18 @@ return; } - if (swap_pager_getpages(object, &m, 1, 0) != VM_PAGER_OK) - panic("swap_pager_force_pagein: read from swap failed");/*XXX*/ + if ((ret=swap_pager_getpages(object, &m, 1, 0)) != VM_PAGER_OK) { + if (ret == VM_PAGER_FAIL) { + printf("swp_pager_force_pagein: VM_PAGER_FAIL\n"); + } + else { + if (ret == VM_PAGER_ERROR) { +printf("swp_pager_force_pagein: VM_PAGER_ERROR\n"); + } + else + panic("swap_pager_force_pagein: read from swap failed");/*XXX*/ + } + } vm_object_pip_subtract(object, 1); vm_page_lock_queues(); vm_page_dirty(m); -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
early attaching tap0
Hello, ehmm, i hesitate writing to this list since I'm really not a hacker, but I have a problem I seemingly cannot resolve : I would like the "device tap" entry of my kernel-config to create and open an ethernet-device, rather than just initialising the necessary kernel structures. I made the following diff (needs "COPTFLAGS = -O -pipe -DTAP_INIT_ETHER" since I do not know how to implement correctly a kernel option ...) : Index: net/if_tap.c === RCS file: /home/ncvs/src/sys/net/if_tap.c,v retrieving revision 1.42 diff -r1.42 if_tap.c 125a126,132 > #if defined (TAP_INIT_ETHER) > #include > static int tap_need_init = 1; > dev_t tap0dev; > void tapether(void); > #endif /* defined (TAP_INIT_ETHER) */ > 140c147 < int s; --- > intrmask_t s; 156a164,168 > #if defined (TAP_INIT_ETHER) > tap0dev = make_dev(&tap_cdevsw, 0, UID_ROOT, GID_WHEEL, 0600, > "tap0"); > tapcreate (tap0dev); > #endif /* defined (TAP_INIT_ETHER) */ 264c276 < static void --- > void 315a328,336 > #if defined (TAP_INIT_ETHER) > if (tap_need_init == 1) { > ifp->if_type = IFT_ETHER; > ifp->if_baudrate = 1200; > tp->tap_flags |= TAP_OPEN; > tp->tap_pid = 0; /* curproc->p_pid; */ > tap_need_init = 0; > } > #endif 325a347 > 892a915 > A kernel with this patch boots OK single user, but when going multi-user it panics randomly (sorry Thomas for blaming atapicam, I thouht I doublechecked outcommenting COPTFLAGS before contacting you, I'm not sure I did everything well ...) I made similar changes to -stable and RELENG_5_2 which work OK (i.e. no panics, stable system), but against -current I am in the mist. If ever someone has a quick idea about what I am doing wrong, thanx a lot in advance. Regards, Arno ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"