Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Hello,

>  [ ... ]
> > I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see
> > if that makes a difference)
> >
> One thing to try is to loose any geom raid, if raid needed use ataraid
> instead.

Nope : i did a "newfs ad6" (the disk at the Promise TX4) and then an
rsync on it panics the same way as the geom_concat case did.


> I'm shuffeling boards and controllers here to try to reproduce, so far
> no luck it "just works(tm)", it seems to depend quite heavily on the
> "right" combination of possibly marginal HW

Rather than the marginal HW part, it seems, for me, closely related to
MB/BIOS (as well (Alexander apperently has about the same setup as I
have for this test)):

a while ago (using releng_6) i tried the same setup on three different
MBs: ahd-controller + scsi-boot-disk and TX4 and three disks in
geom_mirror; results :

  - on ASUS A8? board (I use plenty of them without the sligthest
problem for years; not really expensive but not marginal IMHO) :
just look at it and it would crash (g_vfs_done)

  - on Tyan S28?? : rock stable, unable to crash however
hard I tried

  - on some MSI K8 (I usually run Vista on for testing; this one I
really bought "as cheap as possible" ) : would run OK, even 
under rather heavy load, but when pushing really hard it 
finaly deliveres the lovely g_vfs_done ...

I vaguely remember from another PR that the Promise card does
something with PCI-bursting which fbsd does not detect and/or
handle correctly (and beyond my simple skills as dumb tester, but
maybe the linux-sources contain a clue about that as well).

Regards and thanx for your efforts

Arno
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Alexander Sabourenkov <[EMAIL PROTECTED]> writes:

> Arno J. Klaassen wrote:
> > definitely an improvement, but not sufficient (for my setup ) :
> > 
> > amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
> > for years i386-releng_5 with same hardware apart TX4 and
> > drives)
> > 
> > from dmesg :
> > 
> 
> Setup is identical to mine, except for the drives.
> http://lxnt.info/tx4/freebsd/dmesg.text
> 
> > 
> > Improvement : I now can fsck /dev/concat/data without
> > ad6 being detached
> 
> It was that bad? wow.


yop (often even beyond repair ... )

> > Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
> > I get after about some Gigs of data have been transfered :
> > 
> 
> That's strange. Are you sure cables, PSU and line power are ok?
> Back in October upgrading PSU halved the error count for me (under linux).

I could try, but don't believe in it : just three disks and an extra
controller iso the two disks it used to run with ...
> > 
> > I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see
> > if that makes a difference)
> > 
> 
> Please do.

bon, it does : no more scaring messages about DMA SETFEATURES etc, though
it now ends in a panic ...

the end of my /var/log/messages (I turned on your printf as well ) :

Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte last message repeated 15 times
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 22:59:11 charlotte last message repeated 11 times
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 2048 (limit 128)
Nov  2 22:59:11 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 23:01:18 charlotte syslogd: kernel boot file is /boot/kernel/kernel
Nov  2 23:01:18 charlotte kernel: splitting trailing PRD of 4096 (limit 128)
Nov  2 23:01:18 charlotte last message repeated 17 times
Nov  2 23:01:18 charlotte kernel: Copyright (c) 1992-2007 The FreeBSD Project.
Nov  2 23:01:18 charlotte kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 
1989, 1991, 1992, 1993, 1994


And for the panic :

panic: ffs_clusteralloc: map mismatch
Uptime: 35m27s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1023MB (261808 pages) 1007 991 975 959 943 927 911 895 879 863 847 
831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 
511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 
191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:172
172 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:172
#1  0x0004 in ?? ()
#2  0x8025e233 in boot (howto=260)
at /files/bsd/src6/sys/kern/kern_shutdown.c:409
#3  0x8025e836 in panic (fmt=0xff00305bebe0 "")
at /files/bsd/src6/sys/kern/kern_shutdown.c:565
#4  0x8037ab26 in ffs_clusteralloc (ip=0xff00241ae900, cg=9425, 
bpref=0, len=5) at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1663
#5  0x803769a8 in ffs_hashalloc (ip=0xff00241ae900, cg=395, 
pref=0, size=5, allocator=0x8037a650 )
at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:1281
#6  0x8037841a in ffs_reallocblks (ap=0x0)
at /files/bsd/src6/sys/ufs/ffs/ffs_alloc.c:778
#7  0x8042496d in VOP_REALLOCBLKS_APV (vop=0x0, a=0x0)
at vnode_if.c:2056
#8  0x802bd70c in cluster_write (vp=0xff0015904ba0, 
bp=0x9e74ea10, filesize=81920, seqcount=17) at vnode_if.h:1052
#9  0x8039662f in ffs_write (ap=0xad243a30)
at /files/bsd/src6/sys/ufs/ffs/ffs_vnops.c:763
#10 0x804251fb in VOP_WRITE_APV (vop=0x805ad880, 
a=0xad243a30) at vnode_if.c:698
#11 0x802d9bca in vn_write (fp=0xff002e86da50, 
uio=0xad243b50, active_cred=0x0, flags=0, td=0xff00305bebe0)
at vnode_if.h:372
#12 0x802894d7 in dofilewrite (td=0xff00305bebe0, fd=1, 
fp=0xff002e86da50, auio=0xad243b50, offset=0, flags=0)
at file.h:253
#13 0x80289840 in kern_writev (td=0xff00305bebe0, fd=1, 
auio=0xad243b50) at /files/bsd/src6/sys/kern/sys_generic.c:402
#14 0x80289938 in write (td=0x0, uap=0x0)
at /files/bsd/src6/sys/kern/sys_generic.c:326
#15 0x803e0b21 in syscall (frame=
  {tf_rdi = 1, tf_rsi = 277012480, tf_rdx = 262144, tf_rcx = 262144, tf_r8 
= 262144, tf_r9 = 3219503195, tf_rax = 4, tf_rbx = 277012480, tf_rbp = 32768, 
tf_r10 = 1669914800, tf_r11 = 2860306816, tf_r12 = 0, tf_r

Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround.

2007-11-02 Thread Arno J. Klaassen
Hello,

Alexander Sabourenkov <[EMAIL PROTECTED]> writes:

> Hello.
> 
> I have ported the workaround for the hardware bug that causes data
> corruption on Promise SATA300 TX4 cards to RELENG_7.
> 
> Bug description:
> SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is
> larger than 164 bytes. This was found while analysing vendor-supplied
> linux driver.
> 
> Workaround:
> Split trailing PRD entry if it's larger that 164 bytes.
> 
> Two supplied patches do fix problem on my machine.


definitely an improvement, but not sufficient (for my setup ) :

amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
for years i386-releng_5 with same hardware apart TX4 and
drives)

from dmesg :

atapci0:  port 0xe000-0xe07f,0xd800-0xd8ff 
mem 0xfbb0-0xfbb00fff,0xfba0-0xfba1 irq 18 at device 13.0 on pci0
ata2:  on atapci0
ata3:  on atapci0
ata4:  on atapci0
ata5:  on atapci0
atapci1:  port 
0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff
 irq 20 at device 15.0 on pci0
ata6:  on atapci1
ata7:  on atapci1
atapci2:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0:  on atapci2
ata1:  on atapci2

[ ... ]

ad0: 38166MB  at ata0-master UDMA100
ad6: 476940MB  at ata3-master SATA300
ad12: 305245MB  at ata6-master SATA150

booting from ad0 and simple gconcat over ad6 and ad12.

Improvement : I now can fsck /dev/concat/data without
ad6 being detached

Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
I get after about some Gigs of data have been transfered :

Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries 
left) LBA=268435392
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA 
status=ff 
error=ff
 LBA=268435392
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5
Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry 
left) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=268435648
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - 
completing request directly
Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out 
LBA=268435648
Nov  2 16:40:50 charlotte kernel: 
g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5

...

I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see
if that makes a difference)

Regards, Arno
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


"indefinite" wait buffer patch

2007-11-01 Thread Arno J. Klaassen
ards,

Arno


Index: sys/vm/swap_pager.c
===
RCS file: /home/ncvs/src/sys/vm/swap_pager.c,v
retrieving revision 1.295
diff -u -r1.295 swap_pager.c
--- sys/vm/swap_pager.c	5 Aug 2007 21:04:32 -	1.295
+++ sys/vm/swap_pager.c	1 Nov 2007 18:59:18 -
@@ -941,6 +941,10 @@
 	vm_page_t mreq;
 	int i;
 	int j;
+	int retry = 0;
+#define TIMO_CHUNK 2
+#define TIMO_START 1 /* set low to force quick first timeout */
+	static int timo_secs = TIMO_START;
 	daddr_t blk;
 
 	mreq = m[reqpage];
@@ -1066,16 +1070,28 @@
 	 */
 	VM_OBJECT_LOCK(object);
 	while ((mreq->oflags & VPO_SWAPINPROG) != 0) {
+	  if (retry == 0) {
 		mreq->oflags |= VPO_WANTED;
 		vm_page_lock_queues();
 		vm_page_flag_set(mreq, PG_REFERENCED);
 		vm_page_unlock_queues();
 		PCPU_INC(cnt.v_intrans);
-		if (msleep(mreq, VM_OBJECT_MTX(object), PSWP, "swread", hz*20)) {
+	  }
+		if (msleep(mreq, VM_OBJECT_MTX(object), PSWP, "swread", hz*timo_secs)) {
 			printf(
-"swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n",
-			bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount);
+"swap_pager: wait buffer timeout %d (%d secs): bufobj: %p, blkno: %jd, size: %ld\n",
+			++retry, timo_secs, bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount);
+			if (retry*TIMO_CHUNK > timo_secs) {
+			  timo_secs = retry*TIMO_CHUNK;
+			}
+		} else {
+			if (retry > 0) {
+			  printf(
+"swap_pager: wait buffer completed (%d retry): bufobj: %p, blkno: %jd, size: %ld\n",
+			  retry, bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount);
+			}
 		}
+
 	}
 
 	/*
@@ -1553,6 +1569,7 @@
 swp_pager_force_pagein(vm_object_t object, vm_pindex_t pindex)
 {
 	vm_page_t m;
+	int ret;
 
 	vm_object_pip_add(object, 1);
 	m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL|VM_ALLOC_RETRY);
@@ -1567,8 +1584,18 @@
 		return;
 	}
 
-	if (swap_pager_getpages(object, &m, 1, 0) != VM_PAGER_OK)
-		panic("swap_pager_force_pagein: read from swap failed");/*XXX*/
+	if ((ret=swap_pager_getpages(object, &m, 1, 0)) != VM_PAGER_OK) {
+		if (ret == VM_PAGER_FAIL) {
+			printf("swp_pager_force_pagein: VM_PAGER_FAIL\n");
+		}
+		else {
+			if (ret == VM_PAGER_ERROR) {
+printf("swp_pager_force_pagein: VM_PAGER_ERROR\n");
+			}
+		else
+			panic("swap_pager_force_pagein: read from swap failed");/*XXX*/
+		}
+	}
 	vm_object_pip_subtract(object, 1);
 	vm_page_lock_queues();
 	vm_page_dirty(m);

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

early attaching tap0

2004-05-23 Thread Arno J. Klaassen
Hello,

ehmm, i hesitate writing to this list since I'm really
not a hacker, but I have a problem I seemingly cannot
resolve :

I would like the "device tap" entry of my kernel-config
to create and open an ethernet-device, rather
than just initialising the necessary kernel structures.

I made the following diff (needs "COPTFLAGS = -O -pipe -DTAP_INIT_ETHER"
since I do not know how to implement correctly
a kernel option ...) :

Index: net/if_tap.c
===
RCS file: /home/ncvs/src/sys/net/if_tap.c,v
retrieving revision 1.42
diff -r1.42 if_tap.c
125a126,132
> #if defined (TAP_INIT_ETHER)
> #include 
> static int tap_need_init = 1;
> dev_t tap0dev;
> void tapether(void);
> #endif /* defined (TAP_INIT_ETHER) */
> 
140c147
< 	int			 s;
---
> 	intrmask_t		 s;
156a164,168
> #if defined (TAP_INIT_ETHER)
> 		tap0dev =  make_dev(&tap_cdevsw, 0, UID_ROOT, GID_WHEEL, 0600,
> 		"tap0");
> 		tapcreate (tap0dev);
> #endif /* defined (TAP_INIT_ETHER) */
264c276
< static void
---
> void
315a328,336
> #if defined (TAP_INIT_ETHER)
> 	if (tap_need_init == 1) {
> 		ifp->if_type = IFT_ETHER;
> 		ifp->if_baudrate = 1200;
> 		tp->tap_flags |= TAP_OPEN;
> 		tp->tap_pid = 0; /* curproc->p_pid; */
> 		tap_need_init = 0;
> 	}
> #endif
325a347
> 
892a915
> 

A kernel with this patch boots OK single user, but when going
multi-user it panics randomly
(sorry Thomas for blaming atapicam, I thouht I doublechecked
outcommenting COPTFLAGS before contacting you, I'm not sure
I did everything well ...)

I made similar changes to -stable and RELENG_5_2 which work
OK (i.e. no panics, stable system), but against -current
I am in the mist.
If ever someone has a quick idea about what I am doing wrong,
thanx a lot in advance.

Regards, Arno

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"