Re: FreeBSD 9.1-RC1 Available...

2012-08-28 Thread Arno J. Klaassen
Jim Pingle li...@pingle.org writes:

 On 8/23/2012 11:43 AM, Ian Lepore wrote:
 On Thu, 2012-08-23 at 11:17 -0400, Ken Menzel wrote:

 I found two good primers:
 http://mebsd.com/configure-freebsd-servers/update-freebsd-source-tree-using-subversion-svn.html
 http://www.freebsd.org/doc/en/articles/committers-guide/article.html#SUBVERSION-PRIMER

 The second primer in the committer handbook seems to indicate that it
 is difficult to run an SVN mirror. This appears to me to be the
 biggest drawback.  I have been using CVS and perforce for years,  but
 subversion is new to me. 
 
 It may be difficult to run an svn mirror that allows you to commit
 locally and get those changes back to the project, but running a
 read-only mirror is trivial.  The script I run nightly from cron to sync
 my local mirror is:
 
 #!/bin/sh
 #
 # svnsync to pull in changes from FreeBSD to my local mirror.
 #
 svnsync sync file:///local/vc/svn/base
 
 I can't remember how I initially created and populated the mirror, but
 it's likely I grabbed a snapshot of the mirror at work and brought it
 home on a thumb drive (just to avoid initial network DL time).

 I spent a little time today setting up an SVN mirror after reading this
 thread and wrote up a how-to for those looking to do the same.

 http://www.pingle.org/2012/08/24/freebsd-svn-mirror

 Comments/Flames/Corrections welcome...

thanx; works out of the box for me (using the svnserve_enable path).

That said : I glanced at a diff of a stable/8 checkout both from
/home/ncvs repo and new /home/freebsd-svn one, and saw a (maybe well-known ..)
'feature' :

  diff ./src/contrib/amd/include/am_defs.h 
/raid1/bsd/8/src/contrib/amd/include/am_defs.h

 42c42
  * $FreeBSD: stable/8/contrib/amd/include/am_defs.h 174299 2007-12-05 
16:03:52Z obrien $
---
  * $FreeBSD: src/contrib/amd/include/am_defs.h,v 1.15.2.1 2009/08/03 08:13:06 
 kensmith Exp $


I wondered why the date (and commiter ...) in the expansion were
different (from the svn log ): 

  
  r196045 | kensmith | 2009-08-03 10:13:06 +0200 (Mon, 03 Aug 2009) | 4
  lines

  Copy head to stable/8 as part of 8.0 Release cycle.

  Approved by:re (Implicit)

  
  r174299 | obrien | 2007-12-05 17:03:52 +0100 (Wed, 05 Dec 2007) | 3
  lines


So the 'Copy head' chain does not update the $FreeBSD tag, whereas the
consequent svn to cvs chain does.

FYI, Arno



 Jim
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: nfs-bug when server for 9-Stable becomes client as well ?

2012-07-09 Thread Arno J. Klaassen
Vincent Hoffman vi...@unsane.co.uk writes:

 On 06/07/2012 18:51, Arno J. Klaassen wrote:
 Vincent Hoffman vi...@unsane.co.uk writes:

 On 06/07/2012 14:19, Arno J. Klaassen wrote:
 Hello,

 looks like I discouvered a probable bug in the nfs-code, very
 easy to reproduce in my setup :


Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs)

Machine-2 : 8-stable as of April the 10th exporting /raid1

 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
 and start a script on this mount looping something like :

   dd if=/dev/random of=BIG bs=1048576 count=${SIZE}
   cp -fp BIG BIG2
   cmp -x BIG BIG2

 I let this run for 24 hours (from time to time stressing Machine-1 with
 other scripts, including provoking heavy swapping), no problem at all.

 However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
 on Machine-2, and *immediately* the above loop on Machine-1 fails :

   Copying file ...cp: BIG: Permission denied

 No console messages this time, last time I got 

   kernel: nfs_getpages: error 13
   kernel: vm_fault: pager read error, pid 87803 (cmp)

 on Machine-1.

 I repeated this scenario by replacing Machine-2 with a good old
 6-4-stable one, same outcome.

 Please tell me what I could do to nail this down a bit more.
 Its possible (although not definite) that you have hit the a mountd bug
 as documented in PRs

 kern/131342
 kern/136865
 especially kern/131342 looks similar and quite old; funny I never hit
 this before, I basically do the same tests since 'ages' on each new box.
 Could be that faster network/cpu unreveals some race condition; I notice
 as well that this server is the first (IIRC) who uses 3 different IRQs
 for network interrupts (em(4) Intel(R) PRO/1000).
 Certainly possible and seems reasonable enough.

just my $0.02, I glanced kern/131342, looks like the culprit should be
something like a 'non-atomic'-operation in-between invalidating old
/etc/exports and validating new /etc/exports.
Wonder if just verifying /var/run/mountd.pid is newer than /etc/exports
and if true just skip that operation would be an acceptable band-aid (if
I understood correctly, a rewrite of mountd correcting this (amongst
others) is close to hit -current (?))

 I've recently asked on -CURRENT about this and had a patch to try from
 Rick, I'm testing it now but it doesnt seem to fix it for me, just
 improve it alothough I'm trying to get enough runs to be a valid sample.
 (see
 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current
 )

 What I did for my production nas was edit mount.c so it didnt send a
 SIGHUP to mountd as suggested by rick, as it was easy to do and non
 intrusive.
 hmm, this means I should patch each fbsd-client, no? May be easier to
 patch mountd to ignore SIHGUP and use some non-standard signal to force
 re-init?
 No just patch /sbin/mount on the nfs server so it doesnt send the SIGHUP
 to mountd.

[In my case] it's the mount on a client which causes the server to fail,
I don't see how patching /sbin/mount on the nfs server should fix this?
As I don't remember if it's possible to discriminate a -1 signal send
from a process against one sent from terminal, if so, another bandaid,
one sent from a process could be ignored at all?

Merci

Arno


 you can manually HUP mountd if needed.

 Arno


 Vince

 Thanx in advance,

 Best, Arno


 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


nfs-bug when server for 9-Stable becomes client as well ?

2012-07-06 Thread Arno J. Klaassen

Hello,

looks like I discouvered a probable bug in the nfs-code, very
easy to reproduce in my setup :


   Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs)

   Machine-2 : 8-stable as of April the 10th exporting /raid1

On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
and start a script on this mount looping something like :

  dd if=/dev/random of=BIG bs=1048576 count=${SIZE}
  cp -fp BIG BIG2
  cmp -x BIG BIG2

I let this run for 24 hours (from time to time stressing Machine-1 with
other scripts, including provoking heavy swapping), no problem at all.

However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
on Machine-2, and *immediately* the above loop on Machine-1 fails :

  Copying file ...cp: BIG: Permission denied

No console messages this time, last time I got 

  kernel: nfs_getpages: error 13
  kernel: vm_fault: pager read error, pid 87803 (cmp)

on Machine-1.

I repeated this scenario by replacing Machine-2 with a good old
6-4-stable one, same outcome.

Please tell me what I could do to nail this down a bit more.

Thanx in advance,

Best, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: nfs-bug when server for 9-Stable becomes client as well ?

2012-07-06 Thread Arno J. Klaassen

Vincent Hoffman vi...@unsane.co.uk writes:

 On 06/07/2012 14:19, Arno J. Klaassen wrote:
 Hello,

 looks like I discouvered a probable bug in the nfs-code, very
 easy to reproduce in my setup :


Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs)

Machine-2 : 8-stable as of April the 10th exporting /raid1

 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
 and start a script on this mount looping something like :

   dd if=/dev/random of=BIG bs=1048576 count=${SIZE}
   cp -fp BIG BIG2
   cmp -x BIG BIG2

 I let this run for 24 hours (from time to time stressing Machine-1 with
 other scripts, including provoking heavy swapping), no problem at all.

 However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768)
 on Machine-2, and *immediately* the above loop on Machine-1 fails :

   Copying file ...cp: BIG: Permission denied

 No console messages this time, last time I got 

   kernel: nfs_getpages: error 13
   kernel: vm_fault: pager read error, pid 87803 (cmp)

 on Machine-1.

 I repeated this scenario by replacing Machine-2 with a good old
 6-4-stable one, same outcome.

 Please tell me what I could do to nail this down a bit more.
 Its possible (although not definite) that you have hit the a mountd bug
 as documented in PRs

 kern/131342
 kern/136865

especially kern/131342 looks similar and quite old; funny I never hit
this before, I basically do the same tests since 'ages' on each new box.
Could be that faster network/cpu unreveals some race condition; I notice
as well that this server is the first (IIRC) who uses 3 different IRQs
for network interrupts (em(4) Intel(R) PRO/1000).

 I've recently asked on -CURRENT about this and had a patch to try from
 Rick, I'm testing it now but it doesnt seem to fix it for me, just
 improve it alothough I'm trying to get enough runs to be a valid sample.
 (see
 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current
 )

 What I did for my production nas was edit mount.c so it didnt send a
 SIGHUP to mountd as suggested by rick, as it was easy to do and non
 intrusive.

hmm, this means I should patch each fbsd-client, no? May be easier to
patch mountd to ignore SIHGUP and use some non-standard signal to force
re-init?

Arno


 Vince


 Thanx in advance,

 Best, Arno

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


9-STABLE and Iphone modem (tethering), anyone succeed ?

2012-03-21 Thread Arno J. Klaassen

Hello,


does anyone succeed in using an Iphone as modem on 9-STABLE (sources
as of March 16) ?

I follow the instructions from
 'http://forums.freebsd.org/showthread.php?t=19995' using 'usbmuxd' and
'libimobiledevice' from ports.

When I start 'usbmuxd' I indeed see in dmesg(1) :

  ipheth0: Apple Inc. iPhone, class 0/0, rev 2.00/0.01, addr 3 on usbus1
  ue0: USB Ethernet on ipheth0
  ue0: bpf attached
  ue0: Ethernet address: XXX


I did not find 'ipheth-pair' (or something equiivalent) in ports, I build it 
from the sources as indicated in the forum-post, but it fails with :


  # ./ipheth-pair -v
  ./ipheth-pair: -14: cannot get lockdown

The corresponsing log from  'usbmuxd -v -v ' says (stripped) :


  [16:29:20.490][3] usbmuxd v1.0.7 starting up
  [16:29:20.491][4] Creating socket
  [16:29:20.491][5] client_init
  [16:29:20.491][5] device_init
  [16:29:20.491][4] Initializing USB
  [16:29:20.491][5] usb_init for linux / libusb 1.0
  [16:29:20.491][4] Found new device with v/p 05ac:1297 at 1-3
  [16:29:20.491][4] Found interface 1 with endpoints 04/85 for device 1-3
  [16:29:20.495][4] Using wMaxPacketSize=512 for device 1-3
  [16:29:20.495][3] Connecting to new device on location 0x10003 as ID 1
  [16:29:20.495][4] 1 device detected
  [16:29:20.495][3] Initialization complete
  [16:29:20.495][5] usb polling enable: 0
  [16:29:20.496][3] Connected to v1.0 device 1 on location 0x10003 with serial 
number XXX
  [16:29:20.496][5] client_device_add: id 1, location 0x10003, serial XXX
  [16:29:46.428][4] New client on fd 9
  [16:29:46.428][5] Client command in fd 9 len 16 ver 0 msg 3 tag 1
  [16:29:46.428][5] send_pkt fd 9 tag 1 msg 1 payload_length 4
  [16:29:46.428][5] Client 9 now LISTENING
  [16:29:46.428][5] Enlarging client 9 reply buffer 1024 - 1308 to make space 
for device notifications
  [16:29:46.428][5] send_pkt fd 9 tag 0 msg 4 payload_length 268
  [16:29:47.437][4] Client 9 connection closed
  [16:29:47.437][4] Disconnecting client fd 9
  [16:29:47.437][4] New client on fd 9
  [16:29:47.437][5] Client command in fd 9 len 24 ver 0 msg 2 tag 2
  [16:29:47.437][5] Client 9 connection request to device 1 port 62078
  [16:29:47.437][5] [OUT] dev=1 sport=1 dport=62078 seq=0 ack=0 flags=0x2 
window=131072[512] len=0
  [16:29:47.439][5] [IN] dev=1 sport=62078 dport=1 seq=0 ack=1 flags=0x12 
window=131072[512] len=0
  [16:29:47.439][5] [OUT] dev=1 sport=1 dport=62078 seq=1 ack=1 flags=0x10 
window=131072[512] len=0
  [16:29:47.440][5] send_pkt fd 9 tag 2 msg 1 payload_length 4
  [16:29:47.440][5] Client 9 switching to CONNECTED state
  [16:29:47.442][5] [OUT] dev=1 sport=1 dport=62078 seq=1 ack=1 flags=0x10 
window=131072[512] len=4
  ... (all having 'flags=0x10')
  [16:29:47.499][5] [IN] dev=1 sport=62078 dport=1 seq=3502 ack=14410 
flags=0x10 window=131072[512] len=279
  [16:29:47.501][5] [IN] dev=1 sport=62078 dport=1 seq=3781 ack=14410 flags=0x4 
window=0[0] len=32
  [16:29:47.501][5] RST reason: 
  [16:29:47.501][4] Connection reset by device 1 (1-62078)
  [16:29:47.501][5] connection_teardown dev 1 sport 1 dport 62078
  [16:29:47.501][4] Disconnecting client fd 9
  [16:29:47.501][4] client_process: fd 9 not found in client list


I hope anyone reading this has had more succes ;-).

Thanx, Arno

NB 1, Iphone not 'jailbroken' 
NB 2, yes 'it works' under Windows 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


9-stable: one-device ZFS fails [was: 9-stable : geli + one-disk ZFS fails]

2012-02-19 Thread Arno J. Klaassen

a followup to myself

 Hello,

 Martin Simmons mar...@lispworks.com writes:

 Some random ideas:

 1) Can you dd the whole of ada0s3.eli without errors?

 2) If you scrub a few more times, does it find the same number of errors each
 time and are they always in that XNAT.tar file?

 3) Can you try zfs without geli?


 yeah, and it seems to rule out geli :

 [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and
 /dev/ada0s4 ]

  geli init /dev/ada0s3
  geli attach /dev/ada0s3

  zpool create zgeli /dev/ada0s3.eli

  zfs create zgeli/home
  zfs create zgeli/home/arno
  zfs create zgeli/home/arno/.priv
  zfs create zgeli/home/arno/.scito
  zfs set copies=2 zgeli/home/arno/.priv
  zfs set atime=off zgeli


 [put some files on it, wait a little : ]


[root@cc ~]# zpool status -v
pool: zgeli
   state: ONLINE
  status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
  action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
 see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub in progress since Sat Feb 18 17:46:54 2012
  425M scanned out of 2.49G at 85.0M/s, 0h0m to go
  0 repaired, 16.64% done
  config: 
  
  NAME  STATE READ WRITE CKSUM
  zgeli ONLINE   0 0 1
ada0s3.eli  ONLINE   0 0 2

  errors: Permanent errors have been detected in the following files:

 /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso
  [root@cc ~]# zpool scrub -s zgeli
  [root@cc ~]# 


 [then idem directly on next partition ]

  zpool create zgpart /dev/ada0s4

  zfs create zgpart/home
  zfs create zgpart/home/arno
  zfs create zgpart/home/arno/.priv
  zfs create zgpart/home/arno/.scito
  zfs set copies=2 zgpart/home/arno/.priv
  zfs set atime=off zgpart

 [put some files on it, wait a little : ]

pool: zgpart
   state: ONLINE
  status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
  action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
 see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012
  config:

  NAMESTATE READ WRITE CKSUM
  zgpart  ONLINE   0 0 1
ada0s4ONLINE   0 0 2

  errors: Permanent errors have been detected in the following files:

  /zgpart/home/arno/.scito/ 
  [root@cc ~]# 


I tested a bit more this afternoon :


  - zpool create zgpart /dev/ada0s4d  = 

KO

  - split ada0s4 in two equally sized partitions and then
  
  zpool create zgpart mirror /dev/ada0s4d /dev/ada0s4e =

works like a charm .

   ( [root@cc /zgpart]# zpool status -v zgpart
   pool: zgpart
   state: ONLINE
   scan: scrub repaired 0 in 0h36m with 0 errors on Sun Feb 19
   17:20:34 2012
 config:

NAME STATE READ WRITE CKSUM
zgpart   ONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
ada0s4d  ONLINE   0 0 0
ada0s4e  ONLINE   0 0 0

 errors: No known data errors )
  

FYI, best, Arno



 I still do not particuliarly suspect the disk since I cannot reproduce
 similar behaviour on UFS.

 That said, this disk is supposed to be 'hybrid-SSD', maybe something
 special ZFS doesn't like ??? :


  ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
  ada0: ST95005620AS SD23 ATA-8 SATA 2.x device
  ada0: Serial Number 5YX0J5YD
  ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
  ada0: Command Queueing enabled
  ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
  ada0: Previously was known as ad4
  GEOM: new disk ada0


 Please let me know what information to provide more.

 Best,

 Arno




 4) Is the slice/partition layout definitely correct?

 __Martin


 On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
 
 hello,
 
 to eventually gain interest in this issue :
 
  I updated to today's -stable, tested with vfs.zfs.debug=1
  and vfs.zfs.prefetch_disable=0, no difference.
 
  I also tested to read the raw partition :
 
   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
   103746636+0 records in
   103746636+0 records out
   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
   [root@cc /usr/ports]#
 
  Disk is brand new, looks ok, either my setup is not good or there is
  a bug somewhere; I can play around with this box for some more time,
  please feel free to provide me with some hints what to do to be useful
  for you.
 
 Best,
 
 Arno
 
 
 Arno J. Klaassen a...@heho.snv.jussieu.fr writes:
 
  Hello,
 
 
  I finally decided to 'play' a bit with ZFS on a notebook, some years
  old, but I installed a brand new disk and memtest

Re: 9-stable : geli + one-disk ZFS fails

2012-02-18 Thread Arno J. Klaassen

Hello,

Martin Simmons mar...@lispworks.com writes:

 Some random ideas:

 1) Can you dd the whole of ada0s3.eli without errors?

 2) If you scrub a few more times, does it find the same number of errors each
 time and are they always in that XNAT.tar file?

 3) Can you try zfs without geli?


yeah, and it seems to rule out geli :

[ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and
/dev/ada0s4 ]

 geli init /dev/ada0s3
 geli attach /dev/ada0s3

 zpool create zgeli /dev/ada0s3.eli

 zfs create zgeli/home
 zfs create zgeli/home/arno
 zfs create zgeli/home/arno/.priv
 zfs create zgeli/home/arno/.scito
 zfs set copies=2 zgeli/home/arno/.priv
 zfs set atime=off zgeli


[put some files on it, wait a little : ]


   [root@cc ~]# zpool status -v
   pool: zgeli
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
   scan: scrub in progress since Sat Feb 18 17:46:54 2012
 425M scanned out of 2.49G at 85.0M/s, 0h0m to go
 0 repaired, 16.64% done
 config: 
 
 NAME  STATE READ WRITE CKSUM
 zgeli ONLINE   0 0 1
   ada0s3.eli  ONLINE   0 0 2

 errors: Permanent errors have been detected in the following files:

/zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso
 [root@cc ~]# zpool scrub -s zgeli
 [root@cc ~]# 


[then idem directly on next partition ]

 zpool create zgpart /dev/ada0s4

 zfs create zgpart/home
 zfs create zgpart/home/arno
 zfs create zgpart/home/arno/.priv
 zfs create zgpart/home/arno/.scito
 zfs set copies=2 zgpart/home/arno/.priv
 zfs set atime=off zgpart

[put some files on it, wait a little : ]

   pool: zgpart
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
   scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012
 config:

 NAMESTATE READ WRITE CKSUM
 zgpart  ONLINE   0 0 1
   ada0s4ONLINE   0 0 2

 errors: Permanent errors have been detected in the following files:

 /zgpart/home/arno/.scito/ 
 [root@cc ~]# 


I still do not particuliarly suspect the disk since I cannot reproduce
similar behaviour on UFS.

That said, this disk is supposed to be 'hybrid-SSD', maybe something
special ZFS doesn't like ??? :


 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
 ada0: ST95005620AS SD23 ATA-8 SATA 2.x device
 ada0: Serial Number 5YX0J5YD
 ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
 ada0: Command Queueing enabled
 ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
 ada0: Previously was known as ad4
 GEOM: new disk ada0


Please let me know what information to provide more.

Best,

Arno




 4) Is the slice/partition layout definitely correct?

 __Martin


 On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
 
 hello,
 
 to eventually gain interest in this issue :
 
  I updated to today's -stable, tested with vfs.zfs.debug=1
  and vfs.zfs.prefetch_disable=0, no difference.
 
  I also tested to read the raw partition :
 
   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
   103746636+0 records in
   103746636+0 records out
   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
   [root@cc /usr/ports]#
 
  Disk is brand new, looks ok, either my setup is not good or there is
  a bug somewhere; I can play around with this box for some more time,
  please feel free to provide me with some hints what to do to be useful
  for you.
 
 Best,
 
 Arno
 
 
 Arno J. Klaassen a...@heho.snv.jussieu.fr writes:
 
  Hello,
 
 
  I finally decided to 'play' a bit with ZFS on a notebook, some years
  old, but I installed a brand new disk and memtest passes OK.
 
  I installed base+ports on partition 2, using 'classical' UFS.
 
  I crypted partition 3 and created a single zpool on it containing
  4 Z-file-systems :
 
   [root@cc ~]# zfs list
   NAME  USED  AVAIL  REFER  MOUNTPOINT
   zfiles   10.7G   377G   152K  /zfiles
   zfiles/home  10.6G   377G   119M  /zfiles/home
   zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
   zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
   zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
 
 
  I export the ZFS's via nfs and rsynced on the other machine some backup
  of my current note-book (geli + UFS, (almost) same 9-stable version, no
  problem) to the ZFS's.
 
 
  Quite fast, I see on the notebook :
 
 
   [root@cc /usr/temp

Re: 9-stable : geli + one-disk ZFS fails

2012-02-15 Thread Arno J. Klaassen

Hello,

Martin Simmons mar...@lispworks.com writes:

 Some random ideas:

 1) Can you dd the whole of ada0s3.eli without errors?

[root@cc ~]# dd if=/dev/ada0s3.eli of=/dev/null bs=4096 conv=noerror
103746635+0 records in
103746635+0 records out
424946216960 bytes transferred in 18773.796016 secs (22635072 bytes/sec)
[root@cc ~]# 


 2) If you scrub a few more times, does it find the same number of errors each
 time and are they always in that XNAT.tar file?


Looks like each scrub worsens the situation :


[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 148K in 0h14m with 26 errors on Mon Feb 13 18:54:33 2012
config:

NAME  STATE READ WRITE CKSUM
zfilesONLINE   0 026
  ada0s3.eli  ONLINE   0 087

errors: Permanent errors have been detected in the following files:

 [ 11 files ]

[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Feb 15 14:36:52 2012
17.7G scanned out of 28.7G at 72.1M/s, 0h2m to go
0 repaired, 61.56% done
config:

NAME  STATE READ WRITE CKSUM
zfilesONLINE   0 054
  ada0s3.eli  ONLINE   0 0   143

errors: Permanent errors have been detected in the following files:

  [ 11 files ]

# [root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 4K in 0h7m with 70 errors on Wed Feb 15 14:43:57 2012
config:

NAME  STATE READ WRITE CKSUM
zfilesONLINE   0 096
  ada0s3.eli  ONLINE   0 0   228

errors: Permanent errors have been detected in the following files:

  [ 25 files (cannot quickly see iff it contains all old 11 files) ] 

[root@cc ~]# 

[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0h6m with 70 errors on Wed Feb 15 15:19:28 2012
config:

NAME  STATE READ WRITE CKSUM
zfilesONLINE   0 0   166
  ada0s3.eli  ONLINE   0 0   368

errors: Permanent errors have been detected in the following files:

  [ 25 files  ] 

[root@cc ~]# 


 3) Can you try zfs without geli?

 4) Is the slice/partition layout definitely correct?

 __Martin


 On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
 
 hello,
 
 to eventually gain interest in this issue :
 
  I updated to today's -stable, tested with vfs.zfs.debug=1
  and vfs.zfs.prefetch_disable=0, no difference.
 
  I also tested to read the raw partition :
 
   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
   103746636+0 records in
   103746636+0 records out
   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
   [root@cc /usr/ports]#
 
  Disk is brand new, looks ok, either my setup is not good or there is
  a bug somewhere; I can play around with this box for some more time,
  please feel free to provide me with some hints what to do to be useful
  for you.
 
 Best,
 
 Arno
 
 
 Arno J. Klaassen a...@heho.snv.jussieu.fr writes:
 
  Hello,
 
 
  I finally decided to 'play' a bit with ZFS on a notebook, some years
  old, but I installed a brand new disk and memtest passes OK.
 
  I installed base+ports on partition 2, using 'classical' UFS.
 
  I crypted partition 3 and created a single zpool on it containing
  4 Z-file-systems :
 
   [root@cc ~]# zfs list
   NAME  USED  AVAIL  REFER  MOUNTPOINT
   zfiles   10.7G   377G   152K  /zfiles
   zfiles/home  10.6G   377G   119M  /zfiles/home
   zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
   zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
   zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
 
 
  I export the ZFS's via nfs and rsynced on the other machine some backup
  of my

Re: 9-stable : geli + one-disk ZFS fails

2012-02-14 Thread Arno J. Klaassen

Hallo Aleksandr,

  Hello, Arno J. Klaassen!

 On Sat, Feb 11, 2012 at 04:53:10PM +0100
 a...@heho.snv.jussieu.fr wrote about 9-stable : geli + one-disk ZFS fails:
 
 Hello,
 
 
 I finally decided to 'play' a bit with ZFS on a notebook, some years
 old, but I installed a brand new disk and memtest passes OK.
 
 I installed base+ports on partition 2, using 'classical' UFS.
 
 I crypted partition 3 and created a single zpool on it containing
 4 Z-file-systems :
 
  [root@cc ~]# zfs list
  NAME  USED  AVAIL  REFER  MOUNTPOINT
  zfiles   10.7G   377G   152K  /zfiles
  zfiles/home  10.6G   377G   119M  /zfiles/home
  zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
  zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
 
 
 I export the ZFS's via nfs and rsynced on the other machine some backup
 of my current note-book (geli + UFS, (almost) same 9-stable version, no
 problem) to the ZFS's.
 
 
 Quite fast, I see on the notebook :
 
 
  [root@cc /usr/temp]# zpool status -v
pool: zfiles
   state: ONLINE
  status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
  action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
 see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
2012
  config: 
  
  NAME  STATE READ WRITE CKSUM
  zfilesONLINE   0 011
ada0s3.eli  ONLINE   0 023
 
  errors: Permanent errors have been detected in the following files:
 
  /zfiles/home/arno/.scito/contrib/XNAT.tar
  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
  [root@cc /usr/temp]#
 
 
 As said, memtest is OK, nothing is logged to the console, UFS on the
 same disk works OK (I did some tests copying and comparing random data)
 and smartctl as well seems to trust the disk :
 
  SMART Self-test log structure revision number 1
  Num  Test_DescriptionStatus  Remaining  LifeTime(hours)
  # 1  Extended offlineCompleted without error   00%   388
  # 2  Short offline   Completed without error   00%   387 
 
 
 Am I doing something wrong and/or let me know what I could provide as
 extra info to try to solve this (dmesg.boot at the end of this mail).
 
 Thanx a lot in advance,
 
 best, Arno

 Arno, you forgot to say how are you create geli partiotion.
 It is important.


  geli init /dev/ada0s3  (should I have used ' -s 4096 ' ???) 

I added later :

  geli  attach -k /tmp/ifmemoryfails.key1 -p /dev/ada0s3


In fact, on my regular laptop on which I now use UFS on top of GELI
I use /dev/ada0s3f, not the whole partition 

Hope this helps ;-)

thanx, best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 9-stable : geli + one-disk ZFS fails

2012-02-14 Thread Arno J. Klaassen

Hi,

Martin Simmons mar...@lispworks.com writes:

 Some random ideas:

 1) Can you dd the whole of ada0s3.eli without errors?

I just started it; will take some hours 

 2) If you scrub a few more times, does it find the same number of errors each
 time and are they always in that XNAT.tar file?

I deleted the XNAT.tar; I also copied files by 'ssh tar -c | tar -xp' to
rule out NFS, same type of errors; Looks like multiple scrubs give the
same files but not the same number of chksum errors (to be confirmed)

 3) Can you try zfs without geli?

sure, I will split the place in one partition with geli and one without

 4) Is the slice/partition layout definitely correct?


I (still ???) use sysinstall to do the dirty computations in my place.

This is what gpart says (looks OK (to me ...) :


[root@cc ~]# gpart list ada0
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 976773167
first: 63
entries: 4
scheme: MBR
Providers:
1. Name: ada0s1
   Mediasize: 40802001408 (38G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 32256
   Mode: r0w0e0
   rawtype: 7
   length: 40802001408
   offset: 32256
   type: ntfs
   index: 1
   end: 79691471
   start: 63
2. Name: ada0s2
   Mediasize: 34359607296 (32G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147328000
   Mode: r3w3e5
   attrib: active
   rawtype: 165
   length: 34359607296
   offset: 40802033664
   type: freebsd
   index: 2
   end: 146800079
   start: 79691472
3. Name: ada0s3
   Mediasize: 424946221056 (395G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147196928
   Mode: r1w1e1
   rawtype: 165
   length: 424946221056
   offset: 75161640960
   type: freebsd
   index: 3
   end: 976773167
   start: 146800080
Consumers:
1. Name: ada0
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Mode: r4w4e10

  

Merci,

Arno


 __Martin


 On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
 
 hello,
 
 to eventually gain interest in this issue :
 
  I updated to today's -stable, tested with vfs.zfs.debug=1
  and vfs.zfs.prefetch_disable=0, no difference.
 
  I also tested to read the raw partition :
 
   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
   103746636+0 records in
   103746636+0 records out
   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
   [root@cc /usr/ports]#
 
  Disk is brand new, looks ok, either my setup is not good or there is
  a bug somewhere; I can play around with this box for some more time,
  please feel free to provide me with some hints what to do to be useful
  for you.
 
 Best,
 
 Arno
 
 
 Arno J. Klaassen a...@heho.snv.jussieu.fr writes:
 
  Hello,
 
 
  I finally decided to 'play' a bit with ZFS on a notebook, some years
  old, but I installed a brand new disk and memtest passes OK.
 
  I installed base+ports on partition 2, using 'classical' UFS.
 
  I crypted partition 3 and created a single zpool on it containing
  4 Z-file-systems :
 
   [root@cc ~]# zfs list
   NAME  USED  AVAIL  REFER  MOUNTPOINT
   zfiles   10.7G   377G   152K  /zfiles
   zfiles/home  10.6G   377G   119M  /zfiles/home
   zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
   zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
   zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
 
 
  I export the ZFS's via nfs and rsynced on the other machine some backup
  of my current note-book (geli + UFS, (almost) same 9-stable version, no
  problem) to the ZFS's.
 
 
  Quite fast, I see on the notebook :
 
 
   [root@cc /usr/temp]# zpool status -v
 pool: zfiles
state: ONLINE
   status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
   action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
 scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
 2012
   config: 
   
   NAME  STATE READ WRITE CKSUM
   zfilesONLINE   0 011
 ada0s3.eli  ONLINE   0 023
 
   errors: Permanent errors have been detected in the following files:
 
   /zfiles/home/arno/.scito/contrib/XNAT.tar
   [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
   md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
   [root@cc /usr/temp]#
 
 
  As said, memtest is OK, nothing is logged to the console, UFS on the
  same disk works OK (I did some tests copying and comparing random data)
  and smartctl as well seems to trust the disk :
 
   SMART Self-test log structure revision number 1
   Num  Test_DescriptionStatus  Remaining  
  LifeTime(hours)
   # 1  Extended offlineCompleted without error   00%   388
   # 2  Short offline   Completed without error   00%   387

9-stable : geli + one-disk ZFS fails

2012-02-11 Thread Arno J. Klaassen

Hello,


I finally decided to 'play' a bit with ZFS on a notebook, some years
old, but I installed a brand new disk and memtest passes OK.

I installed base+ports on partition 2, using 'classical' UFS.

I crypted partition 3 and created a single zpool on it containing
4 Z-file-systems :

 [root@cc ~]# zfs list
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 zfiles   10.7G   377G   152K  /zfiles
 zfiles/home  10.6G   377G   119M  /zfiles/home
 zfiles/home/arno 10.5G   377G  2.35G  /zfiles/home/arno
 zfiles/home/arno/.priv192K   377G   192K  /zfiles/home/arno/.priv
 zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito


I export the ZFS's via nfs and rsynced on the other machine some backup
of my current note-book (geli + UFS, (almost) same 9-stable version, no
problem) to the ZFS's.


Quite fast, I see on the notebook :


 [root@cc /usr/temp]# zpool status -v
   pool: zfiles
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
   scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
   2012
 config: 
 
 NAME  STATE READ WRITE CKSUM
 zfilesONLINE   0 011
   ada0s3.eli  ONLINE   0 023

 errors: Permanent errors have been detected in the following files:

 /zfiles/home/arno/.scito/contrib/XNAT.tar
 [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
 md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
 [root@cc /usr/temp]#


As said, memtest is OK, nothing is logged to the console, UFS on the
same disk works OK (I did some tests copying and comparing random data)
and smartctl as well seems to trust the disk :

 SMART Self-test log structure revision number 1
 Num  Test_DescriptionStatus  Remaining  LifeTime(hours)
 # 1  Extended offlineCompleted without error   00%   388
 # 2  Short offline   Completed without error   00%   387 


Am I doing something wrong and/or let me know what I could provide as
extra info to try to solve this (dmesg.boot at the end of this mail).

Thanx a lot in advance,

best, Arno



### demsg.boot ###

Table 'FACP' at 0xbdd90200
Table 'APIC' at 0xbdd90390
APIC: Found table at 0xbdd90390
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 1: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 1 ACPI ID 2: enabled
SMP: Added CPU 1 (AP)
MADT: Found CPU APIC ID 130 ACPI ID 3: disabled
MADT: Found CPU APIC ID 131 ACPI ID 4: disabled
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-STABLE #0: Fri Feb  3 22:48:57 CET 2012
toor@cc:/usr/obj/raid1/bsd/9/src/sys/VR603 amd64
Preloaded elf kernel /boot/kernel/kernel at 0x80bba000.
Preloaded /boot/zfs/zpool.cache /boot/zfs/zpool.cache at 0x80bba200.
Calibrating TSC clock ... TSC clock: 2161296371 Hz
CPU: Intel(R) Pentium(R) Dual  CPU  T3400  @ 2.16GHz (2161.30-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x6fd  Family = 6  Model = f  Stepping = 13
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
real memory  = 3221225472 (3072 MB)
Physical memory chunk(s):
0x1000 - 0x00095fff, 610304 bytes (149 pages)
0x0010 - 0x001f, 1048576 bytes (256 pages)
0x00be9000 - 0xb8402fff, 3078725632 bytes (751642 pages)
avail memory = 3057152000 (2915 MB)
Event timer LAPIC quality 400
ACPI APIC Table: MSI_NB MEGABOOK
INTR: Adding local APIC 1 as a target
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
x86bios:  IVT 0x00-0x0004ff at 0xfe00
x86bios: SSEG 0x001000-0x001fff at 0xff800021
x86bios: EBDA 0x099000-0x09 at 0xfe099000
x86bios:  ROM 0x0a-0x0fefff at 0xfe0a
APIC: CPU 0 has ACPI ID 1
APIC: CPU 1 has ACPI ID 2
ULE: setup cpu 0
ULE: setup cpu 1
ACPI: RSDP 0xf9420 00014 (v00 ACPIAM)
ACPI: RSDT 0xbdd9 00048 (v01 MSI_NB MEGABOOK 20091013 MSFT 0097)
ACPI: FACP 0xbdd90200 00084 (v01 MSI_NB MEGABOOK 20091013 MSFT 0097)
ACPI: DSDT 0xbdd905c0 072D3 (v01  1ADTS 1ADTS012 0012 INTL 20051117)
ACPI: FACS 0xbdd9e000 00040
ACPI: APIC 0xbdd90390 0006C (v01 

Re: 8.2-PRERELEASE freezing on reboot (-current OK)

2010-12-14 Thread Arno J. Klaassen
Andriy Gapon a...@freebsd.org writes:

 on 14/12/2010 02:38 Jeremy Chadwick said the following:
 1) 
 [snip]
 Also try dropping to the
 debugger via serial console (serial break) or VGA (Ctrl-Alt-Esc).

 This is a good advice.

may be ;-) but the box realy freezes, no way to drop into the debugger
nor via serial (ALT_BREAK_TO_DEBUGGER compiled as well) nor
via VGA/PS2 ... realy frozen

Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.2-PRERELEASE freezing on reboot (-current OK)

2010-12-14 Thread Arno J. Klaassen
Jeremy Chadwick free...@jdc.parodius.com writes:

 On Tue, Dec 14, 2010 at 11:24:52PM +0100, Arno J. Klaassen wrote:
 Andriy Gapon a...@freebsd.org writes:
 
  on 14/12/2010 02:38 Jeremy Chadwick said the following:
  1) 
  [snip]
  Also try dropping to the
  debugger via serial console (serial break) or VGA (Ctrl-Alt-Esc).
 
  This is a good advice.
 
 may be ;-) but the box realy freezes, no way to drop into the debugger
 nor via serial (ALT_BREAK_TO_DEBUGGER compiled as well) nor
 via VGA/PS2 ... realy frozen

 Bummer.  It sounds like it's a regression of some kind, since your
 original mail stated this problem has been happening for you on RELENG_8
 (8.x) for quite some time.  (I have to assume it didn't happen for you
 at all on 7.3 or prior).

yop, build this box with some 7.X; it's a spare/development/test one,
doesn't get that much attention when problems, nor frequent reboots.
I noticed this problem at least some months ago (I track -stable on it
when I have time/envy) and adopted the anti-Murphy attitude (wait for
-RELEASE and recheck) 

 Would it be possible for you to dedicate some time narrowing down when
 the problem was introduced?

 A good starting point might be to try 8.0-RELEASE and then 8.1-RELEASE
 (just download + burn + boot livefs images and try that).  If it happens
 on 8.1-RELEASE but not 8.0, then we've at least narrowed down the
 timeframe.

I answered Attilio Rao atti...@freebsd.org in private on a patch he
sent me in order to get at least more info. I will test and let you, and
list, know the results.


 It might also be worth trying the same with 7.3-RELEASE vs. 7.4-BETA1
 (which just came out) see if the same bug was introduced there as result
 of an MFC.

 Otherwise, numerous csups with different date= strings in your supfiles
 and multiple buildworld/kernel/installs would be required.  I would
 probably pick intervals of 4 months at first.

Quite sure many 8-XXX worlds worked great on this box including reboots,
I will find the date window when things went wrong.

Merci, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.2-PRERELEASE freezing on reboot (-current OK)

2010-12-13 Thread Arno J. Klaassen
Hello,

Jeremy Chadwick free...@jdc.parodius.com writes:

 On Fri, Dec 10, 2010 at 10:37:32AM +0100, Arno J. Klaassen wrote:
 just FYI that on an 8-way Tyan S3992-E based box, a reboot under
 8.2-PRERELEASE (in fact, 8-stable since quite a while) makes the box
 freeze, whilst the same thing under -current works OK.

 Try toggling these two sysctls on the 8.2-PRERELEASE box.  Be sure to
 check what the defaults are before toggling them, and only mess with one
 at a time.

 hw.acpi.handle_reboot
 hw.acpi.disable_on_reboot

nope, no difference. Defaults are 0 for both sysctls, both on -current
and -8-stable.

I noticed that -current prints 'cpu_reset: Stopping other CPUs' at the
very end were -8-stable doesn't.

Thanx for your answer, best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


8.2-PRERELEASE freezing on reboot (-current OK)

2010-12-10 Thread Arno J. Klaassen

Hello,

just FYI that on an 8-way Tyan S3992-E based box, a reboot under
8.2-PRERELEASE (in fact, 8-stable since quite a while) makes the box
freeze, whilst the same thing under -current works OK.

For info the end of console output in both cases as well as dmesg.boot
for -current.

Feel free to contact me for more info or test patches.

Best, Arno


### console log ###

-current :

[r...@siamesetwins ~]# reboot
Dec 10 10:12:03 siamesetwins reboot: rebooted by toor
Dec 10 10:12:03 siamesetwins syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
ts_to_ct(1291972331.482314452) = [2010-12-10 09:12:11]
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...3 3 3 2 2 1 0 0 0 done
All buffers synced.
Swap device aacd0s1b removed.
Uptime: 7d13h4m34s
bge0: link DOWN
pcib1: wake_prep disabled wake for \_SB_.PCI0.P0P1 (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.P0P1.P1P2.SL2X (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.P0P1.P1P2.SL3X (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.USB0 (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.USB1 (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.USB2 (S5)
atkbdc0: wake_prep disabled wake for \_SB_.PCI0.SBRG.PS2K (S5)
psmcpnp0: wake_prep disabled wake for \_SB_.PCI0.SBRG.PS2M (S5)
pcib3: wake_prep disabled wake for \_SB_.PCI0.BR14 (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.BR14.SL4X (S5)
pcib4: wake_prep disabled wake for \_SB_.PCI0.BR1E (S5)
bge0: wake_prep disabled wake for \_SB_.PCI0.BR1E.GBE1 (S5)
bge1: wake_prep disabled wake for \_SB_.PCI0.BR1E.GBE2 (S5)
pcib5: wake_prep disabled wake for \_SB_.PCI0.BR28 (S5)
pcib8: wake_prep disabled wake for \_SB_.PCI0.BR32 (S5)
pcib9: wake_prep disabled wake for \_SB_.PCI0.BR3C (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.SL1X (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.MBE1 (S5)
aac0: shutting down controller...done
Rebooting...
cpu_reset: Stopping other CPUs


8.2-PRE :

# reboot
Dec 10 10:18:21 siamesetwins reboot: rebooted by root
Dec 10 10:18:21 siamesetwins syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done

Waiting (max 60 seconds) for system process `syncer' to stop...Syncing disks, 
vnodes remaining...1 1 0 1 0 0 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.
lock order reversal:
 1st 0xff004b2747e8 ufs (ufs) @ /raid1/bsd/8/src/sys/kern/vfs_mount.c:1204
 2nd 0xff004b27e308 syncer (syncer) @ 
/raid1/bsd/8/src/sys/kern/vfs_subr.c:2231
KDB: stack backtrace:
db_trace_self_wrapper() at 0x801d623a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x802e4d27 = kdb_backtrace+0x37
_witness_debugger() at 0x802f8645 = _witness_debugger+0x65
witness_checkorder() at 0x802f98f3 = witness_checkorder+0x833
__lockmgr_args() at 0x8029cd05 = __lockmgr_args+0xd75
vop_stdlock() at 0x803386c9 = vop_stdlock+0x39
VOP_LOCK1_APV() at 0x8054a38b = VOP_LOCK1_APV+0x9b
_vn_lock() at 0x80355308 = _vn_lock+0x68
vputx() at 0x8034b595 = vputx+0x315
dounmount() at 0x80340adb = dounmount+0x2ab
vfs_unmountall() at 0x8034851c = vfs_unmountall+0x4c
boot() at 0x802b2fd6 = boot+0x7b6
reboot() at 0x802b32f8 = reboot+0x68
syscallenter() at 0x802f190f = syscallenter+0xef
syscall() at 0x804fc230 = syscall+0x60
Xfast_syscall() at 0x804e4312 = Xfast_syscall+0xe2
--- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078db3c, rsp = 
0x7fffecf8, rbp = 0 ---
lock order reversal:
 1st 0xff004b2747e8 ufs (ufs) @ /raid1/bsd/8/src/sys/kern/vfs_mount.c:1204
 2nd 0xff0007faf7e8 devfs (devfs) @ 
/raid1/bsd/8/src/sys/ufs/ffs/ffs_vfsops.c:1244
KDB: stack backtrace:
db_trace_self_wrapper() at 0x801d623a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x802e4d27 = kdb_backtrace+0x37
_witness_debugger() at 0x802f8645 = _witness_debugger+0x65
witness_checkorder() at 0x802f98f3 = witness_checkorder+0x833
__lockmgr_args() at 0x8029cd05 = __lockmgr_args+0xd75
vop_stdlock() at 0x803386c9 = vop_stdlock+0x39
VOP_LOCK1_APV() at 0x8054a38b = VOP_LOCK1_APV+0x9b
_vn_lock() at 0x80355308 = _vn_lock+0x68
ffs_flushfiles() at 0x804a29e5 = ffs_flushfiles+0xc5
ffs_unmount() at 0x804a33ec = ffs_unmount+0x6c
dounmount() at 0x80340b16 = dounmount+0x2e6
vfs_unmountall() at 0x8034851c = vfs_unmountall+0x4c
boot() at 0x802b2fd6 = boot+0x7b6
reboot() at 0x802b32f8 = reboot+0x68
syscallenter() at 0x802f190f = syscallenter+0xef
syscall() at 0x804fc230 = syscall+0x60
Xfast_syscall() at 0x804e4312 = Xfast_syscall+0xe2
--- syscall (55, FreeBSD ELF64, reboot), rip = 0x80078db3c, rsp = 
0x7fffecf8, rbp = 0 ---
Swap device 

Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze

2008-12-13 Thread Arno J. Klaassen
Jung-uk Kim j...@freebsd.org writes:

 On Friday 12 December 2008 04:26 pm, Jung-uk Kim wrote:
  On Friday 12 December 2008 03:36 pm, Arno J. Klaassen wrote:
   cpghost cpgh...@cordula.ws writes:
On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen 
 wrote:
 yet another powerd SOS : on an ASUS M3A78-EM MB with
 Phenom 9750 and 8 gig memory, starting powerd freezes
 the box after slowing down a bit cpu frequency.
   
(... snip ...)
   

 I forgot there is a PR with the latest driver:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=128575

yes, this works better :

kldload cpufreq :

  hwpstate0: Cool`n'Quiet 2.0 on cpu0
  hwpstate0: SVI mode
  hwpstate0: you have 2 P-state.
  hwpstate0: freq=2400MHz volts=1300mV
  hwpstate0: freq=1200MHz volts=1050mV
  hwpstate0: Now P0-state.
  hwpstate1: Cool`n'Quiet 2.0 on cpu1
  hwpstate1: SVI mode
  hwpstate1: you have 2 P-state.
  hwpstate1: freq=2400MHz volts=1300mV
  hwpstate1: freq=1200MHz volts=1050mV
  hwpstate1: Now P0-state.
  hwpstate2: Cool`n'Quiet 2.0 on cpu2
  hwpstate2: SVI mode
  hwpstate2: you have 2 P-state.
  hwpstate2: freq=2400MHz volts=1300mV
  hwpstate2: freq=1200MHz volts=1050mV
  hwpstate2: Now P0-state.
  hwpstate3: Cool`n'Quiet 2.0 on cpu3
  hwpstate3: SVI mode
  hwpstate3: you have 2 P-state.
  hwpstate3: freq=2400MHz volts=1300mV
  hwpstate3: freq=1200MHz volts=1050mV
  hwpstate3: Now P0-state.

however, I need to disable acpi_throttle;
standard, I get :

  [r...@m34 ~]# sysctl dev.cpu.0.freq_levels
  dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 599/-1 
299/-1
  [r...@m34 ~]# kldload cpufreq
  [r...@m34 ~]# sysctl dev.cpu.0.freq_levels
  dev.cpu.0.freq_levels: 2400/-1 2100/-1 1800/-1 1500/-1 1200/-1 1050/-1 900/-1 
750/-1 600/-1 450/-1 300/-1 150/-1
  [r...@m34 ~]# powerd -v
  powerd: unable to determine AC line status
  idle time  90%, decreasing clock speed from 2398 MHz to -1 MHz
  powerd: error setting CPU frequency -1: Invalid argument
  idle time  90%, decreasing clock speed from 2398 MHz to -1 MHz
  powerd: error setting CPU frequency -1: Invalid argument

rebooting with hint.acpi_throttle.0.disabled=1 gives :


  [r...@m34 ~]# sysctl dev.cpu.0.freq_levels
  sysctl: unknown oid 'dev.cpu.0.freq_levels'
  [r...@m34 ~]# kldload cpufreq
  [r...@m34 ~]# sysctl dev.cpu.0.freq_levels
  dev.cpu.0.freq_levels: 2400/-1 1200/-1
  [r...@m34 ~]# powerd -v
  powerd: unable to determine AC line status
  idle time  90%, decreasing clock speed from 2400 MHz to 1200 MHz

Thanx, Arno


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze

2008-12-12 Thread Arno J. Klaassen

cpghost cpgh...@cordula.ws writes:

 On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen wrote:
  yet another powerd SOS : on an ASUS M3A78-EM MB with
  Phenom 9750 and 8 gig memory, starting powerd freezes
  the box after slowing down a bit cpu frequency.
 
 (... snip ...)
 
  dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 
  599/-1 299/-1
  
  further :
  
   - I set debug.cpufreq.lowest superior to 1500 : system remains
 up but only when pushing really slightly
  
   -  I set debug.cpufreq.lowest inferior to 1100 : freeze
  garantueed
 
 Same here. Running with
   debug.cpufreq.lowest=1240
 in /boot/loader.conf to prevent freezes.
 
 This is a FreeBSD 7.1-PRERELEASE #0: Sat Nov  8 14:18:05 CET 2008
   r...@textbox:/usr/obj/usr/src/sys/GENERIC
 running in amd64 and i386 mode with ACPI enabled (default):
 
 CPU: AMD Phenom(tm) 9350e Quad-Core Processor (2000.08-MHz K8-class CPU)
   Origin = AuthenticAMD  Id = 0x100f23  Stepping = 3
   Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,
   PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
   Features2=0x802009SSE3,MON,CX16,b23
   AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,
   3DNow!+,3DNow!
   AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7,
   Prefetch,b9,b10
   Cores per package: 4
 
 using an MSI board with SB600 chipset and newest BIOS.
 
 No idea why the system freezes below approx 1200 MHz. But apparently,
 this bug is quite common and affects a lot of systems with Phenoms. :(

do Phenoms not support powernow? I am a bit puzzled by the
differnce with two X2 boards I have around here :

  FreeBSD 7.1-PRERELEASE #0: Tue Dec  2 20:09:28 
  ...
  CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ (2992.52-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x40f33  Stepping = 3
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x2001SSE3,CX16
  AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x1fLAHF,CMP,SVM,ExtAPIC,CR8
  Cores per package: 2
  ...
  cpu0: ACPI CPU on acpi0
  powernow0: PowerNow! K8 on cpu0
  cpu1: ACPI CPU on acpi0
  powernow1: PowerNow! K8 on cpu1


  FreeBSD 7.1-PRERELEASE #1: Mon Nov 17 14:40:26
  ...
  CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x60f82  Stepping = 2
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x2001SSE3,CX16
  AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch
  Cores per package: 2
  ...
  cpu0: ACPI CPU on acpi0
  acpi_throttle0: ACPI CPU Throttling on cpu0
  powernow0: PowerNow! K8 on cpu0
  cpu1: ACPI CPU on acpi0
  acpi_throttle1: ACPI CPU Throttling on cpu1
  acpi_throttle1: failed to attach P_CNT
  device_attach: acpi_throttle1 attach returned 6
  powernow1: PowerNow! K8 on cpu1


whereas the Phenom says :

  CPU: AMD Phenom(tm) 9750 Quad-Core Processor (2410.66-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x100f23  Stepping = 3
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x802009SSE3,MON,CX16,b23
  AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow
  AMD 
Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7,Prefetch,b9,b1
  ...
  cpu0: ACPI CPU on acpi0
  acpi_throttle0: ACPI CPU Throttling on cpu0
  cpu1: ACPI CPU on acpi0
  cpu2: ACPI CPU on acpi0
  cpu3: ACPI CPU on acpi0

my conclusion : acpi_throttle attaches a X4 (why not) and 
not at X2 (thought the Turion seems to detect it but fails
to attach), powernow does not seem to attach to X4 ...

Best regards, Arno


 
   - I define hint.acpi_throttle.0.disabled=1 in loader.conf
 then no dev.cpu.0.freq is showing up ... (as if
 only acpi_throttle is attaching and not powernow)
  
  Let me know what I can test further.
  
  Best, Arno
 
 Regards,
 -cpghost.
 
 -- 
 Cordula's Web. http://www.cordula.ws/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze

2008-12-12 Thread Arno J. Klaassen
Jung-uk Kim j...@freebsd.org writes:

 On Friday 12 December 2008 04:26 pm, Jung-uk Kim wrote:
  On Friday 12 December 2008 03:36 pm, Arno J. Klaassen wrote:
   cpghost cpgh...@cordula.ws writes:
On Fri, Dec 12, 2008 at 12:01:29AM +0100, Arno J. Klaassen 
 wrote:
   
[ .. stuff deleted .. ]
   
   do Phenoms not support powernow?
 
  [SNIP]
 
  Phenom is 10H family processor and it has Cool`n'Quiet 2.0. 
  Someone wrote a driver for it and it was posted on freebsd-current
  in September:
 
  http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0
 88330.html
  http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0
 88803.html
  http://lists.freebsd.org/pipermail/freebsd-current/2008-September/0
 88806.html
 
 I forgot there is a PR with the latest driver:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=128575

ah, I see. Thank you very much.
I'll give it a try this WE

Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


7.1-PRERELEASE: asus M3A / Phenom X4 / powerd freeze

2008-12-11 Thread Arno J. Klaassen

hello,

yet another powerd SOS : on an ASUS M3A78-EM MB with
Phenom 9750 and 8 gig memory, starting powerd freezes
the box after slowing down a bit cpu frequency.

[IMHO] usefull bit of info :

FreeBSD m34.scito.local 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Dec 11 
14:24:39 CET 2008 r...@m34.scito.local:/usr/obj/raid1/bsd/src7/sys/M3A78-EM 
 amd64


CPU: AMD Phenom(tm) 9750 Quad-Core Processor (2410.66-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x100f23  Stepping = 3
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x802009SSE3,MON,CX16,b23
  AMD 
Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!
  AMD 
Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,b5,b6,b7,Prefetch,b9,b10
  Cores per package: 4
usable memory = 8547172352 (8151 MB)
avail memory  = 8268722176 (7885 MB)
ACPI APIC Table: 102408 APIC2239
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0 Version 2.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: 102408 XSDT2239 on motherboard

...

cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
Timecounter HPET frequency 14318180 Hz quality 900

dev.cpu.0.freq_levels: 2398/-1 2098/-1 1798/-1 1498/-1 1199/-1 899/-1 599/-1 
299/-1


further :

 - I set debug.cpufreq.lowest superior to 1500 : system remains
   up but only when pushing really slightly

 -  I set debug.cpufreq.lowest inferior to 1100 : freeze
garantueed

 - I define hint.acpi_throttle.0.disabled=1 in loader.conf
   then no dev.cpu.0.freq is showing up ... (as if
   only acpi_throttle is attaching and not powernow)

Let me know what I can test further.

Best, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-PRERELEASE : bad network performance (nfe0)

2008-09-30 Thread Arno J. Klaassen

Robert Watson [EMAIL PROTECTED] writes:

 On Mon, 29 Sep 2008, Arno J. Klaassen wrote:
 
  However, the request/respones tests are awfull for my notebook
  (test repeated on the notebook for the sake of conviction) :
 
 Is it possible to rerun these tests with a 7.0 kernel of the same
 general configuration?  That would help us determine if it's a
 regression between 7.0 and 7.1,

7.0-RELEASE-p4 kernel (and 7.1 world) as well as 7.0-RELEASE
life-cd give same results : great streaming, very poor
request/response

 or perhaps a more general issue
 between 6.x and 7.x.

nve(4) does not recognise this chip.

If someone does have a bootable 6-stable .iso with
a backported nfe(4) ... or email if_nfe.ko to me
and I will tes under 6-stable

For now I will test the patches Pyun and Luigi sent me
and let you know.

Best, arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 7.1-PRERELEASE : bad network performance (nfe0)

2008-09-29 Thread Arno J. Klaassen

Dear Pyun,


thanx for your prompt answer (as usual).

Pyun YongHyeon [EMAIL PROTECTED] writes:

 On Sat, Sep 27, 2008 at 11:21:00PM +0200, Arno J. Klaassen wrote:
   
   
   Hello,
   
   I've serious network performance problems on a HP Turion X2
   based brand new notebook; I only used a 7-1Beta CD and
   7-STABLE on this thing.
   
   Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives :
   
 # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/
   ports.tgz 100%   98MB  88.7KB/s   18:49 
   
   (doing the same thing by copy from an nfs-mounted disk even
takes mores than an hour ...)
   
   
   Doing a top(1) aside, just shows the box 100% idle :
   
 PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  12 root 171 ki31 0K16K CPU0   0  38:55 100.00% idle: cpu0
  11 root 171 ki31 0K16K RUN1  38:55 100.00% idle: cpu1
  13 root -32- 0K16K WAIT   0   0:02  0.00% swi4: clock 
 sio
  29 root -68- 0K16K -  0   0:00  0.00% nfe0 taskq
  34 root -64- 0K16K WAIT   1   0:00  0.00% irq23: atapci1
1853 root   80  7060K  1920K wait   0   0:00  0.00% sh
 878 nono  440  8112K  2288K CPU1   1   0:00  0.00% top
 884 root   8- 0K16K -  1   0:00  0.00% nfsiod 0
   4 root  -8- 0K16K -  1   0:00  0.00% g_down
  16 root -16- 0K16K -  1   0:00  0.00% yarrow
  46 root  20- 0K16K syncer 0   0:00  0.00% syncer
   3 root  -8- 0K16K -  0   0:00  0.00% g_up
  30 root -68- 0K16K -  0   0:00  0.00% fw0_taskq
   
   
   I tested :
   
 Update Bios
 ULE /4BSD
 PREEMPTION on/off
 PREEMPTION + IPI_PREEMPTION
 hw.nfe.msi[x]_disable=1
  ^^^
  This has no effect as MCP65 lacks MSI/MSI-X capability. 
   
   All don't seem to matter to the problem.
   
   I put two tcpdumps (server and client during another scp(1) ) on 
 http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server
 http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client
   
   I'm far from an expert on TCP/IP, but wireshark expert info shows
   lots of sequences like :
   
 TCP Previous segment lost
 TCP Duplicate ACK 1
 TCP Window update
 TCP Duplicate ACK 2
 TCP Duplicate ACK 3
 TCP Duplicate ACK 4
 TCP Duplicate ACK 5
 TCP Fast retransmission (suspected)
 TCP ...
 TCP Out-of-Order segment
 TCP ...
   
   
   As usual, feel free to contact me for further info/tests.
   
 
 AFAIK it seems that you're the first one that reports poor
 performance issue of MCP65.


someone must be ;) no kiddin, I am not convinced this is (only)
a driver issue (cf. bad NFS/UDP performance thread on -hackers).

I just have no experience on this notebook, so I can't say  it worked
great before and my only other 7-stable-amd64 I have does not
show the probs, having a cheap re0 *and* being UP.


 MCP65 has no checksum offload/TSO
 capability so nfe(4) never try to take advantage of the hardware
 capability. So you should have no checksum offload/TSO related
 issue here. 
 Also note, checking network performance with scp(1) wouldn't
 show real numbers as scp(1) may involve other system activities.
 Use one of network benchmark programs in ports(e.g.
 benchmarks/netperf) to measure network performance.

quite funny (even taken with lots of salt since the LAN is used
for normal work as well in parallel, but differences are
rather significant) :

I test to same server (7-stable-amd64 from Jun  7 (using
nfe0 as well btw, but another chip), either from a
6-stable-x86 (Jul 14, sk0) or the notebook (7-stable-x64 below), using

 for i in SOME-TESTS  ; do
echo $i; /usr/local/bin/netperf -H push -i 4,2 -I 95,10 -t $i;  echo;
 done

streaming results are OK for both :

TCP_STREAM
  Throughput  
  10^6bits/sec  

6-stable-x86  349.57   
7-stable-x64  939.47 

UDP_STREAM
  Throughput
  10^6bits/sec

6-stable-x86  388.45
7-stable-x64  947.89


However, the request/respones tests are awfull for my notebook 
(test repeated on the notebook for the sake of conviction) :

TCP_RR
  Trans.
  Rate  
  per sec

6-stable-x86  9801.58
7-stable-x64   137.61
7-stable-x6489.35
7-stable-x64   102.29

TCP_CRR
  Trans.
  Rate
  per sec   

6-stable-x86  4520.98   
7-stable-x64 7.00
7-stable-x64 8.10
7-stable-x6418.49


UDP_RR
  Trans.
  Rate
  per sec   

6-stable-x86  9473.20   
7-stable-x64 9.60
7-stable-x64 0.90
7-stable-x64 0.10


I can send you complete results if wanted.
 
 Other possible cause of issue could be link

7.1-PRERELEASE : bad network performance (nfe0)

2008-09-27 Thread Arno J. Klaassen


Hello,

I've serious network performance problems on a HP Turion X2
based brand new notebook; I only used a 7-1Beta CD and
7-STABLE on this thing.

Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives :

  # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/
ports.tgz 100%   98MB  88.7KB/s   18:49 

(doing the same thing by copy from an nfs-mounted disk even
 takes mores than an hour ...)


Doing a top(1) aside, just shows the box 100% idle :

  PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
   12 root 171 ki31 0K16K CPU0   0  38:55 100.00% idle: cpu0
   11 root 171 ki31 0K16K RUN1  38:55 100.00% idle: cpu1
   13 root -32- 0K16K WAIT   0   0:02  0.00% swi4: clock sio
   29 root -68- 0K16K -  0   0:00  0.00% nfe0 taskq
   34 root -64- 0K16K WAIT   1   0:00  0.00% irq23: atapci1
 1853 root   80  7060K  1920K wait   0   0:00  0.00% sh
  878 nono  440  8112K  2288K CPU1   1   0:00  0.00% top
  884 root   8- 0K16K -  1   0:00  0.00% nfsiod 0
4 root  -8- 0K16K -  1   0:00  0.00% g_down
   16 root -16- 0K16K -  1   0:00  0.00% yarrow
   46 root  20- 0K16K syncer 0   0:00  0.00% syncer
3 root  -8- 0K16K -  0   0:00  0.00% g_up
   30 root -68- 0K16K -  0   0:00  0.00% fw0_taskq


I tested :

  Update Bios
  ULE /4BSD
  PREEMPTION on/off
  PREEMPTION + IPI_PREEMPTION
  hw.nfe.msi[x]_disable=1

All don't seem to matter to the problem.

I put two tcpdumps (server and client during another scp(1) ) on 
  http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server
  http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client

I'm far from an expert on TCP/IP, but wireshark expert info shows
lots of sequences like :

  TCP Previous segment lost
  TCP Duplicate ACK 1
  TCP Window update
  TCP Duplicate ACK 2
  TCP Duplicate ACK 3
  TCP Duplicate ACK 4
  TCP Duplicate ACK 5
  TCP Fast retransmission (suspected)
  TCP ...
  TCP Out-of-Order segment
  TCP ...


As usual, feel free to contact me for further info/tests.

Thanx, Arno

# uname -a
FreeBSD mv 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 
2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON  amd64

# pciconf -lcv (bits)
[EMAIL PROTECTED]:0:6:0:class=0x02 card=0x30cf103c chip=0x045010de 
rev=0xa3 hdr=0x00
vendor = 'Nvidia Corp'
device = 'MCP65 Ethernet'
class  = network
subclass   = ethernet
cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0


# dmesg -a

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON
Timecounter i8254 frequency 1193250 Hz quality 0
CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x60f82  Stepping = 2
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x2001SSE3,CX16
  AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x11fLAHF,CMP,SVM,ExtAPIC,CR8,Prefetch
  Cores per package: 2
usable memory = 3210813440 (3062 MB)
avail memory  = 3104542720 (2960 MB)
ACPI APIC Table: HP APIC  
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: HPQOEM SLIC-MPC on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
ACPI Error (dsopcode-0671): Field [I9MN] at 544 exceeds Buffer [IORT] size 464 
(bits) [20070320]
ACPI Error (psparse-0626): Method parse/execution failed 
[\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT
ACPI Error (uteval-0309): Method execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] 
(Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT
can't fetch resources for \\_SB_.PCI0.LPC0.PMIO - AE_AML_BUFFER_LIMIT
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
acpi_ec0: Embedded Controller: GPE 0x10 port 0x62,0x66 on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
Timecounter HPET frequency 2500 Hz quality 900
acpi_acad0: AC Adapter on acpi0
battery0: ACPI Control Method Battery on acpi0
acpi_lid0: Control Method Lid Switch on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: memory, RAM at device 0.0 (no driver attached)
isab0: PCI-ISA bridge port 

cpufreq for Opteron quad-core (2354)

2008-06-07 Thread Arno J. Klaassen

Hello,

apparently powernow on Opteron quad-core is not recognised; when
I kldload cpufreq (leaving it out of kernel) I get :

pci0: driver added
pci1: driver added
pci2: driver added
pci3: driver added
pci4: driver added
pci5: driver added
pci6: driver added
found- vendor=0x9005, dev=0x0285, revid=0x00
domain=0, bus=6, slot=14, func=0
class=01-04-00, hdrtype=0x00, mfdev=0
cmdreg=0x0196, statreg=0x0230, cachelnsz=16 (dwords)
lattimer=0xf8 (7440 ns), mingnt=0x01 (250 ns), maxlat=0x01 (250 ns)
intpin=a, irq=28
powerspec 2  supports D0 D1 D3  current D0
MSI supports 2 messages, 64 bit
pci0:6:14:0: reprobing on driver added
pci7: driver added
pci8: driver added
pci9: driver added
pci10: driver added


but no dev.cpu.0.freq* showing up.

When I dig up the by me so beloved good old acpi_ppc it says :

cpu0: Px state: P0, 2200MHz, 28000mW, 19us, 19us
cpu0: Px state: P1, 2000MHz, 26250mW, 19us, 19us
cpu0: Px state: P2, 1700MHz, 23750mW, 19us, 19us
cpu0: Px state: P3, 1400MHz, 21250mW, 19us, 19us
cpu0: Px state: P4, 1100MHz, 18750mW, 19us, 19us
cpu0: Px method: Unknown, disabled

This box will probably stay at my office for a while and I'd be glad to
provide more information.

Best, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


[nvidia | shared irq] umass disconnects [was: panic dd-ing from a USB disk ]

2008-06-06 Thread Arno J. Klaassen
Mikhail Teterin [EMAIL PROTECTED] writes:

 Hello!
 
 I had some troubles mounting the filesystem from:
 
   da0 at umass-sim0 bus 0 target 0 lun 0
   da0: MATSHITA DMC-FX12 0100 Removable Direct Access SCSI-2 device 
   da0: 1.000MB/s transfers
   da0: 3886MB (7959552 512 byte sectors: 255H 63S/T 495C)
 
 and decided to just dd the entire da0 to a file, so that the camera
 can be disconnected:
 
   dd if=/dev/da0 of=/home/mi/da0.dd bs=16384
 
 The dd-ing was proceeding slowly (600Kb/s) and I stopped watching it...
 
 The machine paniced about an hour later (at 0:52). The timestamp on
 /home/mi/da0.dd was 23:45, it was only about 500Mb in size.
 
 The stack is below. Would anybody like to look at the complete
 vmcore dump?
 
 The hardware is a quad Opteron with 8Gb RAM. Only 4Gb of these are
 used, because it runs 7.x/i386 from April 5th (without PAE) -- for
 the sake of NVidia's card.


I can easily produce a similar panic on a dual Opteron 185 with
3G of RAM and running 7-stable-amd64 on a (cheap) nvidia-based MB.
It runs gmirror on atapci1 and I attach a geli-encrypted
disk via usb. Both share irq 23.

Under heavy load (periodic security is enough ) it panics after
having disconnected umass0 ( kgdb trace below ) :

Unread portion of the kernel message buffer:
umass0: at uhub1 port 1 (addr 2) disconnected
(da1:umass-sim0:0:0:0): lost device
(pass1:umass-sim0:0:0:0): lost device
(pass1:umass-sim0:0:0:0): removing device entry


I'd be happy to provide more info.

Best, Arno


 Please, advise. Thanks!
 
   -mi
 
 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
 Undefined symbol ps_pglobal_lookup]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd.
 There is no member named pathname.
 Reading symbols from /boot/modules/nvidia.ko...done.
 Loaded symbols for /boot/modules/nvidia.ko
 Reading symbols from /opt/modules/fuse.ko...done.
 Loaded symbols for /opt/modules/fuse.ko
 
 Unread portion of the kernel message buffer:
 umass0: at uhub0 port 6 (addr 2) disconnected
 (da0:umass-sim0:0:0:0): lost device
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 3; apic id = 03
 fault virtual address = 0x0
 fault code= supervisor write, page not present
 instruction pointer   = 0x20:0xc0449702
 stack pointer = 0x28:0xeb74b8bc
 frame pointer = 0x28:0xeb74b8dc
 code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
 processor eflags  = interrupt enabled, resume, IOPL = 0
 current process   = 13989 (dd)
 trap number   = 12
 panic: page fault
 cpuid = 3
 Uptime: 12d10h52m16s
 (da0:dead_sim0:0:0:0): Synchronize cache failed, status == 0x34, scsi status 
 == 0xc8
 Physical memory: 3054 MB
 Dumping 334 MB: (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to abort)  
 (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to abort)  (CTRL-C to abort)  
 (CTRL-C to abort)  319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 
 79 63 47 31 15
 
 #0  doadump () at pcpu.h:195
 195   __asm __volatile(movl %%fs:0,%0 : =r (td));
 (kgdb) #0  doadump () at pcpu.h:195
 #1  0xc0599f7b in boot (howto=260) at /ibm/src/sys/kern/kern_shutdown.c:418
 #2  0xc059a449 in panic (fmt=0x104 Address 0x104 out of bounds)
 at /ibm/src/sys/kern/kern_shutdown.c:572
 #3  0xc077f60d in trap_fatal (frame=0xeb74b87c, eva=40)
 at /ibm/src/sys/i386/i386/trap.c:899
 #4  0xc077f9aa in trap_pfault (frame=0xeb74b87c, usermode=0, eva=0)
 at /ibm/src/sys/i386/i386/trap.c:812
 #5  0xc078035c in trap (frame=0xeb74b87c) at /ibm/src/sys/i386/i386/trap.c:490
 #6  0xc076637b in calltrap () at /ibm/src/sys/i386/i386/exception.s:139
 #7  0xc0449702 in xpt_done (done_ccb=0xc690a000)
 at /ibm/src/sys/cam/cam_xpt.c:4856
 #8  0xc044b15c in xpt_action (start_ccb=0xc690a000)
 at /ibm/src/sys/cam/cam_xpt.c:3057
 #9  0xc04462b6 in cam_periph_runccb (ccb=0xc690a000, error_routine=0, 
 camflags=CAM_FLAG_NONE, sense_flags=1, ds=0xc6aea690)
 at /ibm/src/sys/cam/cam_periph.c:878
 #10 0xc0453aa1 in daclose (dp=0xcc862600)
 at /ibm/src/sys/cam/scsi/scsi_da.c:714
 #11 0xc0549b2e in g_disk_access (pp=0xc7e12680, r=0, w=0, e=Variable e is 
 not available.)
 at /ibm/src/sys/geom/geom_disk.c:152
 #12 0xc054ec4d in g_access (cp=0xc8a90380, dcr=-1, dcw=0, dce=0)
 at /ibm/src/sys/geom/geom_subr.c:748
 #13 0xc05490f3 in g_dev_close (dev=0xca1dad00, flags=Variable flags is not 
 available.)
 at /ibm/src/sys/geom/geom_dev.c:217
 #14 0xc0531f69 in devfs_close (ap=0xeb74ba94)
 at /ibm/src/sys/fs/devfs/devfs_vnops.c:372
 #15 0xc0623e86 in 

nfs buildworld blocked by rpc.lockd ?

2008-05-28 Thread Arno J. Klaassen

Hello,

my buildworld on a 7-stable-amd64 blocks on the following line :

TERM=dumb TERMCAP=dumb: ex - /files/bsd/src7/share/termcap/termcap.src  
/files/bsd/src7/share/termcap/reorder

ex(1) stays in lockd state, and is unkillable, either by Ctl-C or
kill -9

/files/bsd is nfs-mounted as follows :

  push:/raid1/bsd/files/bsd nfs 
rw,bg,soft,nfsv3,intr,noconn,noauto,-r=32768,-w=32768  0   0

I can provide tcpdumps on server and client if helpful.

Thanx, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-05-01 Thread Arno J. Klaassen

Hello,


 [ .. stuff deleted .. ]

  I have recompiled the kernel with ULE, and it seems fine as well.  I
  ran 160 iterations of a 300MB file and there was no corruption.  Same
  process - copy a junk random file over nfs mount, unmount the nfs
  mount, remount it copy it back, compare the files.
 
 
 Let me summarise my investigations till now :
 
 [ .. more stuff deleted .. ]
 
 - it does *not* seem to depend on :
 
- the interface : I could produce it using nfe0, nfe1 and 
  re0 using some netgear pci-card
 
- the distribution of the 4Gig memory : installing 4G at 
  CPU1 or 1G at CPU1 and 2G at CPU2 produces same results
  (NB, all memory passed memtest.iso in both situtations
   for complete run)
 
- the frequency control method : easier to produce with
  cpufreq/powerd, but finally I can reproduce the cooruption
  as well using acpi_ppc
 
- the nfs-client and options (not exhaustively tested, but different
  test include i386-releng6, amd64-releng6 and linux, and quite
  a set of different try and see mounf_nfs options
 
 I am testing right now with a fixed frequency of 1Ghz.


I cannot reproduce it at fixed cpu-frequency with cpufreq loaded (I
ran my test for three days without prob, normally a couple of hours
was enough).

But I looked again at the corrupted copies :

# for i in raid5/xps/SAVE/1 raid5/pxe/SAVE/1 raid5/pxe/SAVE/2 raid5/pxe/SAVE/3 
raid5/blockhead/SAVE/1 scsi/pxe/SAVE/1 scsi/blockhead/SAVE/1
 scsi/blockhead/SAVE/2 scsi/blockhead/SAVE/3 scsi/blockhead/SAVE/4; do
  ls -l $i/BIG;   cmp -x $i/BIG $i/BIG2;   echo; done

-rw-r--r--  1 root  wheel  144703488 Apr 26 16:06 raid5/xps/SAVE/1/BIG
004fd908 18 00
02c9e6c8 11 00
034ab6c8 90 00
037e4648 09 00
039e85c8 91 01
04484408 00 09
06115cc8 00 81
06e5d148 01 91
07016048 18 00
074307c8 08 19
07aa45c8 29 20
080bfb88 00 11

-rw-r--r--  1 root  wheel  144703488 Apr 20 14:07 raid5/pxe/SAVE/1/BIG
03869a48 09 00

-rw-r--r--  1 root  wheel  144703488 Apr 20 14:47 raid5/pxe/SAVE/2/BIG
05209d88 09 00

-rw-r--r--  1 root  wheel  39845888 Apr 20 15:17 raid5/pxe/SAVE/3/BIG
01777148 09 00

-rw-r--r--  1 root  wheel  144703488 Apr 20 14:54 raid5/blockhead/SAVE/1/BIG
00f10f88 09 00

-rw-r--r--  1 root  wheel  39845888 Apr 20 16:08 scsi/pxe/SAVE/1/BIG
01f4c4c8 11 00

-rw-r--r--  1 root  wheel  144703488 Apr 20 15:38 scsi/blockhead/SAVE/1/BIG
06c3d6c8 11 00

-rw-r--r--  1 root  wheel  144703488 Apr 20 16:11 scsi/blockhead/SAVE/2/BIG
0725ca48 18 00

-rw-r--r--  1 root  wheel  144703488 Apr 20 17:32 scsi/blockhead/SAVE/3/BIG
01608008 09 00

-rw-r--r--  1 root  wheel  144703488 Apr 23 19:26 scsi/blockhead/SAVE/4/BIG
00f3b888 18 00


The output from raid5/xps/SAVE/1/BIG is after installing at a lab with
without doubt more sophisticated switches than I use and the first I was
able to produce with more that just one byte corrupted, but still with
the same pattern :

it looks like the position always is 2^3 * 'somethin without power of two'

(e.g. factor(hex2dec('00f10f88')) =  2  2  2  809  2441

  factor(hex2dec('01f4c4c8')) =  2  2  2  317  12941 )


and the corruption is one out of the following half-byte transitions :

1 - 0
8 - 0
9 - 0
0 - 1
0 - 8
0 - 9
8 - 9

Maybe this gives a hint to someone ... 

Best, Arno



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-26 Thread Arno J. Klaassen


Hello,


Mike Tancsa [EMAIL PROTECTED] writes:

 At 02:35 PM 4/22/2008, Arno J. Klaassen wrote:
 
   Also, you are using ULE or the 4BSD scheduler ?  I
   still have 4BSD on the box I am testing on.
 
 Interesting, this is with ULE. I didn't really test 4BSD on this
 box (I believed those who said SMP needs ULE *and* am quite
 satisfied with overall performance). I'll try 4BSD though time
 is getting short; I promised to deliver this box next thursday but will
 still have some days for on-site testing.
 
 
 I have recompiled the kernel with ULE, and it seems fine as well.  I
 ran 160 iterations of a 300MB file and there was no corruption.  Same
 process - copy a junk random file over nfs mount, unmount the nfs
 mount, remount it copy it back, compare the files.


Let me summarise my investigations till now :


- in all failing cases just *one* byte is currupted, 4 or all 8 bits
  set to zero *and* the original value is one out of the limited
  subset {1, 8, 9} 

  here is the output of `cmp -x $i/BIG $i/BIG2` for some failing
  cases I saved :


  03869a48 09 00
  05209d88 09 00
  01777148 09 00
  00f10f88 09 00
  01f4c4c8 11 00
  06c3d6c8 11 00
  0725ca48 18 00
  01608008 09 00
  00f3b888 18 00

  07aa45c8 29 20


- it does *not* seem to depend on :

   - the interface : I could produce it using nfe0, nfe1 and 
 re0 using some netgear pci-card

   - the distribution of the 4Gig memory : installing 4G at 
 CPU1 or 1G at CPU1 and 2G at CPU2 produces same results
 (NB, all memory passed memtest.iso in both situtations
  for complete run)

   - the frequency control method : easier to produce with
 cpufreq/powerd, but finally I can reproduce the cooruption
 as well using acpi_ppc

   - the nfs-client and options (not exhaustively tested, but different
 test include i386-releng6, amd64-releng6 and linux, and quite
 a set of different try and see mounf_nfs options

I am testing right now with a fixed frequency of 1Ghz.

I am not so inclined to test 4BSD, since reboot possibilities are
limited for me now on this box, but I set up next week a similar
board (S3992e) (iff I can find quad-core socket F over here ...)
and in a certain sense hope I can reproduce it an that board as well.

Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-22 Thread Arno J. Klaassen

Hello,

Mike Tancsa [EMAIL PROTECTED] writes:

 At 05:57 PM 4/21/2008, Arno J. Klaassen wrote:
   Hi,
   How long does it take for the problem to show up ?
 
 
 Less than an hour in general (running the same client script
 simultanuously on a 100Mbps linux box and 1Gbps bds6-x86)
 
 I am running my nic at gig speeds only...   I recompiled the kernel
 this morning to include cpufreq as well as made sure the coolquiet
 was enabled in the BIOS.
 
 
 
 for info, I test with args '38 999' (38M, try 999 times) on linux
 (slightly adapted script BTW) and '138 999' on bsd. The best 'score' I
 got was 'still 871 iterations to go'
 
 
 So far I have done 150 loops with an 80MB file and no issues and 200
 loopswith a 160MB file.  My nfe nic does not support MSI and has its
 own interrupt
 
 # vmstat -i
 interrupt  total   rate
 irq1: atkbd0   5  0
 irq4: sio0  3049  1
 irq16: twe0   327046164
 irq19: bge0   385147194
 irq21: atapci1976355492
 irq23: nfe0 11876726   5986
 cpu0: timer  3966420   1999
 cpu1: timer  3964392   1998
 

# vmstat -i
interrupt  total   rate
irq1: atkbd0   4  0
irq14: ata0   69  0
irq20: nfe0 11650955   5283
irq24: atapci194  0
irq28: atapci2   178  0
irq29: ahd0   355704161
cpu0: timer  4409020   1999
cpu1: timer  4391646   1991
cpu2: timer  4391643   1991
cpu3: timer  4391641   1991
 
 I have powerd started up with
 powerd_enable=YES
 powerd_flags=-a adaptive -b adaptive -n adaptive


slightly different, I mostly use -b adaptive -i 90 -n adaptive -r 80
but the problem shows up without flags as well.

 
 With the sleep in my test script, powerd does seem to be fiddling
 with frequencies as well during the inactivity.

I most often provoke slight swapping for randomizing frequency changes
and a burnK7 or similar to psuh up and down by hand
 
 # sysctl dev. | grep -i fre
 dev.cpu.0.freq: 1800
 dev.cpu.0.freq_levels: 2200/11 2000/105600 1800/89100 1000/49000
 dev.powernow.0.freq_settings: 2200/11 2000/105600 1800/89100 1000/49000
 dev.powernow.1.freq_settings: 2200/11 2000/105600 1800/89100 1000/49000
 dev.cpufreq.0.%driver: cpufreq
 dev.cpufreq.0.%parent: cpu0
 dev.cpufreq.1.%driver: cpufreq
 dev.cpufreq.1.%parent: cpu1

funny, when I do that :

# sysctl dev. | grep -i fre
dev.cpu.0.freq: 995
dev.cpu.0.freq_levels: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 
995/36100
dev.powernow.0.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 
1791/53200 995/36100
dev.powernow.1.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 
1791/53200 995/36100
dev.powernow.2.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 
1791/53200 995/36100
dev.powernow.3.freq_settings: 6747/95000 6228/90300 5709/76200 5190/63800 
4671/53200 2595/36100
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%parent: cpu3

especially the  dev.powernow.3.freq_settings look weird ...

that said, I once more dug up the old acpi_ppc.c and slightly
adapted it for fbsd7 (basically some name changes and using
read_cpu_time() i.s.o. cp_time) and the problem disappears ...

the algo of acpi_ppc makes it somewhat harder to push up frequencies,
though I doubt that matters.

I tried as well with hint.acpi_throttle.0.disabled=1 in loader.conf
with no luck (using powerd).

I'm out of office tomorrow but will try to find time tommorow evening
to test with another NIC.

Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-22 Thread Arno J. Klaassen

Hello,

Peter Jeremy [EMAIL PROTECTED] writes:

 On Mon, Apr 21, 2008 at 08:30:48PM +0200, Arno J. Klaassen wrote:
 NB, (CC to kris@ for this) why is memtest86 port marked as i386-only?
 
 Basically because it's a bootable i386 binary image.

yop, but building it could be allowed on more archs (at least amd64 imho)

but no hard feelings! just a thought

 
Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-22 Thread Arno J. Klaassen
Mike Tancsa [EMAIL PROTECTED] writes:

 At 01:38 PM 4/22/2008, Arno J. Klaassen wrote:
 
 I'm out of office tomorrow but will try to find time tommorow evening
 to test with another NIC.
 
 
 Are you using the latest RELENG_7, or at least the latest version of
 nfe thats in RELENG_7 ?


Think so :

# cvs status if_nfe.c
===
File: if_nfe.c  Status: Up-to-date

   Working revision:1.21.2.5Sat Apr 19 14:27:41 2008
   Repository revision: 1.21.2.5/home/ncvs/src/sys/dev/nfe/if_nfe.c,v
   Sticky Tag:  RELENG_7 (branch: 1.21.2)
   Sticky Date: (none)
   Sticky Options:  (none)

++, Arno


PS, finally the memory seems not involved : populating 4G in CPU1 or
2G in CPU1 and 2G in CPU2 does not make a difference
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-22 Thread Arno J. Klaassen
re,

Mike Tancsa [EMAIL PROTECTED] writes:

 At 02:00 PM 4/22/2008, Arno J. Klaassen wrote:
  
   Are you using the latest RELENG_7, or at least the latest version of
   nfe thats in RELENG_7 ?
 
 
 Think so :
 
 OK, and it is the latest RELENG_7 ?

from saturday (but I didn't see any RELENG_7 commit possibly related to
this since)

 Also, you are using ULE or the 4BSD scheduler ?  I
 still have 4BSD on the box I am testing on.

Interesting, this is with ULE. I didn't really test 4BSD on this
box (I believed those who said SMP needs ULE *and* am quite
satisfied with overall performance). I'll try 4BSD though time
is getting short; I promised to deliver this box next thursday but will
still have some days for on-site testing.

++, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-21 Thread Arno J. Klaassen
Kris Kennaway [EMAIL PROTECTED] writes:

 On Mon, Apr 21, 2008 at 01:02:33AM +0200, Arno J. Klaassen wrote:
 
  I didn't stress-test this MB for a while, but last time I did was
  with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release
  and all worked great
  
  I did add 2G ECC to the 2nd CPU since, though I doubt that interferes
  with NFS.
 
 Uh, you're getting server-side data corruption, it could definitely be
 because of the memory you added.

yop, though I'm still not convinced the memory is bad (the very same
Kingston ECC as the 2*1G in use for about half a year already) :

I added it directly to the 2nd CPU (diagram on page 9 of
 http://www.tyan.com/manuals/m_s2895_101.pdf) and the problem
seems to be the interaction between nfe0 and powerd  :

 - if I stop powerd, problems go away 
 - I let run powerd but turn of txcsum and tso4 on the interface,
   the problem is a lot harder to produce (if ever this gives
   a hint to anyone)

Device is :

[EMAIL PROTECTED]:0:10:0:   class=0x068000 card=0x289510f1 chip=0x005710de 
rev=0xa3 hdr=0x00
vendor = 'Nvidia Corp'
device = 'nForce4 Ultra NVidia Network Bus Enumerator'
class  = bridge
cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0

(this is with the default BIOS setting  LAN Bridge Enabled, disabling
 that setting makes pciconf say class = network but does not influence
 my problem)

I will restart my tests now by populating all 4G to only CPU1 and
say whether that matters.

Best, Arno
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-21 Thread Arno J. Klaassen

Hello,

Jeremy Chadwick [EMAIL PROTECTED] writes:

 On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote:
  Kris Kennaway [EMAIL PROTECTED] writes:
   Uh, you're getting server-side data corruption, it could definitely be
   because of the memory you added.

[ .. stuff deleted; I'll answer in more detail later ..] 
 
 Can you boot the machine in verbose mode, and put the dmesg up
 somewhere?


attached.

More in a moment.

Best, Arno

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-STABLE #1: Sun Apr 20 19:17:47 CEST 2008
[EMAIL PROTECTED]:/usr/obj/files/here/bsd/src7/sys/S2895
Preloaded elf kernel /boot/kernel/kernel at 0x807dc000.
Preloaded elf obj module /boot/kernel/iicsmb.ko at 0x807dc210.
Preloaded elf obj module /boot/kernel/iicbus.ko at 0x807dc738.
Preloaded elf obj module /boot/kernel/smbus.ko at 0x807dcbe0.
Preloaded elf obj module /boot/kernel/smb.ko at 0x807dd048.
Preloaded elf obj module /boot/kernel/nfsmb.ko at 0x807dd4f0.
Calibrating clock(s) ... i8254 clock: 1193107 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 2612050515 Hz
CPU: Dual Core AMD Opteron(tm) Processor 285 (2612.05-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x20f12  Stepping = 2
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x1SSE3
  AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
  AMD Features2=0x3LAHF,CMP
  Cores per package: 2
L1 2MB data TLB: 8 entries, fully associative
L1 2MB instruction TLB: 8 entries, fully associative
L1 4KB data TLB: 32 entries, fully associative
L1 4KB instruction TLB: 32 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 2MB unified TLB: 0 entries, disabled/not present
L2 4KB data TLB: 512 entries, 4-way associative
L2 4KB instruction TLB: 512 entries, 4-way associative
L2 unified cache: 1024 kbytes, 64 bytes/line, 1 lines/tag, 16-way associative
usable memory = 4285255680 (4086 MB)
Physical memory chunk(s):
0x1000 - 0x00099fff, 626688 bytes (153 pages)
0x008dc000 - 0x761a5fff, 1972150272 bytes (481482 pages)
0x8000 - 0xaff7, 804782080 bytes (196480 pages)
0x0001 - 0x00014ffe, 1342111744 bytes (327664 pages)
avail memory  = 4108218368 (3917 MB)
ACPI APIC Table: PTLTD  APIC  
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
INTR: Adding local APIC 3 as a target
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
APIC: CPU 0 has ACPI ID 0
APIC: CPU 1 has ACPI ID 1
APIC: CPU 2 has ACPI ID 2
APIC: CPU 3 has ACPI ID 3
ULE: setup cpu group 0
ULE: setup cpu 0
ULE: adding cpu 0 to group 0: cpus 1 mask 0x1
ULE: setup cpu group 1
ULE: setup cpu 1
ULE: adding cpu 1 to group 1: cpus 1 mask 0x2
ULE: setup cpu group 2
ULE: setup cpu 2
ULE: adding cpu 2 to group 2: cpus 1 mask 0x4
ULE: setup cpu group 3
ULE: setup cpu 3
ULE: adding cpu 3 to group 3: cpus 1 mask 0x8
ACPI: RSDP @ 0x0xf78c0/0x0014 (v  0 PTLTD )
ACPI: RSDT @ 0x0x7ff8b110/0x003C (v  1 PTLTDRSDT   0x0604  LTP 
0x)
ACPI: FACP @ 0x0x7ff909c2/0x0074 (v  1 NVIDIA CK8S 0x0604 PTL_ 
0x000F4240)
ACPI: DSDT @ 0x0x7ff8b14c/0x5876 (v  1 NVIDIA  CK8 0x0604 MSFT 
0x010E)
ACPI: FACS @ 0x0x7ff91fc0/0x0040
ACPI: SPCR @ 0x0x7ff90a36/0x0050 (v  1 PTLTD  $UCRTBL$ 0x0604 PTL  
0x0001)
ACPI: MCFG @ 0x0x7ff90a86/0x003C (v  1 PTLTDMCFG   0x0604  
0x)
ACPI: APIC @ 0x0x7ff90ac2/0x009E (v  1 PTLTD APIC   0x0604  LTP 
0x)
ACPI: BOOT @ 0x0x7ff90b60/0x0028 (v  1 PTLTD  $SBFTBL$ 0x0604  LTP 
0x0001)
ACPI: SSDT @ 0x0x7ff90b88/0x0478 (v  1 PTLTD  POWERNOW 0x0604  LTP 
0x0001)
MADT: Found IO APIC ID 4, Interrupt 0 at 0xfec0
ioapic0: Routing external 8259A's - intpin 0
MADT: Found IO APIC ID 5, Interrupt 24 at 0xd000
MADT: Found IO APIC ID 6, Interrupt 28 at 0xd0001000
MADT: Found IO APIC ID 7, Interrupt 32 at 0xd0a0
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic0: intpin 9 polarity: low
lapic0: Routing NMI - LINT1
lapic0: LINT1 trigger: edge
lapic0: LINT1 polarity: high
lapic1: Routing NMI - LINT1
lapic1: LINT1 trigger: edge
lapic1: LINT1 polarity: high
lapic2: Routing NMI - LINT1
lapic2: LINT1 trigger: edge
lapic2: LINT1 polarity: high
lapic3: Routing NMI - LINT1
lapic3: LINT1 trigger: edge
lapic3: LINT1 polarity

Re: nfs-server silent data corruption

2008-04-21 Thread Arno J. Klaassen


yet another quick partial answer :

Jeremy Chadwick [EMAIL PROTECTED] writes:

 On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote:
  Kris Kennaway [EMAIL PROTECTED] writes:
   Uh, you're getting server-side data corruption, it could definitely be
   because of the memory you added.
  
  yop, though I'm still not convinced the memory is bad (the very same
  Kingston ECC as the 2*1G in use for about half a year already) :
 
 Can you download and run memtest86 on this system, with the added 2G ECC
 insalled?  memtest86 doesn't guarantee showing signs of memory problems,
 but in most cases it'll start spewing errors almost immediately.

It's running for 15 minutes now without any warning; I'll let it run
while cooking a meal [ with 2*1G mem for each CPU to be clear ].

NB, (CC to kris@ for this) why is memtest86 port marked as i386-only?
It only seems to install floppy.bin and memtest.iso, but alas
(maybe I should leave one box dedicated to freebsd-i386 for things
like this ;) )

Best, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-21 Thread Arno J. Klaassen

re,


Jeremy Chadwick [EMAIL PROTECTED] writes:

 On Mon, Apr 21, 2008 at 04:52:55PM +0200, Arno J. Klaassen wrote:
  Kris Kennaway [EMAIL PROTECTED] writes:
   Uh, you're getting server-side data corruption, it could definitely be
   because of the memory you added.
  
  yop, though I'm still not convinced the memory is bad (the very same
  Kingston ECC as the 2*1G in use for about half a year already) :
 
 Can you download and run memtest86 on this system, with the added 2G ECC
 insalled?  memtest86 doesn't guarantee showing signs of memory problems,
 but in most cases it'll start spewing errors almost immediately.


it finished in a bit less than 3 hours without a single error/warning

I feel pretty confident all memory is fine
 
 One thing I did notice in the motherboard manual below is something
 called Hammer Configuration.  It appears to default to 800MHz, but
 there's an Auto choice.  Does using Auto fix anything?

Nope

  I added it directly to the 2nd CPU (diagram on page 9 of
   http://www.tyan.com/manuals/m_s2895_101.pdf) and the problem
  seems to be the interaction between nfe0 and powerd  :
 
 That board is the weirdest thing I've seen in years.


;) I agree I lifted (?) my eye-brows the first time I saw that
diagram


 Two separate CPUs using a single (shared) memory controller, two
 separate (and different!) nVidia chipsets, a SMSC I/O controller
 probably used for serial and parallel I/O, two separate nVidia NICs with
 Marvell PHYs (yet somehow you can bridge the two NICs and PHYs?), two
 separate PCI-e busses (each associated with a separate nVidia chipset),
 two separate PCI-X busses... the list continues.

some may say it's just four wheels, an engine and a steer, she looks
different compared to most others
 

 I know you don't need opinions at this point, but what a behemoth.  I
 can't imagine that thing running reliably.

though it does ;) (till the day I decided she deserved a -stable upgrade
and 2 more gigs ...)
 
   - if I stop powerd, problems go away
 
 This would imply that clock frequency stepping is somehow attributing
 itself to the corruption.  I don't see any BIOS options for controlling
 things related to AMD's Cool-n-Quiet or PowerNow! feature, which is
 usually what handles this.

you can turn it on/off; anyway, the problem *seems* easy to reproduce
when freq drops quickly form 2600Mhz to 1000Mhz 
I just inspected a few corrupted copies, but out of 10-200Mbytes
just 1 byte was 0 iso \t

   - I let run powerd but turn of txcsum and tso4 on the interface,
 the problem is a lot harder to produce (if ever this gives
 a hint to anyone)
 
 Possibly shared interrupts are causing problems?


don't think so; I first had two Promise TX4 cards in this box iso
the Marvell 8port card; since I had problems with TX4 some time
ago I first suspected them. The board is still running memtest86, but
from the dmesg I posted I don't see a shared irq.

  MSI/MSI-X doing
 something odd?  Have you tried disabling MSI/MSI-X and see if it makes a
 difference?


MSI is disabled as is PCI-e Error reporting (or something like
that)

 
 I think you mean MAC LAN Bridge, according to the motherboard manual.
 I'm not even sure what that really does; somehow trunks the two NICs
 together to give you the equivalent of 2000mbit of traffic?  I don't
 know.

probably; I never tried ;) I need the second NIC for a seperate
subnet
 
 Does the corruption you see go away if you install a separate NIC (e.g.
 an Intel NIC) in a PCI or PCI-e slot, and disable the onboard NICs
 (should be MAC LAN: Disable on both the primary and slave)?

Don't have one available right now (for a 2U server).
I will test if I do not find another solution.

Thanx, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-server silent data corruption

2008-04-21 Thread Arno J. Klaassen

Hello,

Mike Tancsa [EMAIL PROTECTED] writes:

 At 10:52 AM 4/21/2008, Arno J. Klaassen wrote:
 
 Device is :
 
  [EMAIL PROTECTED]:0:10:0:   class=0x068000 card=0x289510f1
  chip=0x005710de rev=0xa3 hdr=0x00
  vendor = 'Nvidia Corp'
  device = 'nForce4 Ultra NVidia Network Bus Enumerator'
  class  = bridge
  cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0
 
 (this is with the default BIOS setting  LAN Bridge Enabled, disabling
   that setting makes pciconf say class = network but does not influence
   my problem)
 
 I will restart my tests now by populating all 4G to only CPU1 and
 say whether that matters.
 
 Hi,
 How long does it take for the problem to show up ?


Less than an hour in general (running the same client script
simultanuously on a 100Mbps linux box and 1Gbps bds6-x86)

 I have what appears
 to be a very similar Tyan board (I have an Socket 939 X2 cpu) with the
 same NIC, but this one is running RELENG_7 from April 17th.  There
 have been a few fixes for the nfe driver since 7.0
 
 I am running this small script below on a nfs client (em nic) against
 the server (nfe) ( mount options on the client 192.168.245.1:/backup
 /backup nfs rw,-r=32768,-w=32768,tcp,noauto )
 
 #!/bin/sh
 i=0
 while true
 do
   i=`expr $i + 1`
   dd if=/dev/urandom of=/tmp/junk.txt bs=1024 count=81920   /dev/null 21
   cp -p /tmp/junk.txt /backup/
   orig=`md5 -q /tmp/junk.txt`
   umount /backup
   sleep 2
   mount /backup
   copy=`md5 -q /backup/junk.txt`
   echo $orig and $copy on $i
   if [ $orig != $copy ]; then
  echo \a copy not ok on $i
  exit 255
   fi
 done


quite the same as what I do (apart from the umount/sleep/mount and I 
use same partition for write and copy) :

SIZE=$1

COUNTER=${2:-20}

until [  $COUNTER -lt 1 ]; do
echo  Still $COUNTER iterations to go *** 
echo
echo -n Creating random file of $SIZE MBytes ...
dd if=/dev/random of=BIG bs=1048576 count=${SIZE}  /dev/null 21
echo Done
echo -n Calculating md5 checksum ...
CS1=`md5 -q BIG`
echo Done
echo -n Copying file ...
cp -fp BIG BIG2
echo Done
echo -n Calculating md5 checksum ...
CS2=`md5 -q BIG2`
echo Done
if [ ${CS1} != ${CS2} ]; then
 echo CHECKSUM MISMATCH
 exit -1
else
 echo
fi
let COUNTER-=1
done


for info, I test with args '38 999' (38M, try 999 times) on linux
(slightly adapted script BTW) and '138 999' on bsd. The best 'score' I
got was 'still 871 iterations to go'

 On the server, I have
 
 [EMAIL PROTECTED]:0:10:0:   class=0x068000 card=0x286510f1 chip=0x005710de
 rev=0xa3 hdr=0x00
  vendor = 'Nvidia Corp'
  device = 'nForce4 Ultra NVidia Network Bus Enumerator'
  class  = bridge
  cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0


idem

 # ifconfig nfe0
 nfe0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
  options=10bRXCSUM,TXCSUM,VLAN_MTU,TSO4
  ether 00:e0:81:58:91:6a
  inet 192.168.245.1 netmask 0xff00 broadcast 192.168.245.255
  media: Ethernet autoselect (1000baseTX full-duplex,flag0,flag1)
  status: active

idem
 
 How long does it take for the problem to come up ?

as said : approximately half an hour; never more than 4 hours


Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


nfs-server silent data corruption

2008-04-20 Thread Arno J. Klaassen

Hello,

I've a strange problem with a box I'm setting up as nfs-server
under 7-stable :

 - tyan S2895 MB, 2*285Dualcore Opteron, 4G-ECC, ahd-scsi, nfe-network
 - stripped GENERIC as kernel
 - sources as of last saturday afternoon (European time)

I removed everything from /boot/loader.conf and /etc/sysctl.conf, still
I get easily data corruption when exporting ahd-scsi over nfs
(NB exporting geom_raid5 gives same data corruption)

Testing with the following pseudo code :

  while checksum1 == checksum2 do
   create random file of $1 MBytes
   calculate md5 checksum1
   copy
   calculate md5 checksum2 on copy


Tested on both (as nfs-client) a 6-stable-i386 from a couple of weeks
ago as well as a linux 2.6.15-gentoo-r1 of about two years ago :
within half an hour the copy will be different  ;(

I played with nfs-options on client side (nfs[23], conn, intr, [udp|tcp],
-r=, -w= ) but none seem to matter.

Start/Stop rpc.lock/sttatd on server/client just provoked some  :

 cp: utimes: BIG2: No such file or directory
 cp: chown: BIG2: Stale NFS file handle
 cp: chmod: BIG2: Stale NFS file handle
 cp: chflags: BIG2: Operation not supported
 cp: BIG2: Stale NFS file handle
 cp: setting permissions for `BIG2': Stale NFS file handle
 cp: closing `BIG2': Stale NFS file handle

[and then the while loop continued ... as if the NFS handle where not
 that stale ..]

Anyway, I'll try to nail this down more (e.g. nfs-write performance
is horrible ... (nfsd falling down to 0% cpu and then after while
'wake up' and be at around 3-6% again))

I didn't stress-test this MB for a while, but last time I did was
with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release
and all worked great

I did add 2G ECC to the 2nd CPU since, though I doubt that interferes
with NFS.

Bref, if anyone has a suggestion  (I will try downgrade
to RELENG_7_0 iff noone has a new suggestion for RELENG_7, but I'd like
to go forward and test some maybe suspect recent MFC or other 
suggestion)

Thanx in advance,

best, Arno
 


   
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


more cpufreq woes

2007-12-20 Thread Arno J. Klaassen

Hi,

I once again have a freeze with cpufreq, this time on a Tyan S3950 MB + 
X2 BE 2400 proc;

dev.cpu.0.freq_levels: 2277/10 2178/91708 1980/76426 1782/62805 990/30193

Same proc works OK with Asus M2N32 WS Pro ...

Same Tyan MB works OK with X2 BE 2350 which shows

dev.cpu.0.freq_levels: 2079/10 1980/91311 1782/75334 990/40013


With 'sysctl debug.cpufreq.lowest=1000' it works OK, but that's not
really what I'd like to do.

This is on RELENG_6. 

Best, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


comconsole trouble on ASUS A8VE-deluxe

2007-09-18 Thread Arno J. Klaassen

Hello,

I can't seem to get comconsole work on an ASUS A8VE-Deluxe MB :

 - I get the boot-menu, can escape to loader prompt
   and type, but no output once kernel starts booting

 - I tried (almost) all possible combinations of hint.sio.0.flags
   but no change, though 0x8 to recover sooner from lost output
   interrupts, *sometimes* gives blurbs of output

 - even pulling out the graphics card does not help

 - when in multi-user a good old kermit over cuad0 works OK

 - these are the relevant dmesg lines :

   sio0: configured irq 4 not in bitmap of probed irqs 0
   sio0: port may not be enabled
   sio0: irq maps: 0 0 0 0
   sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x30038 on 
acpi0
   sio0: type ST16650A, console
   ioapic0: routing intpin 4 (ISA IRQ 4) to vector 55

Anyone an idea of what to try next?
I tried uart(4) iso sio(4) and hint.uart.0.flags=0x10, no change,
but I'm not quite sure this is supposed to work on amd64-stable.

Thanx a lot in advance.

Arno


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [summary] Re: burncd 'blank' not terminating ?

2006-12-26 Thread Arno J. Klaassen
Luigi Rizzo [EMAIL PROTECTED] writes:

 summary: there was some discussion on how to
 fix the problem, in 6.x, with burncd -f /dev/acd0 -v blank getting
 stuck with this message
 
   blanking CD, please wait..
 
 This used to work on 4.x.
 [ .. stuff deleted .. ]
 
 Patches below (to be improved to make CDIOCRESET unconditional).
 Does this satisfy all ?

great! Works for me.
(and even cdrecord now works).

Thanx a lot.

Arno

P.S. this fixes, for real, http://www.freebsd.org/cgi/query-pr.cgi?pr=94426
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


witness_checkorder panic

2006-12-02 Thread Arno J. Klaassen

Hello,

I just got this on a box I'm testing before installation.
It has clean RELENG_6 from about two weeks ago with only
some small if_bge.c-patches Bruce Evans sent me for testing
performance/hang problems.
Since I doubt this panic is related to that, I just post
it here in case someone is interested in more info :

[sorry, no serial console attached ... just copy-paste from
screen, but I will leave the box in the debugger for the
WE ]

  struct mount mtx (struct mount mtx) @ 
/files/bsd/src6/sys/ufs/ufs/ufs_vnops.c:138
  KDB: stack backtrace :
  witness_checkorder()
  _mtx_lock_flags()
  ufs_itimes()
  ufs_getattr()
  VOP_GETATTR_APV()
  filt_vfsread()
  knote()
  VOP_WRITE_APV()
  vn_write()
  dofilewrite()
  kern_writev()
  write()
  syscall()
  Xfast_syscall()
  --- syscall (4, FreeBSD ELF64, write), rip = 0x4363dc, rsp = 0X7fffdd78, 
rbp = 0x2f6 ---
  KDB: enter: witness_checkorder
  [thread pid 3987 tid 100133 ]

Kernel config is stripped GENERIC +

  options AHC_ALLOW_MEMIO
  options TCP_DROP_SYNFIN
  options KDB
  options KDB_TRACE
  options DDB
  options KTRACE
  options INVARIANTS
  options INVARIANT_SUPPORT
  options DDB_NUMSYM
  options BREAK_TO_DEBUGGER
  options INVARIANTS
  options INVARIANT_SUPPORT
  options WITNESS
  options WITNESS_KDB
  options DEBUG_LOCKS
  options DEBUG_VFS_LOCKS
  options DIAGNOSTIC
  options MUTEX_PROFILING
  options MUTEX_DEBUG
  options SLEEPQUEUE_PROFILING
  options TURNSTILE_PROFILING
  options DEBUG_MEMGUARD

The box was doing (/usr/src nfs-mounted):

  nohup time make -j 2 -DNO_CLEAN buildworld  /tmp/bw_alone.log 21 

It paniced shortly after I started 'tail -f /tmp/bw_alone.log' in another
window, and /tmp is mfs.

Arno

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Watchdog Timeout - bge device - 6.2-PRERELEASE

2006-11-03 Thread Arno J. Klaassen

John Marshall [EMAIL PROTECTED] writes:

 rwsrv05 dmesg | grep bge
 bge0: Broadcom BCM5705 A3, ASIC rev. 0x3003 mem 0xe820-0xe820
 irq 17 at device 4.0 on pci4
 miibus1: MII bus on bge0
 bge0: Ethernet address: 00:0b:cd:e7:70:19
 bge0: link state changed to UP
 bge0: watchdog timeout -- resetting

I have a Tyan S2850 with the same (dual) LAN-chip; I increased
BGE_TIMEOUT to 50 (due to reboot problems on a good-old
3com 100Mbps-hub which occasionaly gave me :

 bge1: firmware handshake timed out
 bge1: RX CPU self-diagnostics failed! )

This box occasionaly freezes under heavy load; with the above change
AND compiling in DEVICE_POLLING but not enabling it, I do not have
any problem for the time being (though the freeze is very hard to
reproduce).

Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nfs-client reveals MFC-if_re-probs (or vice-versa) ?

2006-07-29 Thread Arno J. Klaassen


/me wrote:

 I have a curious problem which at first sight seems related to the
 end-June MFC of if_re :
 
  - I 'mount -o nfsv3,intr,noconn,-r=32768,-w=32768
-stable-server:/files/bsd /files/bsd '
 
  - (/usr/ports and /usr/src are symlinks to /files/bsd/*) quickly
after a portinstall/portversion etc. I get : 
 nfs server -stable-server: not responding
 (and the corresponding process stuck in 'bo_wwa' according to
 top(1) )

for info: #define RE_CSUM_FEATURES  0 in otherwise up to date if_re.c
solves the problem. 

Best regards, Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


nfs-client reveals MFC-if_re-probs (or vice-versa) ?

2006-07-27 Thread Arno J. Klaassen

Hello,

I have a curious problem which at first sight seems related to the
end-June MFC of if_re :

 - I 'mount -o nfsv3,intr,noconn,-r=32768,-w=32768
   -stable-server:/files/bsd /files/bsd '

 - (/usr/ports and /usr/src are symlinks to /files/bsd/*) quickly
   after a portinstall/portversion etc. I get : 
nfs server -stable-server: not responding
(and the corresponding process stuck in 'bo_wwa' according to
top(1) )

 - though I still can 'ping -stable-server' and even 'ssh
   me@-stable-server-IP'

 - -stable-server works ok with two other -stable clients (using
   if_bge) and all are compiled from the very same source-base (and
   -stable-server works fine as well with a linux-client) which
   seems to exclude nfsd-probs

 - a kernel from June the 11th works ok

 - downgrading if_re.c to revision 1.46.2.14 and if_rlreg.h to
   revision 1.51.2.3 makes the problem disappear

 - this is on my demo-notebook, I can test network stuff without much
   limitations; I just use nfs on it for upgrading world and ports.
   NB, same behaviour on amd64-stable and i386-stable (multi-boot same
   hardware)

I can fill a PR if requested or feel free to contact me for further
testing.

Best regards,

Arno


PS: relevant pciconf info :

[EMAIL PROTECTED]:8:0:   class=0x02 card=0x47011558 chip=0x816910ec 
rev=0x10 hdr=0x00
vendor   = 'Realtek Semiconductor'
device   = 'RTL8169 Gigabit Ethernet Adapter'
class= network
subclass = ethernet

otherwise standard kernel conf with stripped unneeded drivers and
extra :

device cpufreq
device atapicam
device sound
options TCP_DROP_SYNFIN (hint??)

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


NFS : mount option update is unknown

2006-05-30 Thread Arno J. Klaassen
Hello,

I updated today two amd64-servers to -stable as of today, I
now get the following dmesg when mounting nfs :


  mount option update is unknown
  mount option update is unknown
  mount option update is unknown
  mount option update is unknown
  mount option update is unknown
  May 31 01:54:18 accuracy mountd[443]: can't delete exports for 
/users/angora/u4: Invalid argument
  May 31 01:54:18 accuracy mountd[443]: can't delete exports for 
/data/angora/d1: Invalid argument
  May 31 01:54:18 accuracy mountd[443]: can't delete exports for 
/data/tabarnac/d2: Invalid argument
  May 31 01:54:18 accuracy mountd[443]: can't delete exports for 
/data/angora/db: Invalid argument
 May 31 01:54:18 accuracy mountd[443]: can't delete exports for 
/data/charlotte/da: Invalid argument

They seem harmless and maybe related to MFC: 1.208 of ./kern/vfs_mount.c,
though I don't understand the mountd messages : all mentioned filesystem
are nfsclient fs and though /etc/exports exists, it only has one
local fs which isn't mounted anywhere else anyway (while testing).

FYI, Arno

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kmem leak in tmpmfs?

2006-05-26 Thread Arno J. Klaassen
Hello,

thanx to all who responded.
Setting  ' tmpmfs_flags=-S -o async ' survived a nightly
started locate script and a day of intensive 'normal' load.

YMMV, but again, merci!

Arno
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


kmem leak in tmpmfs?

2006-05-25 Thread Arno J. Klaassen
Hello,

I get a very easy to reproduce panic on 6.1-STABLE :

/etc/periodic/weekly/310.locate panics with

  panic: kmem_malloc(4096): kmem_map too small: 335544320 total allocated


  (kgdb) where
  #0  doadump () at pcpu.h:165
  #1  0xc0577574 in boot (howto=260)
  at /files/bsd/src6/sys/kern/kern_shutdown.c:409
  #2  0xc05778a6 in panic (
  fmt=0xc078dc1d kmem_malloc(%ld): kmem_map too small: %ld total 
allocated)
  at /files/bsd/src6/sys/kern/kern_shutdown.c:565
  #3  0xc06df1ab in kmem_malloc (map=0xc10430c0, size=4096, flags=258)
  at /files/bsd/src6/sys/vm/vm_kern.c:299
  #4  0xc06d49a7 in page_alloc (zone=0xc1035700, bytes=0, pflag=0x0, wait=0)
  at /files/bsd/src6/sys/vm/uma_core.c:958
  #5  0xc06d43db in slab_zalloc (zone=0xc1035700, wait=258)
  at /files/bsd/src6/sys/vm/uma_core.c:823
  #6  0xc06d60f6 in uma_zone_slab (zone=0xc1035700, flags=2)
  at /files/bsd/src6/sys/vm/uma_core.c:2025
  #7  0xc06d635f in uma_zalloc_bucket (zone=0xc1035700, flags=2)
  at /files/bsd/src6/sys/vm/uma_core.c:2134
  #8  0xc06d5f39 in uma_zalloc_arg (zone=0xc1035700, udata=0x0, flags=2)
  at /files/bsd/src6/sys/vm/uma_core.c:1942
  #9  0xc05d17ff in cache_enter (dvp=0xc8bf1110, vp=0xc8dd4110, cnp=0xfe14bbbc)
  at uma.h:275
  #10 0xc06c77c4 in ufs_lookup (ap=0xfe14ba40)
  at /files/bsd/src6/sys/ufs/ufs/ufs_lookup.c:583
  #11 0xc0756073 in VOP_CACHEDLOOKUP_APV (vop=0x0, a=0x0) at vnode_if.c:150
  #12 0xc05d1dfa in vfs_cache_lookup (ap=0x0) at vnode_if.h:82
  #13 0xc0755fe8 in VOP_LOOKUP_APV (vop=0xc07c8a60, a=0xfe14baec)
  at vnode_if.c:99
  #14 0xc05d71fb in lookup (ndp=0xfe14bb94) at vnode_if.h:56
  #15 0xc05d6998 in namei (ndp=0xfe14bb94)
  at /files/bsd/src6/sys/kern/vfs_lookup.c:203
  #16 0xc05e865f in kern_lstat (td=0xc6b29780, path=0x0, pathseg=UIO_USERSPACE, 
  sbp=0x0) at /files/bsd/src6/sys/kern/vfs_syscalls.c:2125
  #17 0xc05e85df in lstat (td=0x0, uap=0xfe14bd04)
  at /files/bsd/src6/sys/kern/vfs_syscalls.c:2109
  #18 0xc073e672 in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134664008, tf_esi = 
134663936, tf_ebp = -1077941544, tf_isp = -32195228, tf_ebx = 672511016, tf_edx 
= 134663936, tf_ecx = 134561792, tf_eax = 190, tf_trapno = 0, tf_err = 2, 
tf_eip = 672396855, tf_cs = 51, tf_eflags = 582, tf_esp = -1077941700, tf_ss = 
59})
  at /files/bsd/src6/sys/i386/i386/trap.c:981
  #19 0xc072b21f in Xint0x80_syscall ()
  at /files/bsd/src6/sys/i386/i386/exception.s:200
  #20 0x0033 in ?? ()
  Previous frame inner to this frame (corrupt stack?)
  (kgdb) 

This box has nothing particular, apart from maybe a large number
of stamp-file based test-databases (with a lot of zero-sized
files named .key=value).
Producing this bug is easy :

 - set tmpmfs=YES and set tmpsize greater than around 220m
 - start /etc/periodic/weekly/310.locate (and nothing else!)
 - wait two-three hours and bang

Last test is with tmpfs=1024m and I monitored df -h /tmp and
vmstat -zm every minute; when the system panics, last output is :

  FilesystemSizeUsed   Avail Capacity  Mounted on
  /dev/md0  989M219M691M24%/var/tmp

  vmstat -zm | fgrep md0
  md0: 512,0,  453257, 15,   453437

I'm quite not an expert, but looks to me as if md0 use stays
almost 100% in kmem and is never swapped (as it is supposed to do
by default according to the man-page).

While here, and being struck as well by the nfsd-bug, at least
vfs_lookup.c seems common to both problems.

Full vmstat-zm logs available.

Thanx, Arno


-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RELENG_6 linux emulation problem on amd64

2005-10-29 Thread Arno J. Klaassen
Hello,

I get an easy to reproduce panic on recent RELENG_6/amd64 :

  -su-2.05b# /compat/linux/bin/bash
  bash-2.05b# cd /dev
  bash-2.05b# ls

  panic : kmem_malloc: entry not found or misaligned

Setup is as follows :

  /dev/ad0s3d mounted on /
  /dev/ad0s4d mount on /files
  /usr is a symlink to /files/amd64/usr if ever that might be of
  importance (the rest of ad0s3 is RELENG_5/i386)

uname -a :
  FreeBSD demo 6.0-RC1 FreeBSD 6.0-RC1 #1: Sat Oct 29 17:04:50 CEST 2005 
[EMAIL PROTECTED]:/files/amd64/obj/files/bsd/src6/sys/D470K  amd64

generic config-file with outcommented non-needed drivers and extra options :

  device cpufreq
  device tap
  device atapicam
  device sound
  device smbus
  device iicbus
  device iicsmb

  options NTFS
  options TCP_DROP_SYNFIN

linux_base-8-8.0_7 installed.

NB, please respond preferentially to list; i still need
a good solution to filter important email from my flooding
misc procmail-filter output ;(


Arno

# kgdb trace :  

(kgdb) where
#0  doadump () at /files/bsd/src6/sys/kern/kern_shutdown.c:234
#1  0x8030c10b in boot (howto=260)
at /files/bsd/src6/sys/kern/kern_shutdown.c:399
#2  0x8030c5de in panic (
fmt=0x805cdea8 kmem_malloc: entry not found or misaligned)
at /files/bsd/src6/sys/kern/kern_shutdown.c:555
#3  0x804ed2cf in kmem_malloc (map=0xff003e0b0160, size=0, 
flags=258) at /files/bsd/src6/sys/vm/vm_kern.c:382
#4  0x804e00a2 in page_alloc (zone=0x0, bytes=0, 
pflag=0xa7aba5e7 \002\200\202®-, wait=258)
at /files/bsd/src6/sys/vm/uma_core.c:957
#5  0x804e3bbb in uma_large_malloc (size=0, wait=258)
at /files/bsd/src6/sys/vm/uma_core.c:2711
#6  0x802fc503 in malloc (size=0, mtp=0x80706880, flags=258)
at /files/bsd/src6/sys/kern/kern_malloc.c:327
#7  0x802fc6fe in realloc (addr=0x0, size=18446744073709549576, 
mtp=0x80706880, flags=258)
at /files/bsd/src6/sys/kern/kern_malloc.c:416
#8  0x80398412 in vfs_read_dirent (ap=0xa7aba790, 
dp=0xffe16298, off=0) at /files/bsd/src6/sys/kern/vfs_subr.c:3877
#9  0x80290f56 in devfs_readdir (ap=0xa7aba790)
at /files/bsd/src6/sys/fs/devfs/devfs_vnops.c:828
#10 0x805815ec in VOP_READDIR_APV (vop=0x806fc480, 
a=0xa7aba790) at vnode_if.c:1427
#11 0x8056f559 in VOP_READDIR (vp=0xff0002f6e000, 
uio=0xa7abaab0, cred=0xff002e93c700, 
eofflag=0xa7aba854, ncookies=0xa7aba834, 
cookies=0xa7aba840) at vnode_if.h:747
#12 0x8056efe6 in getdents_common (td=0xff002ff22be0, 
args=0xa7abab90, is64bit=1)
at /files/bsd/src6/sys/compat/linux/linux_file.c:328
#13 0x8056f612 in linux_getdents64 (td=0xff002ff22be0, 
args=0xa7abab90)
at /files/bsd/src6/sys/compat/linux/linux_file.c:476
#14 0x80564f54 in ia32_syscall (frame=
  {tf_rdi = 3, tf_rsi = 0, tf_rdx = 4096, tf_rcx = 134598592, tf_r8 = 0, 
tf_r9 = 0, tf_rax = 220, tf_rbx = 3, tf_rbp = 4294958168, tf_r10 = 0, tf_r11 = 
0, tf_r12 = 0, tf_r13 = 0, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 
134602692, tf_flags = 0, tf_err = 2, tf_rip = 672250937, tf_cs = 27, tf_rflags 
= 582, tf_rsp = 4294958092, tf_ss = 35})
at /files/bsd/src6/sys/amd64/ia32/ia32_syscall.c:186
#15 0x8050c1ad in Xint0x80_syscall () at ia32_exception.S:64
#16 0x2811bc39 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 


-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Memory requirements between releases

2005-08-12 Thread Arno J. Klaassen
hello,

 The installation notes for 5.4 and 6 (the floppies README.TXT) say
 FreeBSD for the i386 requires ...at least 24 MB of RAM.
 [ .. ] 
 I have on old tosh 110CT laptop with 24mb memory I want to set up as a
 wireless router/NAT box but would prefer to use 6 or 5.4. 

I've run 5.X for about a year on a Pentium60 with 16M as ethernet
router/NAT; flawless, excellent perf (untill it died a couple of
weeks ago).

net-booting via PXE though, no idea whether you can *install* with
less than 24M, running only seems OK

Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: TIMEOUT - WRITE_DMA - A possible FIX! turn off ACPI

2004-12-28 Thread Arno J. Klaassen
Joe Koberg [EMAIL PROTECTED] writes:

 Zsolt Kúti wrote:
 
 My system produces these messages that I already know well from this
 list (as well ;):
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=213249674
 
 
 Like many people I was confronted with TIMEOUT - READ_DMA
 and TIMEOUT - WRITE_DMA errors on my drives. I was frustrated.
 But I found a workaround: Turning off ACPI.

dunno, I'd more suspect ACPI-APIC issues : untill now
I only had problems on nForce based systems, but today I 
installed a brand new VIA based A7VT mini-server and
re-voila les XXX_DMA errors (and accompanying severe
system slow-down).
(Disk swapped from the old PII-233 minimalist-server; worked
OK there; disabling APIC (in BOIS and/or config and/or hints)
made disappear the XXX_DMA messages (and gave me my network
connexion back ;) ) whilst ACPI still enabled).

FYI, Arno

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Continuing ahc problems - also cause fxp failure

2001-07-31 Thread Arno J. Klaassen


 I see an identical problem with and without this diff applied on an
 ASUS motherboard with onboard SCSI.  No onboard Ethernet.

same here; ASUS MB with onboard SCSI, offboard xl0 Ethernet.
kernel 4.3-STABLE #0: Wed Jul 11

Arno

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: stable status.... still hosed (no more)

2000-08-31 Thread Arno J. Klaassen

Arnout Boer [EMAIL PROTECTED] writes:

 On Thu, Aug 31, 2000 at 10:07:12AM +0100, Steve O'Hara-Smith wrote:
   Checking procedure is simple: load kernel, boot, then telnet from outside.
  ssh from outside will do it too (as I discovered this morning).
 
 ANy network connection will do - even samba!

make sure you cvsup sys/kern/uipc_socket2.c version 1.55.2.6 commited
this morning, and everything works fine again -- even Samba.

A Ciao, Arno



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message