Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread Henri Hennebert

On 06/20/2011 15:51, John Baldwin wrote:

On Saturday, June 18, 2011 5:04:07 am Henri Hennebert wrote:

On 06/17/2011 19:37, John Baldwin wrote:

On Friday, June 17, 2011 1:06:22 pm Henri Hennebert wrote:

On 06/16/2011 19:35, John Baldwin wrote:

On Thursday, June 16, 2011 8:45:41 am Zhihao Yuan wrote:

Exactly. The MFCed ZFSv28 is different from any patch maintained by
mm@. Maybe some untested changes involved.


Can you try reverting this change:

Author: jhb
Date: Thu Apr 28 17:44:24 2011
New Revision: 221177
URL: http://svn.freebsd.org/changeset/base/221177

Log:
Due to space constraints, the UFS boot2 and boot1 use an evil hack where
boot2 calls back into boot1 to perform disk reads.  The ZFS MBR boot blocks
do not have the same space constraints, so remove this hack for ZFS.
While here, remove commented out code to support C/H/S addressing from
zfsldr.  The ZFS and GPT bootstraps always just use EDD LBA addressing.

MFC after:2 weeks

Modified:
head/sys/boot/i386/boot2/Makefile
head/sys/boot/i386/common/drv.c
head/sys/boot/i386/zfsboot/Makefile
head/sys/boot/i386/zfsboot/zfsldr.S


I try with this revision (221177) reverted to no avail:
same error - 'read error'


Hmm, ok.  No other ideas off the top of my head.


I make the same test under virtualbox and get:

A critical error has occurred while running the virtual machine and the
machine execution has been stopped.

I attach VBox.log.

PS - the message 'ZFS: supported version 28' comes from my patch:

Index: sys/boot/zfs/zfsimpl.c
===
--- sys/boot/zfs/zfsimpl.c  (revision 212549)
+++ sys/boot/zfs/zfsimpl.c  (working copy)
@@ -61,6 +61,8 @@
STAILQ_INIT(zfs_vdevs);
STAILQ_INIT(zfs_pools);

+   printf(ZFS: supported version %u\n, (unsigned) SPA_VERSION);
+
zfs_temp_buf = malloc(TEMP_SIZE);
zfs_temp_end = zfs_temp_buf + TEMP_SIZE;
zfs_temp_ptr = zfs_temp_buf;


Hmm, can you add printfs and narrow down where the hang happens (or which
reads are failing)?  The VBOX log seems to make no sense.  It shows the
CPU trying to call into the BIOS from within protected mode in the loader
but that shouldn't ever happen (note a cs of 0x2b (which is the loader's
%cs selector) but an eip that looks like a cs:ip of a BIOS routine).

I just try to put printf but I get only 'Read error' without any of my 
printf.


Previously event my printf in zfs_init don't show up on the console of 
my netbook. Under VBox it was printed.


Maybe printf is not allowed so soon in zfsboot ?

For the record, I write the bootcode with this 2 commands after booting 
with mfsbsd (from mm@) and fetching zfsboot in /tmp:


dd if=/tmp/zfsboot of=/dev/ad0s2a bs=512 count=1
dd if=/tmp/zfsboot of=/dev/ad0s2a bs=512 skip=1 seek=1024


My debugging patch in zfsboot.c:

[root@morzine zfsboot]# svn diff zfsboot.c
Index: zfsboot.c
===
--- zfsboot.c   (revision 223081)
+++ zfsboot.c   (working copy)
@@ -447,10 +447,16 @@
 off_t off;
 struct dsk *dsk;

+   printf(==trying to boot\n);
+
 dmadat = (void *)(roundup2(__base + (int32_t)_end, 0x1) - 
__base);


+   printf(==about to call bios_getmem()\n);
+
 bios_getmem();

+   printf(==bios_getmem() completed\n);
+   
 if (high_heap_size  0) {
heap_end = PTOV(high_heap_base + high_heap_size);
heap_next = PTOV(high_heap_base);
@@ -482,6 +488,8 @@

 autoboot = 1;

+   printf(==about to call zfs_init()\n);
+   
 zfs_init();

 /*


Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread John Baldwin
On Tuesday, June 21, 2011 5:51:22 am Henri Hennebert wrote:
 On 06/20/2011 15:51, John Baldwin wrote:
  On Saturday, June 18, 2011 5:04:07 am Henri Hennebert wrote:
  On 06/17/2011 19:37, John Baldwin wrote:
  On Friday, June 17, 2011 1:06:22 pm Henri Hennebert wrote:
  On 06/16/2011 19:35, John Baldwin wrote:
  On Thursday, June 16, 2011 8:45:41 am Zhihao Yuan wrote:
  Exactly. The MFCed ZFSv28 is different from any patch maintained by
  mm@. Maybe some untested changes involved.
 
  Can you try reverting this change:
 
  Author: jhb
  Date: Thu Apr 28 17:44:24 2011
  New Revision: 221177
  URL: http://svn.freebsd.org/changeset/base/221177
 
  Log:
  Due to space constraints, the UFS boot2 and boot1 use an evil hack 
where
  boot2 calls back into boot1 to perform disk reads.  The ZFS MBR 
boot blocks
  do not have the same space constraints, so remove this hack for 
ZFS.
  While here, remove commented out code to support C/H/S addressing 
from
  zfsldr.  The ZFS and GPT bootstraps always just use EDD LBA 
addressing.
 
  MFC after:2 weeks
 
  Modified:
  head/sys/boot/i386/boot2/Makefile
  head/sys/boot/i386/common/drv.c
  head/sys/boot/i386/zfsboot/Makefile
  head/sys/boot/i386/zfsboot/zfsldr.S
 
  I try with this revision (221177) reverted to no avail:
  same error - 'read error'
 
  Hmm, ok.  No other ideas off the top of my head.
 
  I make the same test under virtualbox and get:
 
  A critical error has occurred while running the virtual machine and the
  machine execution has been stopped.
 
  I attach VBox.log.
 
  PS - the message 'ZFS: supported version 28' comes from my patch:
 
  Index: sys/boot/zfs/zfsimpl.c
  ===
  --- sys/boot/zfs/zfsimpl.c (revision 212549)
  +++ sys/boot/zfs/zfsimpl.c (working copy)
  @@ -61,6 +61,8 @@
 STAILQ_INIT(zfs_vdevs);
 STAILQ_INIT(zfs_pools);
 
  +  printf(ZFS: supported version %u\n, (unsigned) SPA_VERSION);
  +
 zfs_temp_buf = malloc(TEMP_SIZE);
 zfs_temp_end = zfs_temp_buf + TEMP_SIZE;
 zfs_temp_ptr = zfs_temp_buf;
 
  Hmm, can you add printfs and narrow down where the hang happens (or which
  reads are failing)?  The VBOX log seems to make no sense.  It shows the
  CPU trying to call into the BIOS from within protected mode in the loader
  but that shouldn't ever happen (note a cs of 0x2b (which is the loader's
  %cs selector) but an eip that looks like a cs:ip of a BIOS routine).
 
 I just try to put printf but I get only 'Read error' without any of my 
 printf.
 
 Previously event my printf in zfs_init don't show up on the console of 
 my netbook. Under VBox it was printed.
 
 Maybe printf is not allowed so soon in zfsboot ?

Rather, it may be that zfsldr.S is what is emitting 'Read error' and you are
not getting into the zfsboot.c code itself.  You can try this patch which
should display the error code the BIOS returns when it fails:

Index: zfsldr.S
===
--- zfsldr.S(revision 223339)
+++ zfsldr.S(working copy)
@@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
-   mov $msg_read,%si   # Otherwise, set the error
-   #  message and fall through to
-   #  the error routine
+   mov %ah,%al # Format
+   mov $read_err,%di   #  error
+   call hex8   #  code
+   mov $msg_read,%si   # Set the error message and
+   #  fall through to the error
+   #  routine
 /*
  * Print out the error message pointed to by %ds:(%si) followed
  * by a prompt, wait for a keypress, and then reboot the machine.
@@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
jmp error
 msg_chs:   .asciz CHS not supported
 
+/*
+ * Convert AL to hex, saving the result to [EDI].
+ */
+hex8:  push %ax# Save
+   shrb $0x4,%al   # Do upper
+   call hex8.1 #  4
+   pop %ax # Restore
+hex8.1:andb $0xf,%al   # Get lower 4
+   cmpb $0xa,%al   # Convert
+   sbbb $0x69,%al  #  to hex
+   das #  digit
+   orb $0x20,%al   # To lower case
+   stosb   # Save char
+   ret # (Recursive)
+
 /* 

Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread Henri Hennebert

On 06/21/2011 15:01, John Baldwin wrote:

Index: zfsldr.S
===
--- zfsldr.S(revision 223339)
+++ zfsldr.S(working copy)
@@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
-   mov $msg_read,%si   # Otherwise, set the error
-   #  message and fall through to
-   #  the error routine
+   mov %ah,%al # Format
+   mov $read_err,%di   #  error
+   call hex8   #  code
+   mov $msg_read,%si   # Set the error message and
+   #  fall through to the error
+   #  routine
  /*
   * Print out the error message pointed to by %ds:(%si) followed
   * by a prompt, wait for a keypress, and then reboot the machine.
@@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
jmp error
  msg_chs:  .asciz CHS not supported

+/*
+ * Convert AL to hex, saving the result to [EDI].
+ */
+hex8:  push %ax# Save
+   shrb $0x4,%al   # Do upper
+   call hex8.1 #  4
+   pop %ax # Restore
+hex8.1:andb $0xf,%al   # Get lower 4
+   cmpb $0xa,%al   # Convert
+   sbbb $0x69,%al  #  to hex
+   das #  digit
+   orb $0x20,%al   # To lower case
+   stosb   # Save char
+   ret # (Recursive)
+
  /* Messages */

-msg_read:  .asciz Read
-msg_part:  .asciz Boot
+msg_read:  .ascii Read error: 
+read_err:  .asciz XX
+msg_part:  .asciz Boot error

-prompt:.asciz  error\r\n
+prompt:.asciz \r\n

.org PRT_OFF,0x90


I get

Read error: 01

Henri

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread John Baldwin
On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:
 On 06/21/2011 15:01, John Baldwin wrote:
  Index: zfsldr.S
  ===
  --- zfsldr.S(revision 223339)
  +++ zfsldr.S(working copy)
  @@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
  callw read  # Read from disk
  lea 0x10(%bp),%sp   # Clear stack
  jnc return  # If success, return
  -   mov $msg_read,%si   # Otherwise, set the error
  -   #  message and fall through to
  -   #  the error routine
  +   mov %ah,%al # Format
  +   mov $read_err,%di   #  error
  +   call hex8   #  code
  +   mov $msg_read,%si   # Set the error message and
  +   #  fall through to the error
  +   #  routine
/*
 * Print out the error message pointed to by %ds:(%si) followed
 * by a prompt, wait for a keypress, and then reboot the machine.
  @@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
  jmp error
msg_chs:  .asciz CHS not supported
 
  +/*
  + * Convert AL to hex, saving the result to [EDI].
  + */
  +hex8:  push %ax# Save
  +   shrb $0x4,%al   # Do upper
  +   call hex8.1 #  4
  +   pop %ax # Restore
  +hex8.1:andb $0xf,%al   # Get lower 4
  +   cmpb $0xa,%al   # Convert
  +   sbbb $0x69,%al  #  to hex
  +   das #  digit
  +   orb $0x20,%al   # To lower case
  +   stosb   # Save char
  +   ret # (Recursive)
  +
/* Messages */
 
  -msg_read:  .asciz Read
  -msg_part:  .asciz Boot
  +msg_read:  .ascii Read error: 
  +read_err:  .asciz XX
  +msg_part:  .asciz Boot error
 
  -prompt:.asciz  error\r\n
  +prompt:.asciz \r\n
 
  .org PRT_OFF,0x90
 
 I get
 
 Read error: 01

Hmm, that would be 'invalid parameter'.

Can you add a 'foo: jmp foo' infinite loop and move it around to figure out
which read call is failing?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread Henri Hennebert

On 06/21/2011 17:55, John Baldwin wrote:

On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:

On 06/21/2011 15:01, John Baldwin wrote:

Index: zfsldr.S
===
--- zfsldr.S(revision 223339)
+++ zfsldr.S(working copy)
@@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
-   mov $msg_read,%si   # Otherwise, set the error
-   #  message and fall through to
-   #  the error routine
+   mov %ah,%al # Format
+   mov $read_err,%di   #  error
+   call hex8   #  code
+   mov $msg_read,%si   # Set the error message and
+   #  fall through to the error
+   #  routine
   /*
* Print out the error message pointed to by %ds:(%si) followed
* by a prompt, wait for a keypress, and then reboot the machine.
@@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
jmp error
   msg_chs: .asciz CHS not supported

+/*
+ * Convert AL to hex, saving the result to [EDI].
+ */
+hex8:  push %ax# Save
+   shrb $0x4,%al   # Do upper
+   call hex8.1 #  4
+   pop %ax # Restore
+hex8.1:andb $0xf,%al   # Get lower 4
+   cmpb $0xa,%al   # Convert
+   sbbb $0x69,%al  #  to hex
+   das #  digit
+   orb $0x20,%al   # To lower case
+   stosb   # Save char
+   ret # (Recursive)
+
   /* Messages */

-msg_read:  .asciz Read
-msg_part:  .asciz Boot
+msg_read:  .ascii Read error: 
+read_err:  .asciz XX
+msg_part:  .asciz Boot error

-prompt:.asciz  error\r\n
+prompt:.asciz \r\n

.org PRT_OFF,0x90


I get

Read error: 01


Hmm, that would be 'invalid parameter'.

Can you add a 'foo: jmp foo' infinite loop and move it around to figure out
which read call is failing?


main.5: mov %dx,MEM_ARG # Save args
movb $NSECT,%dh # Sector count
movl $1024,%eax # Offset to boot2
callw nread.1   # Read disk

foo:jmp foo

After this one I get

'Read error: 01'

Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread John Baldwin
On Tuesday, June 21, 2011 12:15:58 pm Henri Hennebert wrote:
 On 06/21/2011 17:55, John Baldwin wrote:
  On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:
  On 06/21/2011 15:01, John Baldwin wrote:
  Index: zfsldr.S
  ===
  --- zfsldr.S  (revision 223339)
  +++ zfsldr.S  (working copy)
  @@ -234,9 +234,12 @@ nread.1: xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
  - mov $msg_read,%si   # Otherwise, set the error
  - #  message and fall through to
  - #  the error routine
  + mov %ah,%al # Format
  + mov $read_err,%di   #  error
  + call hex8   #  code
  + mov $msg_read,%si   # Set the error message and
  + #  fall through to the error
  + #  routine
 /*
  * Print out the error message pointed to by %ds:(%si) followed
  * by a prompt, wait for a keypress, and then reboot the machine.
  @@ -296,12 +299,28 @@ read.1: mov $msg_chs,%si
jmp error
 msg_chs:   .asciz CHS not supported
 
  +/*
  + * Convert AL to hex, saving the result to [EDI].
  + */
  +hex8:push %ax# Save
  + shrb $0x4,%al   # Do upper
  + call hex8.1 #  4
  + pop %ax # Restore
  +hex8.1:  andb $0xf,%al   # Get lower 4
  + cmpb $0xa,%al   # Convert
  + sbbb $0x69,%al  #  to hex
  + das #  digit
  + orb $0x20,%al   # To lower case
  + stosb   # Save char
  + ret # (Recursive)
  +
 /* Messages */
 
  -msg_read:.asciz Read
  -msg_part:.asciz Boot
  +msg_read:.ascii Read error: 
  +read_err:.asciz XX
  +msg_part:.asciz Boot error
 
  -prompt:  .asciz  error\r\n
  +prompt:  .asciz \r\n
 
.org PRT_OFF,0x90
 
  I get
 
  Read error: 01
 
  Hmm, that would be 'invalid parameter'.
 
  Can you add a 'foo: jmp foo' infinite loop and move it around to figure 
out
  which read call is failing?
 
 main.5: mov %dx,MEM_ARG # Save args
  movb $NSECT,%dh # Sector count
  movl $1024,%eax # Offset to boot2
  callw nread.1   # Read disk
 
 foo:jmp foo
 
 After this one I get
 
 'Read error: 01'

Hmm, ok.  NSECT changed in the MFC (it is now larger).  Try this patch.  It 
changes the code to read zfsboot in one sector at a time:

Index: zfsldr.S
===
--- zfsldr.S(revision 223365)
+++ zfsldr.S(working copy)
@@ -16,7 +16,6 @@
  */
 
 /* Memory Locations */
-   .set MEM_REL,0x700  # Relocation address
.set MEM_ARG,0x900  # Arguments
.set MEM_ORG,0x7c00 # Origin
.set MEM_BUF,0x8000 # Load area
@@ -91,26 +90,19 @@ main:   cld # 
String ops inc
mov %cx,%ss # Set up
mov $start,%sp  #  stack
 /*
- * Relocate ourself to MEM_REL.  Since %cx == 0, the inc %ch sets
- * %cx == 0x100.
- */
-   mov %sp,%si # Source
-   mov $MEM_REL,%di# Destination
-   incb %ch# Word count
-   rep # Copy
-   movsw   #  code
-/*
  * If we are on a hard drive, then load the MBR and look for the first
  * FreeBSD slice.  We use the fake partition entry below that points to
  * the MBR when we call nread.  The first pass looks for the first active
  * FreeBSD slice.  The second pass looks for the first non-active FreeBSD
  * slice if the first one fails.
  */
-   mov $part4,%si  # Partition
+   mov $part4,%si  # Dummy partition
cmpb $0x80,%dl  # Hard drive?
jb main.4   # No
-   movb $0x1,%dh   # Block count
-   callw nread # Read MBR
+   xor %eax,%eax   # Read MBR from
+   movw $MEM_BUF,%bx   #  first sector
+  

Re: MFC: graid(8) (RAID GEOM) support

2011-06-21 Thread Doug Ambrisko
Jeremy Chadwick writes:
| Sorry for the cross-post, but I thought both lists would want to know
| about this.
| 
| Looks like mav@ just committed this ~17 hours ago:
| http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c
| 
| Those who have historically wanted to use Intel MatrixRAID (now called
| Intel RST (Rapid Storage Technology)), but haven't due to the severe
| issues/risks with ataraid(4), will probably be very interested in
| this commit.  I know I am!
| 
| I plan on stress-testing the Intel support on a 2-disk system with
| RAID-1 enabled, and will document my experiences, procedures, etc...

We definitely want people to help test this out.  It was designed from 
the start to be robust and do recovery for RAID 1 which is our use.
We had previously hacked enhanced support into ataraid(4) and ata(4) for 
use in-house. 

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread Henri Hennebert

On 06/21/2011 19:51, John Baldwin wrote:

On Tuesday, June 21, 2011 12:15:58 pm Henri Hennebert wrote:

On 06/21/2011 17:55, John Baldwin wrote:

On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:

On 06/21/2011 15:01, John Baldwin wrote:

Index: zfsldr.S
===
--- zfsldr.S(revision 223339)
+++ zfsldr.S(working copy)
@@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
-   mov $msg_read,%si   # Otherwise, set the error
-   #  message and fall through to
-   #  the error routine
+   mov %ah,%al # Format
+   mov $read_err,%di   #  error
+   call hex8   #  code
+   mov $msg_read,%si   # Set the error message and
+   #  fall through to the error
+   #  routine
/*
 * Print out the error message pointed to by %ds:(%si) followed
 * by a prompt, wait for a keypress, and then reboot the machine.
@@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
jmp error
msg_chs:.asciz CHS not supported

+/*
+ * Convert AL to hex, saving the result to [EDI].
+ */
+hex8:  push %ax# Save
+   shrb $0x4,%al   # Do upper
+   call hex8.1 #  4
+   pop %ax # Restore
+hex8.1:andb $0xf,%al   # Get lower 4
+   cmpb $0xa,%al   # Convert
+   sbbb $0x69,%al  #  to hex
+   das #  digit
+   orb $0x20,%al   # To lower case
+   stosb   # Save char
+   ret # (Recursive)
+
/* Messages */

-msg_read:  .asciz Read
-msg_part:  .asciz Boot
+msg_read:  .ascii Read error: 
+read_err:  .asciz XX
+msg_part:  .asciz Boot error

-prompt:.asciz  error\r\n
+prompt:.asciz \r\n

.org PRT_OFF,0x90


I get

Read error: 01


Hmm, that would be 'invalid parameter'.

Can you add a 'foo: jmp foo' infinite loop and move it around to figure

out

which read call is failing?


main.5: mov %dx,MEM_ARG # Save args
  movb $NSECT,%dh # Sector count
  movl $1024,%eax # Offset to boot2
  callw nread.1   # Read disk

foo:jmp foo

After this one I get

'Read error: 01'


Hmm, ok.  NSECT changed in the MFC (it is now larger).  Try this patch.  It
changes the code to read zfsboot in one sector at a time:



I encounter 2 problems - see in you patch

Henri



Index: zfsldr.S
===
--- zfsldr.S(revision 223365)
+++ zfsldr.S(working copy)
@@ -16,7 +16,6 @@
   */

  /* Memory Locations */
-   .set MEM_REL,0x700  # Relocation address
.set MEM_ARG,0x900  # Arguments
.set MEM_ORG,0x7c00 # Origin
.set MEM_BUF,0x8000 # Load area
@@ -91,26 +90,19 @@ main:   cld # 
String ops inc
mov %cx,%ss # Set up
mov $start,%sp  #  stack
  /*
- * Relocate ourself to MEM_REL.  Since %cx == 0, the inc %ch sets
- * %cx == 0x100.
- */
-   mov %sp,%si # Source
-   mov $MEM_REL,%di# Destination
-   incb %ch# Word count
-   rep # Copy
-   movsw   #  code
-/*
   * If we are on a hard drive, then load the MBR and look for the first
   * FreeBSD slice.  We use the fake partition entry below that points to
   * the MBR when we call nread.  The first pass looks for the first active
   * FreeBSD slice.  The second pass looks for the first non-active FreeBSD
   * slice if the first one fails.
   */
-   mov $part4,%si  # Partition
+   mov $part4,%si  # Dummy partition
cmpb $0x80,%dl  # Hard drive?
jb main.4   # No
-   movb $0x1,%dh   # Block count
-   callw nread # Read MBR
+   xor %eax,%eax   # 

Re: MFC: graid(8) (RAID GEOM) support

2011-06-21 Thread Jeremy Chadwick
On Fri, Jun 17, 2011 at 05:51:24PM -0700, Jeremy Chadwick wrote:
 Sorry for the cross-post, but I thought both lists would want to know
 about this.
 
 Looks like mav@ just committed this ~17 hours ago:
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c
 
 Those who have historically wanted to use Intel MatrixRAID (now called
 Intel RST (Rapid Storage Technology)), but haven't due to the severe
 issues/risks with ataraid(4), will probably be very interested in
 this commit.  I know I am!
 
 I plan on stress-testing the Intel support on a 2-disk system with
 RAID-1 enabled, and will document my experiences, procedures, etc...
 
 Thanks, mav@ and imp@ !
 
 I'll be sending another mail momentarily asking about USB memory stick
 image building, since to accomplish the above, I want to do a
 bare-bones install on our test system (e.g. enable Intel RAID, set up
 2 disks in a RAID-1 mirror, boot a USB memory stick that contains this
 latest RELENG_8 build, and do sysinstall, etc.. the normal way).
 
 
 =
 MFC r219974, r220209, r220210, r220790:
 Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
 various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
 does not depend on legacy ata(4) subsystem and can be used with any disk
 drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
 with `options ATA_CAM`). To make code more readable and extensible, this
 implementation follows modular design, including core part and two sets
 of modules, implementing support for different metadata formats and RAID
 levels.
 
 Support for such popular metadata formats is now implemented:
 Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
 
 Such RAID levels are now supported:
 RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
 
 For all of these RAID levels and metadata formats this class supports
 full cycle of volume operations: reading, writing, creation, deletion,
 disk removal and insertion, rebuilding, dirty shutdown detection
 and resynchronization, bad sector recovery, faulty disks tracking,
 hot-spare disks. For Intel and Promise formats there is support multiple
 volumes per disk set.
 
 Look graid(8) manual page for additional details.
 
 Co-authored by: imp
 Sponsored by:   Cisco Systems, Inc. and iXsystems, Inc.
 =

By the way, it doesn't look like the graid(8) man page is being brought
in to the base system on either of the two RELENG_8 systems I've rebuilt
in the past few days.

I'm thinking /usr/src/sbin/geom/class/raid/graid.8 isn't being noticed
as a man page.

/usr/src/sbin/geom/class/raid/Makefile doesn't have MAN8=graid.8 in it,
is that the problem?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread John Baldwin
On Tuesday, June 21, 2011 3:02:28 pm Henri Hennebert wrote:
 On 06/21/2011 19:51, John Baldwin wrote:
  On Tuesday, June 21, 2011 12:15:58 pm Henri Hennebert wrote:
  On 06/21/2011 17:55, John Baldwin wrote:
  On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:
  On 06/21/2011 15:01, John Baldwin wrote:
  Index: zfsldr.S
  ===
  --- zfsldr.S(revision 223339)
  +++ zfsldr.S(working copy)
  @@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
  callw read  # Read from disk
  lea 0x10(%bp),%sp   # Clear stack
  jnc return  # If success, return
  -   mov $msg_read,%si   # Otherwise, set the 
  error
  -   #  message and fall 
  through to
  -   #  the error routine
  +   mov %ah,%al # Format
  +   mov $read_err,%di   #  error
  +   call hex8   #  code
  +   mov $msg_read,%si   # Set the error message 
  and
  +   #  fall through to the 
  error
  +   #  routine
  /*
   * Print out the error message pointed to by %ds:(%si) followed
   * by a prompt, wait for a keypress, and then reboot the machine.
  @@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
  jmp error
  msg_chs:.asciz CHS not supported
 
  +/*
  + * Convert AL to hex, saving the result to [EDI].
  + */
  +hex8:  push %ax# Save
  +   shrb $0x4,%al   # Do upper
  +   call hex8.1 #  4
  +   pop %ax # Restore
  +hex8.1:andb $0xf,%al   # Get lower 4
  +   cmpb $0xa,%al   # Convert
  +   sbbb $0x69,%al  #  to hex
  +   das #  digit
  +   orb $0x20,%al   # To lower case
  +   stosb   # Save char
  +   ret # (Recursive)
  +
  /* Messages */
 
  -msg_read:  .asciz Read
  -msg_part:  .asciz Boot
  +msg_read:  .ascii Read error: 
  +read_err:  .asciz XX
  +msg_part:  .asciz Boot error
 
  -prompt:.asciz  error\r\n
  +prompt:.asciz \r\n
 
  .org PRT_OFF,0x90
 
  I get
 
  Read error: 01
 
  Hmm, that would be 'invalid parameter'.
 
  Can you add a 'foo: jmp foo' infinite loop and move it around to figure
  out
  which read call is failing?
 
  main.5: mov %dx,MEM_ARG # Save args
movb $NSECT,%dh # Sector count
movl $1024,%eax # Offset to boot2
callw nread.1   # Read disk
 
  foo:jmp foo
 
  After this one I get
 
  'Read error: 01'
 
  Hmm, ok.  NSECT changed in the MFC (it is now larger).  Try this patch.  
It
  changes the code to read zfsboot in one sector at a time:
 
 
 I encounter 2 problems - see in you patch
 
 Henri
 
 
  Index: zfsldr.S
  ===
  --- zfsldr.S(revision 223365)
  +++ zfsldr.S(working copy)
  @@ -16,7 +16,6 @@
 */
 
/* Memory Locations */
  -   .set MEM_REL,0x700  # Relocation address
  .set MEM_ARG,0x900  # Arguments
  .set MEM_ORG,0x7c00 # Origin
  .set MEM_BUF,0x8000 # Load area
  @@ -91,26 +90,19 @@ main:   cld # 
  String ops inc
  mov %cx,%ss # Set up
  mov $start,%sp  #  stack
/*
  - * Relocate ourself to MEM_REL.  Since %cx == 0, the inc %ch sets
  - * %cx == 0x100.
  - */
  -   mov %sp,%si # Source
  -   mov $MEM_REL,%di# Destination
  -   incb %ch# Word count
  -   rep # Copy
  -   movsw   #  code
  -/*
 * If we are on a hard drive, then load the MBR and look for the first
 * FreeBSD slice.  We use the fake partition entry below that points to
 * the MBR when we call nread.  The first pass looks for the first 
active
 * FreeBSD slice.  The second pass looks for the first non-active 
FreeBSD
 * slice if the first one fails.
 */
  -   mov $part4,%si  # Partition
  +   mov $part4,%si  # Dummy partition
  cmpb $0x80,%dl  # Hard 

Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread Henri Hennebert

On 06/21/2011 21:25, John Baldwin wrote:

On Tuesday, June 21, 2011 3:02:28 pm Henri Hennebert wrote:

On 06/21/2011 19:51, John Baldwin wrote:

On Tuesday, June 21, 2011 12:15:58 pm Henri Hennebert wrote:

On 06/21/2011 17:55, John Baldwin wrote:

On Tuesday, June 21, 2011 10:50:14 am Henri Hennebert wrote:

On 06/21/2011 15:01, John Baldwin wrote:

Index: zfsldr.S
===
--- zfsldr.S(revision 223339)
+++ zfsldr.S(working copy)
@@ -234,9 +234,12 @@ nread.1:   xor %ecx,%ecx   # Get
callw read  # Read from disk
lea 0x10(%bp),%sp   # Clear stack
jnc return  # If success, return
-   mov $msg_read,%si   # Otherwise, set the error
-   #  message and fall through to
-   #  the error routine
+   mov %ah,%al # Format
+   mov $read_err,%di   #  error
+   call hex8   #  code
+   mov $msg_read,%si   # Set the error message and
+   #  fall through to the error
+   #  routine
 /*
  * Print out the error message pointed to by %ds:(%si) followed
  * by a prompt, wait for a keypress, and then reboot the machine.
@@ -296,12 +299,28 @@ read.1:   mov $msg_chs,%si
jmp error
 msg_chs:   .asciz CHS not supported

+/*
+ * Convert AL to hex, saving the result to [EDI].
+ */
+hex8:  push %ax# Save
+   shrb $0x4,%al   # Do upper
+   call hex8.1 #  4
+   pop %ax # Restore
+hex8.1:andb $0xf,%al   # Get lower 4
+   cmpb $0xa,%al   # Convert
+   sbbb $0x69,%al  #  to hex
+   das #  digit
+   orb $0x20,%al   # To lower case
+   stosb   # Save char
+   ret # (Recursive)
+
 /* Messages */

-msg_read:  .asciz Read
-msg_part:  .asciz Boot
+msg_read:  .ascii Read error: 
+read_err:  .asciz XX
+msg_part:  .asciz Boot error

-prompt:.asciz  error\r\n
+prompt:.asciz \r\n

.org PRT_OFF,0x90


I get

Read error: 01


Hmm, that would be 'invalid parameter'.

Can you add a 'foo: jmp foo' infinite loop and move it around to figure

out

which read call is failing?


main.5: mov %dx,MEM_ARG # Save args
   movb $NSECT,%dh # Sector count
   movl $1024,%eax # Offset to boot2
   callw nread.1   # Read disk

foo:jmp foo

After this one I get

'Read error: 01'


Hmm, ok.  NSECT changed in the MFC (it is now larger).  Try this patch.

It

changes the code to read zfsboot in one sector at a time:



I encounter 2 problems - see in you patch

Henri



Index: zfsldr.S
===
--- zfsldr.S(revision 223365)
+++ zfsldr.S(working copy)
@@ -16,7 +16,6 @@
*/

   /* Memory Locations */
-   .set MEM_REL,0x700  # Relocation address
.set MEM_ARG,0x900  # Arguments
.set MEM_ORG,0x7c00 # Origin
.set MEM_BUF,0x8000 # Load area
@@ -91,26 +90,19 @@ main:   cld # 
String ops inc
mov %cx,%ss # Set up
mov $start,%sp  #  stack
   /*
- * Relocate ourself to MEM_REL.  Since %cx == 0, the inc %ch sets
- * %cx == 0x100.
- */
-   mov %sp,%si # Source
-   mov $MEM_REL,%di# Destination
-   incb %ch# Word count
-   rep # Copy
-   movsw   #  code
-/*
* If we are on a hard drive, then load the MBR and look for the first
* FreeBSD slice.  We use the fake partition entry below that points to
* the MBR when we call nread.  The first pass looks for the first

active

* FreeBSD slice.  The second pass looks for the first non-active

FreeBSD

* slice if the first one fails.
*/
-   mov $part4,%si  # Partition
+   mov $part4,%si  # Dummy partition
cmpb $0x80,%dl  # Hard drive?
jb main.4   # No
-   movb $0x1,%dh   # 

Re: ZFS boot inside on the second partition inside a slice

2011-06-21 Thread John Baldwin
On Tuesday, June 21, 2011 4:13:20 pm Henri Hennebert wrote:
 On 06/21/2011 21:25, John Baldwin wrote:
 and I get:
 
 Read error: 04

Hmm, that is the error for an invalid sector.  Try this patch.  It reshuffles
a few more things and adds code to dump the low 32-bits of the LBA on an
error:

Index: zfsldr.S
===
--- zfsldr.S(revision 223365)
+++ zfsldr.S(working copy)
@@ -16,7 +16,6 @@
  */
 
 /* Memory Locations */
-   .set MEM_REL,0x700  # Relocation address
.set MEM_ARG,0x900  # Arguments
.set MEM_ORG,0x7c00 # Origin
.set MEM_BUF,0x8000 # Load area
@@ -91,26 +90,18 @@ main:   cld # 
String ops inc
mov %cx,%ss # Set up
mov $start,%sp  #  stack
 /*
- * Relocate ourself to MEM_REL.  Since %cx == 0, the inc %ch sets
- * %cx == 0x100.
- */
-   mov %sp,%si # Source
-   mov $MEM_REL,%di# Destination
-   incb %ch# Word count
-   rep # Copy
-   movsw   #  code
-/*
  * If we are on a hard drive, then load the MBR and look for the first
  * FreeBSD slice.  We use the fake partition entry below that points to
  * the MBR when we call nread.  The first pass looks for the first active
  * FreeBSD slice.  The second pass looks for the first non-active FreeBSD
  * slice if the first one fails.
  */
-   mov $part4,%si  # Partition
+   mov $part4,%si  # Dummy partition
cmpb $0x80,%dl  # Hard drive?
jb main.4   # No
-   movb $0x1,%dh   # Block count
-   callw nread # Read MBR
+   xor %eax,%eax   # Read MBR
+   movw $MEM_BUF,%bx   #  from first
+   callw nread #  sector
mov $0x1,%cx# Two passes
 main.1:mov $MEM_BUF+PRT_OFF,%si# Partition table
movb $0x1,%dh   # Partition
@@ -161,10 +152,16 @@ main.4:   xor %dx,%dx # 
Partition:drive
  * area and target area do not overlap.
  */
 main.5:mov %dx,MEM_ARG # Save args
-   movb $NSECT,%dh # Sector count
+   mov $NSECT,%cx  # Sector count
movl $1024,%eax # Offset to boot2
-   callw nread.1   # Read disk
-main.6:mov $MEM_BUF,%si# BTX (before reloc)
+   mov $MEM_BUF,%bx# Destination buffer
+main.6:pushal  # Save params
+   callw nread # Read disk
+   popal   # Restore
+   incl %eax   # Update for
+   add $SIZ_SEC,%bx#  next sector
+   loop main.6 # If not last, read another
+   mov $MEM_BUF,%si# BTX (before reloc)
mov 0xa(%si),%bx# Get BTX length and set
mov $NSECT*SIZ_SEC-1,%di# Size of load area (less one)
mov %di,%si # End of load
@@ -214,29 +211,35 @@ seta20.3: sti # Enable 
interrupts
  * packet on the stack and passes it to read.
  *
  * %eax- int - LBA to read in relative to partition start
+ * %es:%bx - ptr - destination address
  * %dl - byte- drive to read from
- * %dh - byte- num sectors to read
  * %si - ptr - MBR partition entry
  */
-nread: xor %eax,%eax   # Sector offset in partition
-nread.1:   xor %ecx,%ecx   # Get
+nread: xor %ecx,%ecx   # Get
addl 0x8(%si),%eax  #  LBA
adc $0,%ecx
pushl %ecx  # Starting absolute block
pushl %eax  #  block number
push %es# Address of
-   push $MEM_BUF   #  transfer buffer
-   xor %ax,%ax # Number of
-   movb %dh,%al#  blocks to
-   push %ax#  transfer
+   push %bx#  transfer buffer
+   push $0x1   # Read 1 sector
push $0x10  # Size of packet
mov %sp,%bp 

SOLVED (was: re0 died last night; here's how I half-revived it)

2011-06-21 Thread Kirk Strauser
I found the problem: sometime between the May 8 kernel I'd been using  
and the new one (latest build: 15:02:36 CST today), my system decided  
to devour socket buffers. I set kern.ipc.maxsockbuf=16777216 and have  
over an hour of stable multi-user uptime, which is a vast improvement!


On Jun 9, 2011, at 9:37 AM, Kirk Strauser wrote:

I have a FreeBSD 8-STABLE system that's been running stably since I  
last upgraded and rebooted on May 8. Yesterday, I updated /usr/src  
to get ZFS v28 and also seem to have gotten rid of my nice, solid  
re0 network interface:


re0: RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet port  
0xb000-0xb0ff mem 0xea21-0xea210fff,0xea20-0xea20 irq 16  
at device 0.0 on pci5

re0: Using 1 MSI-X message
re0: Chip rev. 0x3c00
re0: MAC rev. 0x0040
miibus0: MII bus on re0

I'm too tired from lack of sleep due to getting the system back up  
and running to remember all the details, but the summary is that it  
started autodetecting its media as 10baseT/UTP. Almost immediately  
after boot - sometimes while still playing in single-user mode - I'd  
start seeing no buffer space available error messages all over the  
place.


Forcing media to 1000baseTX/full-duplex fixed the problem for a few  
minutes, but it wouldn't stay in that state and would shortly start  
throwing no buffer space available errors again. Until I've gotten  
some sleep and have more mental energy to figure out exactly what's  
going on, I've found that forcing the media to 100baseTX keeps it  
solidly chugging along (if a little slowly).


Anyway, that's where I'm at now. If your re NIC is giving you fits  
this morning, try setting it to 100baseTX and see if that'll get you  
running until a better fix comes along.


- Kirk

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Joshua Boyd
If needed, I can reproduce this on demand. Just need to know what sort of
statistics are needed when the problem is occurring. I've had to turn off my
weekly scrubs until I can figure out how to fix this problem.

On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = 

Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Jack Vogel
I cannot repro this, I used your kernel config, this is on a Dell 1850 btw,
I ran netperf stress from 3 clients, and have seen no watchdogs :(

Jack


On Tue, Jun 21, 2011 at 7:59 PM, Joshua Boyd boy...@jbip.net wrote:

 If needed, I can reproduce this on demand. Just need to know what sort of
 statistics are needed when the problem is occurring. I've had to turn off my
 weekly scrubs until I can figure out how to fix this problem.


 On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

  On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like
 watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI