I've played around changing the spinloop to using DELAY (like the Linux model),
but this didn't prevent the controller from either "just" locking up or 
crashing the whole machine with it. Changing various other places in a similar
manner (like replacing the bcopy() in amr_quartz_get_work() with similar
code as in the linux driver to wait for 0xFF to clear) didn't do the trick
either. 

However, when I forced the driver to not use the full number of
concurrent commands as returned by the firmware, I seem to finally have 
found the one change that made the difference. Looking at the linux
code, it sets a hard limit of AMR_MAXCMD (MAX_COMMANDS in the linux code) of
127 (my controller, a 466, returned 254), and it says the value can be tweaked
between 0 and 253, not 254...). So, forcing sc->amr_maxio to AMR_MAXCMD if
that one's smaller, in amr_query_controller(), might cause some performance
loss, but it made the code *significantly* stabler than before. I did two
make world on the raid now, and not one hickup. Before I wasn't even able to
copy over the system to the raid without sending the system to reboot. 

Possible explanation: people that introduced debugging statements slowed down
the feeding of new commands to the controller, so the controller didn't ever
use up the full set of concurrent commands. The lockup happens when too many
concurrent commands are open (now, I haven't tried setting things to 253, I
am glad things finally work:-)).

Hope this helps,
Markus
-- 
KPNQwest Switzerland Ltd
P.O. Box 9470, Zweierstrasse 35, CH-8036 Zuerich
Tel: +41-1-298-6030, Fax: +41-1-291-4642
Markus Wild, Manager Engineering, e-mail: [EMAIL PROTECTED]
Index: amr.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/amr/amr.c,v
retrieving revision 1.8
diff -c -r1.8 amr.c
*** amr.c       2000/03/20 10:44:03     1.8
--- amr.c       2000/03/23 19:20:03
***************
*** 699,704 ****
--- 702,712 ----
        }
        sc->amr_maxdrives = 8;
        sc->amr_maxio = ae->ae_adapter.aa_maxio;
+       if (sc->amr_maxio > AMR_MAXCMD) {
+         device_printf(sc->amr_dev, "reducing maxio from %d to %d\n", 
+                       sc->amr_maxio, AMR_MAXCMD);
+         sc->amr_maxio = AMR_MAXCMD;
+       }
        for (i = 0; i < ae->ae_ldrv.al_numdrives; i++) {
            sc->amr_drive[i].al_size = ae->ae_ldrv.al_size[i];
            sc->amr_drive[i].al_state = ae->ae_ldrv.al_state[i];
***************
*** 853,859 ****
        ac->ac_private = bp;
        ac->ac_data = bp->b_data;
        ac->ac_length = bp->b_bcount;
!       if (bp->b_iocmd == BIO_READ) {
            ac->ac_flags |= AMR_CMD_DATAIN;
            cmd = AMR_CMD_LREAD;
        } else {
--- 861,868 ----
        ac->ac_private = bp;
        ac->ac_data = bp->b_data;
        ac->ac_length = bp->b_bcount;
! /*    if (bp->b_iocmd == BIO_READ) { */
!       if (bp->b_flags & B_READ) {
            ac->ac_flags |= AMR_CMD_DATAIN;
            cmd = AMR_CMD_LREAD;
        } else {
Index: amrvar.h
===================================================================
RCS file: /home/ncvs/src/sys/dev/amr/amrvar.h,v
retrieving revision 1.2
diff -c -r1.2 amrvar.h
*** amrvar.h    1999/10/26 23:18:57     1.2
--- amrvar.h    2000/03/23 19:20:04
***************
*** 37,43 ****
  #define AMR_CFG_SIG   0xa0
  #define AMR_SIGNATURE 0x3344
  
! #define AMR_MAXCMD    255             /* ident = 0 not allowed */
  #define AMR_MAXLD             40
  
  #define AMR_BLKSIZE   512
--- 37,44 ----
  #define AMR_CFG_SIG   0xa0
  #define AMR_SIGNATURE 0x3344
  
! /*#define AMR_MAXCMD  255*/           /* ident = 0 not allowed */
! #define AMR_MAXCMD    127             /* ident = 0 not allowed */
  #define AMR_MAXLD             40
  
  #define AMR_BLKSIZE   512

Reply via email to