Re: camcontrol stop / restart broken

2001-05-06 Thread Kenneth D. Merry

On Sun, May 06, 2001 at 11:19:53 +0200, J Wunsch wrote:
> [F'up changed to freebsd-scsi]
> 
> "Kenneth D. Merry" <[EMAIL PROTECTED]> wrote:
> 
> > This should be fixed as of rev 1.22 of scsi_all.c.  There was an errant
> > search and replace that caused the 'start' bit in the start/stop unit to
> > always be set to 0 (stop).  So automatic spinups wouldn't work, and
> > 'camcontrol start' wouldn't work.
> 
> I've got:
> 
> uriah # cvs stat /sys/cam/scsi/scsi_all.c
> ===
> File: scsi_all.cStatus: Up-to-date
> 
>Working revision:1.24Result of merge
>Repository revision: 1.24/home/ncvs/src/sys/cam/scsi/scsi_all.c,v
>Sticky Tag:  (none)
>Sticky Date: (none)
>Sticky Options:  (none)
> 
> and still have the problem that the "camcontrol start" doesn't
> work.  It returns immediately to the caller, claiming a "unit started
> successfully", while the drive hasn't started at all.

camcontrol uses scsi_start_stop() to build the start unit CDB.
scsi_start_stop() is in libcam, which compiles a number of kernel files in
userland.  (including scsi_all.c)  So you need to rebuild world to fix
camcontrol.

> Issuing a "camcontrol command daX -c '1b 0 0 0 1 0'" works.

That bypasses the CDB builder function (scsi_start_stop()), so I would
expect it to work.

> I didn't try whether the kernel-implied startup on disk access would
> work, though, since it would IMHO risk a hanging kernel and controller
> timeout.

That should work now.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-05-06 Thread J Wunsch

[F'up changed to freebsd-scsi]

"Kenneth D. Merry" <[EMAIL PROTECTED]> wrote:

> This should be fixed as of rev 1.22 of scsi_all.c.  There was an errant
> search and replace that caused the 'start' bit in the start/stop unit to
> always be set to 0 (stop).  So automatic spinups wouldn't work, and
> 'camcontrol start' wouldn't work.

I've got:

uriah # cvs stat /sys/cam/scsi/scsi_all.c
===
File: scsi_all.cStatus: Up-to-date

   Working revision:1.24Result of merge
   Repository revision: 1.24/home/ncvs/src/sys/cam/scsi/scsi_all.c,v
   Sticky Tag:  (none)
   Sticky Date: (none)
   Sticky Options:  (none)

and still have the problem that the "camcontrol start" doesn't
work.  It returns immediately to the caller, claiming a "unit started
successfully", while the drive hasn't started at all.

Issuing a "camcontrol command daX -c '1b 0 0 0 1 0'" works.

I didn't try whether the kernel-implied startup on disk access would
work, though, since it would IMHO risk a hanging kernel and controller
timeout.

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-05-01 Thread Kenneth D. Merry

On Tue, May 01, 2001 at 22:03:37 +0300, Tomi Vainio - Sun Finland - wrote:
> Kenneth D. Merry writes:
>  > 
>  > Hmm.  Well, I definitely haven't seen this before.  The only thing I can
>  > figure is that we got into some sort of infinite rescan loop.  I don't know
>  > how spinning up the disk (or trying to) would trigger a rescan.
>  > 
> My system has been up and running 21 hours since world rebuild and
> reboot. During this time I have stopped and started disk multiple
> times and these errors are history now.

Ahh, cool.  Well, if you see them again, let me know.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-05-01 Thread Tomi Vainio - Sun Finland -

Kenneth D. Merry writes:
 > 
 > Hmm.  Well, I definitely haven't seen this before.  The only thing I can
 > figure is that we got into some sort of infinite rescan loop.  I don't know
 > how spinning up the disk (or trying to) would trigger a rescan.
 > 
My system has been up and running 21 hours since world rebuild and
reboot. During this time I have stopped and started disk multiple
times and these errors are history now.

  Tomppa
-- 
SUN Microsystems Oy PL 112, Lars Sonckin kaari 12, 02601 ESPOO, Finland
Tomi Vainio (System Support Engineer) +358 9 52556300 hotline
email: [EMAIL PROTECTED]+358 9 52556252 fax

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-05-01 Thread Kenneth D. Merry

On Mon, Apr 30, 2001 at 21:22:01 +0300, Tomi Vainio - Sun Finland - wrote:
> Kenneth D. Merry writes:
>  > 
>  > This should be fixed as of rev 1.22 of scsi_all.c.  There was an errant
>  > search and replace that caused the 'start' bit in the start/stop unit to
>  > always be set to 0 (stop).  So automatic spinups wouldn't work, and
>  > 'camcontrol start' wouldn't work.
>  >
> Thanks, I'll test this soon.
> 
>  > I'd still like to know when these messages are cropping up.
>  >
> I scanned messages files and it seems to start ~2 hours after I have tried
> to spin up the disk first time.
> 
> Apr 28 23:01:40 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
> Apr 28 23:08:10 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
> Apr 29 00:49:42 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate CCB, can't continue
> 
> Apr 29 14:40:00 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
> Apr 29 14:44:31 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
> Apr 29 16:34:04 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate path, can't continue
> 

Hmm.  Well, I definitely haven't seen this before.  The only thing I can
figure is that we got into some sort of infinite rescan loop.  I don't know
how spinning up the disk (or trying to) would trigger a rescan.

If it happens again, could you try to drop into the debugger and get a
stack trace?  If the stack trace doesn't show anything, perhaps setting a
breakpoint in xpt_scan_lun would work.  (You may want to have remote gdb
setup for that.)

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-30 Thread Tomi Vainio - Sun Finland -

Kenneth D. Merry writes:
 > 
 > This should be fixed as of rev 1.22 of scsi_all.c.  There was an errant
 > search and replace that caused the 'start' bit in the start/stop unit to
 > always be set to 0 (stop).  So automatic spinups wouldn't work, and
 > 'camcontrol start' wouldn't work.
 >
Thanks, I'll test this soon.

 > I'd still like to know when these messages are cropping up.
 >
I scanned messages files and it seems to start ~2 hours after I have tried
to spin up the disk first time.

Apr 28 23:01:40 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
Apr 28 23:08:10 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
Apr 29 00:49:42 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
allocate CCB, can't continue

Apr 29 14:40:00 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
Apr 29 14:44:31 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
Apr 29 16:34:04 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
allocate path, can't continue

  Tomppa
-- 
SUN Microsystems Oy PL 112, Lars Sonckin kaari 12, 02601 ESPOO, Finland
Tomi Vainio (System Support Engineer) +358 9 52556300 hotline
email: [EMAIL PROTECTED]+358 9 52556252 fax

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-30 Thread Kenneth D. Merry

On Sun, Apr 29, 2001 at 14:47:47 +0300, Tomi Vainio - Sun Finland - wrote:
> Kenneth D. Merry writes:
>  > 
>  > Can you do the following:
>  > 
>  > camcontrol stop da1
>  > camcontrol tur da1 -v
>  > [ then you can start it back up with camcontrol start ]
>  > 
>  > What I want to see here is the sense information coming back from the drive
>  > when it is spun down.
>  > 
>  > The new error recovery code should be doing the same thing as the old error
>  > recovery code -- sending a start unit.  For some reason it isn't doing the
>  > right thing, though.
>  > 
> cat:~(10)# camcontrol stop da1
> Unit stopped successfully
> cat:~(11)# camcontrol tur da1 -v
> Unit is not ready
> (pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
> (pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
> (pass1:ahc0:0:2:0): SCSI Status: Check Condition
> (pass1:ahc0:0:2:0): NOT READY asc:4,2
> (pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
>replaceable unit: 2

This should be fixed as of rev 1.22 of scsi_all.c.  There was an errant
search and replace that caused the 'start' bit in the start/stop unit to
always be set to 0 (stop).  So automatic spinups wouldn't work, and
'camcontrol start' wouldn't work.

> Also messages file is full of these:
> 
> Apr 29 00:55:42 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate path, can't continue
> Apr 29 00:55:43 cat last message repeated 26 times
> Apr 29 00:57:43 cat last message repeated 359 times
> Apr 29 01:07:43 cat last message repeated 1793 times
> Apr 29 01:17:43 cat last message repeated 1794 times
> Apr 29 01:27:43 cat last message repeated 1793 times
> Apr 29 01:34:13 cat last message repeated 1122 times
> Apr 29 01:34:13 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate path, can't continue
> Apr 29 01:34:13 cat last message repeated 43 times
> Apr 29 01:36:02 cat last message repeated 322 times

I'd still like to know when these messages are cropping up.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-29 Thread Doug Russell


On Sun, 29 Apr 2001, Kenneth D. Merry wrote:

> (pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
> (pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
> (pass1:ahc0:0:2:0): SCSI Status: Check Condition
> (pass1:ahc0:0:2:0): NOT READY asc:4,2
> (pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
>replaceable unit: 2
> cat:~(12)# mount /f
> mount: /dev/da1s1e: Input/output error

Just curious, but, does it still bomb if you manually

camcontrol stop da1
camcontrol start da1
camcontrol start da1(do it two or three times to be sure...)
camcontrol start da1
mount whatever

Does it still stick whenever the drive is manually powered up, or does it
only stick when FreeBSD should be automatically restarting a stopped drive?

I'm assuming the camcontrol start da1 just always returns an error? No?

Later.. 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-29 Thread Kenneth D. Merry

On Sun, Apr 29, 2001 at 14:47:47 +0300, Tomi Vainio - Sun Finland - wrote:
> Kenneth D. Merry writes:
>  > 
>  > Can you do the following:
>  > 
>  > camcontrol stop da1
>  > camcontrol tur da1 -v
>  > [ then you can start it back up with camcontrol start ]
>  > 
>  > What I want to see here is the sense information coming back from the drive
>  > when it is spun down.
>  > 
>  > The new error recovery code should be doing the same thing as the old error
>  > recovery code -- sending a start unit.  For some reason it isn't doing the
>  > right thing, though.
>  > 
> cat:~(10)# camcontrol stop da1
> Unit stopped successfully
> cat:~(11)# camcontrol tur da1 -v
> Unit is not ready
> (pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
> (pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
> (pass1:ahc0:0:2:0): SCSI Status: Check Condition
> (pass1:ahc0:0:2:0): NOT READY asc:4,2
> (pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
>replaceable unit: 2
> cat:~(12)# mount /f
> mount: /dev/da1s1e: Input/output error
> cat:~(13)# camcontrol tur da1 -v
> Unit is not ready
> (pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
> (pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
> (pass1:ahc0:0:2:0): SCSI Status: Check Condition
> (pass1:ahc0:0:2:0): NOT READY asc:4,2
> (pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
>replaceable unit: 2

That's the normal error message, so I'm not sure what's going on here.

This will probably have to wait 'till tomorrow when I can get on a -current
test box.  There's definitely something odd going on.

> cat:~(15)# camcontrol start da1
> Unit started successfully
> cat:~(16)# mount /f
> mount: /dev/da1s1e: Input/output error

At this point the pack has probably already been invalidated, so it won't
let you mount the drive.

> cat:~(17)# camcontrol devlist
>   at scbus0 target 0 lun 0 (pass0,da0)
> at scbus1 target 2 lun 0 (probe0,pass1,da1)
> 
> 
> 
> Also messages file is full of these:
> 
> Apr 29 00:55:42 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate path, can't continue
> Apr 29 00:55:43 cat last message repeated 26 times
> Apr 29 00:57:43 cat last message repeated 359 times
> Apr 29 01:07:43 cat last message repeated 1793 times
> Apr 29 01:17:43 cat last message repeated 1794 times
> Apr 29 01:27:43 cat last message repeated 1793 times
> Apr 29 01:34:13 cat last message repeated 1122 times
> Apr 29 01:34:13 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
>allocate path, can't continue
> Apr 29 01:34:13 cat last message repeated 43 times
> Apr 29 01:36:02 cat last message repeated 322 times

That's not good; it means malloc is failing.  Did this happen right after
boot, or after a 'camcontrol rescan' or what?

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-29 Thread Tomi Vainio - Sun Finland -

Kenneth D. Merry writes:
 > 
 > Can you do the following:
 > 
 > camcontrol stop da1
 > camcontrol tur da1 -v
 > [ then you can start it back up with camcontrol start ]
 > 
 > What I want to see here is the sense information coming back from the drive
 > when it is spun down.
 > 
 > The new error recovery code should be doing the same thing as the old error
 > recovery code -- sending a start unit.  For some reason it isn't doing the
 > right thing, though.
 > 
cat:~(10)# camcontrol stop da1
Unit stopped successfully
cat:~(11)# camcontrol tur da1 -v
Unit is not ready
(pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
(pass1:ahc0:0:2:0): SCSI Status: Check Condition
(pass1:ahc0:0:2:0): NOT READY asc:4,2
(pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
replaceable unit: 2
cat:~(12)# mount /f
mount: /dev/da1s1e: Input/output error
cat:~(13)# camcontrol tur da1 -v
Unit is not ready
(pass1:ahc0:0:2:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(pass1:ahc0:0:2:0): CAM Status: SCSI Status Error
(pass1:ahc0:0:2:0): SCSI Status: Check Condition
(pass1:ahc0:0:2:0): NOT READY asc:4,2
(pass1:ahc0:0:2:0): Logical unit not ready, initializing cmd. required field 
replaceable unit: 2
cat:~(15)# camcontrol start da1
Unit started successfully
cat:~(16)# mount /f
mount: /dev/da1s1e: Input/output error
cat:~(17)# camcontrol devlist
  at scbus0 target 0 lun 0 (pass0,da0)
at scbus1 target 2 lun 0 (probe0,pass1,da1)



Also messages file is full of these:

Apr 29 00:55:42 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
allocate path, can't continue
Apr 29 00:55:43 cat last message repeated 26 times
Apr 29 00:57:43 cat last message repeated 359 times
Apr 29 01:07:43 cat last message repeated 1793 times
Apr 29 01:17:43 cat last message repeated 1794 times
Apr 29 01:27:43 cat last message repeated 1793 times
Apr 29 01:34:13 cat last message repeated 1122 times
Apr 29 01:34:13 cat /boot/kernel/kernel: (noperiph:ahc0:0:2:0): xpt_scan_lun: can't 
allocate path, can't continue
Apr 29 01:34:13 cat last message repeated 43 times
Apr 29 01:36:02 cat last message repeated 322 times


  Tomppa
-- 
SUN Microsystems Oy PL 112, Lars Sonckin kaari 12, 02601 ESPOO, Finland
Tomi Vainio (System Support Engineer) +358 9 52556300 hotline
email: [EMAIL PROTECTED]+358 9 52556252 fax

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-29 Thread J Wunsch

Tomi Vainio - Sun Finland - <[EMAIL PROTECTED]> wrote:

> My source disk is quite noisy so normally I stop it after building the
> world and restart it once a week.  Couple weeks this restart hasn't
> worked as before.

Oh, now that you mention it, i've seen that before, too.  I didn't pay
too much attention then since i initially thought that particular disk
wasn't quite OK together with my new controller (the Tekram BIOS also
has problems if the drive isn't spun-up at boot time).  But now that
i've seen the same kind of behaviour for other devices (like a scanner
that has been power-cycled), i see the correlation to the CAM error
handling changes here as well.

If you don't get a reply here, please repost to [EMAIL PROTECTED]

If you look into the archives of that list, you can see some other
discussions about the new error handling.  Note that it's most likely
not the rewritten error handling itself, but rather that the rewrite
uncovered bugs in the top-layer SCSI drivers.

> Only way to start disk again is reboot.

Disconnecting (or powering off the drive), "camcontrol rescan",
reconnect, "camcontrol rescan" would probably work as well.

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: camcontrol stop / restart broken

2001-04-28 Thread Kenneth D. Merry

On Sat, Apr 28, 2001 at 23:09:07 +0300, Tomi Vainio - Sun Finland - wrote:
> Hi,
> 
> My source disk is quite noisy so normally I stop it after building the
> world and restart it once a week.  Couple weeks this restart hasn't
> worked as before.  Only way to start disk again is reboot.  
> 
> camcontrol stop 1:2:0
> Unit stopped successfully
> mount /f
> mount: /dev/da1s1e: Input/output error
> tail /var/log/messages
> Apr 28 23:01:40 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
> Apr 28 23:01:40 cat /boot/kernel/kernel: da1: reading primary partition table: error 
>reading fsbn 0
> camcontrol start 1:2:0
> Unit started successfully
> mount /f
> mount: /dev/da1s1e: Input/output error

Hmm, that's not good. 

Can you do the following:

camcontrol stop da1
camcontrol tur da1 -v
[ then you can start it back up with camcontrol start ]

What I want to see here is the sense information coming back from the drive
when it is spun down.

The new error recovery code should be doing the same thing as the old error
recovery code -- sending a start unit.  For some reason it isn't doing the
right thing, though.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



camcontrol stop / restart broken

2001-04-28 Thread Tomi Vainio - Sun Finland -

Hi,

My source disk is quite noisy so normally I stop it after building the
world and restart it once a week.  Couple weeks this restart hasn't
worked as before.  Only way to start disk again is reboot.  

camcontrol stop 1:2:0
Unit stopped successfully
mount /f
mount: /dev/da1s1e: Input/output error
tail /var/log/messages
Apr 28 23:01:40 cat /boot/kernel/kernel: (da1:ahc0:0:2:0): Invalidating pack
Apr 28 23:01:40 cat /boot/kernel/kernel: da1: reading primary partition table: error 
reading fsbn 0
camcontrol start 1:2:0
Unit started successfully
mount /f
mount: /dev/da1s1e: Input/output error

ahc0:  port 0xd400-0xd4ff mem 
0xefffe000-0xefffefff irq 11 at device 9.0 on pci0
aic7895C: Wide Channel A, SCSI Id=7, 32/255 SCBs
ahc1:  port 0xdc00-0xdcff mem 
0xe000-0xefff irq 11 at device 9.1 on pci0
aic7895C: Wide Channel B, SCSI Id=7, 32/255 SCBs

da1 at ahc0 bus 0 target 2 lun 0
da1:  Fixed Direct Access SCSI-2 device
da1: 11.626MB/s transfers (5.813MHz, offset 8, 16bit), Tagged Queueing Enabled
da1: 2048MB (4194995 512 byte sectors: 255H 63S/T 261C)
da0 at ahc1 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-2 device
da0: 5.813MB/s transfers (5.813MHz, offset 15), Tagged Queueing Enabled
da0: 4357MB (8925000 512 byte sectors: 255H 63S/T 555C)

-- 
SUN Microsystems Oy PL 112, Lars Sonckin kaari 12, 02601 ESPOO, Finland
Tomi Vainio (System Support Engineer) +358 9 52556300 hotline
email: [EMAIL PROTECTED]+358 9 52556252 fax

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message