Re: Time to increase MAXPHYS?

2017-06-05 Thread Kenneth D. Merry
On Sun, Jun 04, 2017 at 09:52:36 +0200, Hans Petter Selasky wrote:
> On 06/04/17 09:39, Tomoaki AOKI wrote:
> > Hi
> > 
> > One possibility would be to make it MD build-time OTIONS,
> > defaulting 1M on regular systems and 128k on smaller systems.
> > 
> > Of course I guess making it a tunable (or sysctl) would be best,
> > though.
> > 
> 
> Hi,
> 
> A tunable sysctl would be fine, but beware that commonly used firmware 
> out there produced in the millions might hang in a non-recoverable way 
> if you exceed their "internal limits". Conditionally lowering this 
> definition is fine, but increasing it needs to be carefully verified.
> 
> For example many USB devices are only tested with OS'es like Windows and 
> MacOS and if these have any kind of limitation on the SCSI transfer 
> sizes, it is very likely many devices out there do not support any 
> larger transfer sizes either.

I agree that I'd like to see a tunable.  We've been using a MAXPHYS value
slightly larger than 1MB at Spectra for years with no problems, but then
again, we're only running on newer hardware.

If we keep DFLTPHYS the same (64K) or come up with another constant that is
defined to 64K, the way the da(4) and sa(4) handle things will keep most
older controllers working properly.  Here is what da(4) does:

if (cpi.maxio == 0)
softc->maxio = DFLTPHYS;/* traditional default */
else if (cpi.maxio > MAXPHYS)
softc->maxio = MAXPHYS; /* for safety */
else
softc->maxio = cpi.maxio;
softc->disk->d_maxsize = softc->maxio;

cpi is the XPT_PATH_INQ CCB.  The maxio field was added later, so older,
unmodified drivers that haven't set the maxio field default to a 64K I/O
size.

Drivers for some of the more common SAS and FC hardware set maxio to a
value that is correct for the hardware.  (e.g. mpt(4), mps(4), mpr(4),
and isp(4) all set it correctly.)

As Warner pointed out, the way ahci(4) works is that it sets its maximum
I/O size to MAXPHYS.  The question is, does all AHCI hardware support
arbitrary transfer sizes?  Is there a way to figure out what the hardware
supports, and if not, we should probably default it to 128K instead of
MAXPHYS.

Tape drives are another related issue.  Tape block sizes up to 1MB are
pretty common.  LTFS allows for blocksizes up to 1MB.  You can't currently
read a tape with a 1MB blocksize on FreeBSD without bumping MAXPHYS and
having a controller and tape drive that can handle the larger blocksize.

The sa(4) driver has the same logic as the da(4) driver for limiting
transfer sizes to the smaller of MAXPHYS and cpi.maxio.

The sa(4) driver gives the user some tools for figuring things out:

{sm4u-1-mgmt:/root:!:1} mt status -v
Drive: sa0:  Serial Number: 101500520A
-
Mode  Density  Blocksize  bpi  Compression
Current:  0x58:LTO-5   variable   384607   enabled (0x1)
-
Current Driver State: at rest.
-
Partition:   0  Calc File Number:   0 Calc Record Number: 0
Residual:0  Reported File Number:   0 Reported Record Number: 0
Flags: BOP
-
Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1048576 bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 8388608 bytes
  Minimum block size supported by tape drive and media (min_blk): 1 bytes
  Block granularity supported by tape drive and media (blk_gran): 0 bytes
  Maximum possible I/O size (max_effective_iosize): 1048576 bytes

On this particular FreeBSD/head machine, I have MAXPHYS set to 1MB.  The
controller (isp(4)) supports ~5MB I/O sizes and the drive (IBM LTO-5)
supports ~8MB I/O, but MAXPHYS is set to 1MB, so that is the limit.

I have considered changing the sa(4) driver to not use physio(9), and
instead use a custom allocator to allow reading and writing tapes with
blocksizes up to what the hardware (combination of tape drive and
controller) allows.  I haven't gotten around to it yet, because bumping
MAXPHYS works well enough in most cases.  It also has a nice side effect of
allowing unmapped I/O.

The pass(4) driver limits I/O sizes in the same way as the da(4) and sa(4)
drivers for CCBs sent via the blocking (CAMIOCOMMAND) ioctl, but for CCBs
sent via the asynchronous API, the only limit is the controller (cpi.maxio)
limit.  The latter is because the buffers for the asynchronous interface
are malloced.  If it were possible to send arbitrary sized, unmapped S/G
lists, then we could convert the asynchronous pass(4) interface to do
unmapped I/O.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 

Heads Up: struct disk KBI change

2016-06-21 Thread Kenneth D. Merry

This will break binary compatibility for loadable modules that depend on
struct disk.  DISK_VERSION has been bumped, and I bumped __FreeBSD_version
in a subsequent change.

So, if you have module that uses struct disk, you'll need to recompile
against the latest version of head.

Ken

- Forwarded message from "Kenneth D. Merry" <k...@freebsd.org> -

Date: Tue, 21 Jun 2016 20:18:19 +0000 (UTC)
From: "Kenneth D. Merry" <k...@freebsd.org>
To: src-committ...@freebsd.org, svn-src-...@freebsd.org,
svn-src-h...@freebsd.org
Subject: svn commit: r302069 - head/sys/geom

Author: ken
Date: Tue Jun 21 20:18:19 2016
New Revision: 302069
URL: https://svnweb.freebsd.org/changeset/base/302069

Log:
  Fix a bug that caused da(4) instances to hang around after the underlying
  device is gone.
  
  The problem was that when disk_gone() is called, if the GEOM disk
  creation process has not yet happened, the withering process
  couldn't start.
  
  We didn't record any state in the GEOM disk code, and so the d_gone()
  callback to the da(4) driver never happened.
  
  The solution is to track the state of the creation process, and
  initiate the withering process from g_disk_create() if the disk is
  being created.
  
  This change does add fields to struct disk, and so I have bumped
  DISK_VERSION.
  
  geom_disk.c:  Track where we are in the disk creation process,
and check to see whether our underlying disk has
gone away or not.
  
In disk_gone(), set a new d_goneflag variable that
g_disk_create() can check to see if it needs to
clean up the disk instance.
  
  geom_disk.h:Add a mutex to struct disk (for internal use) disk
init level, and a gone flag.
  
Bump DISK_VERSION because the size of struct disk has
changed and fields have been added at the beginning.
  
  Sponsored by: Spectra Logic
  Approved by:  re (marius)

Modified:
  head/sys/geom/geom_disk.c
  head/sys/geom/geom_disk.h

- End forwarded message -

-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recognizing SMR HDDs

2016-05-26 Thread Kenneth D. Merry
On Thu, May 26, 2016 at 16:42:10 +0200, Gary Jennejohn wrote:
> On Thu, 26 May 2016 10:10:14 -0400
> "Kenneth D. Merry" <k...@freebsd.org> wrote:
> 
> > On Thu, May 26, 2016 at 16:00:41 +0200, Gary Jennejohn wrote:
> > > protocol  ATA/ATAPI-9 SATA 3.x
> > > device model  ST8000AS0002-1NA17Z
> > > firmware revision AR13  
> > 
> > The firmware is old, the current version is AR17.  You should really ask
> > Seagate for updated firmware.
> > 
> 
> The Download Finder on the Seagate site claims that there is no newer
> firmware.
> 
> So the question is, how to get the latest AR17 version from Seagate
> as a simple consumer?

I would contact Seagate support and ask.

By the way, I've been able to download firmware for Seagate SATA drives via
camcontrol when they're attached via SATA and SAS controllers.  I've never
tried it with USB.

I think camcontrol identify it as a SCSI protocol drive and as a result
may not let you download firmware because it doesn't recognize vendor
"ST8000AS".

So, assuming you get firmware from them, I would suggest upgrading it using
whatever Windows or Linux tool they give you.  (I'll brick drives in my
lab at work, but I'd hate for you to brick your own drive.)

If you want to use camcontrol to do it, take it out of the USB enclosure
and hook it directly to a SATA or SAS controller.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recognizing SMR HDDs

2016-05-26 Thread Kenneth D. Merry
On Thu, May 26, 2016 at 15:02:24 +0100, Igor Mozolevsky wrote:
> On 26 May 2016 at 14:41, Kenneth D. Merry <k...@freebsd.org> wrote:
> 
> > On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote:
> > > On Thu, 26 May 2016 08:34:45 -0400
> > > "Kenneth D. Merry" <k...@freebsd.org> wrote:
> > >
> > > > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote:
> > > > What kind of drive is it?
> > > >
> > >
> > > ST8000AS 0002-1NA17Z 0X03
> >
> 
> [snip]
> 
> 
> > Yes.  There is something slightly odd about the Inquiry data you pasted
> > above.  Seagate didn't set the bits in the ATA identify data to mark it as
> > a Drive Managed drive, so I put in a quirk entry to mark it as Drive
> > Managed.
> >
> > Unfortunately with Drive Managed drives that is really all you know.  You
> > don't know the zone boundaries or states.  But, it is useful to know that
> > you really should write sequentially for good performance.  (True of any
> > drive, but especially true with SMR drives.)
> >
> 
> The drive is supposed to have Word 69 set to 0x0001 and support ZAC MGMT
> IN/OUT -
> http://www.seagate.com/www-content/product-content/hdd-fam/seagate-archive-hdd/en-us/docs/100795782a.pdf
> at pg. 24 and 28.

That is a different drive.  He has ST8000AS0002, which is a Drive Managed
drive.  The doc above is for ST8000AS0022, which is a Host Aware drive.

> Incidentally AR17 firmware is a new batch, perhaps Seagate did what they
> did with -DL003 drives where the early models reported 512n sector size (so
> as not to confuse computers) and the later models properly reported 4kn
> sector size?

Yes, AR17 is the latest firmware.  He really needs to upgrade, there are
bugs with older versions.

AR17 firmware reports the same thing in terms of sector size.  For
instance, from one of mine:

rotocol  ATA/ATAPI-9 SATA 3.x
device model  ST8000AS0002-1NA17Z
firmware revision AR17
serial number Z8409926
WWN   5000c50086f84017
cylinders 16383
heads 16
sectors/track 63
sector size   logical 512, physical 4096, offset 0
LBA supported 268435455 sectors
LBA48 supported   15628053168 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6 
media RPM 5980

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recognizing SMR HDDs

2016-05-26 Thread Kenneth D. Merry
On Thu, May 26, 2016 at 16:00:41 +0200, Gary Jennejohn wrote:
> On Thu, 26 May 2016 09:41:20 -0400
> "Kenneth D. Merry" <k...@freebsd.org> wrote:
> 
> > On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote:
> > > On Thu, 26 May 2016 08:34:45 -0400
> > > "Kenneth D. Merry" <k...@freebsd.org> wrote:
> > >   
> > > > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote:
> > > > What kind of drive is it?
> > > >   
> > > 
> > > ST8000AS 0002-1NA17Z 0X03  
> > 
> > Can you send the output of 'camcontrol inquiry daX -v' and
> > 'camcontrol identify daX -v'?
> > 
> > There is a quirk for that particular drive to identify it as Drive Managed.
> > When attached behind a SAS controller it looks like this:
> > 
> > # camcontrol inquiry da12 -v
> > pass12:  Fixed Direct Access SPC-4 SCSI device
> > pass12: Serial Number Z8407Y52
> > pass12: 600.000MB/s transfers, Command Queueing Enabled
> > 
> 
> Thanks for the info.
> 
> Here the requested output:
> 
> camcontrol inquiry da0 -v
> pass5:  Fixed Direct Access SPC-4 SCSI device
> pass5: Serial Number 
> pass5: 400.000MB/s transfers

Okay.  Looks like the USB to SATA chip is perhaps mangling the model
number.  I'm guessing that is the "standard" way to do it, but it is
unfortunate.

> camcontrol identify da0 -v
> camcontrol: sending ATA ATA_IDENTIFY via pass_16 with timeout of 3 msecs
> pass5: Raw identify data:
>0: 0c5a 3fff c837 0010   003f 
>8:   2020 2020 2020 2020 2020 2020
>   16: 5a38 3430 3339 4738  8000  4152
>   24: 3133 2020 2020 5354 3830 3030 4153 3030
>   32: 3032 2d31 4e41 3137 5a20 2020 2020 2020
>   40: 2020 2020 2020 2020 2020 2020 2020 8010
>   48: 4000 2f00 4000 0200 0200 0007 3fff 0010
>   56: 003f fc10 00fb 5c10  0fff  0007
>   64: 0003 0078 0078 0078 0078   
>   72:    001f 8d0e 0004 00cc 0040
>   80: 03f0 001f 346b 7d61 6163 3469 bc41 6163
>   88: 407f 81e7 81e7  fffe  fe00 
>   96:     2ab0 a381 0003 
>  104:   6003  5000 c500 7b0e 5cbe
>  112:        40dc
>  120: 409c       
>  128: 0021 2ab0 a381 2ab0 a381 2020 0002 0140
>  136: 0108 5000 3c06 3c0a  003c  0008
>  144:   bdff 0280   0008 
>  152:    8000  0184 8b00 8008
>  160:        
>  168:        
>  176:        
>  184:        
>  192:        
>  200:       30a5 
>  208:  4000      
>  216:  175c     107f 
>  224:        
>  232:        
>  240:        
>  248:        6aa5
> 
> camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 via pass_16 with timeout of 
> 1000 msecs
> pass5: Raw native max data:
>0:      
> error = 0x00, sector_count = 0x, device = 0x00, status = 0x00
> pass5:  ACS-2 ATA SATA 3.x device
> pass5: 400.000MB/s transfers
> 
> protocol  ATA/ATAPI-9 SATA 3.x
> device model  ST8000AS0002-1NA17Z
> firmware revision AR13

The firmware is old, the current version is AR17.  You should really ask
Seagate for updated firmware.

> serial number Z84039G8
> WWN   5000c5007b0e5cbe
> cylinders 16383
> heads 16
> sectors/track 63
> sector size   logical 512, physical 4096, offset 0
> LBA supported 268435455 sectors
> LBA48 supported   15628053168 sectors
> PIO supported PIO4
> DMA supported WDMA2 UDMA6
> media RPM 5980
> 
> Feature  Support  Enabled   Value   Vendor
> read ahead yes  yes
> write cacheyes  yes
> flush cacheyes  yes
> overlapno
> Tagged Command Queuing (TCQ)   no   no
> Native Command Queuing (NCQ)   yes  32 tags
> NCQ Queue Management   no
> NCQ Streaming  no
> Receive & Send FPDMA Queuedno
> SMART  yes  yes
> microcode download yes  yes
> security   yes  no
> power management   yes  yes
> advanced powe

Re: Recognizing SMR HDDs

2016-05-26 Thread Kenneth D. Merry
On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote:
> On Thu, 26 May 2016 08:34:45 -0400
> "Kenneth D. Merry" <k...@freebsd.org> wrote:
> 
> > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote:
> > What kind of drive is it?
> > 
> 
> ST8000AS 0002-1NA17Z 0X03

Can you send the output of 'camcontrol inquiry daX -v' and
'camcontrol identify daX -v'?

There is a quirk for that particular drive to identify it as Drive Managed.
When attached behind a SAS controller it looks like this:

# camcontrol inquiry da12 -v
pass12:  Fixed Direct Access SPC-4 SCSI device
pass12: Serial Number Z8407Y52
pass12: 600.000MB/s transfers, Command Queueing Enabled

> > Here are some things you can do on any disk to see what it is:
> > 
> > diskinfo -v /dev/daX
> > 
> 
> I don't have the new versions of these utilities installed, so I can't
> get any of this neat diskinfo/zonectl information.
> 
> > # sysctl kern.cam.da.19
> > kern.cam.da.19.sort_io_queue: -1
> > kern.cam.da.19.rotating: 1
> > kern.cam.da.19.unmapped_io: 1
> > kern.cam.da.19.error_inject: 0
> > [ begin SMR fields ]
> > kern.cam.da.19.max_seq_zones: 0
> > kern.cam.da.19.optimal_nonseq_zones: 0
> > kern.cam.da.19.optimal_seq_zones: 0
> > kern.cam.da.19.zone_support: None
> > kern.cam.da.19.zone_mode: Drive Managed
> > [ begin SMR fields ]
> > kern.cam.da.19.minimum_cmd_size: 6
> > kern.cam.da.19.delete_max: 262144
> > kern.cam.da.19.delete_method: NONE
> > 
> 
> My drive shows this;
> sysctl kern.cam.da.0
> kern.cam.da.0.sort_io_queue: -1
> kern.cam.da.0.rotating: 1
> kern.cam.da.0.unmapped_io: 0
> kern.cam.da.0.error_inject: 0
> kern.cam.da.0.max_seq_zones: 0
> kern.cam.da.0.optimal_nonseq_zones: 0
> kern.cam.da.0.optimal_seq_zones: 0
> kern.cam.da.0.zone_support: None
> kern.cam.da.0.zone_mode: Not Zoned <== I guess it can't be managed
> kern.cam.da.0.minimum_cmd_size: 10
> kern.cam.da.0.delete_max: 131072
> kern.cam.da.0.delete_method: NONE
> 
> In fact, the ouput for every one of the 4 drives in the enclosure is
> the same, even though the other three are non-SMR SATA drives.

Yes.  There is something slightly odd about the Inquiry data you pasted
above.  Seagate didn't set the bits in the ATA identify data to mark it as
a Drive Managed drive, so I put in a quirk entry to mark it as Drive
Managed.

Unfortunately with Drive Managed drives that is really all you know.  You
don't know the zone boundaries or states.  But, it is useful to know that
you really should write sequentially for good performance.  (True of any
drive, but especially true with SMR drives.)

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Recognizing SMR HDDs

2016-05-26 Thread Kenneth D. Merry
On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote:
> Now that ken@ has checked in the SMR code I'm wondering how I can see
> whether it's having any effect.
> 
> I have a 8TB SMR disk in a USB3 enclosure.  Does the kernel emit any
> sort of trace to indicate that it sees the drive as SMR and takes
> that into account?

There is nothing extra emitted in the dmesg to tell you it is an SMR drive,
you have to look.

> I have the probe trace enabled in my kernel config, but I don't see
> anything special pop out when I turn the drive on.

You'll see extra states in the probe compared to a standard drive if it 
is Host Aware or Host Managed.  You won't see those states if it is Drive
Managed.

> Does the fact that the drive appears as a /dev/daX play any role?

It shouldn't matter.  I put changes in both the da(4) and ada(4) drivers to
support SMR drives.  And the changes should work even when you have an ATA
protocol drive attached via a SCSI transport.  Which is likely the case
with your drive.  What kind of drive is it?

Here are some things you can do on any disk to see what it is:

diskinfo -v /dev/daX

For example:

# diskinfo -v /dev/da18
/dev/da18
512 # sectorsize
8001563222016   # mediasize in bytes (7.3T)
15628053168 # mediasize in sectors
4096# stripesize
0   # stripeoffset
972801  # Cylinders according to firmware.
255 # Heads according to firmware.
63  # Sectors according to firmware.
Z84003SK# Disk ident.
id1,enc@n5003048001f311fd/type@0/slot@13/elmdesc@Slot_19#
Physical path
Host_Aware  # Zone Mode

So this is a Host Aware drive.

zonectl -c params -d /dev/daX

# zonectl -c params -d /dev/da18
Zone Mode: Host Aware
Command support: Report Zones, Open, Close, Finish, Reset Write Pointer
Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
Optimal Number of Open Sequential Write Preferred Zones: 128
Optimal Number of Non-Sequentially Written Sequential Write Preferred
Zones: 8
Maximum Number of Open Sequential Write Required Zones: Unlimited

If I issue the same command on a drive managed SMR drive:

# zonectl -c params -d /dev/da19
Zone Mode: Drive Managed
Command support: None
Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
Optimal Number of Open Sequential Write Preferred Zones: Not Set
Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: 
Not Set
Maximum Number of Open Sequential Write Required Zones: Not Set

sysctl kern.cam.da.X

# sysctl kern.cam.da.18
kern.cam.da.18.sort_io_queue: -1
kern.cam.da.18.rotating: 1
kern.cam.da.18.unmapped_io: 1
kern.cam.da.18.error_inject: 0
[ begin SMR fields ]
kern.cam.da.18.max_seq_zones: 4294967295
kern.cam.da.18.optimal_nonseq_zones: 8
kern.cam.da.18.optimal_seq_zones: 128
kern.cam.da.18.zone_support: Report Zones, Open, Close, Finish, Reset Write
Pointer
kern.cam.da.18.zone_mode: Host Aware
[ end SMR fields ]
kern.cam.da.18.minimum_cmd_size: 6
kern.cam.da.18.delete_max: 262144
kern.cam.da.18.delete_method: NONE

# sysctl kern.cam.da.19
kern.cam.da.19.sort_io_queue: -1
kern.cam.da.19.rotating: 1
kern.cam.da.19.unmapped_io: 1
kern.cam.da.19.error_inject: 0
[ begin SMR fields ]
kern.cam.da.19.max_seq_zones: 0
kern.cam.da.19.optimal_nonseq_zones: 0
kern.cam.da.19.optimal_seq_zones: 0
kern.cam.da.19.zone_support: None
kern.cam.da.19.zone_mode: Drive Managed
[ begin SMR fields ]
kern.cam.da.19.minimum_cmd_size: 6
kern.cam.da.19.delete_max: 262144
kern.cam.da.19.delete_method: NONE

If you have a Host Aware or Host Managed drive, you can get the list of
zones and their status, reset the write pointer, etc.  

Ask the drive (via camcontrol(8)) to list all zones on a Host Aware drive 
(but truncate the output to 10 lines):

# camcontrol zone da18 -v -c rz |head -10
29809 zones, Maximum LBA 0x3a3812aaf (15628053167)
Zone lengths and types may vary
  Start LBA  Length   WP LBA  Zone Type  Condition  Sequential  
   Reset
  0, 524288, 0x8,  Conventional,   NWP, Sequential, 
 No Reset Needed
0x8, 524288,0x10,  Conventional,   NWP, Sequential, 
 No Reset Needed
   0x10, 524288,0x18,  Conventional,   NWP, Sequential, 
 No Reset Needed
   0x18, 524288,0x20,  Conventional,   NWP, Sequential, 
 No Reset Needed
   0x20, 524288,0x28,  Conventional,   NWP, Sequential, 
 No Reset Needed
   0x28, 524288,0x30,  Conventional,   NWP, Sequential, 
 No Reset Needed
   0x30, 524288,0x38,  Conventional,   NWP, Sequential, 
 No Reset Needed

Ask the drive (via zonectl(8)) to report zones that are in the Full state:

# zonectl -d /dev/da18 -c rz -o full |head -10
192 zones, Maximum LBA 0x3a3812aaf (15628053167)
Zone lengths and types may vary
  

Re: AHCI/ADA regression?

2016-05-25 Thread Kenneth D. Merry
On Wed, May 25, 2016 at 14:36:59 +0200, Gary Jennejohn wrote:
> On Wed, 25 May 2016 08:15:11 +0200
> Gary Jennejohn <gljennj...@gmail.com> wrote:
> 
> > On Tue, 24 May 2016 15:10:41 -0400
> > "Kenneth D. Merry" <k...@freebsd.org> wrote:
> > 
> > > Can you send full dmesg output from the working kernel?
> > >   
> > 
> > I'll give it a try and hope that the mail server doesn't strip it ==>
> > dmesg.boot.gz.
> > 
> > > It looks like you have some ATAPI devcies on your machine (signature 
> > > eb14).
> > > They would likely be attaching to the da(4) driver if they are disks, and
> > > that is a different code path.
> > >   
> > 
> > The one and only ATAPI device is cd0.
> > 
> 
> OK, it appears that one of the ATA fixes ken@ recently committed
> fixed my problem also.

Great!  I'm glad it's working!

> I'm now at r300677 and booting succeeds.
> 
> I guess the ATAPI DVD drive was the culprite.

It was most likely the Samsung hard drive.  This drive is the exact same
model that Alex Petrov also had problems with:

ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
ada1:  ATA8-ACS SATA 2.x device
ada1: Serial Number S0MUJ1KP317818
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 476938MB (976771055 512 byte sectors)

It claims to support Read Log, but actually doesn't.

The change I checked in in revision 300640 will only send a Read Log (and
additional SMR probe steps) to drives that claim they're SMR drives.  Any
non-SMR drives should get the same probe as before.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Wed, May 25, 2016 at 07:35:06 +0700, Alex V. Petrov wrote:
> 
> 
> 25.05.16 03:18, Kenneth D. Merry ??:
> > On Tue, May 24, 2016 at 21:59:53 +0700, Alex V. Petrov wrote:
> >> 24.05.16 20:21, Kenneth D. Merry ??:
> >>> On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> >>>> On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> >>>>> On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> >>>>>> On Monday 23 May 2016 17:30:45 you wrote:
> >>>>>>> On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> >>>>>>>> On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> >>>>>>>>> On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> >>>>>>>>>> On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> >>>>>>>>>>> On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> >>>>>>>>>>>> On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> >>>>>>>>>>>>> On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman 
> >>>> wrote:
> >>>>>>>>>>>>>>  I have faced the issue with fresh CURRENT stopped to boot
> >>>>>>>>>>>>>>  on
> >>>>>>>>>>>>>>  my
> >>>>>>>>>>>>>>  old
> >>>>>>>>>>>>>>  desktop
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> after update to r300299
> >>>>>>>>>>>>>> Verbose boot shows the endless cycle of
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ata2: SATA reset: ports status=0x05
> >>>>>>>>>>>>>> ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> >>>>>>>>>>>>>> ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> >>>>>>>>>>>>>> ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> >>>>>>>>>>>>>> ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> >>>>>>>>>>>>>> messages logged to console.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Below is the relevant portion of ATA controller/devices
> >>>>>>>>>>>>>> probed/attached
> >>>>>>>>>>>>>> during the boot:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> atapci0:  port
> >>>>>>>>>>>>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at
> >>>>>>>>>>>>>> device
> >>>>>>>>>>>>>> 31.1
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>> pci0
> >>>>>>>>>>>>>> ata0:  at channel 0 on atapci0
> >>>>>>>>>>>>>> atapci1:  port
> >>>>>>>>>>>>>> 0xd080-0xd087,
> >>>>>>>>>>>>>> 0xd000-0xd003,
> >>>>>>>>>>>>>> 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device
> >>>>>>>>>>>>>> 31.2 on
> >>>>>>>>>>>>>> pci0
> >>>>>>>>>>>>>> ata2:  at channel 0 on atapci1
> >>>>>>>>>>>>>> ata3:  at channel 1 on atapci1
> >>>>>>>>>>>>>> ada0 at ata2 bus 0 scbus1 target 0 lun 0
> >>>>>>>>>>>>>> ada0:  ATA-7 SATA 2.x device
> >>>>>>>>>>>>>> ada1 at ata2 bus 0 scbus1 target 1 lun 0
> >>>>>>>>>>>>>> ada1:  ATA8-ACS SATA 3.x device
> >>>>>>>>>>>>>> cd0 at ata0 bus 0 scbus0 target 0 lun 0
> >>>>>>>>>>>>>> cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI
> >>>>>>>>>>>>>> device
> >>>>>&

Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 23:54:09 +0300, Oleg V. Nauman wrote:
> On Tuesday 24 May 2016 16:17:33 you wrote:
> > Okay, I've got a basic idea of what may be going on.  The resets that are
> > getting sent are triggering another probe, which then triggers a reset,
> > which triggers a probe...and so on.
> > 
> > So here is another patch that should work for you:
> > 
> > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160524.2.txt
> > 
> > I have commented out the quirk for this drive, and the driver will now only
> > start the SMR probe on drives that claim to be SMR-capable.  So, for the
> > vast majority of drives out there right now, it won't even start the extra
> > probe steps.
> 
>  It fixes this issue. I was able to boot with your latest patch.

Great!  I'll check it in with that fix as well as a quirk entry.  That way,
if we have other reasons later on to issue a read log, we'll know that
it doesn't work for those drives.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 21:59:53 +0700, Alex V. Petrov wrote:
> 24.05.16 20:21, Kenneth D. Merry ??:
> > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> >> On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> >>> On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> >>>> On Monday 23 May 2016 17:30:45 you wrote:
> >>>>> On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> >>>>>> On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> >>>>>>> On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> >>>>>>>> On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> >>>>>>>>> On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> >>>>>>>>>> On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> >>>>>>>>>>> On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman 
> >> wrote:
> >>>>>>>>>>>>  I have faced the issue with fresh CURRENT stopped to boot
> >>>>>>>>>>>>  on
> >>>>>>>>>>>>  my
> >>>>>>>>>>>>  old
> >>>>>>>>>>>>  desktop
> >>>>>>>>>>>>
> >>>>>>>>>>>> after update to r300299
> >>>>>>>>>>>> Verbose boot shows the endless cycle of
> >>>>>>>>>>>>
> >>>>>>>>>>>> ata2: SATA reset: ports status=0x05
> >>>>>>>>>>>> ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> >>>>>>>>>>>> ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> >>>>>>>>>>>> ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> >>>>>>>>>>>> ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> >>>>>>>>>>>> messages logged to console.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Below is the relevant portion of ATA controller/devices
> >>>>>>>>>>>> probed/attached
> >>>>>>>>>>>> during the boot:
> >>>>>>>>>>>>
> >>>>>>>>>>>> atapci0:  port
> >>>>>>>>>>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at
> >>>>>>>>>>>> device
> >>>>>>>>>>>> 31.1
> >>>>>>>>>>>> on
> >>>>>>>>>>>> pci0
> >>>>>>>>>>>> ata0:  at channel 0 on atapci0
> >>>>>>>>>>>> atapci1:  port
> >>>>>>>>>>>> 0xd080-0xd087,
> >>>>>>>>>>>> 0xd000-0xd003,
> >>>>>>>>>>>> 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device
> >>>>>>>>>>>> 31.2 on
> >>>>>>>>>>>> pci0
> >>>>>>>>>>>> ata2:  at channel 0 on atapci1
> >>>>>>>>>>>> ata3:  at channel 1 on atapci1
> >>>>>>>>>>>> ada0 at ata2 bus 0 scbus1 target 0 lun 0
> >>>>>>>>>>>> ada0:  ATA-7 SATA 2.x device
> >>>>>>>>>>>> ada1 at ata2 bus 0 scbus1 target 1 lun 0
> >>>>>>>>>>>> ada1:  ATA8-ACS SATA 3.x device
> >>>>>>>>>>>> cd0 at ata0 bus 0 scbus0 target 0 lun 0
> >>>>>>>>>>>> cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI
> >>>>>>>>>>>> device
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not entirely sure what is causing the problem with your
> >>>>>>>>>>> system,
> >>>>>>>>>>> but
> >>>>>>>>>>> hopefully we can narrow it down a bit.
> >>>>>>>>>>>
> >>>>>>>>>>> There is a bug that came in with my SMR changes in revision
> >>>>>>>>>>> 300207
> >>>>>>>&g

Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 20:46:33 +0300, Oleg V. Nauman wrote:
> On Tuesday 24 May 2016 13:13:29 Kenneth D. Merry wrote:
> > On Tue, May 24, 2016 at 18:21:19 +0300, Oleg V. Nauman wrote:
> > > On Tuesday 24 May 2016 10:02:09 you wrote:
> > > > On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote:
> > > > > On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote:
> > > > > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> > > > > > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> > > > > > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> > > > > > > > > On Monday 23 May 2016 17:30:45 you wrote:
> > > > > > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman 
> wrote:
> > > > > > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > > > > > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman
> > > 
> > > wrote:
> > > > > > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > > > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V.
> > > > > > > > > > > > > > Nauman
> > > > > 
> > > > > wrote:
> > > > > > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry 
> wrote:
> > > > > > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V.
> > > > > > > > > > > > > > > > Nauman
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > > > > > > > > > >  I have faced the issue with fresh CURRENT
> > > > > > > > > > > > > > > > >  stopped
> > > > > > > > > > > > > > > > >  to
> > > > > > > > > > > > > > > > >  boot
> > > > > > > > > > > > > > > > >  on
> > > > > > > > > > > > > > > > >  my
> > > > > > > > > > > > > > > > >  old
> > > > > > > > > > > > > > > > >  desktop
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > after update to r300299
> > > > > > > > > > > > > > > > > Verbose boot shows the endless cycle of
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > > > > > > > > > > > messages logged to console.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Below is the relevant portion of ATA
> > > > > > > > > > > > > > > > > controller/devices
> > > > > > > > > > > > > > > > > probed/attached
> > > > > > > > > > > > > > > > > during the boot:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > atapci0:  port
> > > > > > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xf
> > > > > > > > > > > > > > > > > faf
> > > > > > > > > > > > > > > > > at
> > > > > > >

Re: AHCI/ADA regression?

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 20:00:51 +0200, Gary Jennejohn wrote:
> On Tue, 24 May 2016 10:41:25 -0400
> "Kenneth D. Merry" <k...@freebsd.org> wrote:
> 
> > > The question in my mind is - why are "empty" multiplier ports being
> > > probed with the new code but not with the old code?  
> > 
> > If the HBA says that it supports port multipliers, the kernel should always
> > look for them.  It probes the port multiplier first, before moving on to
> > look for regular targets.
> > 
> > So, from that standpoint, it should not be any different.  It sounds like
> > we're either getting further in the port multiplier probe process, or there
> > is something different about the way things are behaving.
> > 
> > If you can determine which commands are timing out, that may give us an
> > idea about where it is in the probe process.
> > 
> > Here is one way we may be able to track things down...  Build a kernel with
> > these options:
> > 
> > options CAMDEBUG
> > options CAM_DEBUG_FLAGS=CAM_DEBUG_PROBE
> > 
> > If you build a kernel before and after the change with those options, it
> > will hopefully allow us to compare the probe sequence and get a clue about
> > where to look for the problem.
> > 
> 
> OK, both the old and new kernel versions do an extremely fast intial
> probe with these results (note: obtained with grep over dmesg.boot):
> 
> (aprobe0:ahcich0:0:15:0): Probe started
> (aprobe0:ahcich0:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe1:ahcich1:0:15:0): Probe started
> (aprobe1:ahcich1:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe2:ahcich2:0:15:0): Probe started
> (aprobe2:ahcich2:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe3:ahcich3:0:15:0): Probe started
> (aprobe3:ahcich3:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe4:ahcich4:0:15:0): Probe started
> (aprobe4:ahcich4:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe5:ahcich5:0:15:0): Probe started
> (aprobe5:ahcich5:0:15:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe4:ahcich4:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe4:ahcich4:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe4:ahcich4:0:15:0): Probe completed
> (aprobe4:ahcich4:0:0:0): Probe started
> (aprobe4:ahcich4:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe4:ahcich4:0:0:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe4:ahcich4:0:0:0): Probe completed
> (aprobe0:ahcich0:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe2:ahcich2:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe0:ahcich0:0:15:0): SIGNATURE: 
> (aprobe0:ahcich0:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe0:ahcich0:0:15:0): Probe completed
> (aprobe2:ahcich2:0:15:0): SIGNATURE: 
> (aprobe2:ahcich2:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe2:ahcich2:0:15:0): Probe completed
> (aprobe0:ahcich0:0:0:0): Probe started
> (aprobe0:ahcich0:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe2:ahcich2:0:0:0): Probe started
> (aprobe2:ahcich2:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe3:ahcich3:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe5:ahcich5:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe0:ahcich0:0:0:0): SIGNATURE: 
> (aprobe0:ahcich0:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY
> (aprobe2:ahcich2:0:0:0): SIGNATURE: 
> (aprobe2:ahcich2:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY
> (aprobe1:ahcich1:0:15:0): Probe PROBE_RESET to PROBE_RESET
> (aprobe3:ahcich3:0:15:0): SIGNATURE: 
> (aprobe3:ahcich3:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe3:ahcich3:0:15:0): Probe completed
> (aprobe5:ahcich5:0:15:0): SIGNATURE: 
> (aprobe5:ahcich5:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe5:ahcich5:0:15:0): Probe completed
> (aprobe1:ahcich1:0:15:0): SIGNATURE: eb14
> (aprobe1:ahcich1:0:15:0): Probe PROBE_RESET to PROBE_INVALID
> (aprobe1:ahcich1:0:15:0): Probe completed
> (aprobe1:ahcich3:0:0:0): Probe started
> (aprobe1:ahcich3:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe3:ahcich5:0:0:0): Probe started
> (aprobe3:ahcich5:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe4:ahcich1:0:0:0): Probe started
> (aprobe4:ahcich1:0:0:0): Probe PROBE_INVALID to PROBE_RESET
> (aprobe1:ahcich3:0:0:0): SIGNATURE: 
> (aprobe1:ahcich3:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY
> (aprobe3:ahcich5:0:0:0): SIGNATURE: 
> (aprobe3:ahcich5:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY
> (aprobe4:ahcich1:0:0:0): SIGNATURE: eb14
> (aprobe4:ahcich1:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY
> (aprobe0:ahcich0:0:0:0): Probe PROBE_IDENTIFY to PROBE_SETMODE
> (aprobe1:ahcich3:0:0:0): Probe PROBE_IDENTIFY to PROBE_SETMODE
> (aprobe3:ahcic

Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 18:21:19 +0300, Oleg V. Nauman wrote:
> On Tuesday 24 May 2016 10:02:09 you wrote:
> > On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote:
> > > On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote:
> > > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> > > > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> > > > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> > > > > > > On Monday 23 May 2016 17:30:45 you wrote:
> > > > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> > > > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > > > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman 
> wrote:
> > > > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman
> > > 
> > > wrote:
> > > > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V.
> > > > > > > > > > > > > > Nauman
> > > > > 
> > > > > wrote:
> > > > > > > > > > > > > > >  I have faced the issue with fresh CURRENT stopped
> > > > > > > > > > > > > > >  to
> > > > > > > > > > > > > > >  boot
> > > > > > > > > > > > > > >  on
> > > > > > > > > > > > > > >  my
> > > > > > > > > > > > > > >  old
> > > > > > > > > > > > > > >  desktop
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > after update to r300299
> > > > > > > > > > > > > > > Verbose boot shows the endless cycle of
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > > > > > > > > > messages logged to console.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Below is the relevant portion of ATA
> > > > > > > > > > > > > > > controller/devices
> > > > > > > > > > > > > > > probed/attached
> > > > > > > > > > > > > > > during the boot:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > atapci0:  port
> > > > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > device
> > > > > > > > > > > > > > > 31.1
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > pci0
> > > > > > > > > > > > > > > ata0:  at channel 0 on atapci0
> > > > > > > > > > > > > > > atapci1:  port
> > > > > > > > > > > > > > > 0xd080-0xd087,
> > > > > > > > > > > > > > > 0xd000-0xd003,
> > > > > > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19
> > > > > > > > > > > > > > > at
> > > > > > > > > > &g

Re: AHCI/ADA regression?

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 15:58:28 +0200, Gary Jennejohn wrote:
> On Mon, 23 May 2016 13:51:05 -0400
> "Kenneth D. Merry" <k...@freebsd.org> wrote:
> 
> > On Sat, May 21, 2016 at 10:09:49 +0200, Gary Jennejohn wrote:
> > > There appears to be a regression in AHCI/ADA behavior since r300207.
> > > 
> > > Starting a test kernel at r300293 results in extremely long timeouts
> > > probing ahcich2 for non-existent multiplier ports.
> > > 
> > > Here some kernel output:  
> > 
> > Is this dmesg output with or without the problem?
> > 
> 
> Actually it's the same with and without the problem.  The only real
> difference is the timeouts.

Ahh.

> > > ahci0: 
> > > port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,
> > > 0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 22 at device 17.0 on pci0
> > > 
> > > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported  
> > 
> > Has the controller always claimed support for Port Multipliers?
> > 
> 
> Yes, this from today's dmesg.boot:
> ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported
> 
> > > ahcich2:  at channel 2 on ahci0
> > > 
> > > ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
> > > 
> > > /dev/ada1p1 on /home (ufs, local, journaled soft-updates)
> > > 
> > > An older kernel at r299170 does not exhibit this peculiar behavior and
> > > mounts /home with no delays.  
> > 
> > Are you able to send dmesg output before and after?
> > 
> 
> Before is easy, since I have a dmesg.boot.  After - I'll have to
> copy it from the screen since I don't have the patience to wait
> for booting to complete.  The above is pretty much after, but I
> can copy down the timeout messages which arise when the
> mulitplier ports are probed (no disk is present on any of them,
> so it takes forever).
> 
> The question in my mind is - why are "empty" multiplier ports being
> probed with the new code but not with the old code?

If the HBA says that it supports port multipliers, the kernel should always
look for them.  It probes the port multiplier first, before moving on to
look for regular targets.

So, from that standpoint, it should not be any different.  It sounds like
we're either getting further in the port multiplier probe process, or there
is something different about the way things are behaving.

If you can determine which commands are timing out, that may give us an
idea about where it is in the probe process.

Here is one way we may be able to track things down...  Build a kernel with
these options:

options CAMDEBUG
options CAM_DEBUG_FLAGS=CAM_DEBUG_PROBE

If you build a kernel before and after the change with those options, it
will hopefully allow us to compare the probe sequence and get a clue about
where to look for the problem.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote:
> On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote:
> > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> > > > > On Monday 23 May 2016 17:30:45 you wrote:
> > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman 
> wrote:
> > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman
> > > 
> > > wrote:
> > > > > > > > > > > > >  I have faced the issue with fresh CURRENT stopped to
> > > > > > > > > > > > >  boot
> > > > > > > > > > > > >  on
> > > > > > > > > > > > >  my
> > > > > > > > > > > > >  old
> > > > > > > > > > > > >  desktop
> > > > > > > > > > > > > 
> > > > > > > > > > > > > after update to r300299
> > > > > > > > > > > > > Verbose boot shows the endless cycle of
> > > > > > > > > > > > > 
> > > > > > > > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > > > > > > > messages logged to console.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Below is the relevant portion of ATA
> > > > > > > > > > > > > controller/devices
> > > > > > > > > > > > > probed/attached
> > > > > > > > > > > > > during the boot:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > atapci0:  port
> > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at
> > > > > > > > > > > > > device
> > > > > > > > > > > > > 31.1
> > > > > > > > > > > > > on
> > > > > > > > > > > > > pci0
> > > > > > > > > > > > > ata0:  at channel 0 on atapci0
> > > > > > > > > > > > > atapci1:  port
> > > > > > > > > > > > > 0xd080-0xd087,
> > > > > > > > > > > > > 0xd000-0xd003,
> > > > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at
> > > > > > > > > > > > > device
> > > > > > > > > > > > > 31.2 on
> > > > > > > > > > > > > pci0
> > > > > > > > > > > > > ata2:  at channel 0 on atapci1
> > > > > > > > > > > > > ata3:  at channel 1 on atapci1
> > > > > > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > > > > > > > > > > > ada0:  ATA-7 SATA 2.x device
> > > > > > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > > > > > > > > > > > ada1:  ATA8-ACS SATA 3.x
> > > > > > > > > > > > > device
> > > > > > > > > > > > > cd0 at ata0 bus 0 scbus0 t

Re: ATA? related trouble with r300299

2016-05-24 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote:
> On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote:
> > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> > > On Monday 23 May 2016 17:30:45 you wrote:
> > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman 
> wrote:
> > > > > > > > > > >  I have faced the issue with fresh CURRENT stopped to boot
> > > > > > > > > > >  on
> > > > > > > > > > >  my
> > > > > > > > > > >  old
> > > > > > > > > > >  desktop
> > > > > > > > > > > 
> > > > > > > > > > > after update to r300299
> > > > > > > > > > > Verbose boot shows the endless cycle of
> > > > > > > > > > > 
> > > > > > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > > > > > messages logged to console.
> > > > > > > > > > > 
> > > > > > > > > > > Below is the relevant portion of ATA controller/devices
> > > > > > > > > > > probed/attached
> > > > > > > > > > > during the boot:
> > > > > > > > > > > 
> > > > > > > > > > > atapci0:  port
> > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at
> > > > > > > > > > > device
> > > > > > > > > > > 31.1
> > > > > > > > > > > on
> > > > > > > > > > > pci0
> > > > > > > > > > > ata0:  at channel 0 on atapci0
> > > > > > > > > > > atapci1:  port
> > > > > > > > > > > 0xd080-0xd087,
> > > > > > > > > > > 0xd000-0xd003,
> > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device
> > > > > > > > > > > 31.2 on
> > > > > > > > > > > pci0
> > > > > > > > > > > ata2:  at channel 0 on atapci1
> > > > > > > > > > > ata3:  at channel 1 on atapci1
> > > > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > > > > > > > > > ada0:  ATA-7 SATA 2.x device
> > > > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > > > > > > > > > ada1:  ATA8-ACS SATA 3.x device
> > > > > > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > > > > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI
> > > > > > > > > > > device
> > > > > > > > > > 
> > > > > > > > > > I'm not entirely sure what is causing the problem with your
> > > > > > > > > > system,
> > > > > > > > > > but
> > > > > > > > > > hopefully we can narrow it down a bit.
> > > > > > > > > > 
> > > > > > > > > > There is a bug that came in with my SMR changes in revision
> > > > > > > > > > 300207
> > > > > > > > > > that
> > > > > > > > > > broke the quirk functionality in the ada(4) driver.  I don'

Re: ATA? related trouble with r300299

2016-05-23 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote:
> On Monday 23 May 2016 17:30:45 you wrote:
> > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote:
> > > > > > > > >  I have faced the issue with fresh CURRENT stopped to boot on
> > > > > > > > >  my
> > > > > > > > >  old
> > > > > > > > >  desktop
> > > > > > > > > 
> > > > > > > > > after update to r300299
> > > > > > > > > Verbose boot shows the endless cycle of
> > > > > > > > > 
> > > > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > > > messages logged to console.
> > > > > > > > > 
> > > > > > > > > Below is the relevant portion of ATA controller/devices
> > > > > > > > > probed/attached
> > > > > > > > > during the boot:
> > > > > > > > > 
> > > > > > > > > atapci0:  port
> > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device
> > > > > > > > > 31.1
> > > > > > > > > on
> > > > > > > > > pci0
> > > > > > > > > ata0:  at channel 0 on atapci0
> > > > > > > > > atapci1:  port 0xd080-0xd087,
> > > > > > > > > 0xd000-0xd003,
> > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device
> > > > > > > > > 31.2 on
> > > > > > > > > pci0
> > > > > > > > > ata2:  at channel 0 on atapci1
> > > > > > > > > ata3:  at channel 1 on atapci1
> > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > > > > > > > ada0:  ATA-7 SATA 2.x device
> > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > > > > > > > ada1:  ATA8-ACS SATA 3.x device
> > > > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device
> > > > > > > > 
> > > > > > > > I'm not entirely sure what is causing the problem with your
> > > > > > > > system,
> > > > > > > > but
> > > > > > > > hopefully we can narrow it down a bit.
> > > > > > > > 
> > > > > > > > There is a bug that came in with my SMR changes in revision
> > > > > > > > 300207
> > > > > > > > that
> > > > > > > > broke the quirk functionality in the ada(4) driver.  I don't
> > > > > > > > think
> > > > > > > > that
> > > > > > > > is
> > > > > > > > the problem you're seeing, though.
> > > > > > > > 
> > > > > > > > Can you try out this patch:
> > > > > > > > 
> > > > > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt
> > > > > > > > 
> > > > > > > > In /boot/loader.conf, put the following:
> > > > > > > > 
> > > > > > > > kern.cam.ada.0.quirks="0x04"
> > > > > > > > kern.cam.ada.1.quirks="0x04"
> > > > > > > > 
> > > > > > > > If you're

Re: ATA? related trouble with r300299

2016-05-23 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote:
> On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote:
> > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote:
> > > > > > >  I have faced the issue with fresh CURRENT stopped to boot on my
> > > > > > >  old
> > > > > > >  desktop
> > > > > > > 
> > > > > > > after update to r300299
> > > > > > > Verbose boot shows the endless cycle of
> > > > > > > 
> > > > > > > ata2: SATA reset: ports status=0x05
> > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > > > messages logged to console.
> > > > > > > 
> > > > > > > Below is the relevant portion of ATA controller/devices
> > > > > > > probed/attached
> > > > > > > during the boot:
> > > > > > > 
> > > > > > > atapci0:  port
> > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1
> > > > > > > on
> > > > > > > pci0
> > > > > > > ata0:  at channel 0 on atapci0
> > > > > > > atapci1:  port 0xd080-0xd087,
> > > > > > > 0xd000-0xd003,
> > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on
> > > > > > > pci0
> > > > > > > ata2:  at channel 0 on atapci1
> > > > > > > ata3:  at channel 1 on atapci1
> > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > > > > > ada0:  ATA-7 SATA 2.x device
> > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > > > > > ada1:  ATA8-ACS SATA 3.x device
> > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device
> > > > > > 
> > > > > > I'm not entirely sure what is causing the problem with your system,
> > > > > > but
> > > > > > hopefully we can narrow it down a bit.
> > > > > > 
> > > > > > There is a bug that came in with my SMR changes in revision 300207
> > > > > > that
> > > > > > broke the quirk functionality in the ada(4) driver.  I don't think
> > > > > > that
> > > > > > is
> > > > > > the problem you're seeing, though.
> > > > > > 
> > > > > > Can you try out this patch:
> > > > > > 
> > > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt
> > > > > > 
> > > > > > In /boot/loader.conf, put the following:
> > > > > > 
> > > > > > kern.cam.ada.0.quirks="0x04"
> > > > > > kern.cam.ada.1.quirks="0x04"
> > > > > > 
> > > > > > If you're able to boot with those quirk entries in the loader.conf,
> > > > > > try
> > > > > > taking one of them out, and reboot.  If that works, try taking the
> > > > > > other
> > > > > > one out and reboot.
> > > > > > 
> > > > > > What I'm trying to figure out here is where the problem lies:
> > > > > > 
> > > > > > 1. The bug with the ada(4) driver (in where it loaded the quirks).
> > > > > > 2. The extra probe steps in the ada(4) driver might be causing a
> > > > > > problem
> > > > > > 
> > > > > >with ada0 (Samsung drive).
> > > > > > 
> > > > > > 3. The extra probe steps in the ada(4) driver might be causing a
> > > > > > problem
> > > > > > 
> > > > > >with ada1 (Seagate drive).
> > > > > > 
> >

Re: ATA? related trouble with r300299

2016-05-23 Thread Kenneth D. Merry
On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote:
> On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote:
> > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote:
> > > > >  I have faced the issue with fresh CURRENT stopped to boot on my old
> > > > >  desktop
> > > > > 
> > > > > after update to r300299
> > > > > Verbose boot shows the endless cycle of
> > > > > 
> > > > > ata2: SATA reset: ports status=0x05
> > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > > > messages logged to console.
> > > > > 
> > > > > Below is the relevant portion of ATA controller/devices
> > > > > probed/attached
> > > > > during the boot:
> > > > > 
> > > > > atapci0:  port
> > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on
> > > > > pci0
> > > > > ata0:  at channel 0 on atapci0
> > > > > atapci1:  port 0xd080-0xd087,
> > > > > 0xd000-0xd003,
> > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on
> > > > > pci0
> > > > > ata2:  at channel 0 on atapci1
> > > > > ata3:  at channel 1 on atapci1
> > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > > > ada0:  ATA-7 SATA 2.x device
> > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > > > ada1:  ATA8-ACS SATA 3.x device
> > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device
> > > > 
> > > > I'm not entirely sure what is causing the problem with your system, but
> > > > hopefully we can narrow it down a bit.
> > > > 
> > > > There is a bug that came in with my SMR changes in revision 300207 that
> > > > broke the quirk functionality in the ada(4) driver.  I don't think that
> > > > is
> > > > the problem you're seeing, though.
> > > > 
> > > > Can you try out this patch:
> > > > 
> > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt
> > > > 
> > > > In /boot/loader.conf, put the following:
> > > > 
> > > > kern.cam.ada.0.quirks="0x04"
> > > > kern.cam.ada.1.quirks="0x04"
> > > > 
> > > > If you're able to boot with those quirk entries in the loader.conf, try
> > > > taking one of them out, and reboot.  If that works, try taking the other
> > > > one out and reboot.
> > > > 
> > > > What I'm trying to figure out here is where the problem lies:
> > > > 
> > > > 1. The bug with the ada(4) driver (in where it loaded the quirks).
> > > > 2. The extra probe steps in the ada(4) driver might be causing a problem
> > > > 
> > > >with ada0 (Samsung drive).
> > > > 
> > > > 3. The extra probe steps in the ada(4) driver might be causing a problem
> > > > 
> > > >with ada1 (Seagate drive).
> > > > 
> > > > 4. Something else.
> > > > 
> > > > So, if you can try the patch and try to eliminate a few possibilities,
> > > > we
> > > > may be able to narrow it down.
> > >  
> > >  I was able to boot after applying the patch ;
> > > 
> > > kern.cam.ada.0.quirks="0x04"
> > > was the quirk in effect. It is quirk for my Samsung HD200HJ KF100-06 hard
> > > drive.
> > 
> > Okay.  Just so we can narrow it down a little more, can you try this:
> > 
> > First, let's try getting an ATA Log directory using the PIO version of the
> > command:
> > 
> > camcontrol cmd ada0 -v -a "2f 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd
> > 
> > If that works (you should get hexdump output), try the DMA version of the
> > command:
> > 
> > camcontrol cmd ada0 -v -d -a "47 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd
> 
> "Expecting a character pointer argument." error for both commands.

Did the double quotes make it onto the command line?  Both of those work
for me...

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ATA? related trouble with r300299

2016-05-23 Thread Kenneth D. Merry
On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote:
> On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote:
> > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote:
> > >  I have faced the issue with fresh CURRENT stopped to boot on my old
> > >  desktop
> > > 
> > > after update to r300299
> > > Verbose boot shows the endless cycle of
> > > 
> > > ata2: SATA reset: ports status=0x05
> > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> > > messages logged to console.
> > > 
> > > Below is the relevant portion of ATA controller/devices probed/attached
> > > during the boot:
> > > 
> > > atapci0:  port
> > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0
> > > ata0:  at channel 0 on atapci0
> > > atapci1:  port 0xd080-0xd087,
> > > 0xd000-0xd003,
> > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on pci0
> > > ata2:  at channel 0 on atapci1
> > > ata3:  at channel 1 on atapci1
> > > ada0 at ata2 bus 0 scbus1 target 0 lun 0
> > > ada0:  ATA-7 SATA 2.x device
> > > ada1 at ata2 bus 0 scbus1 target 1 lun 0
> > > ada1:  ATA8-ACS SATA 3.x device
> > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device
> > 
> > I'm not entirely sure what is causing the problem with your system, but
> > hopefully we can narrow it down a bit.
> > 
> > There is a bug that came in with my SMR changes in revision 300207 that
> > broke the quirk functionality in the ada(4) driver.  I don't think that is
> > the problem you're seeing, though.
> > 
> > Can you try out this patch:
> > 
> > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt
> > 
> > In /boot/loader.conf, put the following:
> > 
> > kern.cam.ada.0.quirks="0x04"
> > kern.cam.ada.1.quirks="0x04"
> > 
> > If you're able to boot with those quirk entries in the loader.conf, try
> > taking one of them out, and reboot.  If that works, try taking the other
> > one out and reboot.
> > 
> > What I'm trying to figure out here is where the problem lies:
> > 
> > 1. The bug with the ada(4) driver (in where it loaded the quirks).
> > 2. The extra probe steps in the ada(4) driver might be causing a problem
> >with ada0 (Samsung drive).
> > 3. The extra probe steps in the ada(4) driver might be causing a problem
> >with ada1 (Seagate drive).
> > 4. Something else.
> > 
> > So, if you can try the patch and try to eliminate a few possibilities, we
> > may be able to narrow it down.
> 
>  I was able to boot after applying the patch ;
> kern.cam.ada.0.quirks="0x04"
> was the quirk in effect. It is quirk for my Samsung HD200HJ KF100-06 hard 
> drive.

Okay.  Just so we can narrow it down a little more, can you try this:

First, let's try getting an ATA Log directory using the PIO version of the
command:

camcontrol cmd ada0 -v -a "2f 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd

If that works (you should get hexdump output), try the DMA version of the
command:

camcontrol cmd ada0 -v -d -a "47 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd

My hope is that we can confirm whether or not this is what is causing the
Samsung drive to have issues.  It is certainly possible to put in a quirk,
but I'd rather not make it unnecessarily broad.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ATA? related trouble with r300299

2016-05-23 Thread Kenneth D. Merry
On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote:
> 
>  I have faced the issue with fresh CURRENT stopped to boot on my old desktop 
> after update to r300299
> Verbose boot shows the endless cycle of 
> 
> ata2: SATA reset: ports status=0x05
> ata2: reset tp1 mask=03 ostat0=50 ostat1=50
> ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
> ata2: reset tp2 stat0=50 stat1=50 devices=0x3
> messages logged to console.
> 
> Below is the relevant portion of ATA controller/devices probed/attached 
> during 
> the boot:
> 
> atapci0:  port 
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0
> ata0:  at channel 0 on atapci0
> atapci1:  port 0xd080-0xd087, 0xd000-0xd003, 
> 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on pci0
> ata2:  at channel 0 on atapci1
> ata3:  at channel 1 on atapci1
> ada0 at ata2 bus 0 scbus1 target 0 lun 0
> ada0:  ATA-7 SATA 2.x device
> ada1 at ata2 bus 0 scbus1 target 1 lun 0
> ada1:  ATA8-ACS SATA 3.x device
> cd0 at ata0 bus 0 scbus0 target 0 lun 0
> cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device

I'm not entirely sure what is causing the problem with your system, but
hopefully we can narrow it down a bit.

There is a bug that came in with my SMR changes in revision 300207 that
broke the quirk functionality in the ada(4) driver.  I don't think that is
the problem you're seeing, though.

Can you try out this patch:

https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt

In /boot/loader.conf, put the following:

kern.cam.ada.0.quirks="0x04"
kern.cam.ada.1.quirks="0x04"

If you're able to boot with those quirk entries in the loader.conf, try
taking one of them out, and reboot.  If that works, try taking the other
one out and reboot.

What I'm trying to figure out here is where the problem lies:

1. The bug with the ada(4) driver (in where it loaded the quirks).
2. The extra probe steps in the ada(4) driver might be causing a problem
   with ada0 (Samsung drive).
3. The extra probe steps in the ada(4) driver might be causing a problem
   with ada1 (Seagate drive).
4. Something else.

So, if you can try the patch and try to eliminate a few possibilities, we
may be able to narrow it down.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: AHCI/ADA regression?

2016-05-23 Thread Kenneth D. Merry
On Sat, May 21, 2016 at 10:09:49 +0200, Gary Jennejohn wrote:
> There appears to be a regression in AHCI/ADA behavior since r300207.
> 
> Starting a test kernel at r300293 results in extremely long timeouts
> probing ahcich2 for non-existent multiplier ports.
> 
> Here some kernel output:

Is this dmesg output with or without the problem?

> ahci0: 
> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,
> 0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 22 at device 17.0 on pci0
> 
> ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported

Has the controller always claimed support for Port Multipliers?

> ahcich2:  at channel 2 on ahci0
> 
> ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
> 
> /dev/ada1p1 on /home (ufs, local, journaled soft-updates)
> 
> An older kernel at r299170 does not exhibit this peculiar behavior and
> mounts /home with no delays.

Are you able to send dmesg output before and after?

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CAM Shingled Disk support patches available

2016-03-02 Thread Kenneth D. Merry
On Tue, Mar 01, 2016 at 20:07:19 -0700, Scott Long wrote:
> Hi Ken,
> 
> I???m against changing the function signature of scsi_ata_pass_16().  Even
> if you manage to get things right with symbol versioning, it still leads to
> problems of code compatibility.  Maybe pre-existing binaries will work, but
> source code will forever have to include an #if __FreeBSD_version <
> xx bit of nonsense.

Good point, that would be annoying.

> I agree that it was incorrect for dxferlen to be declared as a uint16_t.
> However, the function already contains a sector count argument pair.  In
> theory the sector count multiplied by the sector length, both of which the
> application should know in order to arrive at a sensible dxferlen, can
> substitute for the dxferlen argument.  If so, then we can just ignore that
> argument and declare that sector_count has logical priority.

Okay.  That will probably work for the most part.

> Really though, I think that scsi_ata_pass_16() is a crummy function.  If its
> purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at it:
> 
> - By my count, it only covers 12 of the available 13 registers.
> 
> - It has no 12 byte, opcode 0xa1 variant.
> 
> - It doesn???t make any allowance for providing the response registers to the
> caller on completion.  Well, maybe it kinda does through a sense descriptor,
> but???. it???s kinda open to vague interpretation.
> 
> - Its use of the registers is clunky, assuming for example that you???ll only 
> want
> to fill the six LBA registers with a host-ordered 64-bit number.  There are
> plenty of commands that re-use sub-parts of the LBA, features, and/or sector
> count registers for different things.  
> 
> I know you stated that you didn???t want to do this, but I think it???s 
> better to start
> over with a better function that has a better signature and a new name.  In 
> fact,
> I think it???s better to use the existing ata_cmd and ata_res structures from 
> sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if 
> needed,
> provide a 12-byte compatibility, and simply the signature.  Something like 
> this:
> 
> void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries,
>   void (*cbfcnp)(struct cam_periph *, union ccb *),
>   u_int32_t flags, u_int8_t tag_action,
>   struct ata_cmd *cmd, struct ata_res *res,
>   u_int8_t *data_ptr, u_int32_t dxfer_len,
>   u_int8_t *data_ptr, u_int16_t dxfer_len,

I assume you only intended one line there, not two. :)

>   u_int8_t sense_len, u_int32_t timeout);
> 
> To differentiate between the 12 and 16 byte variants, you???d look at the
> AP_EXTEND flag in the protocol field.  Btw, the handling of that flag is
> inconsistent in the implementation of the existing scsi_ata_pass_16().  If
> the caller providse an ata_res pointer then it gets filled on completion,
> otherwise the caller does its best to look at 12.2.2.6 and extract what it
> can from the sense descriptor.
> 
> So my proposal is to create a new scsi_ata_pass and deprecate but not remove
> scsi_ata_pass_16.  Tell people that if they need to use it, dxfer_len is 
> going to
> have lower priority than sector_count/sector_count_exp if the latter multiply 
> to
> more than 65535.

In general I think that's a reasonable idea, but we should probably go
further.

While we're at it, we should figure out what we need to do to add the
Auxiliary register to struct ata_cmd.  We'll need that to do the NCQ
versions of the various SMR commands, as well as TRIM.

The obvious challenge is that probably means changing the existing struct
ccb_ataio CCB and bumping the CAM version.  At least that will be source
compatible, but will require ifdefs if people want to compile on older
versions of FreeBSD.  But in that case, they'll also be faced with no
support for sending the NCQ versions of the commands, anyway.  No way
around that, though, since we have to follow the changing specs.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CAM Shingled Disk support patches available

2016-03-01 Thread Kenneth D. Merry
I have a new set of SMR patches available.  (See the original message below
for a more detailed description of what these patches do.)

The primary change is to add library versioning to libcam so that we can
change the function prototype of scsi_ata_pass_16() in a way that won't
break existing binaries.

If someone more familiar with library versioning wants to review this, I'd
appreciate it.

The patches are here:

FreeBSD/head, as of SVN revision 296278

https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt

stable/10, as of SVN revision 296248

https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt

(Note that although there is a stable/10 version of the patches, I'm not
planning to merge them to stable/10 because of the change to struct bio.  I
can't really figure out a good way to make that backward compatible.  If
there is consensus that breaking it is fine because it isn't a user API,
then that may be another story.)

The problem is that the existing, in-tree version of scsi_ata_pass_16() has
a dxfer_len argument that is a uint16_t.  That restricts transfer sizes to
64KB.  So, we need to update it to allow larger than 64K transfers.  I
could just create a new function, but I'd rather just retire the broken
version.

The intent here is that:

1. Binaries built against the old version of libcam, before versioning was
turned on, will get the old version of the scsi_ata_pass_16() function with
a uint16_t dxfer_len.

2. Binaries built against the new version of libcam will get the new
version of the scsi_ata_pass_16() function with a uint32_t dxfer_len.

I've tested this, and it appears to work, but I'm not 100% certain this is
all correct.  I looked at Dan Eischen's description of symbol versioning
here:

https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt

And it looks like the actual implementation is a little different than what
is described there.  I looked around the tree, and didn't see anything that
is obviously exactly like what I'm trying to do here.

So, what I did is as follows:

1. For the kernel, the only change is to switch the dxfer_len argument from
a uint16_t to a uint32_t.

2. For userland, in scsi_all.c, there are now two versions of
scsi_ata_pass_16 -- _ver1 and _ver2.  _ver1 is aliased to
scsi_ata_pass_16() for FBSD_1.3 using __sym_compat().  _ver2 is aliased to
scsi_ata_pass_16() for FBSD_1.4 using __sym_default().

3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which
depends on FBSD_1.3.

4. In lib/libcam/Symbol.map, I pulled out all of the functions defined in
libcam, sorted them, and defined them in FBSD_1.3.  I moved
scsi_ata_pass_16() to FBSD_1.4.  (According to the freebsd_versioning.txt
paper linked above, I should have been able to have scsi_ata_pass_16() in
both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.)

In testing an old binary (linked against libcam without symbol versioning)
against a new libcam (with symbol versioning), the old version of the
function appears to be used.  With a new binary, the new version of the
function appears to be used.

So it looks like things work as intended, but I don't fully trust my
understanding here.  So, if someone could take a look at the changes, I'd
appreciate it.

In particular, I have a few questions:

1. If this change to scsi_ata_pass_16() gets merged to stable/10 (apart
from the larger SMR changes), what should be done with the libcam library
version?

2. Are 1.3 and 1.4 the proper versions to use?

3. If we make additional CAM helper function library changes, when do the
versions get bumped?  i.e., is this an opportunity to look for other
library functions with issues and make changes if possible?

4. When you're going from an unversioned library to a versioned library,
which version of a function gets linked in to a binary linked to the
unversioned library when you run it against a versioned library?  In other
words, what is supposed to happen in the test scenario I tried above, and
am I really seeing what is supposed to happen?

Thanks,

Ken

On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote:
> I have a new set of SMR patches available.  See below for the full
> explanation.
> 
> The primary change here is that I have added SMR support to the ada(4)
> driver.  I spent some time considering whether to try to make the da(4) and
> ada(4) probe infrastructure somewhat common, but in the end concluded it
> would be too involved with not enough code reduction (if any) in the end.
> 
> So, although the ideas are similar, the probe logic is separate.
> 
> Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer,
> etc.) for SATA protocol shingled drives isn't active.  For both the da(4)
> and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
> register down to the drive.
> 
> In the ada(4) case, we need to add the register to struct ccb_ataio and
> add support in one 

Re: CAM Shingled Disk support patches available

2016-01-19 Thread Kenneth D. Merry
On Mon, Jan 18, 2016 at 16:50:34 -0800, Warner Losh wrote:
> 
> > On Jan 18, 2016, at 2:37 PM, Kenneth D. Merry <k...@freebsd.org> wrote:
> > 
> > I have a new set of SMR patches available.  See below for the full
> > explanation.
> > 
> > The primary change here is that I have added SMR support to the ada(4)
> > driver.  I spent some time considering whether to try to make the da(4) and
> > ada(4) probe infrastructure somewhat common, but in the end concluded it
> > would be too involved with not enough code reduction (if any) in the end.
> > 
> > So, although the ideas are similar, the probe logic is separate.
> > 
> > Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer,
> > etc.) for SATA protocol shingled drives isn't active.  For both the da(4)
> > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
> > register down to the drive.
> 
> I???ve plumbed it down, but in a gross, kludgy way to make NCQ Trim work
> where the only value in the Auxiliary register needs to be 1. It only takes
> up one bit, but it doesn???t change the size of the CCB. If the NCQ Trim
> work wasn???t based on the I/O scheduler, I???d have pushed it into head
> and would be happy to share code.

Yeah, for SMR, we'll need to pass the full register down.  That is how you
specify the service action (open, close, finish, reset write pointer,
report zones).

> AHCI can send it, but it turns out that LSI???s drivers (mpt, mps, etc)
> can???t do it due to firmware inadequacies. The ability to send a FIS
> in these firmwares looked promising, but it requires a full draining of
> other requests, which kind of defeats the purpose of NCQ.

Yeah, that would kinda defeat the purpose.  I'm sending a SCSI command
(ATA PASS-THROUGH) to get the SATA zone commands down there.  Those are
treated like an ordered tag by the LSI firmware as well.  Which is just as
well, since there is no way to specify the Auxiliary register via that SCSI
command, and so we can't do NCQ anyway.

LSI/Avago said they're planning to support the zone commands in the SAT
layer in the HBAs in the 12Gb boards.  Phase 10 doesn't have it from what I
understand, but hopefully that'll show up soon.  The translation is in the
latest SAT draft, and it is very straightforward to map from one to the
other, because the SCSI and ATA commands and semantics are pretty much
identical.

> > In the ada(4) case, we need to add the register to struct ccb_ataio and
> > add support in one or more of the underlying SATA drivers, e.g. ahci(4).
> 
> I believe that changes the size of the CCB, so I tried to avoid
> that since I didn???t want to force a recompile of camcontrol(8).
> Adding it to the atacmd structure wasn???t so bad, and the CCB size
> didn???t completely change. The problem was that the atacmd changed
> size and pushed all the other fields.

Yes.  In order to do it, we'll need to add it to struct atacmd, and add
compatibility shims.  I don't see another way to do it unfortunately.

> > In the da(4) case, it will require an update of the T-10 SAT spec to
> > provide a way to pass the Auxiliary register down via the SCSI ATA
> > PASS-THROUGH command, and then a subsquent update of the SAT layer in
> > various vendors' SAS controller firmware.  At that point, there may be
> > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and
> > we may be able to just issue the SCSI version of the commands instead of
> > composing ATA commands in the da(4) driver.  (We'll still need to keep the
> > ATA passthrough version for a while at least to support controllers that
> > don't have the updated translation code.)
> 
> I looked to implement things here, but didn???t want to invent something that
> the T-10 would later reinvent.

Yeah.  Is NCQ trim a new thing?  Is that why you were looking at sending it
down via a FIS?

If so, it is likely that LSI will add it to the SCSI Unmap translation in
the firmware.  Of course if it isn't already in there, they're only going
to put it in their 12Gb controllers and not in the 6Gb controllers at this
point.

Since the SAT spec has the mapping for the SCSI ZBC -> ZAC commands, it sounds
like that'll make it into the LSI 12Gb firmware at some point.

> > FreeBSD/head as of SVN revision 294105:
> > 
> > https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt
> > 
> > FreeBSD stable/10 as of SVN revision 294100:
> > 
> > https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt
> > 
> > Testing and comments are welcome.
> 
> So having said all that, I???m totally open to something better.

I think that for the ATA side, we'll just have to add the register to the
CCB, bump the version and add com

Re: CAM Shingled Disk support patches available

2016-01-19 Thread Kenneth D. Merry
On Tue, Jan 19, 2016 at 14:45:23 +0300, Slawa Olhovchenkov wrote:
> On Mon, Jan 18, 2016 at 05:37:04PM -0500, Kenneth D. Merry wrote:
> 
> > I have a new set of SMR patches available.  See below for the full
> > explanation.
> > 
> > The primary change here is that I have added SMR support to the ada(4)
> > driver.  I spent some time considering whether to try to make the da(4) and
> > ada(4) probe infrastructure somewhat common, but in the end concluded it
> > would be too involved with not enough code reduction (if any) in the end.
> > 
> > So, although the ideas are similar, the probe logic is separate.
> > 
> > Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer,
> > etc.) for SATA protocol shingled drives isn't active.  For both the da(4)
> > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
> > register down to the drive.
> > 
> > In the ada(4) case, we need to add the register to struct ccb_ataio and
> > add support in one or more of the underlying SATA drivers, e.g. ahci(4).
> > 
> > In the da(4) case, it will require an update of the T-10 SAT spec to
> > provide a way to pass the Auxiliary register down via the SCSI ATA
> > PASS-THROUGH command, and then a subsquent update of the SAT layer in
> > various vendors' SAS controller firmware.  At that point, there may be 
> > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and
> > we may be able to just issue the SCSI version of the commands instead of
> > composing ATA commands in the da(4) driver.  (We'll still need to keep the
> > ATA passthrough version for a while at least to support controllers that
> > don't have the updated translation code.)
> 
> Please, check me: currenly SMR lack of support in SCSI devices? On
> [hardvare] vendor level? Currenly only SATA controllers compatible
> with SMR (on command level)? (I am don't talk about FreeBSD support,
> question about common state).

No, there are SAS/SCSI SMR drives in development, and there is the SCSI ZBC
spec that defines the command set.  I don't know whether any vendors are
shipping SAS/SCSI SMR drives yet.

You can use SATA drives (SMR or not) with either a SATA controller or a SAS
controller.  But the way you talk to a SATA drive through a SAS controller
is with SCSI commands.  There is a SCSI spec (SAT) that defines the mapping
of SCSI commands to ATA commands.  It has recently been updated to support
mapping SMR commands from SCSI to ATA, but most (all?) SAS controllers
have not caught up with the spec.

So to use a SATA SMR drive with a SAS controller that doesn't know how to
map SMR commands from SCSI to ATA, you have to send the ATA SMR commands
through the SCSI ATA PASS-THROUGH command.  That just bypasses the usual
translations, and allows sending ATA commands in something like their
native form.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CAM Shingled Disk support patches available

2016-01-19 Thread Kenneth D. Merry
On Tue, Jan 19, 2016 at 20:02:52 +0300, Slawa Olhovchenkov wrote:
> On Tue, Jan 19, 2016 at 11:38:31AM -0500, Kenneth D. Merry wrote:
> 
> > On Tue, Jan 19, 2016 at 14:45:23 +0300, Slawa Olhovchenkov wrote:
> > > On Mon, Jan 18, 2016 at 05:37:04PM -0500, Kenneth D. Merry wrote:
> > > 
> > > > I have a new set of SMR patches available.  See below for the full
> > > > explanation.
> > > > 
> > > > The primary change here is that I have added SMR support to the ada(4)
> > > > driver.  I spent some time considering whether to try to make the da(4) 
> > > > and
> > > > ada(4) probe infrastructure somewhat common, but in the end concluded it
> > > > would be too involved with not enough code reduction (if any) in the 
> > > > end.
> > > > 
> > > > So, although the ideas are similar, the probe logic is separate.
> > > > 
> > > > Note that NCQ support for SMR commands (Report Zones, Reset Write 
> > > > Pointer,
> > > > etc.) for SATA protocol shingled drives isn't active.  For both the 
> > > > da(4)
> > > > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
> > > > register down to the drive.
> > > > 
> > > > In the ada(4) case, we need to add the register to struct ccb_ataio and
> > > > add support in one or more of the underlying SATA drivers, e.g. ahci(4).
> > > > 
> > > > In the da(4) case, it will require an update of the T-10 SAT spec to
> > > > provide a way to pass the Auxiliary register down via the SCSI ATA
> > > > PASS-THROUGH command, and then a subsquent update of the SAT layer in
> > > > various vendors' SAS controller firmware.  At that point, there may be 
> > > > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, 
> > > > and
> > > > we may be able to just issue the SCSI version of the commands instead of
> > > > composing ATA commands in the da(4) driver.  (We'll still need to keep 
> > > > the
> > > > ATA passthrough version for a while at least to support controllers that
> > > > don't have the updated translation code.)
> > > 
> > > Please, check me: currenly SMR lack of support in SCSI devices? On
> > > [hardvare] vendor level? Currenly only SATA controllers compatible
> > > with SMR (on command level)? (I am don't talk about FreeBSD support,
> > > question about common state).
> > 
> > No, there are SAS/SCSI SMR drives in development, and there is the SCSI ZBC
> > spec that defines the command set.  I don't know whether any vendors are
> > shipping SAS/SCSI SMR drives yet.
> > 
> > You can use SATA drives (SMR or not) with either a SATA controller or a SAS
> > controller.  But the way you talk to a SATA drive through a SAS controller
> > is with SCSI commands.  There is a SCSI spec (SAT) that defines the mapping
> > of SCSI commands to ATA commands.  It has recently been updated to support
> > mapping SMR commands from SCSI to ATA, but most (all?) SAS controllers
> > have not caught up with the spec.
> > 
> > So to use a SATA SMR drive with a SAS controller that doesn't know how to
> > map SMR commands from SCSI to ATA, you have to send the ATA SMR commands
> > through the SCSI ATA PASS-THROUGH command.  That just bypasses the usual
> > translations, and allows sending ATA commands in something like their
> > native form.
> 
> What in case of expanders an port replicatiors (SATA drives and HBA
> SAS controllers, of course)? Need expander be compatible with SMR? Or
> any expander with SATA support automaticly compatible?

Expanders and port replicators shouldn't matter.  The place where you need
to know about SMR is the place where the native ATA or SCSI drive commands
are generated.  Expanders and port replicators typically just pass commands
through.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CAM Shingled Disk support patches available

2016-01-18 Thread Kenneth D. Merry
I have a new set of SMR patches available.  See below for the full
explanation.

The primary change here is that I have added SMR support to the ada(4)
driver.  I spent some time considering whether to try to make the da(4) and
ada(4) probe infrastructure somewhat common, but in the end concluded it
would be too involved with not enough code reduction (if any) in the end.

So, although the ideas are similar, the probe logic is separate.

Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer,
etc.) for SATA protocol shingled drives isn't active.  For both the da(4)
and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
register down to the drive.

In the ada(4) case, we need to add the register to struct ccb_ataio and
add support in one or more of the underlying SATA drivers, e.g. ahci(4).

In the da(4) case, it will require an update of the T-10 SAT spec to
provide a way to pass the Auxiliary register down via the SCSI ATA
PASS-THROUGH command, and then a subsquent update of the SAT layer in
various vendors' SAS controller firmware.  At that point, there may be 
an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and
we may be able to just issue the SCSI version of the commands instead of
composing ATA commands in the da(4) driver.  (We'll still need to keep the
ATA passthrough version for a while at least to support controllers that
don't have the updated translation code.)

FreeBSD/head as of SVN revision 294105:

https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt

FreeBSD stable/10 as of SVN revision 294100:

https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt

Testing and comments are welcome.

Ken

On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote:
> 
> I have work in progress patches to add SMR (Shingled Magnetic Recording)
> support to CAM and GEOM here:
> 
> FreeBSD/head as of SVN revision 290997:
> 
> https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt
> 
> FreeBSD stable/10 as of SVN revision 290995:
> 
> https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt
> 
> This includes support for Host Managed, Host Aware and Drive Managed SMR
> drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS
> controller.  This does not include support for SMR ATA drives attched via
> an ATA controller.  Also, I have not yet figured out how to properly detect
> a Host Managed ATA drive, so this code won't do that.
> 
> The big drive vendors are moving to SMR for at least some of their drives.
> The primary challenge with SMR is that it requires writing a relatively
> large zone sequentially starting at the beginning of the zone.  The usual
> zone size is 256MB.  It is conceptually almost like having a 256MB sector
> size.
> 
> We (Spectra Logic) are working on ZFS changes that will use this CAM and
> GEOM infrastructure to make ZFS play well with SMR drives.  Those changes
> aren't yet done.
> 
> The patches linked above include:
>  o A new 'camcontrol zone' command that allows displaying and managing
>drive zones via SCSI/ATA passthrough.
>  o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to display
>and manage zones via the da(4) (and later ada(4)) driver.
>  o Changes to diskinfo -v to display the zone mode of a drive.
>  o A new disk zone API, sys/sys/disk_zone.h.
>  o A new bio type, BIO_ZONE, and modifications to GEOM to support it.  This
>new bio will allow filesystems to query zone support in a drive and
>manage zoned drives.
>  o Extensive modifications to the da(4) driver to handle probing SCSI and
>SATA behind SAS SMR drives.
>  o Additional CAM CDB building functions for zone commands.
> 
> The current issues that need to be addressed are:
>  o The da(4) driver now has 6 additional probe states, 5 of which are
>needed for probing ATA drives behind SAS controllers.  I have not yet
>added support for BIO_ZONE bios to ada(4), but it will be very similar
>to the da(4) driver version.  The ATA probe code needs to be pulled
>out of the da(4) driver and changed into a form that will allow it to
>work for either the ada(4) or da(4) driver.  Otherwise we'll have a fair
>amount of code duplication between the two drivers.
> 
>  o There is a reasonable amount of code duplication between 'camcontrol zone'
>and zonectl(8).  This was done for speed / expediency's sake, but it may
>be possible to make more of the code common there.
> 
>  o In order to add the new BIO_ZONE bio command, I had to change the bio_cmd
>field in struct bio from a uint8_t to a uint16_t.  This will cause
>binary compatibility problems with any 3rd party loadable modules.
>Advice on how to handle this would be welcome.
> 
>  o In the process of developing these changes, 

Re: CAM Shingled Disk support patches available

2015-11-19 Thread Kenneth D. Merry
On Thu, Nov 19, 2015 at 12:48:41 -0600, Matthew D. Fuller wrote:
> On Wed, Nov 18, 2015 at 12:13:09PM -0500 I heard the voice of
> Kenneth D. Merry, and lo! it spake thus:
> > 
> > Testing and comments are welcome.
> 
> GELI does explicit handling of each BIO type, so will need to be
> updated to pass it through (possibly in the form of inverting the
> default handling?) or it'll just EOPNOTSUPP it, whether the underlying
> layer does or not.  I wouldn't be surprised if there were other geom
> layers that did similar things.
> 
> Not meant to be read as some kind of "you need to"; just a comment on
> a possible [lack of] impact.

You're correct.  For GEOM classes like GELI that don't change the layout on
disk, passing the BIO_ZONE bio through would be the right thing to do.

For those that change the layout (i.e. the lba you write on the virtual
disk doesn't match what goes down to the physical disk), like graid or
gstripe, I think all we really need to do is just make sure they return
EOPNOTSUPP.  If someone wants to modify that code to handle shingled disks,
they can certainly do that.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


CAM Shingled Disk support patches available

2015-11-18 Thread Kenneth D. Merry

I have work in progress patches to add SMR (Shingled Magnetic Recording)
support to CAM and GEOM here:

FreeBSD/head as of SVN revision 290997:

https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt

FreeBSD stable/10 as of SVN revision 290995:

https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt

This includes support for Host Managed, Host Aware and Drive Managed SMR
drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS
controller.  This does not include support for SMR ATA drives attched via
an ATA controller.  Also, I have not yet figured out how to properly detect
a Host Managed ATA drive, so this code won't do that.

The big drive vendors are moving to SMR for at least some of their drives.
The primary challenge with SMR is that it requires writing a relatively
large zone sequentially starting at the beginning of the zone.  The usual
zone size is 256MB.  It is conceptually almost like having a 256MB sector
size.

We (Spectra Logic) are working on ZFS changes that will use this CAM and
GEOM infrastructure to make ZFS play well with SMR drives.  Those changes
aren't yet done.

The patches linked above include:
 o A new 'camcontrol zone' command that allows displaying and managing
   drive zones via SCSI/ATA passthrough.
 o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to display
   and manage zones via the da(4) (and later ada(4)) driver.
 o Changes to diskinfo -v to display the zone mode of a drive.
 o A new disk zone API, sys/sys/disk_zone.h.
 o A new bio type, BIO_ZONE, and modifications to GEOM to support it.  This
   new bio will allow filesystems to query zone support in a drive and
   manage zoned drives.
 o Extensive modifications to the da(4) driver to handle probing SCSI and
   SATA behind SAS SMR drives.
 o Additional CAM CDB building functions for zone commands.

The current issues that need to be addressed are:
 o The da(4) driver now has 6 additional probe states, 5 of which are
   needed for probing ATA drives behind SAS controllers.  I have not yet
   added support for BIO_ZONE bios to ada(4), but it will be very similar
   to the da(4) driver version.  The ATA probe code needs to be pulled
   out of the da(4) driver and changed into a form that will allow it to
   work for either the ada(4) or da(4) driver.  Otherwise we'll have a fair
   amount of code duplication between the two drivers.

 o There is a reasonable amount of code duplication between 'camcontrol zone'
   and zonectl(8).  This was done for speed / expediency's sake, but it may
   be possible to make more of the code common there.

 o In order to add the new BIO_ZONE bio command, I had to change the bio_cmd
   field in struct bio from a uint8_t to a uint16_t.  This will cause
   binary compatibility problems with any 3rd party loadable modules.
   Advice on how to handle this would be welcome.

 o In the process of developing these changes, I discovered that the
   dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, and
   it needed to be uint32_t).  I increased it, but that will potentially
   cause a binary incompatibility problem with any existing applications
   that use the current API via libcam.  Advice on how to handle that
   would be welcome.

If you look through the code, you'll notice that the disk_zone.h API is
separate from the SCSI and ATA APIs.  The intent is to allow filesystems
and other consumers of the API to just talk to the disk zone API without
dealing with the SCSI and ATA specifics.  Another reason behind all of this
is that even though the SCSI ZBC and ATA ZAC specs were developed in
concert, and are intended to be functionally identical, they are still SCSI
and ATA.  As usual, SCSI is big endian and ATA is little endian.  So to
present a common API to the filesystem, we give all of the zone data back
in native byte order, regardless of the underlying device protocol.

Another thing to note is the extensive use of ATA passthrough in the da(4)
driver.  This is necessary because the SCSI SAT (SCSI to ATA Translation)
specification has not yet caught up with translating SCSI zone commands
(ZBC) to ATA zone commands (ZAC).  So, until the spec is updated and LSI
and other vendors update their SCSI to ATA translation layers, we'll have
to use the ATA version of the commands when talking to ATA drives via SAS
controllers.

I have only tested the code so far with Seagate SATA Drive Managed and Host
Aware drives.  I would appreciate testing with any drives.  (And testing to
make sure that the patches don't cause problems with existing hardware.)
Right now, all you can really do is manage the zones manually using
camcontrol(8) or zonectl(8).  Automatic management will come with the ZFS
changes.  (Or changes to other filesysems if people want to do it.)

If you have a SATA Host Aware drive, in theory camcontrol(8) should allow
you to manage the drive if you have it attached to a SATA controller.

Here is an example of some of the commands.

Get 

Re: async pass(4) patches available

2015-11-18 Thread Kenneth D. Merry

I have updated the asynchronous pass(4) changes, and fixed a number of bugs
in camdd(8).

The new patches are here:

FreeBSD/head as of SVN revision 290970:

http://people.freebsd.org/~ken/async_pass.head.20151117.1.txt

FreeBSD stable/10 as of SVN revision 290899:

http://people.freebsd.org/~ken/async_pass.stable10.20151117.1.txt

And a description / draft commit message, this time updated to include all
the files that have changed:

http://people.freebsd.org/~ken/async_pass_commitmsg.20151118.txt

I have also attached the description to this email.

At this point I think I've fixed enough bugs and it is stable enough to go
into the tree.  That will allow others to more easily use the code and add
enhancements.

Ken

On Mon, Mar 30, 2015 at 16:23:58 -0600, Kenneth D. Merry wrote:
> 
> I have put patches to add an asynchronous interface to the pass(4) driver
> and add a new camdd(8) utility here:
> 
> FreeBSD/head as of SVN revision 280857:
> 
> http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt
> 
> FreeBSD stable/10 as of SVN revision 280856:
> 
> http://people.freebsd.org/~ken/async_pass.stable_10.20150330.1.txt
> 
> And the description / draft commit message:
> 
> http://people.freebsd.org/~ken/async_pass_commitmsg.20150330.txt
> 
> I have also attached the description and draft commit message to this
> email.
> 
> The asynchronous changes to the pass(4) driver allow queueing and fetching
> CAM CCBs via two new ioctls.  Notification of completed I/O can come via
> kqueue(2), poll(2), select(2), etc.
> 
> The camdd(8) utility is intended as a simple data transfer utility,
> benchmark, and an in-tree example of how to use the asynchronous pass(4)
> interface.
> 
> camdd(8) is still a work in progress.  It needs to be cleaned up a bit and
> streamlined.
> 
> There is one known arrival and departure bug with the pass(4) driver
> changes.  We've reproduced it with our tests at Spectra, but I haven't yet
> tracked it down.
> 
> There are many more arrival and departure bugs in FreeBSD/head, however.
> We have fixed quite a few in our local tree, but the test (called devad2)
> that triggers all of the problems uses the asynchronous pass(4) interface.
> So this is a prerequisite for fixing/verifying those bugs.
> 
> Comments and testing are welcome!  As I said, camdd(8) in particular is a
> work in progress.  It could use some cleanup and there are some more useful
> features that could be added there.
> 
> Part of the reason for camdd(8) was as a test facility for the new
> interface.  But, it also serves as a useful demonstration of the
> asynchronous pass(4) functionality, given that the original application
> that used the API doesn't make sense to go into FreeBSD.  (It is
> Spectra-specific, and not generally useful.)
> 
> Ken
> -- 
> Kenneth Merry
> k...@freebsd.org

> Add asynchronous command support to the pass(4) driver, and the new
> camdd(8) utility.
> 
> CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
> completed CCBs may be retrieved via the CAMIOGET ioctl.  User
> processes can use poll(2) or kevent(2) to get notification when
> I/O has completed.
> 
> While the existing CAMIOCOMMAND blocking ioctl interface only
> supports user virtual data pointers in a CCB (generally only
> one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
> physical address pointers, as well as user virtual and physical
> scatter/gather lists.  This allows user applications to have more
> flexibility in their data handling operations.
> 
> Kernel memory for data transferred via the queued interface is 
> allocated from the zone allocator in MAXPHYS sized chunks, and user
> data is copied in and out.  This is likely faster than the
> vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
> configurations with many processors (there are more TLB shootdowns
> caused by the mapping/unmapping operation) but may not be as fast
> as running with unmapped I/O.
> 
> The new memory handling model for user requests also allows
> applications to send CCBs with request sizes that are larger than
> MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
> size listed by the SIM driver in the maxio field in the Path
> Inquiry (XPT_PATH_INQ) CCB.
> 
> There are some things things would be good to add:
> 
> 1. Come up with a way to do unmapped I/O on multiple buffers.
>Currently the unmapped I/O interface operates on a struct bio,
>which includes only one address and length.  It would be nice
>to be able to send an unmapped scatter/gather list down to
>busdma.  This would allow eliminating the copy we currently do
>for data.
> 
> 2. Add

Re: sa(4) driver changes available for test

2015-08-24 Thread Kenneth D. Merry
On Mon, Aug 24, 2015 at 17:24:22 -0400, Dan Langille wrote:
 
  On Mar 2, 2015, at 12:26 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Mon, Mar 02, 2015 at 11:43:15 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 9:06 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate 
  testing and
  feedback.
  
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
  
  The patches against FreeBSD/head as of SVN revision 278706:
  
  http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
  
  And (untested) patches against FreeBSD stable/10 as of SVN revision 
  278721.
  
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
  
  
  The intent is to get the tape infrastructure more up to date, so we 
  can
  support LTFS and more modern tape drives:
  
  http://www.ibm.com/systems/storage/tape/ltfs/
  
  I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port 
  depends
  on the patches linked above.  It isn't fully cleaned up and ready 
  for
  redistribution.  If you're interested, though, let me know and I'll 
  tell
  you when it is ready to go out.  You need an IBM LTO-5, LTO-6, 
  TS1140 or
  TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and 
  older
  drives don't have the necessary features to support LTFS.
  
  The commit message below outlines most of the changes.
  
  A few comments:
  
  1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
  
  2. The XML output is similar to what GEOM and CTL do.  It would be 
  nice to
  figure out how to put a standard schema on it so that standard tools
  could read it.  I don't know how feasible that is, since I haven't
  time to dig into it.  If anyone has suggestions on whether that is
  feasible or advisable, I'd appreciate feedback.
  
  3. I have tested with a reasonable amount of tape hardware (see 
  below for a
  list), but more testing and feedback would be good.
  
  4. Standard 'mt status' output looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled 
  (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  
  5. 'mt status -v' looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled 
  (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  -
  Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1081344 
  bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 
  8388608 bytes
  Minimum block size supported by tape drive and media (min_blk): 1 
  bytes
  Block granularity supported by tape drive and media (blk_gran): 0 
  bytes
  Maximum possible I/O size (max_effective_iosize): 1081344 bytes
  
  
  # mtx -f /dev/pass0 status
  Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export )
  Data Transfer Element 0:Empty
  Data Transfer Element 1:Empty
Storage Element 1:Empty
Storage Element 2:Empty
Storage Element 3:Empty
Storage Element 4:Full :VolumeTag=FAI260  
Storage Element 5:Full :VolumeTag=FAI261  
Storage Element 6:Full :VolumeTag=FAI262  
Storage Element 7:Full :VolumeTag=FAI263  
Storage Element 8:Empty
Storage Element 9:Empty
Storage Element 10:Empty
  
  
  It was at this point I spent

Re: Why shoud we cause panic in scsi_da.c?

2015-07-14 Thread Kenneth D. Merry
On Mon, Jul 13, 2015 at 18:29:36 +0300, Alexander Motin wrote:
 Hi.
 
 On 13.07.2015 11:51, Kohji Okuno wrote:
  On 07/13/15 10:11, Kohji Okuno wrote:
  Could you comment on my quesion?
 
  I found panic() in scsi_da.c. Please find the following.
  I think we should return with error without panic().
  What do you think about this?
 
  scsi_da.c:
  3018 } else if (bp != NULL) {
  3019 if ((done_ccb-ccb_h.status  CAM_DEV_QFRZN) != 0)
  3020 panic(REQ_CMP with QFRZN);
 
 
  It looks to me more like an KASSERT() is appropriate here.
 
 As I can see, this panic() call was added by ken@ about 15 years ago.
 I've added him to CC in case he has some idea why it was done. From my
 personal opinion I don't see much reasons to allow CAM_DEV_QFRZN to be
 returned only together with error. While is may have little sense in
 case of successful command completion, I don't think it should be
 treated as error. Simply removing this panic is probably a bad idea,
 since if it happens device will just remain frozen forever, that will be
 will be difficult to diagnose, but I would better just dropped device
 freeze in that case same as in case of completion with error.

I put it there because it indicates a software error.  The queue shouldn't
be frozen if the command is successful.  The reason for freezing the queue
is to allow error recovery to happen.  The queue will get unfrozen after
error recovery completes.

We could alternately just print a diagnostic message, unfreeze the queue
and move on, but the idea is to allow the driver writer to detect and
correct his error immediately.

As for the original poster's problem, he has uncovered a bug that needs to
be fixed.  (And I don't mean in the da(4) driver.  The bug is in the
component that left the queue frozen.  Most likely in the USB driver, but
it will take a little more investigation.)  The panic worked as intended. :)

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


camcontrol(8) attrib patches available

2015-05-20 Thread Kenneth D. Merry

I have put patches to camcontrol(8) to implement the attrib subcommand
here:

FreeBSD/head as of SVN revision 283160:

http://people.freebsd.org/~ken/camcontrol_attrib.20150520.1.txt

FreeBSD stable/10 as of SVN revision 283161:

http://people.freebsd.org/~ken/camcontrol_attrib.stable10.20150520.1.txt

The patches also add libcam support for handling SCSI READ ATTRIBUTE data,
and adds a new sbuf_hexdump(3)/(9) routine.

The SCSI READ ATTRIBUTE command is used to read Medium Auxiliary Memory on
SCSI devices.  This is usually found in the small (4KB-16KB) flash chips on
LTO and other similar tapes.

I have not yet implemented attribute writing support.

Here is an abbreviated example of the output:

==
[root@black-pearl ~]# camcontrol attrib sa0 -r attr_val
Remaining Capacity in Partition (0x)[8](RO): 35048 MB
Maximum Capacity in Partition (0x0001)[8](RO): 35060 MB
TapeAlert Flags (0x0002)[8](RO): 0x0
Load Count (0x0003)[8](RO): 29
MAM Space Remaining (0x0004)[8](RO): 2321 bytes
Assigning Organization (0x0005)[8](RO): LTO-CVE
Format Density Code (0x0006)[1](RO): 0x5a
Initialization Count (0x0007)[2](RO): 20
Volume Change Reference (0x0009)[4](RO): 0x47
Device Vendor/Serial at Last Load (0x020a)[40](RO): IBM 1068022701
Device Vendor/Serial at Last Load - 1 (0x020b)[40](RO): IBM 1068022701
Device Vendor/Serial at Last Load - 2 (0x020c)[40](RO): IBM 1068022701
Device Vendor/Serial at Last Load - 3 (0x020d)[40](RO): IBM 1068022701
Total MB Written in Medium Life (0x0220)[8](RO): 40009 MB
Total MB Read in Medium Life (0x0221)[8](RO): 3149 MB
Total MB Written in Current/Last Load (0x0222)[8](RO): 0 MB
Total MB Read in Current/Last Load (0x0223)[8](RO): 12 MB
Logical Position of First Encrypted Block (0x0224)[8](RO): 18446744073709551615
Logical Position of First Unencrypted Block after First Encrypted Block 
(0x0225)[8](RO): 18446744073709551615
Medium Manufacturer (0x0400)[8](RO): HP
Medium Serial Number (0x0401)[32](RO): AE46TCFD0U
Medium Length (0x0402)[4](RO): 846 m
Medium Width (0x0403)[4](RO): 12.7 mm
Assigning Organization (0x0404)[8](RO): LTO-CVE
Medium Density Code (0x0405)[1](RO): 0x5a
Medium Manufacture Date (0x0406)[8](RO): 20130506
MAM Capacity (0x0407)[8](RO): 16384 bytes
Medium Type (0x0408)[1](RO): 0x0
Medium Type Information (0x0409)[2](RO): 0x0
Application Vendor (0x0800)[8](RW): IBM
Application Name (0x0801)[32](RW): LTFS
Application Version (0x0802)[8](RW): 1.3.0.2
User Medium Text Label (0x0803)[160](RW):
Text Localization Identifier (0x0805)[1](RW): 0x81
Barcode (0x0806)[32](RW):
Application Format Version (0x080b)[16](RW): 2.2.0
Volume Coherency Information (0x080c)[70](RW):
Volume Change Reference Value: 0x45
Volume Coherency Count: 1
Volume Coherency Set Identifier: 0x5
Application Client Specific Information: LTFS
LTFS UUID: 28076791-d64e-4cd7-bc43-fa51ec097d83
LTFS Version: 1
==

Testing and comments (on the camcontrol changes or the library changes) are
welcome.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: async pass(4) patches available

2015-04-07 Thread Kenneth D. Merry
On Tue, Apr 07, 2015 at 13:16:04 +0200, Fabian Keil wrote:
 Kenneth D. Merry k...@freebsd.org wrote:
 
  On Mon, Apr 06, 2015 at 15:39:56 +0200, Fabian Keil wrote:
   Kenneth D. Merry k...@freebsd.org wrote:
   
I have put patches to add an asynchronous interface to the pass(4)
driver and add a new camdd(8) utility here:

FreeBSD/head as of SVN revision 280857:

http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt
   [...]
Comments and testing are welcome!  As I said, camdd(8) in particular
is a work in progress.  It could use some cleanup and there are some
more useful features that could be added there.
   
   I've been using the patch for a couple of days on an amd64 system
   based on 11.0-CURRENT r280952 and didn't notice any obvious
   regressions using the system as usual.
 [...] 
   I also tried to test camdd, but didn't get it to work.
   Some failed attempts:
   
   [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536 -o file=blafsel.img
   (pass2:umass-sim0:0:0:0): READ(6). CDB: 08 00 00 00 80 00 
   (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an
   error 13 bytes read from pass2
   13 bytes written to blafsel.img
   20.3203 seconds elapsed
   0.00 MB/sec
   [fk@kendra ~]$ sudo hd blafsel.img 
     55 53 42 53 d9 02 00 00  00 00 01 00 01
   |USBS.| 000d
   [fk@kendra ~]$ sudo dd if=/dev/da0 bs=1k count=1  | hd | head -n 1
   1+0 records in
   1+0 records out
   1024 bytes transferred in 0.000603 secs (1697756 bytes/sec)
     fc 31 c0 8e c0 8e d8 8e  d0 bc 00 0e be 1a 7c bf
   |.1|.|
  
  One possibility is that the device doesn't support 6-byte read/write
  requests.  The da(4) driver has quirk entries and code to figure that out
  and default to 10-byte read/write requests, but camdd(8) doesn't have
  anything like that yet.
  
  I've attached patches to camdd that allow you to specify a minimum
  command size.  So, apply the patches, rebuild camdd, and try this:
  
  # sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img
  
  We'll see if that helps.  I'm not sure why you were even able to get 13
  bytes back.  That is very strange.
 
 With the patch, reading from da0 seems to work until the end,
 but again only 13 bytes are written out when writing to a file:
 
 [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img
 (pass2:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 78 a8 00 00 00 00 00
 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error
 4048551936 bytes read from pass2
 13 bytes written to blafsel.img
 127.6488 seconds elapsed
 0.00 MB/sec

Did the file exist before running that command?  If so, camdd will look at
the file size and not write any more than the current file size.  If the
file doesn't exist, it'll stop writing when it reaches the end of the
input, or it gets a write error, or it reaches the specified I/O limit (-m
argument).

It also looks like there is a bug; the command above is attempting to read
0 bytes starting from one sector beyond the last logical block.


 [fk@kendra ~]$ diskinfo -v /dev/da0
 /dev/da0
 512 # sectorsize
 4048551936  # mediasize in bytes (3.8G)
 7907328 # mediasize in sectors
 0   # stripesize
 0   # stripeoffset
 492 # Cylinders according to firmware.
 255 # Heads according to firmware.
 63  # Sectors according to firmware.
 AA000958# Disk ident.
 
 It works as expected when writing to stdout, though, so this is
 probably just a camdd-internal issue:
 
 [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=-  
 /dpool/scratch/blafasel.img
 (pass2:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 78 a8 00 00 00 00 00
 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error
 4048551936 bytes read from pass2
 4048551936 bytes written to -
 128.7222 seconds elapsed
 29.99 MB/sec

Ahh, yes, that is what I would expect.

 [fk@kendra ~]$ sudo dd if=/dev/da0 bs=65536 of=/dpool/scratch/blafasel-dd.img
 61776+0 records in
 61776+0 records out
 4048551936 bytes transferred in 134.993030 secs (29990822 bytes/sec)
 
 [fk@kendra ~]$ sha1 /dpool/scratch/blafasel*.img
 SHA1 (/dpool/scratch/blafasel-dd.img) = 
 12d1d9e82f840a6c6485ffcdb1fbf780266ed266
 SHA1 (/dpool/scratch/blafasel.img) = 12d1d9e82f840a6c6485ffcdb1fbf780266ed266
 
 Looks good to me.

Great!  I'll see if I can fix the bug that is causing the zero length read
at the end.

Thank you for testing it!

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: async pass(4) patches available

2015-04-06 Thread Kenneth D. Merry
On Mon, Apr 06, 2015 at 15:39:56 +0200, Fabian Keil wrote:
 Kenneth D. Merry k...@freebsd.org wrote:
 
  I have put patches to add an asynchronous interface to the pass(4) driver
  and add a new camdd(8) utility here:
  
  FreeBSD/head as of SVN revision 280857:
  
  http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt
 [...]
  Comments and testing are welcome!  As I said, camdd(8) in particular is a
  work in progress.  It could use some cleanup and there are some more
  useful features that could be added there.
 
 I've been using the patch for a couple of days on an amd64 system
 based on 11.0-CURRENT r280952 and didn't notice any obvious
 regressions using the system as usual.
 
 Scrubbing a pool once revealed checksum errors which I haven't
 seen before:
 
 [fk@kendra ~]$ zpool status -v dpool
   pool: dpool
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
   scan: scrub repaired 0 in 1h52m with 0 errors on Thu Apr  2 13:01:44 2015
 config:
 
 NAME  STATE READ WRITE CKSUM
 dpool ONLINE   0 0 0
   gpt/dpool-ada0.eli  ONLINE   0 0 6
 
 errors: No known data errors
 
 Apr  2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 
 60 30 17 61 55 40 31 00 00 00 00 00
 Apr  2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status 
 Error
 Apr  2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): ATA status: 51 (DRDY 
 SERV ERR), error: 40 (UNC )
 Apr  2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): RES: 51 40 3e 61 55 40 
 31 00 00 00 00
 Apr  2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): Error 5, Retries 
 exhausted
 Apr  2 12:31:34 kendra kernel: GEOM_ELI: g_eli_read_done() failed 
 gpt/dpool-ada0.eli[READ(offset=414970949120, length=24576)]
 
 However the issue doesn't seem to be (easily) reproducible
 and could be unrelated.

It is unlikely that this is related to the pass(4) driver patches.  Possible,
but highly unlikely.  camdd(8) doesn't support ATA passthrough yet, so the
only way to access it with camdd is with the file I/O method.

 I also tried to test camdd, but didn't get it to work.
 Some failed attempts:
 
 [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536 -o file=blafsel.img
 (pass2:umass-sim0:0:0:0): READ(6). CDB: 08 00 00 00 80 00 
 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error
 13 bytes read from pass2
 13 bytes written to blafsel.img
 20.3203 seconds elapsed
 0.00 MB/sec
 [fk@kendra ~]$ sudo hd blafsel.img 
   55 53 42 53 d9 02 00 00  00 00 01 00 01   |USBS.|
 000d
 [fk@kendra ~]$ sudo dd if=/dev/da0 bs=1k count=1  | hd | head -n 1
 1+0 records in
 1+0 records out
 1024 bytes transferred in 0.000603 secs (1697756 bytes/sec)
   fc 31 c0 8e c0 8e d8 8e  d0 bc 00 0e be 1a 7c bf  |.1|.|

One possibility is that the device doesn't support 6-byte read/write
requests.  The da(4) driver has quirk entries and code to figure that out
and default to 10-byte read/write requests, but camdd(8) doesn't have
anything like that yet.

I've attached patches to camdd that allow you to specify a minimum command
size.  So, apply the patches, rebuild camdd, and try this:

# sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img

We'll see if that helps.  I'm not sure why you were even able to get 13
bytes back.  That is very strange.

 Trying the block size suggested in the manual result in:
 
 [fk@kendra ~]$ sudo camdd -i pass=da0,bs=1M -o file=blafsel.img
 camdd: camdd_pass_run: error sending CAMIOQUEUE ioctl to pass2: Invalid 
 argument
 camdd: camdd_pass_run: CCB address is 0x80250e420: Invalid argument
 0 bytes read from pass2
 0 bytes written to blafsel.img
 0.0007 seconds elapsed
 0.00 MB/sec
 
 Apr  5 19:08:20 kendra kernel: (pass2:umass-sim0:0:0:0): passmemsetup: data 
 length 1048576  max allowed 65536 bytes
 

Yes.  By default, if you don't specify a blocksize, camdd(8) should limit
the I/O size to the controller's maximum or 128K, whichever is smaller.  If
you specify an I/O size, it will try to use that.

Thanks for testing the code, I really appreciate it!

Let me know how the patch works!

Ken
-- 
Kenneth Merry
k...@freebsd.org
 //depot/users/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8#1 - 
/usr/home/kenm/perforce4/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8 
*** /tmp/tmp.54366.13   Mon Apr  6 21:56:38 2015
--- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8  Mon Apr 
 6 21:23:29 2015
***
*** 31,37 
  .\ 
  .\ $FreeBSD$
  .\
! .Dd March 13, 2015
  .Dt CAMDD 8
  .Os
  .Sh NAME
--- 31,37 
  .\ 
  .\ $FreeBSD$
  .\
! .Dd April 6, 2015
  .Dt CAMDD 8
  .Os

Re: async pass(4) patches available

2015-03-31 Thread Kenneth D. Merry
On Tue, Mar 31, 2015 at 03:49:12 +0300, Konstantin Belousov wrote:
 On Mon, Mar 30, 2015 at 04:23:58PM -0600, Kenneth D. Merry wrote:
  Kernel memory for data transferred via the queued interface is 
  allocated from the zone allocator in MAXPHYS sized chunks, and user
  data is copied in and out.  This is likely faster than the
  vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
  configurations with many processors (there are more TLB shootdowns
  caused by the mapping/unmapping operation) but may not be as fast
  as running with unmapped I/O.
 cam_periph_mapmem() uses vmapbuf() with an indicator to always map the
 user pages mostly because I do not know CAM code and wanted to make
 the least intrusive changes there.  It is not inherently impossible
 to pass unmapped pages down from cam_periph_mapmem(), but might
 require some more plumbing for driver to indicate that it is acceptable.

I think that would probably not be too difficult to change.  That API isn't
one that is exposed, so changing it shouldn't be a problem.  The only
reason not to do unmapped I/O there is just if the underlying controller
doesn't support it.  The lower parts of the stack shouldn't be trying to
sniff the data that is read or written to the device, although that has
happened in the past.  We'd have to audit a couple of the drivers to
make sure they aren't trying to access the data.

  The new memory handling model for user requests also allows
  applications to send CCBs with request sizes that are larger than
  MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
  size listed by the SIM driver in the maxio field in the Path
  Inquiry (XPT_PATH_INQ) CCB.
  
  There are some things things would be good to add:
  
  1. Come up with a way to do unmapped I/O on multiple buffers.
 Currently the unmapped I/O interface operates on a struct bio,
 which includes only one address and length.  It would be nice
 to be able to send an unmapped scatter/gather list down to
 busdma.  This would allow eliminating the copy we currently do
 for data.
 Only because nothing more was needed.  The struct bio does not use
 address/length pair when unmapped, it passes the list of physical
 pages, see bio_ma array pointer.  It is indeed taylored to be a pointer
 to struct buf' b_pages, but it does not have to be.
 
 The busdma unmapped non-specific interface is bus_dmamap_load_ma(),
 which again takes array of pages to load.  If you want some additional
 helper, suitable for your goals, please provide the desired interface
 definition.

What I'd like to be able to do is pass down a CCB with a user virtual
S/G list (CAM_DATA_SG, but with user virtual pointers) and have busdma deal
with it.

The trouble would likely be figuring out a flag to use to indicate that the
S/G list in question contains user virtual pointers.  (Backwards/binary
compatibility is always an issue with CCB flags, since they have all been
used.)

But that is essentially what is needed.  

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


async pass(4) patches available

2015-03-30 Thread Kenneth D. Merry

I have put patches to add an asynchronous interface to the pass(4) driver
and add a new camdd(8) utility here:

FreeBSD/head as of SVN revision 280857:

http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt

FreeBSD stable/10 as of SVN revision 280856:

http://people.freebsd.org/~ken/async_pass.stable_10.20150330.1.txt

And the description / draft commit message:

http://people.freebsd.org/~ken/async_pass_commitmsg.20150330.txt

I have also attached the description and draft commit message to this
email.

The asynchronous changes to the pass(4) driver allow queueing and fetching
CAM CCBs via two new ioctls.  Notification of completed I/O can come via
kqueue(2), poll(2), select(2), etc.

The camdd(8) utility is intended as a simple data transfer utility,
benchmark, and an in-tree example of how to use the asynchronous pass(4)
interface.

camdd(8) is still a work in progress.  It needs to be cleaned up a bit and
streamlined.

There is one known arrival and departure bug with the pass(4) driver
changes.  We've reproduced it with our tests at Spectra, but I haven't yet
tracked it down.

There are many more arrival and departure bugs in FreeBSD/head, however.
We have fixed quite a few in our local tree, but the test (called devad2)
that triggers all of the problems uses the asynchronous pass(4) interface.
So this is a prerequisite for fixing/verifying those bugs.

Comments and testing are welcome!  As I said, camdd(8) in particular is a
work in progress.  It could use some cleanup and there are some more useful
features that could be added there.

Part of the reason for camdd(8) was as a test facility for the new
interface.  But, it also serves as a useful demonstration of the
asynchronous pass(4) functionality, given that the original application
that used the API doesn't make sense to go into FreeBSD.  (It is
Spectra-specific, and not generally useful.)

Ken
-- 
Kenneth Merry
k...@freebsd.org
Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl.  User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists.  This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is 
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out.  This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS.  The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
   Currently the unmapped I/O interface operates on a struct bio,
   which includes only one address and length.  It would be nice
   to be able to send an unmapped scatter/gather list down to
   busdma.  This would allow eliminating the copy we currently do
   for data.

2. Add an ioctl to list currently outstanding CCBs in the various
   queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
   that.

4. Test physical address support.  Virtual pointers and scatter
   gather lists have been tested, but I have not yet tested 
   physical addresses or scatter/gather lists.

5. Investigate multiple queue support.  At the moment there is one
   queue of commands per pass(4) device.  If multiple processes
   open the device, they will submit I/O into the same queue and 
   get events for the same completions.  This is probably the right
   model for most applications, but it would be good to make sure
   that there is not really a case for multiple queues before
   pushing this code upstream.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It 

Re: SLR140 with new mt(1) [Was: Re: sa(4) driver changes available for test]

2015-03-11 Thread Kenneth D. Merry
On Wed, Mar 11, 2015 at 20:26:49 +0100, Harald Schmalzbauer wrote:
  Bez?glich Kenneth D. Merry's Nachricht vom 28.02.2015 01:08 (localtime):
 ?
  Still just works fine ! :-) (stable_10.20150218.1-patchset with LTO2,
  LTO3 and DDS5)
  With DDS5, densitiy is reported as unknown. If I remember correctly,
  you have your DDS4 reporting DDS4?
  That means that we need to add DDS5 to the density table in libmt.  Can
  you send the output of 'mt status -v'?  It would actually be helpful for
  all three drives.
 
 Hello,
 
 I'd like to present some test results.
 All tests were done with 10-stable-r273923 and Ken's
 sa_driver_changes-patchset, reduced by the commited scsi-sys-code.

Thank you for testing all of these drives and media!  I really appreciate
it!

 Unfortunately, there's a problem with appending files to any SLRtape. I
 can write the first file, but trying to open a second file for writing,
 results in end of device message. This problem doesn't exist for other
 drives (tested on VXA-2 (also SCSI-2) and DAT72 (SCSI-3)) with exactly
 same environment (all currently connected SCSI drives (7) are on one
 mpt(4) bus).
 After the first end of device message, consecutive write attempts lead
 to Operation not permitted.
 
 According to the datasheet
 (http://www.tandbergdata.ru/products/files/SLR140_DS_605_ENG.pdf), the
 drive should speak SCSI-3, but camcontrol shows SCSI-2.
 
 ##
 TandbergData SLR140 Drive
 ##
 camcontrol inq $TAPE -v
 pass3: TANDBERG SLR140 0605 Removable Sequential Access SCSI-2 device
 pass3: Serial Number SN140253489
 pass3: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Command
 Queueing Enabled

This sounds like it could be an End Of Tape (EOT) model issue.  There is a
quirk entry in the driver for other SLR drives, but it probably won't match
this particular drive because of a leading space in the INQUIRY data:

{
{ T_SEQUENTIAL, SIP_MEDIA_REMOVABLE, TANDBERG,
   SLR*, *}, SA_QUIRK_1FM, 0
},

So, try doing a 'mt geteotmodel' on that drive.  It is probably set to 2
filemarks.  If it is, do:

mt seteotmodel 1

Obviously, if it is set to 1, try 2 and see what happens.

If that is the case, we can adjust the quirk to match that drive.

 Density 0x36 = ALRF-6, 186000 bpi, SLR140 drive + SLR140tape:
 SLRtape140 (8mm DualReel, 70GB native, 505.9m length, 5.5MiB/s)

Do you have any source documentation for the BPI data?  Any information
on the number of tracks or the other fields that might go in the mt(1) man
page?  (We can obviously put it in with that, it's just nice to put it all
in if we have it.)

 mt status -v
 Drive: sa3: TANDBERG SLR140 0605 Serial Number: SN140253489
 -
 Mode Density Blocksize bpi Compression
 Current: 0x36:UNKNOWN variable 0 enabled (0x3)
 -
 Current Driver State: at rest.
 -
 Partition: 0 Calc File Number: 0 Calc Record Number: 0
 Residual: 0 Reported File Number: -1 Reported Record Number: -1
 Flags: None
 -
 Tape I/O parameters:
 Maximum I/O size allowed by driver and controller (maxio): 131072 bytes
 Maximum I/O size reported by controller (cpi_maxio): 131072 bytes
 Maximum block size supported by tape drive and media (max_blk): 262144 bytes
 Minimum block size supported by tape drive and media (min_blk): 1 bytes
 Block granularity supported by tape drive and media (blk_gran): 0 bytes
 Maximum possible I/O size (max_effective_iosize): 131072 bytes
 
 Minimum blocksize to reach highest throughput, thus sustained write of
 uncompressable data (from /dev/random): 24k@5.5MiB/s

That's pretty good!

 mt status
 -
 Current Driver State: at rest.
 -
 Partition: 0 Calc File Number: 1 Calc Record Number: 0
 Residual: 0 Reported File Number: -1 Reported Record Number: -1
 Flags: None
 
 short READ POSITION
 camcontrol cmd $TAPE -v -c 34 0 0 0 0 0 0 0 0 0 -i 20 - | hd
  30 00 00 00 00 00 06 83 00 00 00 00 00 00 00 00 |0...|
 0010 00 00 00 00 ||
 0014
 vendor READ POSITION
 camcontrol cmd $TAPE -v -c 34 1 0 0 0 0 0 0 0 0 -i 20 - | hd
 camcontrol: error sending command
 (pass3:mpt1:0:13:0): READ POSITION. CDB: 34 01 00 00 00 00 00 00 00 00
 (pass3:mpt1:0:13:0): CAM status: SCSI Status Error
 (pass3:mpt1:0:13:0): SCSI status: Check Condition
 (pass3:mpt1:0:13:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field
 in CDB)
 (pass3:mpt1:0:13:0): Command byte 1 bit 0 is invalid
 long READ POSITION
 camcontrol cmd $TAPE -v -c 34 6 0 0 0 0 0 0 0 0 -i 32 - |hd
 camcontrol: error sending command
 (pass3:mpt1:0:13:0): READ POSITION. CDB: 34 06 00 00 00 00 00 00 00 00
 (pass3:mpt1:0:13:0): CAM status: SCSI Status Error
 (pass3:mpt1:0:13:0): SCSI status: Check Condition
 (pass3:mpt1:0:13:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field
 in CDB)
 (pass3:mpt1:0:13:0): 

Re: sa(4) driver changes available for test

2015-03-11 Thread Kenneth D. Merry
On Sat, Mar 07, 2015 at 14:30:26 +0100, Harald Schmalzbauer wrote:
  Bez?glich Kenneth D. Merry's Nachricht vom 19.02.2015 01:13 (localtime):
  I have updated the patches.
 
  I have removed the XPT_DEV_ADVINFO changes from the patches to head, since
  I committed those separately.
 
  I have (hopefully) fixed the build for the stable/10 patches by MFCing
  dependencies.  (One of them mav did for me, thanks!)
 
  Rough draft commit message:
 
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt
 
  The patches against FreeBSD/head as of SVN revision 278975:
 
  http://people.freebsd.org/~ken/sa_changes.20150218.1.txt
 
  And (untested) patches against FreeBSD stable/10 as of SVN revision 278974:
 
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt
 
 Hello,
 
 on 26/02/2105, r278964 seems to be part from the sa_changes patchset.
 Do you have a sa_changes.stable_10.20150226 ready?

I haven't done it yet, sorry.

 Or is it just a matter of exluding all parts, comitted with r278964 
 from the patchset?
 I've done so in the mean while:
 ftp://ftp.omnilan.de/pub/FreeBSD/OmniLAN/misc/sa_changes.stable_10.20150226.fudge.patch
 

Thanks!

 Noticed that in sys/dev/mps/mps_sas.c 'cdai.flags' gets conditionally
 (#if __FreeBSD_version = 1100061) the new CDAI_FLAG_NONE, while in
 sbin/camcontrol/camcontrol.c, this is unconditionally used. Haven't
 really looked at the code, mostly because my skills wouldN#t allow me to
 answer this qusteion myself, but is that versioncheck in mps_sas.c still
 vaild with the rest of the sa_driver-changes?

Yes, that's intentional.  The mps(4) and mpr(4) drivers are also maintained
outside the tree by LSI/Avago.  I usually try to put version checks in
there, so that things work when they try to compile against earlier
releases.  Otherwise they'd be putting in the same checks themselves.  It
is easier to do them when the changes go in the tree.

camcontrol(8), on the other hand, is only maintained in the FreeBSD tree.
So it only ever needs to build against the FreeBSD branch it is in.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sa(4) driver changes available for test

2015-03-02 Thread Kenneth D. Merry
On Mon, Mar 02, 2015 at 16:34:34 -0500, Dan Langille wrote:
 
  On Mar 2, 2015, at 2:47 PM, Dan Langille d...@langille.org wrote:
  
  
  On Mar 2, 2015, at 2:07 PM, Dan Langille d...@langille.org wrote:
  
  
  On Mar 2, 2015, at 12:28 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Mon, Mar 02, 2015 at 11:44:09 -0500, Dan Langille wrote:
  
  On Mar 2, 2015, at 11:31 AM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote:
  
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry 
  k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and 
  mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate 
  testing and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare 
  machine which I could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing 
  it via Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula 
  committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your 
  trying it out
  would be helpful if you have the time.
  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, 
  they both
  claim to support long position information for the SCSI READ 
  POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled 
  (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: 
  -1
  Residual:0  Reported File Number:  -1 Reported Record Number: 
  -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled 
  (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: 
  -1
  Residual:0  Reported File Number:   2 Reported Record Number: 
  32373
  Flags: None
  
  That, in combination with the changes I made to the position 
  information
  code in the driver, mean that even the old MTIOCGET ioctl should 
  return an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula 
  continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
  Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, 
  and it would
  be good to run through the tests on one of the tape drives and 
  see whether
  they work, and whether the results are different before and after 
  the
  changes.  I'm not sure how to enable the test mode.
  
  I have this in /usr/local/etc/bacula/bacula-sd.conf
  
  Device {
  Name= DLT
  Description = QUANTUM DLT7000 1624
  Media Type  = DLT
  Archive Device  = /dev/nsa1
  
  Autochanger = YES
  Drive Index = 0
  
  Offline On Unmount  = no
  Hardware End of Medium  = yes
  BSF at EOM  = yes
  Backward Space Record   = no
  Fast Forward Space File = no
  TWO EOF = yes
  }
  
  FYI, http://www.freebsddiary.org/digital-tl891

Re: sa(4) driver changes available for test

2015-03-02 Thread Kenneth D. Merry
On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote:
 
  On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote:
  
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing 
  and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare machine 
  which I could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing it 
  via Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula 
  committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your trying it 
  out
  would be helpful if you have the time.
  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, they 
  both
  claim to support long position information for the SCSI READ POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: -1
  Residual:0  Reported File Number:  -1 Reported Record Number: -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: -1
  Residual:0  Reported File Number:   2 Reported Record Number: 32373
  Flags: None
  
  That, in combination with the changes I made to the position information
  code in the driver, mean that even the old MTIOCGET ioctl should return 
  an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula 
  continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
  Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, and it 
  would
  be good to run through the tests on one of the tape drives and see 
  whether
  they work, and whether the results are different before and after the
  changes.  I'm not sure how to enable the test mode.
  
  I have this in /usr/local/etc/bacula/bacula-sd.conf
  
  Device {
  Name= DLT
  Description = QUANTUM DLT7000 1624
  Media Type  = DLT
  Archive Device  = /dev/nsa1
  
  Autochanger = YES
  Drive Index = 0
  
  Offline On Unmount  = no
  Hardware End of Medium  = yes
  BSF at EOM  = yes
  Backward Space Record   = no
  Fast Forward Space File = no
  TWO EOF = yes
  }
  
  FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a 
  btape test on this same model.
  
  Here's the test I ran tonight:
  
  [root@cuppy:/usr/home/dan] # btape -c 
  /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1   
   
  Tape block granularity is 1024 bytes.
  btape: butil.c:287-0 Using device: /dev/nsa1 for writing.
  btape: btape.c:469-0 open device DLT (/dev/nsa1): OK
  *test
  
  === Write

Re: sa(4) driver changes available for test

2015-03-02 Thread Kenneth D. Merry
On Mon, Mar 02, 2015 at 11:43:15 -0500, Dan Langille wrote:
 
  On Mar 1, 2015, at 9:06 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing 
  and
  feedback.
  
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
  
  The patches against FreeBSD/head as of SVN revision 278706:
  
  http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
  
  And (untested) patches against FreeBSD stable/10 as of SVN revision 
  278721.
  
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
  
  
  The intent is to get the tape infrastructure more up to date, so we 
  can
  support LTFS and more modern tape drives:
  
  http://www.ibm.com/systems/storage/tape/ltfs/
  
  I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port 
  depends
  on the patches linked above.  It isn't fully cleaned up and ready for
  redistribution.  If you're interested, though, let me know and I'll 
  tell
  you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 
  or
  TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and 
  older
  drives don't have the necessary features to support LTFS.
  
  The commit message below outlines most of the changes.
  
  A few comments:
  
  1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
  
  2. The XML output is similar to what GEOM and CTL do.  It would be 
  nice to
  figure out how to put a standard schema on it so that standard tools
  could read it.  I don't know how feasible that is, since I haven't
  time to dig into it.  If anyone has suggestions on whether that is
  feasible or advisable, I'd appreciate feedback.
  
  3. I have tested with a reasonable amount of tape hardware (see below 
  for a
  list), but more testing and feedback would be good.
  
  4. Standard 'mt status' output looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  
  5. 'mt status -v' looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  -
  Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1081344 
  bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 
  8388608 bytes
  Minimum block size supported by tape drive and media (min_blk): 1 
  bytes
  Block granularity supported by tape drive and media (blk_gran): 0 
  bytes
  Maximum possible I/O size (max_effective_iosize): 1081344 bytes
  
  
  # mtx -f /dev/pass0 status
  Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export )
  Data Transfer Element 0:Empty
  Data Transfer Element 1:Empty
 Storage Element 1:Empty
 Storage Element 2:Empty
 Storage Element 3:Empty
 Storage Element 4:Full :VolumeTag=FAI260  
 Storage Element 5:Full :VolumeTag=FAI261  
 Storage Element 6:Full :VolumeTag=FAI262  
 Storage Element 7:Full :VolumeTag=FAI263  
 Storage Element 8:Empty
 Storage Element 9:Empty
 Storage Element 10:Empty
  
  
  It was at this point I spent the next 90 minute trying to get the tape 
  drive out of the tape library to free a stuck tape.  Some of this was 
  spent
  attempting

Re: sa(4) driver changes available for test

2015-03-02 Thread Kenneth D. Merry
On Mon, Mar 02, 2015 at 11:44:09 -0500, Dan Langille wrote:
 
  On Mar 2, 2015, at 11:31 AM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote:
  
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org 
  wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate 
  testing and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare machine 
  which I could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing it 
  via Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula 
  committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your trying 
  it out
  would be helpful if you have the time.
  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, 
  they both
  claim to support long position information for the SCSI READ POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: -1
  Residual:0  Reported File Number:  -1 Reported Record Number: -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: -1
  Residual:0  Reported File Number:   2 Reported Record Number: 
  32373
  Flags: None
  
  That, in combination with the changes I made to the position 
  information
  code in the driver, mean that even the old MTIOCGET ioctl should 
  return an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula 
  continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
  Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, and it 
  would
  be good to run through the tests on one of the tape drives and see 
  whether
  they work, and whether the results are different before and after the
  changes.  I'm not sure how to enable the test mode.
  
  I have this in /usr/local/etc/bacula/bacula-sd.conf
  
  Device {
  Name= DLT
  Description = QUANTUM DLT7000 1624
  Media Type  = DLT
  Archive Device  = /dev/nsa1
  
  Autochanger = YES
  Drive Index = 0
  
  Offline On Unmount  = no
  Hardware End of Medium  = yes
  BSF at EOM  = yes
  Backward Space Record   = no
  Fast Forward Space File = no
  TWO EOF = yes
  }
  
  FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a 
  btape test on this same model.
  
  Here's the test I ran tonight:
  
  [root@cuppy:/usr/home/dan] # btape -c 
  /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1 
 
  Tape block granularity

Re: sa(4) driver changes available for test

2015-03-02 Thread Kenneth D. Merry
On Mon, Mar 02, 2015 at 11:45:59 -0500, Dan Langille wrote:
 
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
 
 This came to me today via the Bacula mailing lists.
 
 http://marc.info/?l=bacula-usersm=142531236722693w=2
 
  As far as I can tell ltfs support on linux sits on top of the standard 
  mt-st stuff \
  as a userspace (fuse) filesystem 
  I'd hope it's much the same with BSD. Removing the standard interface would 
  be \
  counterproductive overall
 
 Can you answer that and I'll relay please?

Sure.  In short, the current interface will stay in place.  I have added
additional ioctls that provide more features and information, but I don't
see any issue with leaving the current ioctls in place.

The MTIOCGET ioctl even gets an improvement in behavior when the tape drive
supports long position information -- it will report the file number after
a 'mt eod'.

IBM's LTFS sits on top of their own Linux tape driver, and operates with
a combination of tape driver ioctls (e.g. the standard MTIOTCOP ioctls)
and SCSI passthrough.

When I ported it to FreeBSD, I ran into several areas where we needed
more information out of the tape driver.  So that was the primary
motivation behind adding the additional features.  (Other features I
implemented using SCSI passthrough.)

He is correct that it runs with FUSE, although it can be linked into an
application as a library as well.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sa(4) driver changes available for test

2015-03-01 Thread Kenneth D. Merry
On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote:
 
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare machine which I 
  could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing it via 
  Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your trying it out
  would be helpful if you have the time.
  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, they both
  claim to support long position information for the SCSI READ POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: -1
  Residual:0  Reported File Number:  -1 Reported Record Number: -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: -1
  Residual:0  Reported File Number:   2 Reported Record Number: 32373
  Flags: None
  
  That, in combination with the changes I made to the position information
  code in the driver, mean that even the old MTIOCGET ioctl should return an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
   Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, and it would
  be good to run through the tests on one of the tape drives and see whether
  they work, and whether the results are different before and after the
  changes.  I'm not sure how to enable the test mode.
 
 I have this in /usr/local/etc/bacula/bacula-sd.conf
 
 Device {
   Name= DLT
   Description = QUANTUM DLT7000 1624
   Media Type  = DLT
   Archive Device  = /dev/nsa1
 
   Autochanger = YES
   Drive Index = 0
 
   Offline On Unmount  = no
   Hardware End of Medium  = yes
   BSF at EOM  = yes
   Backward Space Record   = no
   Fast Forward Space File = no
   TWO EOF = yes
 }
 
 FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape 
 test on this same model.
 
 Here's the test I ran tonight:
 
 [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf 
 /dev/nsa1 

 Tape block granularity is 1024 bytes.
 btape: butil.c:287-0 Using device: /dev/nsa1 for writing.
 btape: btape.c:469-0 open device DLT (/dev/nsa1): OK
 *test
 
 === Write, rewind, and re-read test ===
 
 I'm going to write 1 records and an EOF
 then write 1 records and an EOF, then rewind,
 and re-read the data to verify that it is correct.
 
 This is an *essential* feature ...
 
 btape: btape.c:1152-0 Wrote 1 blocks of 64412 bytes.
 btape: btape.c:604-0 Wrote 1 EOF to DLT

Re: sa(4) driver changes available for test

2015-03-01 Thread Kenneth D. Merry
On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote:
 
  On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote:
  
  On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing 
  and
  feedback.
  
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
  
  The patches against FreeBSD/head as of SVN revision 278706:
  
  http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
  
  And (untested) patches against FreeBSD stable/10 as of SVN revision 
  278721.
  
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
  
  
  The intent is to get the tape infrastructure more up to date, so we can
  support LTFS and more modern tape drives:
  
  http://www.ibm.com/systems/storage/tape/ltfs/
  
  I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port 
  depends
  on the patches linked above.  It isn't fully cleaned up and ready for
  redistribution.  If you're interested, though, let me know and I'll tell
  you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 or
  TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and older
  drives don't have the necessary features to support LTFS.
  
  The commit message below outlines most of the changes.
  
  A few comments:
  
  1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
  
  2. The XML output is similar to what GEOM and CTL do.  It would be nice 
  to
  figure out how to put a standard schema on it so that standard tools
  could read it.  I don't know how feasible that is, since I haven't
  time to dig into it.  If anyone has suggestions on whether that is
  feasible or advisable, I'd appreciate feedback.
  
  3. I have tested with a reasonable amount of tape hardware (see below 
  for a
  list), but more testing and feedback would be good.
  
  4. Standard 'mt status' output looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  
  5. 'mt status -v' looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  -
  Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 8388608 
  bytes
  Minimum block size supported by tape drive and media (min_blk): 1 bytes
  Block granularity supported by tape drive and media (blk_gran): 0 bytes
  Maximum possible I/O size (max_effective_iosize): 1081344 bytes
  
  
  # mtx -f /dev/pass0 status
  Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export )
  Data Transfer Element 0:Empty
  Data Transfer Element 1:Empty
  Storage Element 1:Empty
  Storage Element 2:Empty
  Storage Element 3:Empty
  Storage Element 4:Full :VolumeTag=FAI260  
  Storage Element 5:Full :VolumeTag=FAI261  
  Storage Element 6:Full :VolumeTag=FAI262  
  Storage Element 7:Full :VolumeTag=FAI263  
  Storage Element 8:Empty
  Storage Element 9:Empty
  Storage Element 10:Empty
  
  
  It was at this point I spent the next 90 minute trying to get the tape 
  drive out of the tape library to free a stuck tape.  Some of this was 
  spent
  attempting, and failing, to undo a stripped screw.  I stopped the 
  attempt when
  I noticed the screw did need to be removed.  :/
  
  Thanks for all of the effort

Re: sa(4) driver changes available for test

2015-03-01 Thread Kenneth D. Merry
On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote:
 
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
  
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
  
  The patches against FreeBSD/head as of SVN revision 278706:
  
  http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
  
  And (untested) patches against FreeBSD stable/10 as of SVN revision 278721.
  
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
  
  
  The intent is to get the tape infrastructure more up to date, so we can
  support LTFS and more modern tape drives:
  
  http://www.ibm.com/systems/storage/tape/ltfs/
  
  I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port depends
  on the patches linked above.  It isn't fully cleaned up and ready for
  redistribution.  If you're interested, though, let me know and I'll tell
  you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 or
  TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and older
  drives don't have the necessary features to support LTFS.
  
  The commit message below outlines most of the changes.
  
  A few comments:
  
  1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
  
  2. The XML output is similar to what GEOM and CTL do.  It would be nice to
figure out how to put a standard schema on it so that standard tools
could read it.  I don't know how feasible that is, since I haven't
time to dig into it.  If anyone has suggestions on whether that is
feasible or advisable, I'd appreciate feedback.
  
  3. I have tested with a reasonable amount of tape hardware (see below for a
list), but more testing and feedback would be good.
  
  4. Standard 'mt status' output looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  
  5. 'mt status -v' looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  -
  Tape I/O parameters:
   Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes
   Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
   Maximum block size supported by tape drive and media (max_blk): 8388608 
  bytes
   Minimum block size supported by tape drive and media (min_blk): 1 bytes
   Block granularity supported by tape drive and media (blk_gran): 0 bytes
   Maximum possible I/O size (max_effective_iosize): 1081344 bytes
 
 
 # mtx -f /dev/pass0 status
   Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export )
 Data Transfer Element 0:Empty
 Data Transfer Element 1:Empty
   Storage Element 1:Empty
   Storage Element 2:Empty
   Storage Element 3:Empty
   Storage Element 4:Full :VolumeTag=FAI260  
   Storage Element 5:Full :VolumeTag=FAI261  
   Storage Element 6:Full :VolumeTag=FAI262  
   Storage Element 7:Full :VolumeTag=FAI263  
   Storage Element 8:Empty
   Storage Element 9:Empty
   Storage Element 10:Empty
 
 
 It was at this point I spent the next 90 minute trying to get the tape 
 drive out of the tape library to free a stuck tape.  Some of this was spent
 attempting, and failing, to undo a stripped screw.  I stopped the attempt when
 I noticed the screw did need to be removed.  :/

Thanks for all of the effort!  Looks like it is paying off! :)

 When I do this command, I hear the drive move a bit, to read the tape:
 
 # mt -f /dev/nsa1 status
 Drive: sa1: DEC TZ89 (C) DEC 2561 Serial Number: CXA09S1340
 -
 Mode  DensityBlocksize  bpi

Re: sa(4) driver changes available for test

2015-03-01 Thread Kenneth D. Merry
On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote:
 
  On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
  
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
  
  The patches against FreeBSD/head as of SVN revision 278706:
  
  http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
  
  And (untested) patches against FreeBSD stable/10 as of SVN revision 
  278721.
  
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
  
  
  The intent is to get the tape infrastructure more up to date, so we can
  support LTFS and more modern tape drives:
  
  http://www.ibm.com/systems/storage/tape/ltfs/
  
  I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port 
  depends
  on the patches linked above.  It isn't fully cleaned up and ready for
  redistribution.  If you're interested, though, let me know and I'll tell
  you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 or
  TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and older
  drives don't have the necessary features to support LTFS.
  
  The commit message below outlines most of the changes.
  
  A few comments:
  
  1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
  
  2. The XML output is similar to what GEOM and CTL do.  It would be nice to
   figure out how to put a standard schema on it so that standard tools
   could read it.  I don't know how feasible that is, since I haven't
   time to dig into it.  If anyone has suggestions on whether that is
   feasible or advisable, I'd appreciate feedback.
  
  3. I have tested with a reasonable amount of tape hardware (see below for 
  a
   list), but more testing and feedback would be good.
  
  4. Standard 'mt status' output looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  
  5. 'mt status -v' looks like this:
  
  # mt -f /dev/nsa3 status  -v
  Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   0 Calc Record Number: 0
  Residual:0  Reported File Number:   0 Reported Record Number: 0
  Flags: BOP
  -
  Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 8388608 
  bytes
  Minimum block size supported by tape drive and media (min_blk): 1 bytes
  Block granularity supported by tape drive and media (blk_gran): 0 bytes
  Maximum possible I/O size (max_effective_iosize): 1081344 bytes
  
  
  # mtx -f /dev/pass0 status
   Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export )
  Data Transfer Element 0:Empty
  Data Transfer Element 1:Empty
   Storage Element 1:Empty
   Storage Element 2:Empty
   Storage Element 3:Empty
   Storage Element 4:Full :VolumeTag=FAI260  
   Storage Element 5:Full :VolumeTag=FAI261  
   Storage Element 6:Full :VolumeTag=FAI262  
   Storage Element 7:Full :VolumeTag=FAI263  
   Storage Element 8:Empty
   Storage Element 9:Empty
   Storage Element 10:Empty
  
  
  It was at this point I spent the next 90 minute trying to get the tape 
  drive out of the tape library to free a stuck tape.  Some of this was spent
  attempting, and failing, to undo a stripped screw.  I stopped the attempt 
  when
  I noticed the screw did need to be removed.  :/
  
  Thanks for all of the effort!  Looks like it is paying off! :)
  
  When I do this command, I hear the drive move a bit, to read the tape:
  
  # mt -f /dev/nsa1 status

Re: sa(4) driver changes available for test

2015-03-01 Thread Kenneth D. Merry
On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote:
 
  On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote:
  
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) 
  driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing 
  and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare machine which 
  I could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing it via 
  Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your trying it out
  would be helpful if you have the time.
  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, they 
  both
  claim to support long position information for the SCSI READ POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: -1
  Residual:0  Reported File Number:  -1 Reported Record Number: -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: -1
  Residual:0  Reported File Number:   2 Reported Record Number: 32373
  Flags: None
  
  That, in combination with the changes I made to the position information
  code in the driver, mean that even the old MTIOCGET ioctl should return an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
  Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, and it 
  would
  be good to run through the tests on one of the tape drives and see whether
  they work, and whether the results are different before and after the
  changes.  I'm not sure how to enable the test mode.
  
  I have this in /usr/local/etc/bacula/bacula-sd.conf
  
  Device {
   Name= DLT
   Description = QUANTUM DLT7000 1624
   Media Type  = DLT
   Archive Device  = /dev/nsa1
  
   Autochanger = YES
   Drive Index = 0
  
   Offline On Unmount  = no
   Hardware End of Medium  = yes
   BSF at EOM  = yes
   Backward Space Record   = no
   Fast Forward Space File = no
   TWO EOF = yes
  }
  
  FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape 
  test on this same model.
  
  Here's the test I ran tonight:
  
  [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf 
  /dev/nsa1  

  Tape block granularity is 1024 bytes.
  btape: butil.c:287-0 Using device: /dev/nsa1 for writing.
  btape: btape.c:469-0 open device DLT (/dev/nsa1): OK
  *test
  
  === Write, rewind, and re-read test ===
  
  I'm going to write 1 records and an EOF
  then write 1 records and an EOF, then rewind,
  and re-read

Re: sa(4) driver changes available for test

2015-02-28 Thread Kenneth D. Merry
On Sat, Feb 28, 2015 at 17:29:48 -0500, Dan Langille wrote:
 
  On Feb 18, 2015, at 7:13 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have updated the patches.
  
  I have removed the XPT_DEV_ADVINFO changes from the patches to head, since
  I committed those separately.
  
  I have (hopefully) fixed the build for the stable/10 patches by MFCing
  dependencies.  (One of them mav did for me, thanks!)
  
  Rough draft commit message:
  
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt
 
 
 I have current installed and running with Bacula, but I have not tried the 
 tape drive yet.
 

Thanks for all your work on this!

 It seems like your changes are in there from about 5 days ago.

Yes, that is correct.

 Having solved my server hardware issues, I'm now having issues with the 
 autochanger mechanism 
 of the tape library.  

Does it work with chio(1)?

Does it look like hardware or software?  (If it is software, I can help
with that.)

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sa(4) driver changes available for test

2015-02-27 Thread Kenneth D. Merry
On Fri, Feb 27, 2015 at 17:56:42 -0500, Dan Langille wrote:
 
  On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
  
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
  
  I have a DLT 8000 and an SDLT 220.
  
  I don't have anything running current, but I have a spare machine which I 
  could use for testing.
  
  Do you see any value is tests with that hardware? I'd be testing it via 
  Bacula.
  
  disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer.
  
  
  Actually, yes.  Bacula is a bit tricky to configure, so your trying it out
  would be helpful if you have the time.
 
 I have been unable to test yet.  I've encountered time and hardware issues.

I know how that goes!  (On both counts.)

 I may be able to try tomorrow.

So I have tested building it and it does build at least.  If you're able to
figure out some of the answers below, that would be great!

  
  In looking at the manuals for both the SDLT 220 and the DLT 8000, they both
  claim to support long position information for the SCSI READ POSITION
  command.
  
  You can see what I'm talking about by doing:
  
  mt eod
  mt status
  
  On my DDS-4 tape drive, this shows:
  
  # mt -f /dev/nsa3 status
  Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:  -1 Calc Record Number: -1
  Residual:0  Reported File Number:  -1 Reported Record Number: -1
  Flags: None
  
  But on an LTO-5, which will give long position information, I get:
  
  [root@doc ~]# mt status
  Drive: sa0: IBM ULTRIUM-HH5 E4J1
  -
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   enabled (0x1)
  -
  Current Driver State: at rest.
  -
  Partition:   0  Calc File Number:   2 Calc Record Number: -1
  Residual:0  Reported File Number:   2 Reported Record Number: 32373
  Flags: None
  
  That, in combination with the changes I made to the position information
  code in the driver, mean that even the old MTIOCGET ioctl should return an
  accurate file number at end of data.  e.g., on the LTO-5:
  
  [root@doc ~]# mt ostatus
  Mode  Density  Blocksize  bpi  Compression
  Current:  0x58:LTO-5   variable   384607   0x1
  -available modes-
  0:0x58:LTO-5   variable   384607   0x1
  1:0x58:LTO-5   variable   384607   0x1
  2:0x58:LTO-5   variable   384607   0x1
  3:0x58:LTO-5   variable   384607   0x1
  -
  Current Driver State: at rest.
  -
  File Number: 2  Record Number: -1   Residual Count -1
  
  So the thing to try, in addition to just making sure that Bacula continues
  to work properly, is to try setting this for the tape drive in
  bacula-sd.conf:
  
   Hardware End of Medium = yes
  
  It looks like the Bacula tape program (btape) has a test mode, and it would
  be good to run through the tests on one of the tape drives and see whether
  they work, and whether the results are different before and after the
  changes.  I'm not sure how to enable the test mode.
  
  I'll let the other Bacula devs know about this.  They deal with the 
  hardware.  I work on PostgreSQL.
  
  
  Thanks!  If there are additional features they would like out of the tape
  driver, I'm happy to talk about it.  (Or help if they'd like to use the new
  status reporting ioctl, MTIOCEXTGET or any of the other new ioctls.)
  
  Ken
  -- 
  Kenneth Merry
  k...@freebsd.org
 
 ? 
 Dan Langille
 http://langille.org/
 
 
 
 
 

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sa(4) driver changes available for test

2015-02-27 Thread Kenneth D. Merry
On Fri, Feb 27, 2015 at 20:05:05 +0100, Harald Schmalzbauer wrote:
  Bez?glich Kenneth D. Merry's Nachricht vom 26.02.2015 23:42 (localtime):
 
 ?
  And (untested) patches against FreeBSD stable/10 as of SVN revision 
  278974:
 
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt
 ?
 
  I'm glad it is working well for you!  You can do larger I/O sizes with the
  Adaptec by changing your MAXPHYS and DFLTPHYS values in your kernel config
  file.  e.g.:
 
  options MAXPHYS=(1024*1024)
  options DFLTPHYS=(1024*1024)
 
  If you set those values larger, you won't be able to do more than 132K with
  the sym(4) driver on an x86 box.  (It limits the maximum I/O size to 33
  segments * PAGE_SIZE.)
 
 Thanks for the hint! I wasn't aware that kern.cam.sa.N.maxio has driver
 limitations corresponding to systems MAX/DFLTPHYS. I thought only
 silicon limitations define it's value.

It depends on the driver.  I thought that the Adaptec drivers go off of
MAXPHYS (because that's what the driver author told me last week :), but
in looking at the code, they actually have a hard-coded value that can be
increased.  You can bump AHC_MAXPHYS or AHD_MAXPHYS in aic7xxx_osm.h or
aic79xx_osm.h, respectively.  In order to make any difference, though, you
would have to bump MAXPHYS/DFLTPHYS (so the sa(4) driver will use that
value) or change the ahc(4)/ahd(4) driver to set the maxio field in the
path inquiry CCB.

 But in order to have a best matching pre-production test-environment, I
 nevertheless replaced it, now using mpt(4) instead of ahc(4)/ahc_pci on
 PCI-X@S3210 (for parallel tape drives I consistently have mpt(4)@PCIe,
 which is the same LSI(53c1020) chip but with on-board PCI-X-PCIe bridge).

Okay.  That should work.

 Still just works fine ! :-) (stable_10.20150218.1-patchset with LTO2,
 LTO3 and DDS5)
 With DDS5, densitiy is reported as unknown. If I remember correctly,
 you have your DDS4 reporting DDS4?

That means that we need to add DDS5 to the density table in libmt.  Can
you send the output of 'mt status -v'?  It would actually be helpful for
all three drives.

Also, do any of your drives give a full report for 'mt getdensity'?  If so,
can you send that as well?  (By full report, I mean more than one line.)

We don't have density codes for DDS-5/DAT 72, DAT 160 or DAT 320 yet.  It
looks like DDS-5 should be 0x47.

   therefore I'd like to point to the new port misc/vdmfec
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197950
  That looks cool. :)  I'm not a ports committer, but hopefully one of them
  will pick it up.
 
 Cool it is indeed, but whether it's really usefull or not is beyond my
 expertise. I couldn't collect much MT experience yet.
 I know that LTO and similar modern MT technology do their own ECC (in
 the meaning of erasure code, mostly Reed-Solomon).
 What I don't know (but wanting to be best prepared for) is how arbitrary
 LTO drives behave, if the one (1) in 10^17 bits was detected to be
 uncorrectable.
 If it wasn't detected, the post erasure code (vdmfec in that case) would
 help for sure.
 But If the drive just cuts the output, or stops streaming at all, vdmfec
 was useless?

There is a difference in the uncorrectable bit error rate and the
undetectable bit error rate.  The uncorrectable bit error rate for LTO-6 is
1 in 10^17.  It is 1 in 10^19 for Oracle T1 C/D drives, and 1 in 10^20
for IBM TS1150.  Seagate Enterprise drives claim to have an uncorrectable
bit error rate of 1 sector per 10^15 bits read.

See:

http://www.oracle.com/us/products/servers-storage/storage/tape-storage/t1c-reliability-wp-409919.pdf

http://www.spectralogic.com/index.cfm?fuseaction=home.displayFileDocID=2513

http://www.seagate.com/www-content/product-content/enterprise-hdd-fam/enterprise-capacity-3-5-hdd/constellation-es-4/en-us/docs/enterprise-capacity-3-5-hdd-ds1791-8-1410us.pdf

The second white paper claims that tape has an undetectable bit error rate
of 1 in 1.6x10^33 bits.  I assume it is referring to TS1150, but I don't
know for sure.

It is far more likely that your tape or tape drive will break than it is
that you would get a bad bit back from the drive.

 According to excerpts of Study of Perpendicular AME Media in a Linear
 Tape Drive, LTO-4 has a soft read error rate of 1 in 10^6 bits and DDS
 has 1 in 10^4 bits (!!!, according to HP C1537A DDS 3 - ACT/Apricot). So
 with DDS, _every_ single block pax(1) writes to tape needs to be
 internally corrected! Of course, nobody wants zfs' send output stream to
 DDS, it's much too slow/small, but just to mention.
 
 For archives of zfs streams, I don't feel safe relying on the tape
 drives' FEC, which was designed for backup solutions which do their own
 blocking+cheksumming, so the very seldom to expect uncorrectable read
 error would at worst lead to some/single unrecoverable files ? even in
 case of database files most likely post-recoverable.
 But with one flipped bit in the zfs stream, you'd loose hundred of
 

Re: sa(4) driver changes available for test

2015-02-26 Thread Kenneth D. Merry
On Thu, Feb 26, 2015 at 10:57:50 +0100, Harald Schmalzbauer wrote:
  Bez?glich Kenneth D. Merry's Nachricht vom 19.02.2015 01:13 (localtime):
  I have updated the patches.
 
  I have removed the XPT_DEV_ADVINFO changes from the patches to head, since
  I committed those separately.
 
  I have (hopefully) fixed the build for the stable/10 patches by MFCing
  dependencies.  (One of them mav did for me, thanks!)
 
  Rough draft commit message:
 
  http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt
 
  The patches against FreeBSD/head as of SVN revision 278975:
 
  http://people.freebsd.org/~ken/sa_changes.20150218.1.txt
 
  And (untested) patches against FreeBSD stable/10 as of SVN revision 278974:
 
  http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt
 
 Ken,
 
 thank you very much for your work!
 Last sa(4) overhaul (with 10.0 I guess) was a great success and I highly
 appreciate your work on tape support for FreeBSD!
 I compiled your 10-stable patchset for one machine with LTO2 and DDS5
 drives, but haven't done much testing since I'll replace the adaptec
 (39160) because it's maxio is limited to 64k (while 53c1020 has 128k).
 sa(4) seems to work just fine with both drives, mt(1) showing Reported
 File/Record Number :-) No EOM tests done so far?

I'm glad it is working well for you!  You can do larger I/O sizes with the
Adaptec by changing your MAXPHYS and DFLTPHYS values in your kernel config
file.  e.g.:

options MAXPHYS=(1024*1024)
options DFLTPHYS=(1024*1024)

If you set those values larger, you won't be able to do more than 132K with
the sym(4) driver on an x86 box.  (It limits the maximum I/O size to 33
segments * PAGE_SIZE.)

 I'll archive zfs streams, therefore I needed some kind of forward error
 correction. Probably people following this thread also have found to
 need this, therefore I'd like to point to the new port misc/vdmfec
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197950
 Perhaps someone want's to take this bug report.

That looks cool. :)  I'm not a ports committer, but hopefully one of them
will pick it up.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: sa(4) driver changes available for test

2015-02-18 Thread Kenneth D. Merry

I have updated the patches.

I have removed the XPT_DEV_ADVINFO changes from the patches to head, since
I committed those separately.

I have (hopefully) fixed the build for the stable/10 patches by MFCing
dependencies.  (One of them mav did for me, thanks!)

Rough draft commit message:

http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt

The patches against FreeBSD/head as of SVN revision 278975:

http://people.freebsd.org/~ken/sa_changes.20150218.1.txt

And (untested) patches against FreeBSD stable/10 as of SVN revision 278974:

http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt

Thanks,

Ken

On Fri, Feb 13, 2015 at 17:32:32 -0700, Kenneth D. Merry wrote:
 
 I have a fairly large set of changes to the sa(4) driver and mt(1) driver
 that I'm planning to commit in the near future.
 
 A description of the changes is here and below in this message.
 
 If you have tape hardware and the inclination, I'd appreciate testing and
 feedback.
 
 
 Rough draft commit message:
 
 http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt
 
 The patches against FreeBSD/head as of SVN revision 278706:
 
 http://people.freebsd.org/~ken/sa_changes.20150213.3.txt
 
 And (untested) patches against FreeBSD stable/10 as of SVN revision 278721.
 
 http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt
 
 
 The intent is to get the tape infrastructure more up to date, so we can
 support LTFS and more modern tape drives:
 
 http://www.ibm.com/systems/storage/tape/ltfs/
 
 I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port depends
 on the patches linked above.  It isn't fully cleaned up and ready for
 redistribution.  If you're interested, though, let me know and I'll tell
 you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 or
 TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and older
 drives don't have the necessary features to support LTFS.
 
 The commit message below outlines most of the changes.
 
 A few comments:
 
 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.
 
 2. The XML output is similar to what GEOM and CTL do.  It would be nice to
figure out how to put a standard schema on it so that standard tools
could read it.  I don't know how feasible that is, since I haven't
time to dig into it.  If anyone has suggestions on whether that is
feasible or advisable, I'd appreciate feedback.
 
 3. I have tested with a reasonable amount of tape hardware (see below for a
list), but more testing and feedback would be good.
 
 4. Standard 'mt status' output looks like this:
 
 # mt -f /dev/nsa3 status  -v
 Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
 -
 Mode  Density  Blocksize  bpi  Compression
 Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
 -
 Current Driver State: at rest.
 -
 Partition:   0  Calc File Number:   0 Calc Record Number: 0
 Residual:0  Reported File Number:   0 Reported Record Number: 0
 Flags: BOP
 
 5. 'mt status -v' looks like this:
 
 # mt -f /dev/nsa3 status  -v
 Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
 -
 Mode  Density  Blocksize  bpi  Compression
 Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
 -
 Current Driver State: at rest.
 -
 Partition:   0  Calc File Number:   0 Calc Record Number: 0
 Residual:0  Reported File Number:   0 Reported Record Number: 0
 Flags: BOP
 -
 Tape I/O parameters:
   Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes
   Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
   Maximum block size supported by tape drive and media (max_blk): 8388608 
 bytes
   Minimum block size supported by tape drive and media (min_blk): 1 bytes
   Block granularity supported by tape drive and media (blk_gran): 0 bytes
   Maximum possible I/O size (max_effective_iosize): 1081344 bytes
 
 6. Existing applications should work without changes.  If not, please let
me know.  Hopefully they will move over time to the new interfaces.
 
 7. There are lots of additional features that could be added later.
Append-only support, encryption, more log pages, etc.
 
 8. I have SCSI READ ATTRIBUTE changes for camcontrol(8) that will go in
separately.  These changes allow displaying the contents of the MAM
(Medium Auxiliary Memory) chips on LTO, TS and other modern tape drives.
These are good, and a future possible direction is adding attributes 
to the status XML from the sa(4) driver.
 
 
 Significant upgrades to sa(4) and mt(1).
 
 The primary focus of these changes is to modernize FreeBSD's
 tape infrastructure

Re: sa(4) driver changes available for test

2015-02-17 Thread Kenneth D. Merry
On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote:
 
  On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote:
  
  
  I have a fairly large set of changes to the sa(4) driver and mt(1) driver
  that I'm planning to commit in the near future.
  
  A description of the changes is here and below in this message.
  
  If you have tape hardware and the inclination, I'd appreciate testing and
  feedback.
 
 I have a DLT 8000 and an SDLT 220.
 
 I don't have anything running current, but I have a spare machine which I 
 could use for testing.
 
 Do you see any value is tests with that hardware? I'd be testing it via 
 Bacula.
 
 disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer.
 

Actually, yes.  Bacula is a bit tricky to configure, so your trying it out
would be helpful if you have the time.

In looking at the manuals for both the SDLT 220 and the DLT 8000, they both
claim to support long position information for the SCSI READ POSITION
command.

You can see what I'm talking about by doing:

mt eod
mt status

On my DDS-4 tape drive, this shows:

# mt -f /dev/nsa3 status
Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY
-
Mode  Density  Blocksize  bpi  Compression
Current:  0x26:DDS-4   1024 bytes 97000enabled (DCLZ)
-
Current Driver State: at rest.
-
Partition:   0  Calc File Number:  -1 Calc Record Number: -1
Residual:0  Reported File Number:  -1 Reported Record Number: -1
Flags: None

But on an LTO-5, which will give long position information, I get:

[root@doc ~]# mt status
Drive: sa0: IBM ULTRIUM-HH5 E4J1
-
Mode  Density  Blocksize  bpi  Compression
Current:  0x58:LTO-5   variable   384607   enabled (0x1)
-
Current Driver State: at rest.
-
Partition:   0  Calc File Number:   2 Calc Record Number: -1
Residual:0  Reported File Number:   2 Reported Record Number: 32373
Flags: None

That, in combination with the changes I made to the position information
code in the driver, mean that even the old MTIOCGET ioctl should return an
accurate file number at end of data.  e.g., on the LTO-5:

[root@doc ~]# mt ostatus
Mode  Density  Blocksize  bpi  Compression
Current:  0x58:LTO-5   variable   384607   0x1
-available modes-
0:0x58:LTO-5   variable   384607   0x1
1:0x58:LTO-5   variable   384607   0x1
2:0x58:LTO-5   variable   384607   0x1
3:0x58:LTO-5   variable   384607   0x1
-
Current Driver State: at rest.
-
File Number: 2  Record Number: -1   Residual Count -1

So the thing to try, in addition to just making sure that Bacula continues
to work properly, is to try setting this for the tape drive in
bacula-sd.conf:

  Hardware End of Medium = yes

It looks like the Bacula tape program (btape) has a test mode, and it would
be good to run through the tests on one of the tape drives and see whether
they work, and whether the results are different before and after the
changes.  I'm not sure how to enable the test mode.

 I'll let the other Bacula devs know about this.  They deal with the hardware. 
  I work on PostgreSQL.
 

Thanks!  If there are additional features they would like out of the tape
driver, I'm happy to talk about it.  (Or help if they'd like to use the new
status reporting ioctl, MTIOCEXTGET or any of the other new ioctls.)

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


sa(4) driver changes available for test

2015-02-13 Thread Kenneth D. Merry

I have a fairly large set of changes to the sa(4) driver and mt(1) driver
that I'm planning to commit in the near future.

A description of the changes is here and below in this message.

If you have tape hardware and the inclination, I'd appreciate testing and
feedback.


Rough draft commit message:

http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt

The patches against FreeBSD/head as of SVN revision 278706:

http://people.freebsd.org/~ken/sa_changes.20150213.3.txt

And (untested) patches against FreeBSD stable/10 as of SVN revision 278721.

http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt


The intent is to get the tape infrastructure more up to date, so we can
support LTFS and more modern tape drives:

http://www.ibm.com/systems/storage/tape/ltfs/

I have ported IBM's LTFS Single Drive Edition to FreeBSD.  The port depends
on the patches linked above.  It isn't fully cleaned up and ready for
redistribution.  If you're interested, though, let me know and I'll tell
you when it is ready to go out.  You need an IBM LTO-5, LTO-6, TS1140 or
TS1150 tape drive.  HP drives aren't supported by IBM's LTFS, and older
drives don't have the necessary features to support LTFS.

The commit message below outlines most of the changes.

A few comments:

1. I'm planning to commit the XPT_DEV_ADVINFO changes separately.

2. The XML output is similar to what GEOM and CTL do.  It would be nice to
   figure out how to put a standard schema on it so that standard tools
   could read it.  I don't know how feasible that is, since I haven't
   time to dig into it.  If anyone has suggestions on whether that is
   feasible or advisable, I'd appreciate feedback.

3. I have tested with a reasonable amount of tape hardware (see below for a
   list), but more testing and feedback would be good.

4. Standard 'mt status' output looks like this:

# mt -f /dev/nsa3 status  -v
Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
-
Mode  Density  Blocksize  bpi  Compression
Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
-
Current Driver State: at rest.
-
Partition:   0  Calc File Number:   0 Calc Record Number: 0
Residual:0  Reported File Number:   0 Reported Record Number: 0
Flags: BOP

5. 'mt status -v' looks like this:

# mt -f /dev/nsa3 status  -v
Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A
-
Mode  Density  Blocksize  bpi  Compression
Current:  0x5a:LTO-6   variable   384607   enabled (0xff)
-
Current Driver State: at rest.
-
Partition:   0  Calc File Number:   0 Calc Record Number: 0
Residual:0  Reported File Number:   0 Reported Record Number: 0
Flags: BOP
-
Tape I/O parameters:
  Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes
  Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes
  Maximum block size supported by tape drive and media (max_blk): 8388608 bytes
  Minimum block size supported by tape drive and media (min_blk): 1 bytes
  Block granularity supported by tape drive and media (blk_gran): 0 bytes
  Maximum possible I/O size (max_effective_iosize): 1081344 bytes

6. Existing applications should work without changes.  If not, please let
   me know.  Hopefully they will move over time to the new interfaces.

7. There are lots of additional features that could be added later.
   Append-only support, encryption, more log pages, etc.

8. I have SCSI READ ATTRIBUTE changes for camcontrol(8) that will go in
   separately.  These changes allow displaying the contents of the MAM
   (Medium Auxiliary Memory) chips on LTO, TS and other modern tape drives.
   These are good, and a future possible direction is adding attributes 
   to the status XML from the sa(4) driver.


Significant upgrades to sa(4) and mt(1).

The primary focus of these changes is to modernize FreeBSD's
tape infrastructure so that we can take advantage of some of the
features of modern tape drives and allow support for LTFS.

Significant changes and new features include:

 o sa(4) driver status and parameter information is now exported via an
   XML structure.  This will allow for changes and improvements later
   on that will not break userland applications.  The old MTIOCGET
   status ioctl remains, so applications using the existing interface
   will not break.

 o 'mt status' now reports drive-reported tape position information
   as well as the previously available calculated tape position
   information.  These numbers will be different at times, because
   the drive-reported block numbers are relative to BOP (Beginning
   of Partition), but the block numbers calculated previously via
   sa(4) (and still 

Re: LSI SAS2008 mps(4) 4TB disk only shows 2TB on CURRENT r255089

2013-09-02 Thread Kenneth D. Merry
On Sat, Aug 31, 2013 at 13:07:53 -0400, Sam Fourman Jr. wrote:
 Hello list
 
 I have two issues that may in fact be both related to the LSI SAS2008 card
 or
 the mps(4) driver.
 
 this server is running FreeBSD 10.0-CURRENT #0 r255089
 
 
 1) All of the SSD disks are showing up at SATA2 300MB's
 but the card is in fact a 6GB Sata3 card..
 
 2) a Westren Digital 4TB disk only shows 2TB (connected to the LSI
 controller)
 
 full dmesg here
 https://gist.github.com/sfourman/6399419

Both problems are because the drives in question are plugged into an mpt(4)
controller, not the mps(4) controller in the system.

The first one is because mpt(4) controllers only support up to 3Gb of
course, but the second one is because mpt(4) controllers don't support
SATA drives over 2TB.  Or more precisely don't let you access the capacity
over 2TB.

Both problems should go away if you plug them into the mps(4) controller.

From the dmesg:

mps0: LSI SAS2008 port 0xe000-0xe0ff mem 
0xfeb3c000-0xfeb3,0xfeb4-0xfeb7 irq 24 at device 0.0 on pci9
mps0: Firmware: 15.00.00.00, Driver: 16.00.00.00-fbsd
mps0: IOCCapabilities: 
1285cScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc
pcib2: ACPI PCI-PCI bridge irq 52 at device 3.0 on pci0
pci8: ACPI PCI bus on pcib2
mpt0: LSILogic SAS/SATA Adapter port 0xd000-0xd0ff mem 
0xfe7ec000-0xfe7e,0xfe7f-0xfe7f irq 28 at device 0.0 on pci8
mpt0: MPI Version=1.5.20.0
[ ...]
da0 at mpt0 bus 0 scbus1 target 1 lun 0
da0: ATA INTEL SSDSC2BB12 0350 Fixed Direct Access SCSI-5 device 
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C)
cd0 at ata0 bus 0 scbus6 target 1 lun 0
cd0: MATSHITA DVD-ROM UJ8B0AC 1.00 Removable CD-ROM SCSI-0 device 
cd0: 150.000MB/s transfers (SATA, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray 
closed
da1 at mpt0 bus 0 scbus1 target 2 lun 0
da1: ATA INTEL SSDSC2BB12 0350 Fixed Direct Access SCSI-5 device 
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C)
da2 at mpt0 bus 0 scbus1 target 3 lun 0
da2: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device 
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C)
da3 at mpt0 bus 0 scbus1 target 4 lun 0
da3: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device 
da3: 300.000MB/s transfers
da3: Command Queueing enabled
da3: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C)
da4 at mpt0 bus 0 scbus1 target 5 lun 0
da4: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device 
da4: 300.000MB/s transfers
da4: Command Queueing enabled
da4: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C)
da5 at mpt0 bus 0 scbus1 target 6 lun 0
da5: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device 
da5: 300.000MB/s transfers
da5: Command Queueing enabled
da5: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C)
da6 at mpt0 bus 0 scbus1 target 8 lun 0
da6: ATA WDC WD4000FYYZ-0 1K01 Fixed Direct Access SCSI-5 device 
da6: 300.000MB/s transfers
da6: Command Queueing enabled
da6: 2097151MB (4294967294 512 byte sectors: 255H 63S/T 267349C)


Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Removing an SDHC card causes a kernel panic on -current

2012-06-27 Thread Kenneth D. Merry
On Wed, Jun 27, 2012 at 10:22:59 -0400, Michael Butler wrote:
 On 06/26/12 22:29, Kenneth D. Merry wrote:
  On Tue, Jun 26, 2012 at 19:41:07 -0400, Benjamin Kaduk wrote:
  On Tue, 26 Jun 2012, Michael Butler wrote:
 
  As follows, in g_disk_providergone, a NULL pointer reference?:
 
  g_disk_providergone() is new in r237518 (by ken); ken cc'd.
  
  Can you try the attached patch to sys/geom/geom_disk.c?
 
 This fixes the panic :-)

Great!  I just committed it.

  Also, do you have full dmesg information for when the panic happened?
  
  It looks like disk_destroy() has already been called in this case, and I
  suppose that's likely to happen for any of the users of the GEOM disk class
  that haven't been updated with the reference count changes I made in da(4).
  (i.e. all of the rest of them.)
  
  Let me know whether this works for you.
 
 All I have is the following leading up to my removal of the card (and
 the restart afterwards):

Thanks!

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Removing an SDHC card causes a kernel panic on -current

2012-06-26 Thread Kenneth D. Merry
On Tue, Jun 26, 2012 at 19:41:07 -0400, Benjamin Kaduk wrote:
 On Tue, 26 Jun 2012, Michael Butler wrote:
 
 As follows, in g_disk_providergone, a NULL pointer reference?:
 
 g_disk_providergone() is new in r237518 (by ken); ken cc'd.

Can you try the attached patch to sys/geom/geom_disk.c?

Also, do you have full dmesg information for when the panic happened?

It looks like disk_destroy() has already been called in this case, and I
suppose that's likely to happen for any of the users of the GEOM disk class
that haven't been updated with the reference count changes I made in da(4).
(i.e. all of the rest of them.)

Let me know whether this works for you.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
 //depot/users/kenm/FreeBSD-test2/sys/geom/geom_disk.c#7 - 
/usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/geom/geom_disk.c 
*** /tmp/tmp.75357.20   Tue Jun 26 20:25:44 2012
--- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/geom/geom_disk.cTue Jun 
26 20:25:29 2012
***
*** 502,507 
--- 502,515 
struct g_disk_softc *sc;
  
sc = (struct g_disk_softc *)pp-geom-softc;
+ 
+   /*
+* If the softc is already NULL, then we've probably been through
+* g_disk_destroy already; there is nothing for us to do anyway.
+*/
+   if (sc == NULL)
+   return;
+ 
dp = sc-dp;
  
if (dp-d_gone != NULL)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: minor GEOM disk API change coming

2012-06-23 Thread Kenneth D. Merry
On Fri, Jun 22, 2012 at 19:50:01 +0300, Alexander Motin wrote:
 Hi.
 
 I understand problem you are going to fix and I think your patch should 
 do it. What I don't very like is addition of new GEOM method. Now GEOM 
 doesn't need it because all internal open/close operations and provider 
 destructions there protected by the topology SX lock. Unluckily that 
 lock doesn't cover g_wither_provider(), called by disk_gone() while 
 holding CAM SIM lock. If not that SIM lock, it would be enough to just 
 grab and drop GEOM topology lock to ensure that no new open() calls will 
 follow. Indirect way to do it could be to post GEOM event that would 
 drop the reference as soon as it will be handled and can obtain the 
 topology lock. Unluckily it uses malloc() for event storage and also can 
 be unreliable if called from under the SIM mutex lock. So it seems many 
 things would be much easier if it was possible to drop SIM lock inside 
 periph invalidate method, but now it is unsafe
 
 That is not an objection, just some thoughts about.

Yeah, there are things in CAM (and GEOM) that need to be cleaned up.  I
wouldn't have added a GEOM method if there were a reasonable way around it,
but as you pointed out, there isn't right now.

I committed the patch, and plan to merge it to stable/9.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: minor GEOM disk API change coming

2012-06-21 Thread Kenneth D. Merry
On Thu, Jun 21, 2012 at 19:53:03 +0400, Andrey V. Elsukov wrote:
 On 21.06.2012 08:29, Kenneth D. Merry wrote:
  Fix a bug which causes a panic in daopen(). The panic is caused by
  a da(4) instance going away while GEOM is still probing it.
  
  In this case, the GEOM disk class instance has been created by
  disk_create(), and the taste of the disk is queued in the GEOM
  event queue.
  
  While that event is queued, the da(4) instance goes away.  When the
  open call comes into the da(4) driver, it dereferences the freed
  (but non-NULL) peripheral pointer provided by GEOM, which results
  in a panic.
 
 I think this situation is very specific for the GEOM_DISK class, and
 this callback will be less useful for other classes.
 Does g_cancel_event() cannot help you prevent tasting?

Calling g_cancel_event(), for instance from disk_gone(), would not
completely close the race condition.  It can't cancel an event that is
already in progress, and it is possible for the peripheral to go away while
the event is marked in progress but before the taste gets far enough into
daopen() to acquire a reference to the peripheral.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: minor GEOM disk API change coming

2012-06-21 Thread Kenneth D. Merry
On Thu, Jun 21, 2012 at 23:58:10 +0400, Andrey V. Elsukov wrote:
 On 21.06.2012 20:48, Kenneth D. Merry wrote:
In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.
 
  I think this situation is very specific for the GEOM_DISK class, and
  this callback will be less useful for other classes.
  Does g_cancel_event() cannot help you prevent tasting?
  
  Calling g_cancel_event(), for instance from disk_gone(), would not
  completely close the race condition.  It can't cancel an event that is
  already in progress, and it is possible for the peripheral to go away while
  the event is marked in progress but before the taste gets far enough into
  daopen() to acquire a reference to the peripheral.
 
 If i understand correctly your patch, you acquires a reference to the
 periph and release it when g_destroy_provider finished. What if you will
 queue some custom event from the disk_gone() that will call
 cddiskgonecb()? Does it close the race? This event will be executed
 after the taste completes.

That still would not close the race.  It would still be possible for
another context to come along and open the device at any point.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


minor GEOM disk API change coming

2012-06-20 Thread Kenneth D. Merry
Hi folks,

I have attached some patches that fix an object lifetime issue between CAM
and GEOM.

Fixing the bug required adding a callback to the GEOM disk code, and adding
a callback that a GEOM class can register to get notified when a provider
is destroyed.

The probable commit message is below.

If I don't hear any objections, I will commit it on Friday, June 22nd.


Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up.  This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c:  In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().

Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.

In the cd(4) driver, clean up open and close
behavior slightly.  GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.

In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.

geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.

geom_disk.h:Add a new d_gone() callback to the GEOM disk
interface.

Bump the DISK_VERSION to version 2.  This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.

geom_disk.c:Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.

Bump the DISK_VERSION to 2.

geom_subr.c:In g_destroy_provider(), call the providergone
callback if it has been provided.

In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.

disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.


Ken
-- 
Kenneth Merry
k...@freebsd.org
 //depot/users/kenm/FreeBSD-test/share/man/man9/disk.9#1 - 
/usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9 
*** /tmp/tmp.81866.21   Wed Jun 20 22:19:20 2012
--- /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9Wed Jun 
20 21:30:45 2012
***
*** 145,150 
--- 145,160 
  .Xr dumpon 8 ,
  this function is invoked from a very restricted system state after a
  kernel panic to record a copy of the system RAM to the disk.
+ .It Vt disk_getattr_t * Va d_getattr
+ Optional: if this method is provided, it gives the disk driver the
+ opportunity to override the default GEOM response to BIO_GETATTR requests.
+ This function should return -1 if the attribute is not handled, 0 if the
+ attribute is handled, or an errno to be passed to g_io_deliver().
+ .It Vt disk_gone_t * Va d_gone
+ Optional: if this method is provided, it will be called after disk_gone()
+ is called, once GEOM has finished its cleanup process.
+ Once this callback is called, it is safe for the disk driver to free all of
+ its resources, as it will not be receiving further calls from GEOM.
  .El
  .Ss Mandatory Media Properties
  The following fields identify the size 

Re: LSI supported mps(4) driver available

2012-03-27 Thread Kenneth D. Merry
On Tue, Mar 27, 2012 at 23:50:31 +1030, Matt Thyer wrote:
 On 26 March 2012 23:55, Gary Palmer gpal...@freebsd.org wrote:
 
  On Mon, Mar 26, 2012 at 08:05:59PM +1030, Matt Thyer wrote:
   On Mar 26, 2012 3:43 AM, Garrett Cooper yaneg...@gmail.com wrote:
   
On Sun, Mar 25, 2012 at 5:16 AM, Matt Thyer matt.th...@gmail.com
  wrote:
 Has this driver been MFC to 8-STABLE yet ?

 I'm asking because I updated my NAS on the 4th of March from 8-STABLE
 r225723 to r232477 and am now seeing 157,000 interrupts per second on
   irq
 16 where my SuperMicro AOC-USAS2-L8i resides (this card uses the LSI
 SAS2008 chip).
 
 
 [snip]
 
 
   After encountering this problem I updated my firmware from phase 7 to
  phase
   11 but this did not fix things.
  
   My question is: Is the LSI driver even in 8-STABLE yet?.
  
   If not I'll upgrade to 9-STABLE to get the new driver.
  
   If it is, then I want to downgrade to just before it came in to see if
  this
   high interrupt rate problem is fixed.
 
  I'm no export in svn, however:
 
  http://svnweb.freebsd.org/base?view=revisionamp;revision=230922
 
  would appear to suggest that the new driver is in 8-Stable
 
  Gary
 
 
 It's painful to take this system back to r230921 due to intolerance for
 downtime from it's users so I'd like to investigate the cause of the
 problem and try patches/sysctls/whatever first.
 
 The drives I'm using are 7 x WDC WD20EARS-00M (3 are AB50, 4 are AB51) and
 1 x WD20EARX-00P AB51.
 The WD20EARX-00P AB51 is a SATA 3 (6 Gbps) drive but the others are all
 SATA 2 (3 Gbps).
 
 I know the driver doesn't like mixed speeds in IR mode but I'm flashed with
 IT firmware as ZFS is doing my RAID (raidz2).
 
 I was having problems with the WD20EARX-00P AB51 drive being faulted by ZFS
 until I updated the firmware to 11 and now ZFS is happy (I've also done a
 full extended drive SMART test and the drive is fine).
 
 So what do people suggest (before reversion to r230921) ?

If you're going to prove that it's the new LSI driver, you will probably
have to go back to the old driver.

You don't have to back out your entire tree, you can just back out the
driver itself if you have an SVN tree.  You can go into sys/dev/mps and do:

svn update -r 230714

And then edit sys/conf/files and comment out these three lines:

dev/mps/mps_config.coptional mps
dev/mps/mps_mapping.c   optional mps
dev/mps/mps_sas_lsi.c   optional mps

Then you should be able to rebuild your kernel with the old driver and see
if the problem occurs again.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: LSI supported mps(4) driver available

2012-02-07 Thread Kenneth D. Merry
On Tue, Feb 07, 2012 at 21:00:28 +0530, Desai, Kashyap wrote:
 Can you to reproduce issue with below mentioned changes..
 
 In mps.c 
 
 mps_get_tunables(struct mps_softc *sc)
 {
 char tmpstr[80];
  
 /* XXX default to some debugging for now */
 sc-mps_debug = MPS_FAULT;
 
 Instead of above line make
 sc-mps_debug = 0xd;

You can also put the following in /boot/loader.conf:

hw.mps.debug_level=0xd

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: LSI supported mps(4) driver available

2012-01-25 Thread Kenneth D. Merry
On Wed, Jan 25, 2012 at 20:47:37 -0800, Dennis Glatting wrote:
 On Fri, 2012-01-20 at 13:44 -0700, Kenneth D. Merry wrote:
  The LSI-supported version of the mps(4) driver that supports their 6Gb SAS
  HBAs as well as WarpDrive controllers, is available here:
  
  http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt
  
  I plan to check it in to head next week, and then MFC it into stable/9 a
  week after that most likely.
  
  Please test it out and let me know if you run into any problems.
  
  In addition to supporting WarpDrive, the driver also supports Integrated
  RAID.
  
  Thanks to LSI for doing the work on this driver!
  
 
 Does this include the SAS2008 series chips? I have two systems, one a
 Tyan FT48-B8812 with a S8812 MB and Interlagos chips, where I am
 interested in using a driver under 9.0 amd64.

Yes.  The driver in 9.0 supports the 2008 as well.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Firewire disk/tape access stopped working after recent CAM commit

2012-01-24 Thread Kenneth D. Merry
On Tue, Jan 24, 2012 at 00:03:56 -0600, Richard Todd wrote:
 On Mon, Jan 23, 2012 at 11:16:05AM -0700, Kenneth D. Merry wrote:
  If you can, please try the attached patch and see if it has any impact on
  the problem.  There is a bug in that commit in that we shouldn't be
  invalidating all LUNs on a target when we get a status of
  CAM_DEV_NOT_THERE.
 
 Just applied the patch, built new kernel, and rebooted, and all the FW
 drivees are showing up now.  Thanks!

Great!

  It may be that we need to do a more thorough audit of how various SIM
  drivers are using the CAM_DEV_NOT_THERE status.
 
 So I take it the layers for the different hardware (SCSI, FW, USB,
 ATA/AHCI) are handling this status differently, so that's why this bug only
 showed up on the Firewire buses but not on ATA/AHCI, USB, or (on my other 
 machine) SCSI buses? 

Yes.  Some drivers report a selection timeout, some report that the device
isn't there, and they may be using those status values in different
situations.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Firewire disk/tape access stopped working after recent CAM commit

2012-01-23 Thread Kenneth D. Merry
On Sun, Jan 22, 2012 at 20:52:38 -0600, Richard Todd wrote:
 Hi.  I tried upgrading my amd64 10-CURRENT box to the most recent -CURRENT 
 code
 and found that the new kernel couldn't find my two disks and tape drive that
 are on a Firewire bus.  All the USB and AHCI-attached hardware still showed
 up okay, it's just the Firewire stuff that failed to show up properly on boot.
 Spent today doing binary search to find the responsible commit and it looks
 to be this one: 
 
   r23 | ken | 2012-01-11 18:41:48 -0600 (Wed, 11 Jan 2012) | 72 lines
 
   Fix a race condition in CAM peripheral free handling, locking
   in the CAM XPT bus traversal code, and a number of other periph level
   issues.
 
 Not sure what in this commit triggers the problem, or why it just hits 
 Firewire and not the rest of the system.   I've built kernels both right
 before and right after the r23 commit, with CAM debugging turned on real
 high on the firewire bus in question, bus 0 (hardwired to that number in
 device.hints, if that matters)
 
  options CAMDEBUG
  options CAM_DEBUG_BUS=0
  options CAM_DEBUG_TARGET=-1
  options CAM_DEBUG_LUN=-1
  options CAM_DEBUG_FLAGS=CAM_DEBUG_INFO|CAM_DEBUG_TRACE|CAM_DEBUG_CDB
 
 and got dmesgs of both the bad (r23) and good (pre-r23) kernels,
 which I've put online at http://ln.servalan.com/rmtodd/bug1/dmesg.bad and
 http://ln.servalan.com/rmtodd/bug1/dmesg.good, respectively.  They're a bit
 lengthy, what with all that debug info.  Grepping out the info for one of
 the targets (disk 0, sbp0:0:0:0) and just looking at the lines for that one,
 we see that the good kernel does a lot more with that target, starting
 with the (noperiph:sbp0:0:0:0): xpt_compile_path bit, that the bad
 kernel doesn't do, as seen in the diff below. 
 
 Not sure what's going on here, but if anyone has suggestions on more things
 I can test/debug code I can add to track this down further, let me know.

Thanks for testing this out, and for sending all of the debugging output!

If you can, please try the attached patch and see if it has any impact on
the problem.  There is a bug in that commit in that we shouldn't be
invalidating all LUNs on a target when we get a status of
CAM_DEV_NOT_THERE.

It may be that we need to do a more thorough audit of how various SIM
drivers are using the CAM_DEV_NOT_THERE status.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
 //depot/users/kenm/FreeBSD-test2/sys/cam/cam_periph.c#7 - 
/usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/cam/cam_periph.c 
*** /tmp/tmp.87992.13   Mon Jan 23 11:11:36 2012
--- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/cam/cam_periph.cMon Jan 
23 10:53:13 2012
***
*** 1864,1876 
case CAM_DEV_NOT_THERE:
{
struct cam_path *newpath;
  
error = ENXIO;
/* Should we do more if we can't create the path?? */
if (xpt_create_path(newpath, periph,
xpt_path_path_id(ccb-ccb_h.path),
xpt_path_target_id(ccb-ccb_h.path),
!   CAM_LUN_WILDCARD) != CAM_REQ_CMP) 
break;
  
/*
--- 1864,1889 
case CAM_DEV_NOT_THERE:
{
struct cam_path *newpath;
+   lun_id_t lun_id;
  
error = ENXIO;
+ 
+   /*
+* For a selection timeout, we consider all of the LUNs on
+* the target to be gone.  If the status is CAM_DEV_NOT_THERE,
+* then we only get rid of the device(s) specified by the
+* path in the original CCB.
+*/
+   if (status == CAM_DEV_NOT_THERE)
+   lun_id = xpt_path_lun_id(ccb-ccb_h.path);
+   else
+   lun_id = CAM_LUN_WILDCARD;
+ 
/* Should we do more if we can't create the path?? */
if (xpt_create_path(newpath, periph,
xpt_path_path_id(ccb-ccb_h.path),
xpt_path_target_id(ccb-ccb_h.path),
!   lun_id) != CAM_REQ_CMP) 
break;
  
/*
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

LSI supported mps(4) driver available

2012-01-20 Thread Kenneth D. Merry

The LSI-supported version of the mps(4) driver that supports their 6Gb SAS
HBAs as well as WarpDrive controllers, is available here:

http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt

I plan to check it in to head next week, and then MFC it into stable/9 a
week after that most likely.

Please test it out and let me know if you run into any problems.

In addition to supporting WarpDrive, the driver also supports Integrated
RAID.

Thanks to LSI for doing the work on this driver!

I have added a number of other infrastructure changes that are necessary
for the driver, and here is a brief summary:

 - A new Advanced Information buffer is now added to the EDT for drives
   that support READ CAPACITY (16).  The da(4) driver updates this buffer
   when it grabs new read capacity data from a drive.
 - The mps(4) driver will look for Advanced Information state change async
   events, and updates its table of drives with protection information
   turned on accordingly.
 - The size of struct scsi_read_capacity_data_long has been bumped up to
   the amount specified in the latest SBC-3 draft.  The hope is to avoid
   some future structure size bumps with that change.  The API for
   scsi_read_capacity_16() has been changed to add a length argument.
   Hopefully this will future-proof it somewhat.
 - __FreeBSD_version bumped for the addition of the Advanced Information
   buffer with the read capacity information.  The mps(4) driver has a
   kludgy way of getting the information on versions of FreeBSD without
   this change.

I believe that the CAM API changes are mild enough and beneficial enough
for a merge into stable/9, but they are intertwined with the unmap changes
in the da(4) driver, so those changes will have to go back to stable/9 as
well in order to MFC the full set of changes.

Otherwise it'll just be the driver that gets merged into stable/9, and
it'll use the kludgy method of getting the read capacity data for each
drive.

A couple of notes about issues with this driver:

 - Unlike the current mps(4) driver, it probes sequentially.  If you have a
   lot of drives in your system, it will take a while to probe them all.
 - You may see warning messages like this:

_mapping_add_new_device: failed to add the device with handle 0x0019 to persiste
nt table because there is no free space available
_mapping_add_new_device: failed to add the device with handle 0x001a to persiste
nt table because there is no free space available

 - The driver is not endian safe.  (It assumes a little endian machine.)
   This is not new, the driver in the tree has the same issue.

The LSI folks know about these issues.  The driver has passed their testing
process.

Many thanks to LSI for going through the effort to support FreeBSD.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: LSI supported mps(4) driver available

2012-01-20 Thread Kenneth D. Merry
On Fri, Jan 20, 2012 at 12:53:04 -0800, Freddie Cash wrote:
 On Fri, Jan 20, 2012 at 12:44 PM, Kenneth D. Merry k...@freebsd.org wrote:
  The LSI-supported version of the mps(4) driver that supports their 6Gb SAS
  HBAs as well as WarpDrive controllers, is available here:
 
 Just to clarify, this will replace the existing mps(4) driver in
 FreeBSD 10-CURRENT and 9-STABLE?

That is correct.

 So there won't be mps(4) (FreeBSD driver) and mpslsi(4) (LSI driver)
 anymore?  Just mps(4)?

Right.  Just mps(4), which will be the LSI driver.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: LSI supported mps(4) driver available

2012-01-20 Thread Kenneth D. Merry
On Fri, Jan 20, 2012 at 23:14:20 -, Steven Hartland wrote:
 - Original Message - 
 From: Kenneth D. Merry k...@freebsd.org
 To: freebsd-s...@freebsd.org; freebsd-current@freebsd.org
 Sent: Friday, January 20, 2012 8:44 PM
 Subject: LSI supported mps(4) driver available
 
 
 
 The LSI-supported version of the mps(4) driver that supports their 6Gb SAS
 HBAs as well as WarpDrive controllers, is available here:
 
 http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt
 
 I plan to check it in to head next week, and then MFC it into stable/9 a
 week after that most likely.
 
 Great to see this being done, thanks to everyone! Be even better to see
 this MFC'ed to 8.x as well if all goes well. Do you think this will
 possible?

Yes, that should be doable as well.  It's unlikely that all of the CAM
changes will get merged back, but the driver itself shouldn't be a problem.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ctlstat not building with clang

2012-01-12 Thread Kenneth D. Merry
On Thu, Jan 12, 2012 at 14:59:11 -0600, Dan McGregor wrote:
 Building world with clang now (as of r229997) no longer compiles because
 ctlstat was imported into the tree.  The error is:
 
 clang -O2 -pipe  -I/usr/src/usr.bin/ctlstat/../../sys -std=gnu99
 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W
 -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes
 -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow
 -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs
 -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c
 /usr/src/usr.bin/ctlstat/ctlstat.c
 /usr/src/usr.bin/ctlstat/ctlstat.c:149:35: error: format string is not a
 string literal (potentially insecure)
   [-Werror,-Wformat-security]
 fprintf(error ? stderr : stdout, ctlstat_usage);
  ^
 1 error generated.
 *** Error code 1
 
 Stop in /usr/src/usr.bin/ctlstat
 
 How do people feel about the attached patch that turns a call to fprintf to
 fputs?

Looks fine, I just committed it.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CAM Target Layer available

2012-01-11 Thread Kenneth D. Merry
On Wed, Jan 04, 2012 at 21:53:11 -0700, Kenneth D. Merry wrote:
 
 The CAM Target Layer (CTL) is now available for testing.  I am planning to
 commit it to to head next week, barring any major objections.
 
 CTL is a disk and processor device emulation subsystem originally written
 for Copan Systems under Linux starting in 2003.  It has been shipping in
 Copan (now SGI) products since 2005.
 
 It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
 (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
 available under a BSD-style license.  The intent behind the agreement was
 that Spectra would work to get CTL into the FreeBSD tree.
 
 The patches are against FreeBSD/head as of SVN change 229516 and are
 located here:
 
 http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz
 
 The code is not perfect (few pieces of software are), but is in good
 shape from a functional standpoint.  My intent is to get it out there for
 other folks to use, and perhaps help with improvements.
 
 There are a few other CAM changes included with these diffs, some of which
 will be committed separately from CTL, some concurrently.  This is a quick
 summary:
 
  - Fix a panic in the da(4) driver when a drive disappears on boot.
  - Fix locking in the CAM EDT traversal code.
  - Add an optional sysctl/tunable (disabled by default) to suppress
duplicate devices.  This most frequently shows up with dual ported SAS
drives.
  - Add some very basic error injection into the da(4) driver.
  - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with
more recent SCSI specs.
 
 CTL Features:
 
 
  - Disk and processor device emulation.
  - Tagged queueing
  - SCSI task attribute support (ordered, head of queue, simple tags)
  - SCSI implicit command ordering support.  (e.g. if a read follows a mode
select, the read will be blocked until the mode select completes.)
  - Full task management support (abort, LUN reset, target reset, etc.)
  - Support for multiple ports
  - Support for multiple simultaneous initiators
  - Support for multiple simultaneous backing stores
  - Persistent reservation support
  - Mode sense/select support
  - Error injection support
  - High Availability support (1)
  - All I/O handled in-kernel, no userland context switch overhead.
 
 (1) HA Support is just an API stub, and needs much more to be fully
 functional.  See the to-do list below.
 
 Configuring and Running CTL:
 ===
 
  - After applying the CTL patchset to your tree, build world and install it
on your target system.
 
  - Add 'device ctl' to your kernel configuration file.
 
  - If you're running with a 8Gb or 4Gb Qlogic FC board, add
'options ISP_TARGET_MODE' to your kernel config file.  'device ispfw'
or loading the ispfw module is also recommended.
 
  - Rebuild and install a new kernel.
 
  - Reboot with the new kernel.
 
  - To add a LUN with the RAM disk backend:
 
   ctladm create -b ramdisk -s 10485760
   ctladm port -o on
 
  - You should now see the CTL disk LUN through camcontrol devlist:
 
 scbus6 on ctl2cam0 bus 0:
 FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32)
  at scbus6 target -1 lun -1 ()
 
This is visible through the CTL CAM SIM.  This allows using CTL without
any physical hardware.  You should be able to issue any normal SCSI
commands to the device via the pass(4)/da(4) devices.
 
If any target-capable HBAs are in the system (e.g. isp(4)), and have
target mode enabled, you should now also be able to see the CTL LUNs via
that target interface.
 
Note that all CTL LUNs are presented to all frontends.  There is no
LUN masking, or separate, per-port configuration.
 
  - Note that the ramdisk backend is a fake ramdisk.  That is, it is
backed by a small amount of RAM that is used for all I/O requests.  This
is useful for performance testing, but not for any data integrity tests.
 
  - To add a LUN with the block/file backend:
 
   truncate -s +1T myfile
   ctladm create -b block -o file=myfile
   ctladm port -o on
 
  - You can also see a list of LUNs and their backends like this:
 
 # ctladm devlist
 LUN Backend   Size (Blocks)   BS Serial NumberDevice ID   
   0 block2147483648  512 MYSERIAL   0 MYDEVID   0 
   1 block2147483648  512 MYSERIAL   1 MYDEVID   1 
   2 block2147483648  512 MYSERIAL   2 MYDEVID   2 
   3 block2147483648  512 MYSERIAL   3 MYDEVID   3 
   4 block2147483648  512 MYSERIAL   4 MYDEVID   4 
   5 block2147483648  512 MYSERIAL   5 MYDEVID   5 
   6 block2147483648  512 MYSERIAL   6 MYDEVID   6 
   7 block2147483648  512 MYSERIAL   7 MYDEVID   7 
   8 block2147483648  512 MYSERIAL   8

CAM Target Layer available

2012-01-04 Thread Kenneth D. Merry

The CAM Target Layer (CTL) is now available for testing.  I am planning to
commit it to to head next week, barring any major objections.

CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003.  It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license.  The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

The patches are against FreeBSD/head as of SVN change 229516 and are
located here:

http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz

The code is not perfect (few pieces of software are), but is in good
shape from a functional standpoint.  My intent is to get it out there for
other folks to use, and perhaps help with improvements.

There are a few other CAM changes included with these diffs, some of which
will be committed separately from CTL, some concurrently.  This is a quick
summary:

 - Fix a panic in the da(4) driver when a drive disappears on boot.
 - Fix locking in the CAM EDT traversal code.
 - Add an optional sysctl/tunable (disabled by default) to suppress
   duplicate devices.  This most frequently shows up with dual ported SAS
   drives.
 - Add some very basic error injection into the da(4) driver.
 - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with
   more recent SCSI specs.

CTL Features:


 - Disk and processor device emulation.
 - Tagged queueing
 - SCSI task attribute support (ordered, head of queue, simple tags)
 - SCSI implicit command ordering support.  (e.g. if a read follows a mode
   select, the read will be blocked until the mode select completes.)
 - Full task management support (abort, LUN reset, target reset, etc.)
 - Support for multiple ports
 - Support for multiple simultaneous initiators
 - Support for multiple simultaneous backing stores
 - Persistent reservation support
 - Mode sense/select support
 - Error injection support
 - High Availability support (1)
 - All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
functional.  See the to-do list below.

Configuring and Running CTL:
===

 - After applying the CTL patchset to your tree, build world and install it
   on your target system.

 - Add 'device ctl' to your kernel configuration file.

 - If you're running with a 8Gb or 4Gb Qlogic FC board, add
   'options ISP_TARGET_MODE' to your kernel config file.  'device ispfw'
   or loading the ispfw module is also recommended.

 - Rebuild and install a new kernel.

 - Reboot with the new kernel.

 - To add a LUN with the RAM disk backend:

ctladm create -b ramdisk -s 10485760
ctladm port -o on

 - You should now see the CTL disk LUN through camcontrol devlist:

scbus6 on ctl2cam0 bus 0:
FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32)
 at scbus6 target -1 lun -1 ()

   This is visible through the CTL CAM SIM.  This allows using CTL without
   any physical hardware.  You should be able to issue any normal SCSI
   commands to the device via the pass(4)/da(4) devices.

   If any target-capable HBAs are in the system (e.g. isp(4)), and have
   target mode enabled, you should now also be able to see the CTL LUNs via
   that target interface.

   Note that all CTL LUNs are presented to all frontends.  There is no
   LUN masking, or separate, per-port configuration.

 - Note that the ramdisk backend is a fake ramdisk.  That is, it is
   backed by a small amount of RAM that is used for all I/O requests.  This
   is useful for performance testing, but not for any data integrity tests.

 - To add a LUN with the block/file backend:

truncate -s +1T myfile
ctladm create -b block -o file=myfile
ctladm port -o on

 - You can also see a list of LUNs and their backends like this:

# ctladm devlist
LUN Backend   Size (Blocks)   BS Serial NumberDevice ID   
  0 block2147483648  512 MYSERIAL   0 MYDEVID   0 
  1 block2147483648  512 MYSERIAL   1 MYDEVID   1 
  2 block2147483648  512 MYSERIAL   2 MYDEVID   2 
  3 block2147483648  512 MYSERIAL   3 MYDEVID   3 
  4 block2147483648  512 MYSERIAL   4 MYDEVID   4 
  5 block2147483648  512 MYSERIAL   5 MYDEVID   5 
  6 block2147483648  512 MYSERIAL   6 MYDEVID   6 
  7 block2147483648  512 MYSERIAL   7 MYDEVID   7 
  8 block2147483648  512 MYSERIAL   8 MYDEVID   8 
  9 block2147483648  512 MYSERIAL   9 MYDEVID   9 
 10 block2147483648  512 MYSERIAL  10 MYDEVID  10 
 11 block 

CAM Target Layer available

2012-01-04 Thread Kenneth D. Merry

The CAM Target Layer (CTL) is now available for testing.  I am planning to
commit it to to head next week, barring any major objections.

CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003.  It has been shipping in
Copan (now SGI) products since 2005.

It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license.  The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.

The attached patches are against FreeBSD/head as of SVN change 229516.
They are also located here:

http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz

The code is not perfect (few pieces of software are), but is in good
shape from a functional standpoint.  My intent is to get it out there for
other folks to use, and perhaps help with improvements.

There are a few other CAM changes included with these diffs, some of which
will be committed separately from CTL, some concurrently.  This is a quick
summary:

 - Fix a panic in the da(4) driver when a drive disappears on boot.
 - Fix locking in the CAM EDT traversal code.
 - Add an optional sysctl/tunable (disabled by default) to suppress
   duplicate devices.  This most frequently shows up with dual ported SAS
   drives.
 - Add some very basic error injection into the da(4) driver.
 - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with
   more recent SCSI specs.

CTL Features:


 - Disk and processor device emulation.
 - Tagged queueing
 - SCSI task attribute support (ordered, head of queue, simple tags)
 - SCSI implicit command ordering support.  (e.g. if a read follows a mode
   select, the read will be blocked until the mode select completes.)
 - Full task management support (abort, LUN reset, target reset, etc.)
 - Support for multiple ports
 - Support for multiple simultaneous initiators
 - Support for multiple simultaneous backing stores
 - Persistent reservation support
 - Mode sense/select support
 - Error injection support
 - High Availability support (1)
 - All I/O handled in-kernel, no userland context switch overhead.

(1) HA Support is just an API stub, and needs much more to be fully
functional.  See the to-do list below.

Configuring and Running CTL:
===

 - After applying the CTL patchset to your tree, build world and install it
   on your target system.

 - Add 'device ctl' to your kernel configuration file.

 - If you're running with a 8Gb or 4Gb Qlogic FC board, add
   'options ISP_TARGET_MODE' to your kernel config file.  'device ispfw'
   or loading the ispfw module is also recommended.

 - Rebuild and install a new kernel.

 - Reboot with the new kernel.

 - To add a LUN with the RAM disk backend:

ctladm create -b ramdisk -s 10485760
ctladm port -o on

 - You should now see the CTL disk LUN through camcontrol devlist:

scbus6 on ctl2cam0 bus 0:
FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32)
 at scbus6 target -1 lun -1 ()

   This is visible through the CTL CAM SIM.  This allows using CTL without
   any physical hardware.  You should be able to issue any normal SCSI
   commands to the device via the pass(4)/da(4) devices.

   If any target-capable HBAs are in the system (e.g. isp(4)), and have
   target mode enabled, you should now also be able to see the CTL LUNs via
   that target interface.

   Note that all CTL LUNs are presented to all frontends.  There is no
   LUN masking, or separate, per-port configuration.

 - Note that the ramdisk backend is a fake ramdisk.  That is, it is
   backed by a small amount of RAM that is used for all I/O requests.  This
   is useful for performance testing, but not for any data integrity tests.

 - To add a LUN with the block/file backend:

truncate -s +1T myfile
ctladm create -b block -o file=myfile
ctladm port -o on

 - You can also see a list of LUNs and their backends like this:

# ctladm devlist
LUN Backend   Size (Blocks)   BS Serial NumberDevice ID   
  0 block2147483648  512 MYSERIAL   0 MYDEVID   0 
  1 block2147483648  512 MYSERIAL   1 MYDEVID   1 
  2 block2147483648  512 MYSERIAL   2 MYDEVID   2 
  3 block2147483648  512 MYSERIAL   3 MYDEVID   3 
  4 block2147483648  512 MYSERIAL   4 MYDEVID   4 
  5 block2147483648  512 MYSERIAL   5 MYDEVID   5 
  6 block2147483648  512 MYSERIAL   6 MYDEVID   6 
  7 block2147483648  512 MYSERIAL   7 MYDEVID   7 
  8 block2147483648  512 MYSERIAL   8 MYDEVID   8 
  9 block2147483648  512 MYSERIAL   9 MYDEVID   9 
 10 block2147483648  512 MYSERIAL  10 MYDEVID  

Re: SCSI descriptor sense changes, testing needed

2011-10-03 Thread Kenneth D. Merry

This has been committed to head, and the plan is to get it into stable/9
in time for 9.0.

Please let me know if you run into any problems with the changes.

Thanks,

Ken

On Fri, Sep 30, 2011 at 23:19:14 -0600, Kenneth D. Merry wrote:
 
 I have attached a new version of the patches, with a number of changes.
 
 One issue that has cropped up is that the previous sense code and my new
 descriptor sense changes never paid any attention to the actual length of
 the sense data returned by the controller.
 
 I have changed all of the error recovery code and sense printing code to
 honor the sense data length in the CAM CCB.
 
 One other problem related to that is that many controller drivers don't set
 the sense residual field in struct ccb_scsiio properly, or don't set it at
 all.  This patch includes changes to the isp, mps, mpt, umass, and ciss
 drivers to set the sense_resid field properly.
 
 There are lots of other drivers in the system, however, that haven't been
 audited, and may or may not set the sense residual correctly.
 
 I also fixed an issue reported by Fabian Keil that showed up with the ahci
 driver.  In reverting a change I have in my local tree to switch to a 2
 byte length field in the SCSI inquiry CDB, I accidently shortened the CDB
 to 5 bytes.  Oops.
 
 I'd really appreciate more feedback; Fabian is the only person to report
 testing the previous patch.
 
 Thanks,
 
 Ken
 
 On Thu, Sep 22, 2011 at 13:33:05 -0600, Kenneth D. Merry wrote:
  
  I have attached a set of patches against head that implement SCSI
  descriptor sense support for CAM.
  
  Descriptor sense is a new sense (SCSI error) format introduced in the SPC-3
  spec in 2006.  FreeBSD doesn't currently support it.
  
  Seagate's new 3TB SAS drives come with descriptor sense enabled by default,
  and it's possible that other newer drives do as well.  Because all the
  sense key, additional sense code, and additional sense code qualifier
  fields are in different places, the CAM error recovery code will not do the
  right thing when it gets descriptor sense.
  
  These patches do bump up the size of struct scsi_sense_data, and so I have
  incremented CAM_VERSION as well.  I have discussed this with re@, and it
  looks like we'll be putting the changes in before 9.0, so it ships with
  support for newer SCSI devices.
  
  A number of things have changed in these patches, but in particular, it
  would be good to test the following:
  
   - The sa(4) (SCSI tape) driver.  The residual handling code, which looks
 at the sense data, has changed.
   - The Playstation 3 CDROM driver.
   - Firewire target mode.
   - umass devices with the NO_INQUIRY_EVPD quirk.
  
  Also, please let me know if you see any anomalies with the sense printing
  code.  In the common cases the output should look identical to the old
  code, but in some cases it will be a little different.  e.g.:
  
  # camcontrol inquiry da40 -v
  pass47: SEAGATE ST33000650SS 0002 Fixed Direct Access SCSI-6 device
  pass47: Serial Number 9XK0GAJ7S125XDNU
  pass47: 300.000MB/s transfers, Command Queueing Enabled
  
  (Seagate 3TB drive)
  
  # camcontrol modepage da40 -m 10 |grep D_SENSE
  D_SENSE:  1
  
  (Descriptor sense is enabled)
  
  # camcontrol modepage da40 -m 15 -v
  (pass47:mps1:0:47:0): MODE SENSE(6). CDB: 1a 0 4f 0 ff 0 
  (pass47:mps1:0:47:0): CAM status: SCSI Status Error
  (pass47:mps1:0:47:0): SCSI status: Check Condition
  (pass47:mps1:0:47:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field 
  in CDB)
  (pass47:mps1:0:47:0): Field Replaceable Unit: 1
  (pass47:mps1:0:47:0): Command byte 2 bit 5 is invalid
  (pass47:mps1:0:47:0): Descriptor 0x80: 00 00 00 00 00 00 00 00 00 00 00 00 
  00 00
  camcontrol: error sending mode sense command
  
  (The FRU and Sense Key Specific entries are on separate lines, and a
  vendor-specific sense descriptor is printed out in hex format.)
  
  Anyway, I'd appreciate any testing and feedback on these changes.  As I
  said, they will probably be in 9.0, so if there are any issues it would be
  better to find them now. :)
  
  Thanks,
  
  Ken
  -- 
  Kenneth Merry
  k...@freebsd.org
 -- 
 Kenneth Merry
 k...@freebsd.org
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SCSI descriptor sense changes, testing needed

2011-09-30 Thread Kenneth D. Merry
On Tue, Sep 27, 2011 at 21:46:03 +0200, Fabian Keil wrote:
 Kenneth D. Merry k...@freebsd.org wrote:
 
  On Sat, Sep 24, 2011 at 21:27:22 +0200, Fabian Keil wrote:
   Kenneth D. Merry k...@freebsd.org wrote:
   
I have attached a set of patches against head that implement SCSI
descriptor sense support for CAM.
   
Anyway, I'd appreciate any testing and feedback on these changes.  As I
said, they will probably be in 9.0, so if there are any issues it would
be better to find them now. :)
   
   I've been using the patch on a ThinkPad R500 since yesterday and
   just reverted it today again to get my kernel closer to HEAD before
   looking into some (probably unrelated) panics.
   
   I didn't notice it while using the patch, but it looks like the
   kernel wasn't able to pick up cd0 anymore:
  
  Hmm.  I don't think any of the changes would have caused this, but
  evidently something did...
  
  Let's see if we can debug it...
  
  I have attached a patch to add some debugging output, and I see at least
  one interesting thing in the logs below.
  
  Can you re-apply the descriptor sense patch, and then try the attached
  debugging patch as well?
 
 Sure.

I believe this is fixed with my latest set of patches.  Can you try them
and let me know?

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: SCSI descriptor sense changes, testing needed

2011-09-26 Thread Kenneth D. Merry
On Sat, Sep 24, 2011 at 21:27:22 +0200, Fabian Keil wrote:
 Kenneth D. Merry k...@freebsd.org wrote:
 
  I have attached a set of patches against head that implement SCSI
  descriptor sense support for CAM.
 
  Anyway, I'd appreciate any testing and feedback on these changes.  As I
  said, they will probably be in 9.0, so if there are any issues it would
  be better to find them now. :)
 
 I've been using the patch on a ThinkPad R500 since yesterday and
 just reverted it today again to get my kernel closer to HEAD before
 looking into some (probably unrelated) panics.
 
 I didn't notice it while using the patch, but it looks like the
 kernel wasn't able to pick up cd0 anymore:

Hmm.  I don't think any of the changes would have caused this, but
evidently something did...

Let's see if we can debug it...

I have attached a patch to add some debugging output, and I see at least
one interesting thing in the logs below.

Can you re-apply the descriptor sense patch, and then try the attached
debugging patch as well?

 fk@r500 ~ $grep -h new dis /var/log/messages /var/log/messages.[123] | sort
 Sep 21 23:40:23 r500 kernel: GEOM: new disk da0
 Sep 21 23:40:30 r500 kernel: GEOM: new disk da1
 Sep 21 23:45:21 r500 kernel: GEOM: new disk ada0
 Sep 21 23:45:21 r500 kernel: GEOM: new disk cd0
 Sep 21 23:45:21 r500 kernel: GEOM: new disk da0
 Sep 21 23:45:21 r500 kernel: GEOM: new disk da1
 Sep 21 23:52:44 r500 kernel: GEOM: new disk ada0
 Sep 21 23:52:44 r500 kernel: GEOM: new disk cd0
 Sep 21 23:53:14 r500 kernel: GEOM: new disk da0
 Sep 21 23:56:23 r500 kernel: GEOM: new disk da1
 Sep 22 21:14:17 r500 kernel: GEOM: new disk ada0
 Sep 22 21:14:17 r500 kernel: GEOM: new disk cd0
 Sep 22 22:10:20 r500 kernel: GEOM: new disk da0
 [patch applied]
 Sep 22 23:29:45 r500 kernel: GEOM: new disk da0
 Sep 23 14:38:31 r500 kernel: GEOM: new disk ada0
 Sep 23 17:19:40 r500 kernel: GEOM: new disk da0
 Sep 23 19:20:21 r500 kernel: GEOM: new disk da0
 Sep 23 19:20:42 r500 kernel: GEOM: new disk da1
 Sep 23 22:58:56 r500 kernel: GEOM: new disk da0
 Sep 24 09:31:02 r500 kernel: GEOM: new disk ada0
 Sep 24 14:17:22 r500 kernel: GEOM: new disk da0
 Sep 24 14:44:03 r500 kernel: GEOM: new disk ada0
 Sep 24 14:44:03 r500 kernel: GEOM: new disk da0
 Sep 24 14:53:30 r500 kernel: GEOM: new disk ada0
 Sep 24 15:03:24 r500 kernel: GEOM: new disk da0
 Sep 24 15:06:03 r500 kernel: GEOM: new disk da0
 Sep 24 15:13:57 r500 kernel: GEOM: new disk ada0
 Sep 24 15:14:16 r500 kernel: GEOM: new disk da0
 Sep 24 15:27:11 r500 kernel: GEOM: new disk ada0
 Sep 24 15:28:05 r500 kernel: GEOM: new disk da0
 Sep 24 15:32:10 r500 kernel: GEOM: new disk ada0
 Sep 24 15:32:10 r500 kernel: GEOM: new disk da0
 Sep 24 15:38:16 r500 kernel: GEOM: new disk ada0
 Sep 24 15:38:16 r500 kernel: GEOM: new disk da0
 Sep 24 15:43:33 r500 kernel: GEOM: new disk ada0
 Sep 24 15:43:33 r500 kernel: GEOM: new disk da0
 Sep 24 15:49:30 r500 kernel: GEOM: new disk ada0
 [patch reverted]
 Sep 24 19:32:51 r500 kernel: GEOM: new disk ada0
 Sep 24 19:32:51 r500 kernel: GEOM: new disk cd0
 Sep 24 19:32:51 r500 kernel: GEOM: new disk da0
 Sep 24 19:38:07 r500 kernel: GEOM: new disk ada0
 Sep 24 19:38:07 r500 kernel: GEOM: new disk cd0
 
 Without the patch I'm used to getting the following kernel
 messages when booting (without a disc in cd0):
 
 Sep 24 19:32:51 r500 kernel: ahcich0: AHCI reset: device ready after 100ms
 Sep 24 19:32:51 r500 kernel: (aprobe0:ahcich0:0:0:0): SIGNATURE: 
 Sep 24 19:32:51 r500 kernel: ahcich1: AHCI reset: device ready after 100ms
 Sep 24 19:32:51 r500 kernel: (aprobe1:ahcich1:0:0:0): SIGNATURE: eb14
 Sep 24 19:32:51 r500 kernel: GEOM: new disk cd0
 Sep 24 19:32:51 r500 kernel: pass0 at ahcich0 bus 0 scbus0 target 0 lun 0
 Sep 24 19:32:51 r500 kernel: pass0: HITACHI HTS543225L9SA00 FBEZC4EC ATA-8 
 SATA 1.x device
 Sep 24 19:32:51 r500 kernel: pass0: Serial Number 090509FB2F32LLEY6D8A
 Sep 24 19:32:51 r500 kernel: pass0: 150.000MB/s transfers (SATA 1.x, UDMA5, 
 PIO 8192bytes)
 Sep 24 19:32:51 r500 kernel: pass0: Command Queueing enabled
 Sep 24 19:32:51 r500 kernel: pass1 at ahcich1 bus 0 scbus1 target 0 lun 0
 Sep 24 19:32:51 r500 kernel: pass1: HL-DT-ST DVDRAM GSA-T50N RX05 Removable 
 CD-ROM SCSI-0 device 
 Sep 24 19:32:51 r500 kernel: pass1: Serial Number M2R96NC0647
 Sep 24 19:32:51 r500 kernel: pass1: 150.000MB/s transfers (SATA 1.x, UDMA6, 
 ATAPI 12bytes, PIO 8192bytes)
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 0 
 0 0 0 0 0 0 0 
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status 
 Error
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condition
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: UNIT ATTENTION 
 asc:29,0 (Power on, reset, or bus device reset occurred)
 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): Retrying command (per sense 
 data)
 Sep 24 19:32:51 r500 kernel: ada0

Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)

2011-06-22 Thread Kenneth D. Merry
On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote:
 On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote:
  These two are interesting:
  
   http://img825.imageshack.us/img825/1249/21062011014m.jpg
   http://img839.imageshack.us/img839/3791/21062011015.jpg
  
  It looks like the GEOM event thread is stuck inside the cd(4) driver.  The
  cd(4) driver is trying to acquire the peripheral lock, and is sleeping
  until it gets it.
  
  What isn't clear is who is holding it.  The ps output shows an idle thread
  running on CPU 1, and thread 100014 (taskq) running on CPU 0.
  Unfortunately I don't see a stack trace for that.  (I might have missed
  it.)
  
  Do you happen to have the image with the stack trace for that thread?
 
 I don't have the image because no disks are mounted at that stage and the 
 swap slice is not attached. But I can issue more specific DDB commands to 
 narrow it down, just say what you need in detail.
 
 BTW, the machine have 2 DVD both are attached to Marvell IDE plain ATA 
 interface, they always works before.
 
 Are you sure that something holding the lock? 'show lock' shows absolutely 
 nothing, it is empty.

Well, after looking at the code a little more, it looks like the lock
that is being held is the periph lock, which is really just a flag.
So 'show lock' wouldn't show anything relevant.  Here's cam_periph_hold():

int
cam_periph_hold(struct cam_periph *periph, int priority)
{
int error;

/*
 * Increment the reference count on the peripheral
 * while we wait for our lock attempt to succeed
 * to ensure the peripheral doesn't disappear out
 * from user us while we sleep.
 */

if (cam_periph_acquire(periph) != CAM_REQ_CMP)
return (ENXIO);

mtx_assert(periph-sim-mtx, MA_OWNED);
while ((periph-flags  CAM_PERIPH_LOCKED) != 0) {
periph-flags |= CAM_PERIPH_LOCK_WANTED;
if ((error = mtx_sleep(periph, periph-sim-mtx, priority,
 caplck, 0)) != 0) {
cam_periph_release_locked(periph);
return (error);
}
}

periph-flags |= CAM_PERIPH_LOCKED;
return (0);
}

The GEOM event thread is stuck sleeping in the mtx_sleep() call above.  So
that tells me that one of several things is going on:

 - There is a path in the cd(4) driver where it can call cam_periph_hold()
   but not cam_periph_unhold().

 - There is another thread in the system that has called cam_periph_hold(),
   and has gotten stuck before it can call cam_periph_unhold().

 - The hold/unhold logic is broken, and there is a case where a thread
   waiting for the lock can miss the wakeup.  After looking at the code, I
   don't think this is the case, but I may have missed something.

So it is probably one of the first two cases.  From the dmesg, I only see
cd1 listed, not cd0.  So it is possible that cd0 is stuck in the probe code
somewhere, and the geom code just gets stuck trying to open it when the
probe hasn't completed.

Seeing the stack trace for the taskq thread that is running on CPU 0
(process 100014) might be enlightening, it's hard to say.  That may or may
not show the issue.

It's possible that this issue is directly related to the commit in
question; perhaps there is an error being returned that wasn't returned
before and it isn't being handled right in the cd(4) driver.  (The cd(4)
driver wasn't touched in the commit.)

It's also possible that the commit in question just changed the timing and
your system is hitting a race that was there previously.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)

2011-06-21 Thread Kenneth D. Merry
On Wed, Jun 22, 2011 at 00:49:34 +0400, Andrey Chernov wrote:
 On Tue, Jun 21, 2011 at 10:17:19AM -0600, Kenneth D. Merry wrote:
  ps
  alltrace
  show locks
  show msgbuf
  
  Hopefully that will give us something to start looking at...
  
  This would really work a lot better if there is any way to get a serial
  console on the machine.  The above will produce a good bit of output, and
  would likely need a lot of pictures.
  
  Since we can't reproduce the problem here, some debugging help would be
  greatly appreciated.
 
 Sorry I have no serial console. Here are the photos. I remove very similar 
 looking USB parts from 'ps' and 'alltrace', and very general parts from 
 'alltrace' always been there. I hope remaining info will be enough. USB 
 hotplagging works at this stage, so no reason to look there. If it will be 
 not enough, I'll upload whole series.

Thanks for uploading all of the photos.  That's a lot of work, but they are
helpful...

I think I see part of the problem, but not the whole problem:

 'show lock' outputs nothing, it means no locks just sleep somewhere 
 forever.
 
 'ps':
 http://img43.imageshack.us/img43/1424/21062011001j.jpg
 http://img835.imageshack.us/img835/6607/21062011002.jpg
 http://img841.imageshack.us/img841/5401/21062011003.jpg
 
 'alltrace':
 http://img864.imageshack.us/img864/6757/21062011004ya.jpg
 http://img542.imageshack.us/img542/4857/21062011005.jpg
 http://img828.imageshack.us/img828/823/21062011006.jpg
 http://img5.imageshack.us/img5/910/21062011007.jpg
 http://img7.imageshack.us/img7/4704/21062011008.jpg
 http://img848.imageshack.us/img848/5487/21062011009.jpg
 http://img641.imageshack.us/img641/2/21062011010.jpg
 http://img7.imageshack.us/img7/7946/21062011011.jpg
 http://img860.imageshack.us/img860/8185/21062011012.jpg
 http://img696.imageshack.us/img696/5276/21062011013.jpg

These two are interesting:

 http://img825.imageshack.us/img825/1249/21062011014m.jpg
 http://img839.imageshack.us/img839/3791/21062011015.jpg

It looks like the GEOM event thread is stuck inside the cd(4) driver.  The
cd(4) driver is trying to acquire the peripheral lock, and is sleeping
until it gets it.

What isn't clear is who is holding it.  The ps output shows an idle thread
running on CPU 1, and thread 100014 (taskq) running on CPU 0.
Unfortunately I don't see a stack trace for that.  (I might have missed
it.)

Do you happen to have the image with the stack trace for that thread?

 http://img594.imageshack.us/img594/1773/21062011016.jpg
 http://img109.imageshack.us/img109/9937/21062011017.jpg
 http://img51.imageshack.us/img51/6047/21062011018l.jpg
 
 'show msgbuf':
 http://img59.imageshack.us/img59/46/21062011019.jpg
 http://img189.imageshack.us/img189/483/21062011020.jpg
 http://img19.imageshack.us/img19/8163/21062011021.jpg
 http://img683.imageshack.us/img683/3171/21062011022.jpg
 http://img819.imageshack.us/img819/5923/21062011023.jpg
 http://img692.imageshack.us/img692/3789/21062011024.jpg
 http://img580.imageshack.us/img580/1550/21062011025.jpg
 http://img560.imageshack.us/img560/7478/21062011026.jpg
 http://img94.imageshack.us/img94/9371/21062011027.jpg
 http://img857.imageshack.us/img857/5185/21062011028.jpg

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)

2011-06-21 Thread Kenneth D. Merry
On Mon, Jun 20, 2011 at 15:46:56 +0400, Andrey Chernov wrote:
 On Mon, Jun 20, 2011 at 11:01:46AM +0300, Kostik Belousov wrote:
  On Mon, Jun 20, 2011 at 11:02:22AM +0400, Andrey Chernov wrote:
   On Sun, Jun 19, 2011 at 08:15:43PM -0600, Justin T. Gibbs wrote:
On 6/19/11 6:19 PM, Andrey Chernov wrote:
 Exactly that commit is responsible for boot hang.
 Please fix.
 BTW, I have MBR on SATA disk (CAM emulated), ICH9.

Since it works for me, you'll need to provide more information.  Can you
at least drop into kdb to determine the likely source of the hang by
getting a stack trace of all processes to see where they are sleeping
and dumping lock information?
   
   I drop into DDB and put 'bt' console photo in the very first message of 
   this thread - nothing unusual seen in the main stack. Could you please 
   specify exact DDB commands you want to be issued by me? No dump can be 
   provided since nothing is mounted yet including swap,
   
   BTW, I remember I saw previously unseen warnings with post Jun 14 kernels:
   xpt_action_default: CCB type 0xe not supported
   
   'ps' inside DDB shows [xpt_thrd] at ccb_scan wmesg state and [g_event]
   at caplck wmesg state, [kernel] at g_waitid state.
   Even don't know, if it matters.
  
  Just in case, please try r223277.
 
 As the second message in the thread states, I try first even 223296 with 
 the same hang and the same 
 xpt_action_default: CCB type 0xe not supported
 As I think, DDB's 'ps' indicates that kernel waits something from geom and 
 geom waits something from ccb_scan forever, just raw guess. I will be glad to 
 issue more specific DDB commands and upload corresponding photos.
 BTW, pluging and unplugging USB devides works in that stage.

Can you do the following when the hang happens:

ps
alltrace
show locks
show msgbuf

Hopefully that will give us something to start looking at...

This would really work a lot better if there is any way to get a serial
console on the machine.  The above will produce a good bit of output, and
would likely need a lot of pictures.

Since we can't reproduce the problem here, some debugging help would be
greatly appreciated.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: message buffer scrambling fix

2011-05-31 Thread Kenneth D. Merry
On Sat, May 28, 2011 at 11:26:50 -0700, Julian Elischer wrote:
 On 5/27/11 3:45 PM, Kenneth D. Merry wrote:
 Hey folks,
 
 I have attached some patches to the kernel message buffer code (this
 affects dmesg(8) output as well as kernel messages that go to the syslog)
 to address log scrambling.
 
 This fixes the same issue that 'options PRINTF_BUFR_SIZE=128' fixes for the
 console.
 
 The problem is that you can have multiple kernel threads writing to the
 message buffer at the same time, and so their characters will get
 interleaved.  All of the characters will get in there, because they're
 written with atomic operations, but the output might looked scrambled.
 
 So the fix is to use the same stack buffer that is used for the console
 output (so the stack size doesn't increase), and use a spin lock instead of
 atomic operations to insert the string into the message buffer.
 
 The result is that dmesg and syslog output should look the same as the
 console output.  As long as individual kernel prints fit in the printf
 buffer size, they will be put into the message buffer atomically.
 
 I also fixed a couple of other long-standing issues.  putcons() (in
 subr_prf.c) was adding a carriage return before calling cnputs().  But
 cnputs() calls cnputc(), which adds a carriage return before every newline.
 So much of the console output (the part that came from putcons() at least)
 had two carriage returns at the end.
 
 The other issue was that log_console() was inserting a newline for any
 console write that didn't already have one at the end.  The issue with that
 can be seen if you do a 'dmesg -a' and compare that to the console output.
 
 You'll see something like this on the console:
 
 Updating motd:.
 
 But this in dmesg -a:
 
 Updating motd:
 .
 
 That is because Updating motd: is written first, log_console() appends a
 newline, and then .\n is written.
 
 I added a loader tunable and sysctl to turn the old behavior back on
 (kern.log_console_add_linefeed) if you want the old behavior, but I think
 we should be able to safely remove it.
 
 Also, the new msgbuf_addstr() function allows the caller to optionally ask
 for carriage returns to be stripped out.  However, in my testing I haven't
 seen any carriage returns to strip.
 
 Let me know if you have any comments.  I'm planning to check this into head
 next week.
 
 looks good.. as long as we don't end up  with the behaviour that I 
 think I see on
 Linux (it's hard to tell sometimes) where the last message (the one 
 you really
 want to see) doesn't make it out.

Everything passed into the kernel printf() call should make it out to the
console, message buffer, etc. before the printf call completes.  The only
way that wouldn't happen is if spin locks break for some reason.

One thing I forgot to mention is that I think the PRINTF_BUFR_SIZE option
should be made non-optional.  Even on smaller embedded machines, I think we
should be able to afford the 128 bytes of stack space to keep messages from
getting scrambled.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


message buffer scrambling fix

2011-05-27 Thread Kenneth D. Merry
Hey folks,

I have attached some patches to the kernel message buffer code (this
affects dmesg(8) output as well as kernel messages that go to the syslog)
to address log scrambling.

This fixes the same issue that 'options PRINTF_BUFR_SIZE=128' fixes for the
console.

The problem is that you can have multiple kernel threads writing to the
message buffer at the same time, and so their characters will get
interleaved.  All of the characters will get in there, because they're
written with atomic operations, but the output might looked scrambled.

So the fix is to use the same stack buffer that is used for the console
output (so the stack size doesn't increase), and use a spin lock instead of
atomic operations to insert the string into the message buffer.

The result is that dmesg and syslog output should look the same as the
console output.  As long as individual kernel prints fit in the printf
buffer size, they will be put into the message buffer atomically.

I also fixed a couple of other long-standing issues.  putcons() (in
subr_prf.c) was adding a carriage return before calling cnputs().  But
cnputs() calls cnputc(), which adds a carriage return before every newline.
So much of the console output (the part that came from putcons() at least)
had two carriage returns at the end.

The other issue was that log_console() was inserting a newline for any
console write that didn't already have one at the end.  The issue with that
can be seen if you do a 'dmesg -a' and compare that to the console output.

You'll see something like this on the console:

Updating motd:.

But this in dmesg -a:

Updating motd:
.

That is because Updating motd: is written first, log_console() appends a
newline, and then .\n is written.

I added a loader tunable and sysctl to turn the old behavior back on
(kern.log_console_add_linefeed) if you want the old behavior, but I think
we should be able to safely remove it.

Also, the new msgbuf_addstr() function allows the caller to optionally ask
for carriage returns to be stripped out.  However, in my testing I haven't
seen any carriage returns to strip.

Let me know if you have any comments.  I'm planning to check this into head
next week.

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org
Index: sys/kern/subr_msgbuf.c
===
--- sys/kern/subr_msgbuf.c  (revision 222390)
+++ sys/kern/subr_msgbuf.c  (working copy)
@@ -31,8 +31,16 @@
 
 #include sys/param.h
 #include sys/systm.h
+#include sys/lock.h
+#include sys/mutex.h
 #include sys/msgbuf.h
 
+/*
+ * Maximum number conversion buffer length: uintmax_t in base 2, plus 
+ * around the priority, and a terminating NUL.
+ */
+#defineMAXPRIBUF   (sizeof(intmax_t) * NBBY + 3)
+
 /* Read/write sequence numbers are modulo a multiple of the buffer size. */
 #define SEQMOD(size) ((size) * 16)
 
@@ -51,6 +59,9 @@
mbp-msg_seqmod = SEQMOD(size);
msgbuf_clear(mbp);
mbp-msg_magic = MSG_MAGIC;
+   mbp-msg_lastpri = -1;
+   mbp-msg_needsnl = 0;
+   mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 }
 
 /*
@@ -80,6 +91,11 @@
}
msgbuf_clear(mbp);
}
+
+   mbp-msg_lastpri = -1;
+   /* Assume that the old message buffer didn't end in a newline. */
+   mbp-msg_needsnl = 1;
+   mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN);
 }
 
 /*
@@ -110,28 +126,143 @@
 }
 
 /*
- * Append a character to a message buffer.  This function can be
- * considered fully reentrant so long as the number of concurrent
- * callers is less than the number of characters in the buffer.
- * However, the message buffer is only guaranteed to be consistent
- * for reading when there are no callers in this function.
+ * Add a character into the message buffer, and update the checksum and
+ * sequence number.
+ *
+ * The caller should hold the message buffer spinlock.
  */
+static inline void
+msgbuf_do_addchar(struct msgbuf *mbp, u_int *seq, int c)
+{
+   u_int pos;
+
+   /* Make sure we properly wrap the sequence number. */
+   pos = MSGBUF_SEQ_TO_POS(mbp, *seq);
+
+   mbp-msg_cksum += (u_int)c -
+   (u_int)(u_char)mbp-msg_ptr[pos];
+
+   mbp-msg_ptr[pos] = c;
+
+   *seq = MSGBUF_SEQNORM(mbp, *seq + 1);
+}
+
+/*
+ * Append a character to a message buffer.
+ */
 void
 msgbuf_addchar(struct msgbuf *mbp, int c)
 {
-   u_int new_seq, pos, seq;
+   mtx_lock_spin(mbp-msg_lock);
 
-   do {
-   seq = mbp-msg_wseq;
-   new_seq = MSGBUF_SEQNORM(mbp, seq + 1);
-   } while (atomic_cmpset_rel_int(mbp-msg_wseq, seq, new_seq) == 0);
-   pos = MSGBUF_SEQ_TO_POS(mbp, seq);
-   atomic_add_int(mbp-msg_cksum, (u_int)(u_char)c -
-   (u_int)(u_char)mbp-msg_ptr[pos]);
-   mbp-msg_ptr[pos] = c;
+   msgbuf_do_addchar(mbp, mbp-msg_wseq, c);
+
+   mtx_unlock_spin(mbp-msg_lock);
 }
 
 /*
+ * Append a NUL-terminated string with a priority to a message 

Re: multiple issues with devstat_*(9)

2011-04-11 Thread Kenneth D. Merry
On Thu, Apr 07, 2011 at 13:59:35 +0300, Alexander Motin wrote:
 Alexander Best wrote:
  On Fri Apr  1 11, John Baldwin wrote:
  On Thursday, March 31, 2011 6:33:39 pm Alexander Best wrote:
  i think there are multiple issues with devstat. i found the following in
  devicestat.h:
 
 ...
 
  funny thing is i found the following in scsi_pass.c:
 
  softc-device_stats = devstat_new_entry(pass,
periph-unit_number, 0,
DEVSTAT_NO_BLOCKSIZE
| (no_tags ? DEVSTAT_NO_ORDERED_TAGS : 0),
softc-pd_type |
DEVSTAT_TYPE_IF_SCSI |
DEVSTAT_TYPE_PASS,
DEVSTAT_PRIORITY_PASS);
 
  ...so pass* *should* show up under iostat -t scsi.
 
 As I can see, this is a bug (or feature) of the libdevstat /
 devstat_selectdevs(). If you specify any -t, then pass devices will be
 reported only if you request pass specifically.
 
  Hmm, pass devices for adaX should not be SCSI though, they should be ide I
  think.
  
  i think the situation with ATA_CAM should be discussed further. still 
  besides
  this issue there are many more with devstat(3).
  
  i'll try to track all the devstat_new_entry() occurrences and see if some
  issues can be fixed. maybe only the proper DEVSTAT_* args were forgotten.
 
 Assuming that SCSI and IDE in -t option means transport type, and
 assuming that we count everything except ATA and SATA as SCSI, I've made
 following patch, that should fix issues from the CAM side:
 http://people.freebsd.org/~mav/cam.devstat.patch
 
 Any objections? Or SCSI/IDE there expected to mean command set?

For what it's worth, I think the above patch is the right approach.  The
device type stuff in devstat has been broken since GEOM went in, so I'm
glad to see you step up to fix it!

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: multiple issues with devstat_*(9)

2011-04-11 Thread Kenneth D. Merry
On Sun, Apr 10, 2011 at 23:19:31 +0300, Alexander Motin wrote:
 Alexander Best wrote:
  On Thu Apr  7 11, Alexander Motin wrote:
  Alexander Best wrote:
  On Fri Apr  1 11, John Baldwin wrote:
  On Thursday, March 31, 2011 6:33:39 pm Alexander Best wrote:
  i think there are multiple issues with devstat. i found the following in
  devicestat.h:
  ...
 
  funny thing is i found the following in scsi_pass.c:
 
  softc-device_stats = devstat_new_entry(pass,
periph-unit_number, 0,
DEVSTAT_NO_BLOCKSIZE
| (no_tags ? DEVSTAT_NO_ORDERED_TAGS : 0),
softc-pd_type |
DEVSTAT_TYPE_IF_SCSI |
DEVSTAT_TYPE_PASS,
DEVSTAT_PRIORITY_PASS);
 
  ...so pass* *should* show up under iostat -t scsi.
  As I can see, this is a bug (or feature) of the libdevstat /
  devstat_selectdevs(). If you specify any -t, then pass devices will be
  reported only if you request pass specifically.
 
  Hmm, pass devices for adaX should not be SCSI though, they should be ide 
  I
  think.
  i think the situation with ATA_CAM should be discussed further. still 
  besides
  this issue there are many more with devstat(3).
 
  i'll try to track all the devstat_new_entry() occurrences and see if 
  some
  issues can be fixed. maybe only the proper DEVSTAT_* args were forgotten.
  Assuming that SCSI and IDE in -t option means transport type, and
  assuming that we count everything except ATA and SATA as SCSI, I've made
  following patch, that should fix issues from the CAM side:
  http://people.freebsd.org/~mav/cam.devstat.patch
  
  with your patch i get the following output:
  
  otaku% iostat -t ide
 ttyada0 ada1 cpu
   tin  tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 6   144 14.21   6  0.09  20.46  40  0.81   2  0  3  0 95
  otaku% iostat -t scsi
 tty cd0 cpu
   tin  tout  KB/t tps  MB/s  us ni sy in id
 6   146  2.32   0  0.00   2  0  3  0 95
  otaku% iostat -t pass
 tty   pass0pass1pass2 cpu
   tin  tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 6   147  0.36   0  0.00   0.36   0  0.00   0.00   0  0.00   2  0  3  0 95
  otaku% iostat -t da  
 ttyada0 ada1 cpu
   tin  tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 6   147 14.21   6  0.08  20.46  37  0.75   1  0  3  0 95
  otaku% iostat -t cd
 tty cd0 cpu
   tin  tout  KB/t tps  MB/s  us ni sy in id
 7   147  2.32   0  0.00   1  0  3  0 95
  otaku% iostat -t other
 ttycpu
   tin  tout us ni sy in id
 7   149  1  0  3  0 95
  otaku% iostat -n 100  
 ttyada0 ada1  cd0
  pass0pass1pass2 cpu
   tin  tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  
  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 6   135 14.21   5  0.07  20.44  32  0.64   2.32   0  0.00   0.36   0  
  0.00   0.36   0  0.00   0.00   0  0.00   1  0  3  0 96
  
  the the remaining issues imho are:
  
  1) ada* and cd* are SATA/ATA devices. so i think they should show up 
  together
 either under ide *or* scsi. i don't have any *real* scsi devices.
 
 I've just retested the patch and haven't reproduced your problem:
 %iostat -d
  da0 ada0  da1  cd0
   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s
   0.01   0  0.00   3.27   1  0.00   2.65   1  0.00   0.00   0  0.00
 %iostat -d -t ide
  da0 ada0  cd0
   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s
   0.01   0  0.00   3.27   1  0.00   0.00   0  0.00
 %iostat -d -t scsi
  da1
   KB/t tps  MB/s
   2.65   1  0.00
 %iostat -d -t pass
pass0pass1pass2pass3
   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s
   0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0.00   0  0.00
 %iostat -d -t ide,pass
pass0pass1pass2
   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s
   0.00   0  0.00   0.00   0  0.00   0.00   0  0.00
 %iostat -d -t scsi,pass
pass3
   KB/t tps  MB/s
   0.00   0  0.00
 
 da0 is an PATA ATAPI ZIP, da1 - USB floppy, ada0 - SATA HDD, cd0 - PATA
 ATAPI CD-ROM.
 
 Just an idea, aren't you are using legacy ata(4) + atapicam for your
 cd0? atapicam lies that it's buses are SPI (SCSI).
 
  2) the pass* devices still don't show up under ide/scsi/other. that's ok, 
  but
 then the src comments and manual pages need to be changed accordingly.
 
 As I have told - it is a bug/feature of libdevstat. It should not be
 difficult to fix, if it is really a bug.

Re: multiple issues with devstat_*(9)

2011-04-11 Thread Kenneth D. Merry
On Mon, Apr 11, 2011 at 19:09:00 +0300, Alexander Motin wrote:
 On 11.04.2011 18:43, Kenneth D. Merry wrote:
 On Sun, Apr 10, 2011 at 23:19:31 +0300, Alexander Motin wrote:
 Alexander Best wrote:
 2) the pass* devices still don't show up under ide/scsi/other. that's 
 ok, but
 then the src comments and manual pages need to be changed 
 accordingly.
 
 As I have told - it is a bug/feature of libdevstat. It should not be
 difficult to fix, if it is really a bug.
 
 That was intentional, if I can remember what I intended in 1997/1998.
 
 The reason was that since there is one passthrough device created for every
 device that CAM manages, you don't want to show pass(4) devices when the
 user says 'iostat -t scsi'.  Otherwise he might get all pass(4) devices,
 depending on the order of devices in the system.
 
 But, if it's pass(4) devices you want, you can ask for them specifically,
 or for all SCSI/IDE pass(4) devices, as mav did above.
 
 But it is impossible to get, for example, all SCSI devices including 
 pass. Either only non-pass, or pass only.
 
 It is strange that if I won't specify -t (most probable for 
 inexperienced users), I'll gel all devices including pass, but if 
 specify -t scsi (as more advanced user who knows what to ask), I'll 
 get only non-pass. It is at least inconsistent.

Perhaps it is somewhat inconsistent, and we should do some filtering by
default to not show pass(4) devices.

The idea was that in most cases, people will not want to see the pass(4)
devices.  That is not where most of the I/O typically happens.  If they
want to see the pass(4) devices, they can ask for them specifically by type
or by name.

When I have a system full of drives and I want to look at one particular
pass(4) device, I always specify it manually, e.g.:  'iostat -d pass4 1'

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


LSI 6Gb SAS driver committed

2010-09-10 Thread Kenneth D. Merry

I sent this out to the -scsi list earlier today.  Testers would be
appreciated for the 6Gb LSI SAS driver.

Please follow up to me or the -scsi list.

Thanks,

Ken

- Forwarded message from Kenneth D. Merry k...@freebsd.org -

Date: Fri, 10 Sep 2010 09:04:38 -0600
From: Kenneth D. Merry k...@freebsd.org
To: s...@freebsd.org
Subject: LSI 6Gb SAS driver committed

Hey folks,

I have commited the mps driver (LSI Logic 6Gb SAS controller driver) to the
FreeBSD perforce server (//depot/projects/mps/... and FreeBSD-current.

The driver works with SAS and SATA drives, directly attached or attached
through expanders.  Basic error recovery works as well (i.e. timeouts and
aborts).

There are some known issues, including:

 - No support for integrated RAID (IR) arrays.

 - Devices tend to disappear and come back in one of my configurations.  I
   also see some phantom devices, and events that don't make sense:

mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
(da2:mps0:0:6:0): SCSI command timeout on device handle 0x0017 SMID 90
mps0: mpssas_abort_complete: abort request on handle 0x17 SMID 90 complete
mps0: Unhandled event 0x0
(probe2:mps0:0:2:0): AutoSense failed
mps0: Unhandled event 0x0
(da10:mps0:0:0:0): unsupportable block size 0
(da10:mps0:0:0:0): lost device
(da10:mps0:0:0:0): removing device entry
(da2:mps0:0:6:0): lost device
(da2:mps0:0:6:0): removing device entry
da2 at mps0 bus 0 scbus0 target 6 lun 0
da2: ATA ST3160023AS 3.05 Fixed Direct Access SCSI-5 device
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0
mps0: Unhandled event 0x0


 - Sometimes you'll run into a device that fails part of the probe on boot,
   and you'll end up running into the run_interrupt_driven_config_hooks
   timeout.  You see some aborts during probe, and then the 5 minute probe
   timeout kicks in and panics the kernel.  For instance:

(probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 81
mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 81 complete
run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
(probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 214
mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 214 complete
run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config
run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_config
(probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 281
mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 281 complete
run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_config
(probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 348
mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 348 complete
run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_config
(probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 415
mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 415 complete
panic: run_interrupt_driven_config_hooks: waited too long
cpuid = 0
KDB: enter: panic
[ thread pid 0 tid 10 ]
Stopped at  kdb_enter+0x3d: movq$0,0x4c70b0(%rip)
db

 - ioctl support isn't complete, and there is no userland utility.

 - There is no man page.

The driver is in the tree at this point to allow people to test it out,
report any problems, and hopefully contribute bug fixes.

LSI has some developers working on this driver, and we hope to get them to
put some of their work-in-progress in the FreeBSD Perforce repo.  So, in
view of that, if you make any changes to the driver, please make them in
the FreeBSD Perforce repository first (in //depot/projects/mps/...) and
then merge them into FreeBSD-current.

Thanks to Scott Long for writing the driver, and to Yahoo and Spectra Logic
for sponsoring the work.

Ken
-- 
Kenneth Merry
k...@freebsd.org

- End forwarded message -

-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


HEADS UP: CAM error recovery change

2003-10-27 Thread Kenneth D. Merry

I checked in a change to the CAM error recovery code that will hopefully
have a positive effect on systems with CDROM drives that were taking a
while to probe.

Anyway, try this out and let me know if there are any regressions.

Thanks,

Ken

- Forwarded message from Kenneth D. Merry [EMAIL PROTECTED] -

From: Kenneth D. Merry [EMAIL PROTECTED]
Date: Sun, 26 Oct 2003 22:15:55 -0800 (PST)
To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: cvs commit: src/sys/cam cam_periph.c src/sys/cam/scsi scsi_cd.c

ken 2003/10/26 22:15:55 PST

  FreeBSD src repository

  Modified files:
sys/cam  cam_periph.c 
sys/cam/scsi scsi_cd.c 
  Log:
  In camperiphdone(), make sure we check for fatal errors and bail out
  instead of retrying them blindly.
  
  This should fix some of the problems people have been having with cdrom
  drives taking a long time to probe.  This should also eliminate the need
  for the initial TUR in cdsize().
  
  cam_periph.c:   Don't keep retrying if the error we get back is a fatal
  error.  This should help us detect the transition from
  Logical unit not ready, cause not reportable to Medium
  not present in the TUR many handler.  (The TUR many
  handler gets triggered for Logical unit not ready, cause
  not reportable errors.)
  
  scsi_cd.c:  Remove the initial test unit ready in cdsize().  Hopefully
  it isn't necessary after the above change.
  
  Submitted by:   gibbs (mostly)
  Tested by:  peter
  MFC After:  2 weeks
  
  Revision  ChangesPath
  1.55  +17 -2 src/sys/cam/cam_periph.c
  1.88  +0 -14 src/sys/cam/scsi/scsi_cd.c

- End forwarded message -

-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: cd0 errors during probe?

2003-10-12 Thread Kenneth D. Merry
On Sun, Oct 12, 2003 at 10:26:54 -0700, Steve Kargl wrote:
 Can I assume that the following error messages are
 erronous because cd0 appears to function without
 any problems?  There is a CD in the drive.
 
 cd0 at ahc0 bus 0 target 4 lun 0
 cd0: TOSHIBA CD-ROM XM-6401TA 1001 Removable CD-ROM SCSI-2 device 
 cd0: 10.000MB/s transfers (10.000MHz, offset 15)
 cd0: cd present [129875 x 2048 byte records]
 (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 
 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error
 (cd0:ahc0:0:4:0): SCSI Status: Check Condition
 (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0
 (cd0:ahc0:0:4:0): Illegal mode for this track
 (cd0:ahc0:0:4:0): Retrying Command (per Sense Data)
 (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 
 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error
 (cd0:ahc0:0:4:0): SCSI Status: Check Condition
 (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0
 (cd0:ahc0:0:4:0): Illegal mode for this track
 (cd0:ahc0:0:4:0): Retrying Command (per Sense Data)
 (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 
 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error
 (cd0:ahc0:0:4:0): SCSI Status: Check Condition
 (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0
 (cd0:ahc0:0:4:0): Illegal mode for this track
 (cd0:ahc0:0:4:0): Retrying Command (per Sense Data)

Looks like GEOM is trying to read the first sector of the CD, but since
it's likely an audio CD, it doesn't quite work.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: cd0 errors during probe?

2003-10-12 Thread Kenneth D. Merry
On Sun, Oct 12, 2003 at 14:29:38 -0700, Steve Kargl wrote:
 On Sun, Oct 12, 2003 at 02:51:50PM -0600, Kenneth D. Merry wrote:
  On Sun, Oct 12, 2003 at 10:26:54 -0700, Steve Kargl wrote:
   Can I assume that the following error messages are
   erronous because cd0 appears to function without
   any problems?  There is a CD in the drive.
   
   cd0 at ahc0 bus 0 target 4 lun 0
   cd0: TOSHIBA CD-ROM XM-6401TA 1001 Removable CD-ROM SCSI-2 device 
   cd0: 10.000MB/s transfers (10.000MHz, offset 15)
   cd0: cd present [129875 x 2048 byte records]
   (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 
   (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error
   (cd0:ahc0:0:4:0): SCSI Status: Check Condition
   (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0
   (cd0:ahc0:0:4:0): Illegal mode for this track
   (cd0:ahc0:0:4:0): Retrying Command (per Sense Data)
  
  Looks like GEOM is trying to read the first sector of the CD, but since
  it's likely an audio CD, it doesn't quite work.
  
 
 Yes, it is an audio CD.  I suspected that it was a 
 transient GEOM/CAM issue, but wanted to make sure
 before I needlessly replaced the cdrom drive.

There's nothing wrong with your drive, most likely.

I suppose it plays audio CDs and reads data CDs okay?  If so, it's nothing
to worry about.

Back when the cd(4) driver used the old slice code, it had a function,
cdfirsttrackisdata(), that figured out whether the first track was an audio
or data track.  It would set the flags in the disk structure accordingly to
tell the slice code whether or not to attempt to read a disklabel from the
CD.

The code in -stable still works that way.

My guess is that we need something similar again to tell GEOM not to
attempt to read the first sector of the CD when it's not a data CD.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ath(4) driver problems with WEP...

2003-09-19 Thread Kenneth D. Merry
On Fri, Sep 19, 2003 at 08:22:13 -0700, Sam Leffler wrote:
  Hmm.  One other thing I'm seeing is that when I configure a 128 bit key
  with ifconfig or wicontrol (wicontrol shows all 28 characters -- 0x plus
  26 hex characters), ifconfig still thinks it is a 104 bit key.  This is
  because ireq.i_len is 13.
  
 
 You must not have the up to date ifconfig.  This was the behaviour from
 before.  I believe the mechanism that wicontrol uses to fetch keys does not
 handle 104-bit keys so you see the zero-padded key string.

I rebuilt ifconfig from the sources you checked in, although I didn't do
a full buildworld.  Does it depend on a library change somewhere?

The key I got from wicontrol wasn't zero-padded.  It showed all 26 hex
characters.

   In a separate issue, the ath(4) driver can't see the 802.11a side of
   the wireless router at all when it is running in 108Mbps turbo mode.
   If I drop it down to 54Mbps, it sees it.  (Works fine in Windows.)
   
   Is the ath(4) driver supposed to support the 108Mbps turbo mode?
  
  I was able to associate with an Atheros AP with turbo mode enabled but
  didn't get any higher throughput.  I'm investigating this.
  
  FWIW I enabled turbo mode with:
  
  ifconfig ath0 mediaopt turbo
  
  I had to also set the mode to 11a before it wanted to accept the turbo
  option.  Otherwise I got:
  
  ifconfig: SIOCSIFMEDIA (mediaopt): Device not configured
  
 
 Ah, yes.  Turbo mode is orthogonal to 11a/b/g.  You can use it with 11g too
 so you need to specifiy 11a or 11g.  I was already locked in 11a mode.
 (But note that 11g+turbo is not yet supported by the driver.)

Ahh.  I take it there's no way for the driver/card to autodetect that
there's a turbo network around and attach to it?  (The Windows driver seems
to find it..)

  Then I typed:
  
 # ifconfig ath0 mode 11a mediaopt turbo
  atalk 0.0 range 0-0 phase 2
  
  Does it think I'm doing appletalk or something?
 
 Hmm, didn't see this, will have to check.
 
  
  It seems to see the base station in turbo mode now, but I'm still getting
  the authentication failed (reason 13) errors.
 
 Are you using WEP?  As I explained WEP doesn't work right now.

Yeah, I know.  I need to see if I can get an IPSec tunnel running to the
router.  My guess is that it will probably only want to talk IPSec over
its internet port.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ath(4) driver problems with WEP...

2003-09-18 Thread Kenneth D. Merry
On Wed, Sep 17, 2003 at 12:43:08 -0700, Sam Leffler wrote:
  I've got a Netgear WAG511 (Atheros 5212-based card) and a Netgear FWAG114
  wireless router.
  
  I've been trying to get the card and the router talking under FreeBSD.
  (Both 802.11a and 802.11g work fine under Windows on the same machine.)
  
  I'm using -current from September 15th.
  
  Anyway, whenever I try to get the card talking to the router, which is
  running WEP (128 bit keys) on both the a and b/g sides, I get:
  
  ath0: authentication failed (reason 13) for [ base station MAC address ]
  ath0: authentication failed (reason 13) for [ base station MAC address ]
  ath0: authentication failed (reason 13) for [ base station MAC address ]
  ath0: authentication failed (reason 13) for [ base station MAC address ]
  ath0: authentication failed (reason 13) for [ base station MAC address ]
  
  Here's what the ifconfig looks like:
  
  ath0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
  ether [ card mac address ]
  media: IEEE 802.11 Wireless Ethernet autoselect mode 11a
  (OFDM/6Mbps) status: no carrier
  ssid [my ssid] 1:[my ssid]
  channel -1 authmode OPEN powersavemode OFF powersavesleep 100
  wepmode MIXED weptxkey 1
  wepkey 1:128-bit wepkey 2:128-bit wepkey 3:128-bit wepkey
  4:128-bit
  
  I've verified and re-verified, via cut-and-paste from the router setup
  screen, that the WEP key is correct.
  
 
 Good news+bad news.  I just committed a fix to ifconfig to correctly handle
 128-bit WEP keys.  I'm not sure how you thought you were setting your key
 up but ifconfig was barfing on anything more than 104 bits.  FWIW ifconfig
 wrongly indicated keys 5 bytes (40 bits) were 128-bit keys; I also fixed
 that so ifconfig now indicates keys are 40-, 104-, or 128-bit according to
 their length.  Beware also that wicontrol displays WEP keys longer than 104
 bits zero-padded; I believe this is because of limitations in the RID API
 for fetching keys.  Someone else may want to investigate that issue.
 
 The bad news is that with 128-bit keys installed I'm getting decryption
 errors at the AP.  Actually, I'm seeing errors for any length key so it's
 likely a botch in the WEP frame construction in the driver.  I've run out
 of time to look at this right now and will have to investigate later.

Hmm.  One other thing I'm seeing is that when I configure a 128 bit key
with ifconfig or wicontrol (wicontrol shows all 28 characters -- 0x plus 26
hex characters), ifconfig still thinks it is a 104 bit key.  This is
because ireq.i_len is 13.

  Anyway, I can't get the ath(4) driver to talk to the router when it is
  running WEP.  I have been able to get it to talk 802.11g to the router
  without WEP enabled, though.
  
  I tried setting the authmode to shared via ifconfig, but from looking at
  ieee80211_ioctl.c:
  
 # if 0
  case IEEE80211_IOC_AUTHMODE:
  sc-wi_authmode = ireq-i_val;
  break;
 # endif
  
  i.e. I get EINVAL back.
  
  Is WEP supposed to work in -current?
  
 
 authmode is not relevant.  WEP worked at one time; I seem to have broken
 it.  As I said above I will have to look at it later.

Okay.

  In a separate issue, the ath(4) driver can't see the 802.11a side of the
  wireless router at all when it is running in 108Mbps turbo mode.  If I
  drop it down to 54Mbps, it sees it.  (Works fine in Windows.)
  
  Is the ath(4) driver supposed to support the 108Mbps turbo mode?
 
 I was able to associate with an Atheros AP with turbo mode enabled but
 didn't get any higher throughput.  I'm investigating this.
 
 FWIW I enabled turbo mode with:
 
 ifconfig ath0 mediaopt turbo

I had to also set the mode to 11a before it wanted to accept the turbo
option.  Otherwise I got:

ifconfig: SIOCSIFMEDIA (mediaopt): Device not configured

Then I typed:

# ifconfig ath0 mode 11a mediaopt turbo
atalk 0.0 range 0-0 phase 2

Does it think I'm doing appletalk or something?

It seems to see the base station in turbo mode now, but I'm still getting
the authentication failed (reason 13) errors.

 I verified turbo mode was in use by disabling it on either station or AP
 side and with things mismatched the station/AP couldn't see each other.
 With turbo mode enabled on each side I was able to associate and
 communicate as normal; but netperf throughput was identical to the
 non-turbo setup.  I'm asking Atheros folks for clarification on this--I may
 need to do some additional setup work to enable turbo operation.  This is
 actually the first time I've tried turbo mode...

Ahh.

Thanks!

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


ath(4) driver problems with WEP...

2003-09-16 Thread Kenneth D. Merry

I've got a Netgear WAG511 (Atheros 5212-based card) and a Netgear FWAG114
wireless router.

I've been trying to get the card and the router talking under FreeBSD.
(Both 802.11a and 802.11g work fine under Windows on the same machine.)

I'm using -current from September 15th.

Anyway, whenever I try to get the card talking to the router, which is
running WEP (128 bit keys) on both the a and b/g sides, I get:

ath0: authentication failed (reason 13) for [ base station MAC address ]
ath0: authentication failed (reason 13) for [ base station MAC address ]
ath0: authentication failed (reason 13) for [ base station MAC address ]
ath0: authentication failed (reason 13) for [ base station MAC address ]
ath0: authentication failed (reason 13) for [ base station MAC address ]

Here's what the ifconfig looks like:

ath0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
ether [ card mac address ]
media: IEEE 802.11 Wireless Ethernet autoselect mode 11a (OFDM/6Mbps)
status: no carrier
ssid [my ssid] 1:[my ssid]
channel -1 authmode OPEN powersavemode OFF powersavesleep 100
wepmode MIXED weptxkey 1
wepkey 1:128-bit wepkey 2:128-bit wepkey 3:128-bit wepkey 4:128-bit

I've verified and re-verified, via cut-and-paste from the router setup
screen, that the WEP key is correct.

Anyway, I can't get the ath(4) driver to talk to the router when it is
running WEP.  I have been able to get it to talk 802.11g to the router
without WEP enabled, though.

I tried setting the authmode to shared via ifconfig, but from looking at
ieee80211_ioctl.c:

#if 0
case IEEE80211_IOC_AUTHMODE:
sc-wi_authmode = ireq-i_val;
break;
#endif

i.e. I get EINVAL back.

Is WEP supposed to work in -current?

In a separate issue, the ath(4) driver can't see the 802.11a side of the
wireless router at all when it is running in 108Mbps turbo mode.  If I
drop it down to 54Mbps, it sees it.  (Works fine in Windows.)

Is the ath(4) driver supposed to support the 108Mbps turbo mode?

Thanks,

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: scsi_cd or atapicam crash in current.

2003-09-12 Thread Kenneth D. Merry
On Fri, Sep 12, 2003 at 08:57:22 -0700, Kevin Oberman wrote:
 I am seeing a peculiar, possibly timing sensitive, crash that looks
 like if is probably in either atapicam or scsi_cd. The system is
 CURRENT as of yesterday morning.
 
 The crash happens frequently when nautilus starts up. It does not
 always crash, but does so fairly frequently and leaves my laptop
 locked in X with no access to the console. If nautilus starts, the
 system continues without problems until X is terminated and restarted.
 
 I managed to get a panic printout by switching back to vty0 (console)
 while the X startup was in progress and I am entering the panic by
 hand. Slight chance of a typo, but I have checked it a couple of
 times.
 
 For some reason I can't explain, I didn't get a crash dump, but I
 probably can get one after a future crash. The easy fix is to remove
 the DVD/CD-RW drive. FWIW, the system is an IBM T30 and it happens
 with either APM or ACPI. I am attaching the dmesg and the config
 file. Hopefully the mailer won't strip them.
 
 Fatal trap 18: integer divide fault while in kernel mode
 instruction pointer = 0x8:oxc0139a8b
 stack pointer   = 0x10:0xdd5b6a38
 frame pointer   = 0x10:0xdd5b6a80
 code segment= base 0x0, limit 0x, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= Interrupt enabled, resume, IOPL = 0
 current process = 737 (nautilus)
 kernel: type 18 trap, code 0
 Stopped at  cdstart+0xcb:   divl0x30(%ebx), %eax
 db tr
 cdstart(c419d500,c4192000,1,c407cc30,c407cc00) at cdstart+0xcb
 xpt_run_dev_allocq(c40b8c00,c407cc08,1,c418d800,c419d500) at 
 xpt_run_dev_allocq+0xab
 xpt_schedule(c419d500,1,0,ce54ec78,dd5b6c70) at xpt_schedule+0xca
 cdstrategy(ce54ec78,0,0,0,d439f000) at cdstrategy+0x88
 physio(c4197700,dd5b6c70,10,dd5b6b78,c03f4900) at physio+0x2df
 spec_read(dd5b6bd0,dd5b6c20,c02b35e3,dd5b6bd0,1020002) at spec_read+0x19a
 spec_vnoperate(dd5b6bd0,1020002,c470c850,0,dd5b6c70) at spec_vnoperate+0x18
 vn_read(c489d8c4,dd5b6c70,c478ee00,0,c470c850) at vn_read+0x1a3
 dofileread(c470c850,c489d8c4,12,bfbfeb40,800) at dofileread+0xd9
 read(c470c850,dd5b6d10,c,c,3) at read+0x6b
 syscall(2f,2f,2f,80cb000,0) at syscall+0x2b0
 Xint0x80_syscall() at Xint0x80_syscall+0x1d
 --- syscall (3, FreeBSD ELF32, read), eip = 0x28da2b5f, esp = 0xbfbfeadc,ebp = 
 0xbfbfeb08 ---
 db

Other folks have reported seeing bogus values returned from read capacity
for atapicam-attached driver.

I've seen it on my laptop as well (which runs -current).  (Only since the
ATAng code went in.  It worked fine before.)

cdstart() uses the blocksize to try to figure out the LBA to pass to the
SCSI read or write commands, so that's likely what's causing the integer
divide fault.

What does dmesg say about the size of the disk in the drive?  Do you have a
CD in the drive?

What happens when you do:

camcontrol cmd cd0 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4

That should give you the media size and blocksize of the CD in the drive,
or an error if you don't have any media.

If you're getting bogus values for the media/blocksize, or if it says
there's a disk there when there isn't one, then you've got a problem either
with the ATAPI or atapicam code.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: wi0: cardbus card activation failed

2003-09-04 Thread Kenneth D. Merry
On Thu, Sep 04, 2003 at 10:02:10 -0600, M. Warner Losh wrote:
 In message: [EMAIL PROTECTED]
 [EMAIL PROTECTED] writes:
 : Hello,
 : I have same problem as 
 http://lists.freebsd.org/pipermail/freebsd-current/2003-August/008948.html,
 :  but with other PCMCIA card - new Proxim Orinoco.
 : When I plug it in, I get:
 : 
 : cardbus0: network, ethernet at device 0.0 (no driver attached)
 : cbb0: CardBus card activation failed
 : 
 : I use kernel from yestereday, pccbb.c ver 1.95.
 
 You need a driver.  This message says that none attached.  Maybe
 FreeBSD doesn't support this chip yet?  Maybe it is supported by the
 atheros driver (ath).

I have the same problem with my fxp card, although it isn't because I don't
have a driver or I don't have the card attached.

If I pull it out and re-insert it, it probes.

When I boot:

cardbus0: Resource not specified in CIS: id=10, size=2000
cardbus0: network, ethernet at device 0.0 (no driver attached)
cbb0: CardBus card activation failed

After re-inserting it:

fxp0: Intel 82559ER Pro/100 Ethernet port 0x1000-0x103f mem 
0xf602-0xf603,0xf604-0xf6040fff irq 11 at device 0.0 on cardbus0
fxp0: Ethernet address 00:03:47:49:82:2b
miibus0: MII bus on fxp0
inphy0: i82555 10/100 media interface on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

(This is -current from August 16th.)

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: make buildworld errors (libcam)

2003-09-03 Thread Kenneth D. Merry
On Wed, Sep 03, 2003 at 03:03:04 -0700, Don Lewis wrote:
 On  3 Sep, Michael Bretterklieber wrote:
  Hi,
  
  buildworld fails (cvsup some minutes ago):
  In file included from /usr/src/sys/cam/scsi/scsi_da.c:51:
  /usr/src/sys/sys/taskqueue.h:33:2: #error no user-servicable parts
  inside
  mkdep: compile failed
 
 The following patch works for me:

Ack!  Sorry about that!  Pass the pointy hat...

It's fixed now.

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: need some debugging help

2003-09-01 Thread Kenneth D. Merry
On Sun, Aug 31, 2003 at 12:52:47 +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Kenneth D. Merry writes:
 
 Anyway, I got some debugging output, and I've attached dmesg output.  Let
 me know whether anything in there looks suspicious or points to a possible
 problem.
 
 There's nothing which jumps out at me, and I guess the best strategy is
 hunting down the devbuf thing by changing all users of M_DEVBUF until
 something trips...

Thanks.  That did the trick.

As it turns out, it was a one-line problem in the da(4) patches that was
causing the problem.

Anyway, that's fixed, and things seem to work fine.  I've attached a new
version of the patches.  I'll try to come up with a -stable version that'll
fix things there as well.

If anyone wants to take a look at the way I'm using mutexes here,
especially in the new taskqueue thread, I'd appreciate it.

In particular, I went through some interesting permutations in
taskqueue_kthread() to make things work right:

 - I tried holding Giant when calling tsleep, but it complained that I
   didn't own Giant.

 - I tried not holding a mutex at all when calling tsleep, but ran into
   this assert in msleep():

KASSERT(timo != 0 || mtx_owned(Giant) || mtx != NULL,
(sleeping without a mutex));

 - I tried just holding a mutex all the time, but obviously you can't
   malloc while holding a mutex (except Giant), and the sysctl code does a
   number of mallocs.  (The original cause of this problem -- M_WAITOK
   mallocs.)

So in the end, I just acquire a mutex, drop it for taskqueue_run(),
re-acquire it and and pass it into the msleep call so that it can drop it
and re-acquire it for me.  There's no other reason for it.  The taskqueue
stuff already has its own mutex that isn't exposed to taskqueue_run(), and
it shouldn't be held anyway when the task's function is called.

I also put code in the sysctl functions in the cd(4) and da(4) drivers to
acquire Giant, since I'm assuming that the sysctl code needs it.

Comments are welcome.

Thanks,

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
 //depot/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c#39 - 
/usr/home/ken/perforce2/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c 
*** /tmp/tmp.319.0  Mon Sep  1 00:33:39 2003
--- /usr/home/ken/perforce2/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c  Mon Sep  1 
00:21:23 2003
***
*** 62,67 
--- 62,68 
  #include sys/dvdio.h
  #include sys/devicestat.h
  #include sys/sysctl.h
+ #include sys/taskqueue.h
  
  #include cam/cam.h
  #include cam/cam_ccb.h
***
*** 154,159 
--- 155,161 
eventhandler_tagclonetag;
int minimum_command_size;
int outstanding_cmds;
+   struct task sysctl_task;
struct sysctl_ctx_list  sysctl_ctx;
struct sysctl_oid   *sysctl_tree;
STAILQ_HEAD(, cd_mode_params)   mode_queue;
***
*** 598,603 
--- 600,642 
}
  }
  
+ static void
+ cdsysctlinit(void *context, int pending)
+ {
+   struct cam_periph *periph;
+   struct cd_softc *softc;
+   char tmpstr[80], tmpstr2[80];
+ 
+   periph = (struct cam_periph *)context;
+   softc = (struct cd_softc *)periph-softc;
+ 
+   snprintf(tmpstr, sizeof(tmpstr), CAM CD unit %d, periph-unit_number);
+   snprintf(tmpstr2, sizeof(tmpstr2), %d, periph-unit_number);
+ 
+   mtx_lock(Giant);
+ 
+   sysctl_ctx_init(softc-sysctl_ctx);
+   softc-sysctl_tree = SYSCTL_ADD_NODE(softc-sysctl_ctx,
+   SYSCTL_STATIC_CHILDREN(_kern_cam_cd), OID_AUTO,
+   tmpstr2, CTLFLAG_RD, 0, tmpstr);
+ 
+   if (softc-sysctl_tree == NULL) {
+   printf(cdsysctlinit: unable to allocate sysctl tree\n);
+   return;
+   }
+ 
+   /*
+* Now register the sysctl handler, so the user can the value on
+* the fly.
+*/
+   SYSCTL_ADD_PROC(softc-sysctl_ctx,SYSCTL_CHILDREN(softc-sysctl_tree),
+   OID_AUTO, minimum_cmd_size, CTLTYPE_INT | CTLFLAG_RW,
+   softc-minimum_command_size, 0, cdcmdsizesysctl, I,
+   Minimum CDB size);
+ 
+   mtx_unlock(Giant);
+ }
+ 
  /*
   * We have a handler function for this so we can check the values when the
   * user sets them, instead of every time we look at them.
***
*** 642,648 
struct ccb_setasync csa;
struct ccb_pathinq cpi;
struct ccb_getdev *cgd;
!   char tmpstr[80], tmpstr2[80];
caddr_t match;
  
cgd = (struct ccb_getdev *)arg;
--- 681,687 
struct ccb_setasync csa;
struct ccb_pathinq cpi;
struct ccb_getdev *cgd;
!   char tmpstr[80];
caddr_t match;
  
cgd = (struct ccb_getdev *)arg;
***
*** 696,712 
if (cpi.ccb_h.status == CAM_REQ_CMP  (cpi.hba_misc  PIM_NO_6_BYTE))
softc-quirks |= CD_Q_10_BYTE_ONLY;
  
!   snprintf(tmpstr, sizeof

Re: need some debugging help

2003-09-01 Thread Kenneth D. Merry
On Mon, Sep 01, 2003 at 02:23:18 +0200, Pawel Jakub Dawidek wrote:
 On Mon, Sep 01, 2003 at 02:13:45AM +0200, Pawel Jakub Dawidek wrote:
 + I was getting same panics while I was working on GEOM Gate.
 + After many hours of debugging I've tracked this down - I've initialized
 + a mutex, but I haven't destroy it.
 + 
 + As I susspect you're loading cd(4) as kld module?
 
 No, you don't need to load it as kld module, because you initiate
 this mutex on every function call (and mutex is locally allocated to),
 so try to put mtx_destroy() on the end of this function, this should help.
 (I hope there is no problem with calling msleep(9) with mutex from stack)

Well, keep in mind that this function, taskqueue_kthread(), is only called
once, when the kthread is forked off.  It then runs in an infinite loop
forever.

So far it doesn't seem like there's any problem with calling msleep() with
a mutex allocated on the stack.

The problem I was having turned out to be that I forgot to deference
periph-softc in dasysctlinit().

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


  1   2   3   4   >