Re: Time to increase MAXPHYS?
On Sun, Jun 04, 2017 at 09:52:36 +0200, Hans Petter Selasky wrote: > On 06/04/17 09:39, Tomoaki AOKI wrote: > > Hi > > > > One possibility would be to make it MD build-time OTIONS, > > defaulting 1M on regular systems and 128k on smaller systems. > > > > Of course I guess making it a tunable (or sysctl) would be best, > > though. > > > > Hi, > > A tunable sysctl would be fine, but beware that commonly used firmware > out there produced in the millions might hang in a non-recoverable way > if you exceed their "internal limits". Conditionally lowering this > definition is fine, but increasing it needs to be carefully verified. > > For example many USB devices are only tested with OS'es like Windows and > MacOS and if these have any kind of limitation on the SCSI transfer > sizes, it is very likely many devices out there do not support any > larger transfer sizes either. I agree that I'd like to see a tunable. We've been using a MAXPHYS value slightly larger than 1MB at Spectra for years with no problems, but then again, we're only running on newer hardware. If we keep DFLTPHYS the same (64K) or come up with another constant that is defined to 64K, the way the da(4) and sa(4) handle things will keep most older controllers working properly. Here is what da(4) does: if (cpi.maxio == 0) softc->maxio = DFLTPHYS;/* traditional default */ else if (cpi.maxio > MAXPHYS) softc->maxio = MAXPHYS; /* for safety */ else softc->maxio = cpi.maxio; softc->disk->d_maxsize = softc->maxio; cpi is the XPT_PATH_INQ CCB. The maxio field was added later, so older, unmodified drivers that haven't set the maxio field default to a 64K I/O size. Drivers for some of the more common SAS and FC hardware set maxio to a value that is correct for the hardware. (e.g. mpt(4), mps(4), mpr(4), and isp(4) all set it correctly.) As Warner pointed out, the way ahci(4) works is that it sets its maximum I/O size to MAXPHYS. The question is, does all AHCI hardware support arbitrary transfer sizes? Is there a way to figure out what the hardware supports, and if not, we should probably default it to 128K instead of MAXPHYS. Tape drives are another related issue. Tape block sizes up to 1MB are pretty common. LTFS allows for blocksizes up to 1MB. You can't currently read a tape with a 1MB blocksize on FreeBSD without bumping MAXPHYS and having a controller and tape drive that can handle the larger blocksize. The sa(4) driver has the same logic as the da(4) driver for limiting transfer sizes to the smaller of MAXPHYS and cpi.maxio. The sa(4) driver gives the user some tools for figuring things out: {sm4u-1-mgmt:/root:!:1} mt status -v Drive: sa0: Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1048576 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1048576 bytes On this particular FreeBSD/head machine, I have MAXPHYS set to 1MB. The controller (isp(4)) supports ~5MB I/O sizes and the drive (IBM LTO-5) supports ~8MB I/O, but MAXPHYS is set to 1MB, so that is the limit. I have considered changing the sa(4) driver to not use physio(9), and instead use a custom allocator to allow reading and writing tapes with blocksizes up to what the hardware (combination of tape drive and controller) allows. I haven't gotten around to it yet, because bumping MAXPHYS works well enough in most cases. It also has a nice side effect of allowing unmapped I/O. The pass(4) driver limits I/O sizes in the same way as the da(4) and sa(4) drivers for CCBs sent via the blocking (CAMIOCOMMAND) ioctl, but for CCBs sent via the asynchronous API, the only limit is the controller (cpi.maxio) limit. The latter is because the buffers for the asynchronous interface are malloced. If it were possible to send arbitrary sized, unmapped S/G lists, then we could convert the asynchronous pass(4) interface to do unmapped I/O. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to
Heads Up: struct disk KBI change
This will break binary compatibility for loadable modules that depend on struct disk. DISK_VERSION has been bumped, and I bumped __FreeBSD_version in a subsequent change. So, if you have module that uses struct disk, you'll need to recompile against the latest version of head. Ken - Forwarded message from "Kenneth D. Merry" <k...@freebsd.org> - Date: Tue, 21 Jun 2016 20:18:19 +0000 (UTC) From: "Kenneth D. Merry" <k...@freebsd.org> To: src-committ...@freebsd.org, svn-src-...@freebsd.org, svn-src-h...@freebsd.org Subject: svn commit: r302069 - head/sys/geom Author: ken Date: Tue Jun 21 20:18:19 2016 New Revision: 302069 URL: https://svnweb.freebsd.org/changeset/base/302069 Log: Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h:Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius) Modified: head/sys/geom/geom_disk.c head/sys/geom/geom_disk.h - End forwarded message - -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Recognizing SMR HDDs
On Thu, May 26, 2016 at 16:42:10 +0200, Gary Jennejohn wrote: > On Thu, 26 May 2016 10:10:14 -0400 > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > On Thu, May 26, 2016 at 16:00:41 +0200, Gary Jennejohn wrote: > > > protocol ATA/ATAPI-9 SATA 3.x > > > device model ST8000AS0002-1NA17Z > > > firmware revision AR13 > > > > The firmware is old, the current version is AR17. You should really ask > > Seagate for updated firmware. > > > > The Download Finder on the Seagate site claims that there is no newer > firmware. > > So the question is, how to get the latest AR17 version from Seagate > as a simple consumer? I would contact Seagate support and ask. By the way, I've been able to download firmware for Seagate SATA drives via camcontrol when they're attached via SATA and SAS controllers. I've never tried it with USB. I think camcontrol identify it as a SCSI protocol drive and as a result may not let you download firmware because it doesn't recognize vendor "ST8000AS". So, assuming you get firmware from them, I would suggest upgrading it using whatever Windows or Linux tool they give you. (I'll brick drives in my lab at work, but I'd hate for you to brick your own drive.) If you want to use camcontrol to do it, take it out of the USB enclosure and hook it directly to a SATA or SAS controller. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Recognizing SMR HDDs
On Thu, May 26, 2016 at 15:02:24 +0100, Igor Mozolevsky wrote: > On 26 May 2016 at 14:41, Kenneth D. Merry <k...@freebsd.org> wrote: > > > On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote: > > > On Thu, 26 May 2016 08:34:45 -0400 > > > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > > > > > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote: > > > > What kind of drive is it? > > > > > > > > > > ST8000AS 0002-1NA17Z 0X03 > > > > [snip] > > > > Yes. There is something slightly odd about the Inquiry data you pasted > > above. Seagate didn't set the bits in the ATA identify data to mark it as > > a Drive Managed drive, so I put in a quirk entry to mark it as Drive > > Managed. > > > > Unfortunately with Drive Managed drives that is really all you know. You > > don't know the zone boundaries or states. But, it is useful to know that > > you really should write sequentially for good performance. (True of any > > drive, but especially true with SMR drives.) > > > > The drive is supposed to have Word 69 set to 0x0001 and support ZAC MGMT > IN/OUT - > http://www.seagate.com/www-content/product-content/hdd-fam/seagate-archive-hdd/en-us/docs/100795782a.pdf > at pg. 24 and 28. That is a different drive. He has ST8000AS0002, which is a Drive Managed drive. The doc above is for ST8000AS0022, which is a Host Aware drive. > Incidentally AR17 firmware is a new batch, perhaps Seagate did what they > did with -DL003 drives where the early models reported 512n sector size (so > as not to confuse computers) and the later models properly reported 4kn > sector size? Yes, AR17 is the latest firmware. He really needs to upgrade, there are bugs with older versions. AR17 firmware reports the same thing in terms of sector size. For instance, from one of mine: rotocol ATA/ATAPI-9 SATA 3.x device model ST8000AS0002-1NA17Z firmware revision AR17 serial number Z8409926 WWN 5000c50086f84017 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 4096, offset 0 LBA supported 268435455 sectors LBA48 supported 15628053168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 5980 Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Recognizing SMR HDDs
On Thu, May 26, 2016 at 16:00:41 +0200, Gary Jennejohn wrote: > On Thu, 26 May 2016 09:41:20 -0400 > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote: > > > On Thu, 26 May 2016 08:34:45 -0400 > > > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > > > > > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote: > > > > What kind of drive is it? > > > > > > > > > > ST8000AS 0002-1NA17Z 0X03 > > > > Can you send the output of 'camcontrol inquiry daX -v' and > > 'camcontrol identify daX -v'? > > > > There is a quirk for that particular drive to identify it as Drive Managed. > > When attached behind a SAS controller it looks like this: > > > > # camcontrol inquiry da12 -v > > pass12: Fixed Direct Access SPC-4 SCSI device > > pass12: Serial Number Z8407Y52 > > pass12: 600.000MB/s transfers, Command Queueing Enabled > > > > Thanks for the info. > > Here the requested output: > > camcontrol inquiry da0 -v > pass5: Fixed Direct Access SPC-4 SCSI device > pass5: Serial Number > pass5: 400.000MB/s transfers Okay. Looks like the USB to SATA chip is perhaps mangling the model number. I'm guessing that is the "standard" way to do it, but it is unfortunate. > camcontrol identify da0 -v > camcontrol: sending ATA ATA_IDENTIFY via pass_16 with timeout of 3 msecs > pass5: Raw identify data: >0: 0c5a 3fff c837 0010 003f >8: 2020 2020 2020 2020 2020 2020 > 16: 5a38 3430 3339 4738 8000 4152 > 24: 3133 2020 2020 5354 3830 3030 4153 3030 > 32: 3032 2d31 4e41 3137 5a20 2020 2020 2020 > 40: 2020 2020 2020 2020 2020 2020 2020 8010 > 48: 4000 2f00 4000 0200 0200 0007 3fff 0010 > 56: 003f fc10 00fb 5c10 0fff 0007 > 64: 0003 0078 0078 0078 0078 > 72: 001f 8d0e 0004 00cc 0040 > 80: 03f0 001f 346b 7d61 6163 3469 bc41 6163 > 88: 407f 81e7 81e7 fffe fe00 > 96: 2ab0 a381 0003 > 104: 6003 5000 c500 7b0e 5cbe > 112: 40dc > 120: 409c > 128: 0021 2ab0 a381 2ab0 a381 2020 0002 0140 > 136: 0108 5000 3c06 3c0a 003c 0008 > 144: bdff 0280 0008 > 152: 8000 0184 8b00 8008 > 160: > 168: > 176: > 184: > 192: > 200: 30a5 > 208: 4000 > 216: 175c 107f > 224: > 232: > 240: > 248: 6aa5 > > camcontrol: sending ATA READ_NATIVE_MAX_ADDRESS48 via pass_16 with timeout of > 1000 msecs > pass5: Raw native max data: >0: > error = 0x00, sector_count = 0x, device = 0x00, status = 0x00 > pass5: ACS-2 ATA SATA 3.x device > pass5: 400.000MB/s transfers > > protocol ATA/ATAPI-9 SATA 3.x > device model ST8000AS0002-1NA17Z > firmware revision AR13 The firmware is old, the current version is AR17. You should really ask Seagate for updated firmware. > serial number Z84039G8 > WWN 5000c5007b0e5cbe > cylinders 16383 > heads 16 > sectors/track 63 > sector size logical 512, physical 4096, offset 0 > LBA supported 268435455 sectors > LBA48 supported 15628053168 sectors > PIO supported PIO4 > DMA supported WDMA2 UDMA6 > media RPM 5980 > > Feature Support Enabled Value Vendor > read ahead yes yes > write cacheyes yes > flush cacheyes yes > overlapno > Tagged Command Queuing (TCQ) no no > Native Command Queuing (NCQ) yes 32 tags > NCQ Queue Management no > NCQ Streaming no > Receive & Send FPDMA Queuedno > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced powe
Re: Recognizing SMR HDDs
On Thu, May 26, 2016 at 15:29:21 +0200, Gary Jennejohn wrote: > On Thu, 26 May 2016 08:34:45 -0400 > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote: > > What kind of drive is it? > > > > ST8000AS 0002-1NA17Z 0X03 Can you send the output of 'camcontrol inquiry daX -v' and 'camcontrol identify daX -v'? There is a quirk for that particular drive to identify it as Drive Managed. When attached behind a SAS controller it looks like this: # camcontrol inquiry da12 -v pass12: Fixed Direct Access SPC-4 SCSI device pass12: Serial Number Z8407Y52 pass12: 600.000MB/s transfers, Command Queueing Enabled > > Here are some things you can do on any disk to see what it is: > > > > diskinfo -v /dev/daX > > > > I don't have the new versions of these utilities installed, so I can't > get any of this neat diskinfo/zonectl information. > > > # sysctl kern.cam.da.19 > > kern.cam.da.19.sort_io_queue: -1 > > kern.cam.da.19.rotating: 1 > > kern.cam.da.19.unmapped_io: 1 > > kern.cam.da.19.error_inject: 0 > > [ begin SMR fields ] > > kern.cam.da.19.max_seq_zones: 0 > > kern.cam.da.19.optimal_nonseq_zones: 0 > > kern.cam.da.19.optimal_seq_zones: 0 > > kern.cam.da.19.zone_support: None > > kern.cam.da.19.zone_mode: Drive Managed > > [ begin SMR fields ] > > kern.cam.da.19.minimum_cmd_size: 6 > > kern.cam.da.19.delete_max: 262144 > > kern.cam.da.19.delete_method: NONE > > > > My drive shows this; > sysctl kern.cam.da.0 > kern.cam.da.0.sort_io_queue: -1 > kern.cam.da.0.rotating: 1 > kern.cam.da.0.unmapped_io: 0 > kern.cam.da.0.error_inject: 0 > kern.cam.da.0.max_seq_zones: 0 > kern.cam.da.0.optimal_nonseq_zones: 0 > kern.cam.da.0.optimal_seq_zones: 0 > kern.cam.da.0.zone_support: None > kern.cam.da.0.zone_mode: Not Zoned <== I guess it can't be managed > kern.cam.da.0.minimum_cmd_size: 10 > kern.cam.da.0.delete_max: 131072 > kern.cam.da.0.delete_method: NONE > > In fact, the ouput for every one of the 4 drives in the enclosure is > the same, even though the other three are non-SMR SATA drives. Yes. There is something slightly odd about the Inquiry data you pasted above. Seagate didn't set the bits in the ATA identify data to mark it as a Drive Managed drive, so I put in a quirk entry to mark it as Drive Managed. Unfortunately with Drive Managed drives that is really all you know. You don't know the zone boundaries or states. But, it is useful to know that you really should write sequentially for good performance. (True of any drive, but especially true with SMR drives.) Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Recognizing SMR HDDs
On Thu, May 26, 2016 at 08:42:53 +0200, Gary Jennejohn wrote: > Now that ken@ has checked in the SMR code I'm wondering how I can see > whether it's having any effect. > > I have a 8TB SMR disk in a USB3 enclosure. Does the kernel emit any > sort of trace to indicate that it sees the drive as SMR and takes > that into account? There is nothing extra emitted in the dmesg to tell you it is an SMR drive, you have to look. > I have the probe trace enabled in my kernel config, but I don't see > anything special pop out when I turn the drive on. You'll see extra states in the probe compared to a standard drive if it is Host Aware or Host Managed. You won't see those states if it is Drive Managed. > Does the fact that the drive appears as a /dev/daX play any role? It shouldn't matter. I put changes in both the da(4) and ada(4) drivers to support SMR drives. And the changes should work even when you have an ATA protocol drive attached via a SCSI transport. Which is likely the case with your drive. What kind of drive is it? Here are some things you can do on any disk to see what it is: diskinfo -v /dev/daX For example: # diskinfo -v /dev/da18 /dev/da18 512 # sectorsize 8001563222016 # mediasize in bytes (7.3T) 15628053168 # mediasize in sectors 4096# stripesize 0 # stripeoffset 972801 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Z84003SK# Disk ident. id1,enc@n5003048001f311fd/type@0/slot@13/elmdesc@Slot_19# Physical path Host_Aware # Zone Mode So this is a Host Aware drive. zonectl -c params -d /dev/daX # zonectl -c params -d /dev/da18 Zone Mode: Host Aware Command support: Report Zones, Open, Close, Finish, Reset Write Pointer Unrestricted Read in Sequential Write Required Zone (URSWRZ): No Optimal Number of Open Sequential Write Preferred Zones: 128 Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: 8 Maximum Number of Open Sequential Write Required Zones: Unlimited If I issue the same command on a drive managed SMR drive: # zonectl -c params -d /dev/da19 Zone Mode: Drive Managed Command support: None Unrestricted Read in Sequential Write Required Zone (URSWRZ): No Optimal Number of Open Sequential Write Preferred Zones: Not Set Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: Not Set Maximum Number of Open Sequential Write Required Zones: Not Set sysctl kern.cam.da.X # sysctl kern.cam.da.18 kern.cam.da.18.sort_io_queue: -1 kern.cam.da.18.rotating: 1 kern.cam.da.18.unmapped_io: 1 kern.cam.da.18.error_inject: 0 [ begin SMR fields ] kern.cam.da.18.max_seq_zones: 4294967295 kern.cam.da.18.optimal_nonseq_zones: 8 kern.cam.da.18.optimal_seq_zones: 128 kern.cam.da.18.zone_support: Report Zones, Open, Close, Finish, Reset Write Pointer kern.cam.da.18.zone_mode: Host Aware [ end SMR fields ] kern.cam.da.18.minimum_cmd_size: 6 kern.cam.da.18.delete_max: 262144 kern.cam.da.18.delete_method: NONE # sysctl kern.cam.da.19 kern.cam.da.19.sort_io_queue: -1 kern.cam.da.19.rotating: 1 kern.cam.da.19.unmapped_io: 1 kern.cam.da.19.error_inject: 0 [ begin SMR fields ] kern.cam.da.19.max_seq_zones: 0 kern.cam.da.19.optimal_nonseq_zones: 0 kern.cam.da.19.optimal_seq_zones: 0 kern.cam.da.19.zone_support: None kern.cam.da.19.zone_mode: Drive Managed [ begin SMR fields ] kern.cam.da.19.minimum_cmd_size: 6 kern.cam.da.19.delete_max: 262144 kern.cam.da.19.delete_method: NONE If you have a Host Aware or Host Managed drive, you can get the list of zones and their status, reset the write pointer, etc. Ask the drive (via camcontrol(8)) to list all zones on a Host Aware drive (but truncate the output to 10 lines): # camcontrol zone da18 -v -c rz |head -10 29809 zones, Maximum LBA 0x3a3812aaf (15628053167) Zone lengths and types may vary Start LBA Length WP LBA Zone Type Condition Sequential Reset 0, 524288, 0x8, Conventional, NWP, Sequential, No Reset Needed 0x8, 524288,0x10, Conventional, NWP, Sequential, No Reset Needed 0x10, 524288,0x18, Conventional, NWP, Sequential, No Reset Needed 0x18, 524288,0x20, Conventional, NWP, Sequential, No Reset Needed 0x20, 524288,0x28, Conventional, NWP, Sequential, No Reset Needed 0x28, 524288,0x30, Conventional, NWP, Sequential, No Reset Needed 0x30, 524288,0x38, Conventional, NWP, Sequential, No Reset Needed Ask the drive (via zonectl(8)) to report zones that are in the Full state: # zonectl -d /dev/da18 -c rz -o full |head -10 192 zones, Maximum LBA 0x3a3812aaf (15628053167) Zone lengths and types may vary
Re: AHCI/ADA regression?
On Wed, May 25, 2016 at 14:36:59 +0200, Gary Jennejohn wrote: > On Wed, 25 May 2016 08:15:11 +0200 > Gary Jennejohn <gljennj...@gmail.com> wrote: > > > On Tue, 24 May 2016 15:10:41 -0400 > > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > > > Can you send full dmesg output from the working kernel? > > > > > > > I'll give it a try and hope that the mail server doesn't strip it ==> > > dmesg.boot.gz. > > > > > It looks like you have some ATAPI devcies on your machine (signature > > > eb14). > > > They would likely be attaching to the da(4) driver if they are disks, and > > > that is a different code path. > > > > > > > The one and only ATAPI device is cd0. > > > > OK, it appears that one of the ATA fixes ken@ recently committed > fixed my problem also. Great! I'm glad it's working! > I'm now at r300677 and booting succeeds. > > I guess the ATAPI DVD drive was the culprite. It was most likely the Samsung hard drive. This drive is the exact same model that Alex Petrov also had problems with: ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada1: ATA8-ACS SATA 2.x device ada1: Serial Number S0MUJ1KP317818 ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 476938MB (976771055 512 byte sectors) It claims to support Read Log, but actually doesn't. The change I checked in in revision 300640 will only send a Read Log (and additional SMR probe steps) to drives that claim they're SMR drives. Any non-SMR drives should get the same probe as before. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ATA? related trouble with r300299
On Wed, May 25, 2016 at 07:35:06 +0700, Alex V. Petrov wrote: > > > 25.05.16 03:18, Kenneth D. Merry ??: > > On Tue, May 24, 2016 at 21:59:53 +0700, Alex V. Petrov wrote: > >> 24.05.16 20:21, Kenneth D. Merry ??: > >>> On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > >>>> On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > >>>>> On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > >>>>>> On Monday 23 May 2016 17:30:45 you wrote: > >>>>>>> On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > >>>>>>>> On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > >>>>>>>>> On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > >>>>>>>>>> On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > >>>>>>>>>>> On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > >>>>>>>>>>>> On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > >>>>>>>>>>>>> On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman > >>>> wrote: > >>>>>>>>>>>>>> I have faced the issue with fresh CURRENT stopped to boot > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>> my > >>>>>>>>>>>>>> old > >>>>>>>>>>>>>> desktop > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> after update to r300299 > >>>>>>>>>>>>>> Verbose boot shows the endless cycle of > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ata2: SATA reset: ports status=0x05 > >>>>>>>>>>>>>> ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > >>>>>>>>>>>>>> ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > >>>>>>>>>>>>>> ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > >>>>>>>>>>>>>> ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > >>>>>>>>>>>>>> messages logged to console. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Below is the relevant portion of ATA controller/devices > >>>>>>>>>>>>>> probed/attached > >>>>>>>>>>>>>> during the boot: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> atapci0: port > >>>>>>>>>>>>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at > >>>>>>>>>>>>>> device > >>>>>>>>>>>>>> 31.1 > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>> pci0 > >>>>>>>>>>>>>> ata0: at channel 0 on atapci0 > >>>>>>>>>>>>>> atapci1: port > >>>>>>>>>>>>>> 0xd080-0xd087, > >>>>>>>>>>>>>> 0xd000-0xd003, > >>>>>>>>>>>>>> 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device > >>>>>>>>>>>>>> 31.2 on > >>>>>>>>>>>>>> pci0 > >>>>>>>>>>>>>> ata2: at channel 0 on atapci1 > >>>>>>>>>>>>>> ata3: at channel 1 on atapci1 > >>>>>>>>>>>>>> ada0 at ata2 bus 0 scbus1 target 0 lun 0 > >>>>>>>>>>>>>> ada0: ATA-7 SATA 2.x device > >>>>>>>>>>>>>> ada1 at ata2 bus 0 scbus1 target 1 lun 0 > >>>>>>>>>>>>>> ada1: ATA8-ACS SATA 3.x device > >>>>>>>>>>>>>> cd0 at ata0 bus 0 scbus0 target 0 lun 0 > >>>>>>>>>>>>>> cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI > >>>>>>>>>>>>>> device > >>>>>&
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 23:54:09 +0300, Oleg V. Nauman wrote: > On Tuesday 24 May 2016 16:17:33 you wrote: > > Okay, I've got a basic idea of what may be going on. The resets that are > > getting sent are triggering another probe, which then triggers a reset, > > which triggers a probe...and so on. > > > > So here is another patch that should work for you: > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160524.2.txt > > > > I have commented out the quirk for this drive, and the driver will now only > > start the SMR probe on drives that claim to be SMR-capable. So, for the > > vast majority of drives out there right now, it won't even start the extra > > probe steps. > > It fixes this issue. I was able to boot with your latest patch. Great! I'll check it in with that fix as well as a quirk entry. That way, if we have other reasons later on to issue a read log, we'll know that it doesn't work for those drives. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 21:59:53 +0700, Alex V. Petrov wrote: > 24.05.16 20:21, Kenneth D. Merry ??: > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > >> On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > >>> On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > >>>> On Monday 23 May 2016 17:30:45 you wrote: > >>>>> On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > >>>>>> On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > >>>>>>> On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > >>>>>>>> On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > >>>>>>>>> On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > >>>>>>>>>> On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > >>>>>>>>>>> On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman > >> wrote: > >>>>>>>>>>>> I have faced the issue with fresh CURRENT stopped to boot > >>>>>>>>>>>> on > >>>>>>>>>>>> my > >>>>>>>>>>>> old > >>>>>>>>>>>> desktop > >>>>>>>>>>>> > >>>>>>>>>>>> after update to r300299 > >>>>>>>>>>>> Verbose boot shows the endless cycle of > >>>>>>>>>>>> > >>>>>>>>>>>> ata2: SATA reset: ports status=0x05 > >>>>>>>>>>>> ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > >>>>>>>>>>>> ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > >>>>>>>>>>>> ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > >>>>>>>>>>>> ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > >>>>>>>>>>>> messages logged to console. > >>>>>>>>>>>> > >>>>>>>>>>>> Below is the relevant portion of ATA controller/devices > >>>>>>>>>>>> probed/attached > >>>>>>>>>>>> during the boot: > >>>>>>>>>>>> > >>>>>>>>>>>> atapci0: port > >>>>>>>>>>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at > >>>>>>>>>>>> device > >>>>>>>>>>>> 31.1 > >>>>>>>>>>>> on > >>>>>>>>>>>> pci0 > >>>>>>>>>>>> ata0: at channel 0 on atapci0 > >>>>>>>>>>>> atapci1: port > >>>>>>>>>>>> 0xd080-0xd087, > >>>>>>>>>>>> 0xd000-0xd003, > >>>>>>>>>>>> 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device > >>>>>>>>>>>> 31.2 on > >>>>>>>>>>>> pci0 > >>>>>>>>>>>> ata2: at channel 0 on atapci1 > >>>>>>>>>>>> ata3: at channel 1 on atapci1 > >>>>>>>>>>>> ada0 at ata2 bus 0 scbus1 target 0 lun 0 > >>>>>>>>>>>> ada0: ATA-7 SATA 2.x device > >>>>>>>>>>>> ada1 at ata2 bus 0 scbus1 target 1 lun 0 > >>>>>>>>>>>> ada1: ATA8-ACS SATA 3.x device > >>>>>>>>>>>> cd0 at ata0 bus 0 scbus0 target 0 lun 0 > >>>>>>>>>>>> cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI > >>>>>>>>>>>> device > >>>>>>>>>>> > >>>>>>>>>>> I'm not entirely sure what is causing the problem with your > >>>>>>>>>>> system, > >>>>>>>>>>> but > >>>>>>>>>>> hopefully we can narrow it down a bit. > >>>>>>>>>>> > >>>>>>>>>>> There is a bug that came in with my SMR changes in revision > >>>>>>>>>>> 300207 > >>>>>>>&g
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 20:46:33 +0300, Oleg V. Nauman wrote: > On Tuesday 24 May 2016 13:13:29 Kenneth D. Merry wrote: > > On Tue, May 24, 2016 at 18:21:19 +0300, Oleg V. Nauman wrote: > > > On Tuesday 24 May 2016 10:02:09 you wrote: > > > > On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote: > > > > > On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote: > > > > > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > > > > > > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > > > > > > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > > > > > > > > > On Monday 23 May 2016 17:30:45 you wrote: > > > > > > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman > wrote: > > > > > > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > > > > > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman > > > > > > wrote: > > > > > > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > > > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. > > > > > > > > > > > > > > Nauman > > > > > > > > > > wrote: > > > > > > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry > wrote: > > > > > > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. > > > > > > > > > > > > > > > > Nauman > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I have faced the issue with fresh CURRENT > > > > > > > > > > > > > > > > > stopped > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > boot > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > my > > > > > > > > > > > > > > > > > old > > > > > > > > > > > > > > > > > desktop > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > after update to r300299 > > > > > > > > > > > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > > > > > > > > > > > messages logged to console. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Below is the relevant portion of ATA > > > > > > > > > > > > > > > > > controller/devices > > > > > > > > > > > > > > > > > probed/attached > > > > > > > > > > > > > > > > > during the boot: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > atapci0: port > > > > > > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xf > > > > > > > > > > > > > > > > > faf > > > > > > > > > > > > > > > > > at > > > > > > >
Re: AHCI/ADA regression?
On Tue, May 24, 2016 at 20:00:51 +0200, Gary Jennejohn wrote: > On Tue, 24 May 2016 10:41:25 -0400 > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > > The question in my mind is - why are "empty" multiplier ports being > > > probed with the new code but not with the old code? > > > > If the HBA says that it supports port multipliers, the kernel should always > > look for them. It probes the port multiplier first, before moving on to > > look for regular targets. > > > > So, from that standpoint, it should not be any different. It sounds like > > we're either getting further in the port multiplier probe process, or there > > is something different about the way things are behaving. > > > > If you can determine which commands are timing out, that may give us an > > idea about where it is in the probe process. > > > > Here is one way we may be able to track things down... Build a kernel with > > these options: > > > > options CAMDEBUG > > options CAM_DEBUG_FLAGS=CAM_DEBUG_PROBE > > > > If you build a kernel before and after the change with those options, it > > will hopefully allow us to compare the probe sequence and get a clue about > > where to look for the problem. > > > > OK, both the old and new kernel versions do an extremely fast intial > probe with these results (note: obtained with grep over dmesg.boot): > > (aprobe0:ahcich0:0:15:0): Probe started > (aprobe0:ahcich0:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe1:ahcich1:0:15:0): Probe started > (aprobe1:ahcich1:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe2:ahcich2:0:15:0): Probe started > (aprobe2:ahcich2:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe3:ahcich3:0:15:0): Probe started > (aprobe3:ahcich3:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe4:ahcich4:0:15:0): Probe started > (aprobe4:ahcich4:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe5:ahcich5:0:15:0): Probe started > (aprobe5:ahcich5:0:15:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe4:ahcich4:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe4:ahcich4:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe4:ahcich4:0:15:0): Probe completed > (aprobe4:ahcich4:0:0:0): Probe started > (aprobe4:ahcich4:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe4:ahcich4:0:0:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe4:ahcich4:0:0:0): Probe completed > (aprobe0:ahcich0:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe2:ahcich2:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe0:ahcich0:0:15:0): SIGNATURE: > (aprobe0:ahcich0:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe0:ahcich0:0:15:0): Probe completed > (aprobe2:ahcich2:0:15:0): SIGNATURE: > (aprobe2:ahcich2:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe2:ahcich2:0:15:0): Probe completed > (aprobe0:ahcich0:0:0:0): Probe started > (aprobe0:ahcich0:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe2:ahcich2:0:0:0): Probe started > (aprobe2:ahcich2:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe3:ahcich3:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe5:ahcich5:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe0:ahcich0:0:0:0): SIGNATURE: > (aprobe0:ahcich0:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY > (aprobe2:ahcich2:0:0:0): SIGNATURE: > (aprobe2:ahcich2:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY > (aprobe1:ahcich1:0:15:0): Probe PROBE_RESET to PROBE_RESET > (aprobe3:ahcich3:0:15:0): SIGNATURE: > (aprobe3:ahcich3:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe3:ahcich3:0:15:0): Probe completed > (aprobe5:ahcich5:0:15:0): SIGNATURE: > (aprobe5:ahcich5:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe5:ahcich5:0:15:0): Probe completed > (aprobe1:ahcich1:0:15:0): SIGNATURE: eb14 > (aprobe1:ahcich1:0:15:0): Probe PROBE_RESET to PROBE_INVALID > (aprobe1:ahcich1:0:15:0): Probe completed > (aprobe1:ahcich3:0:0:0): Probe started > (aprobe1:ahcich3:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe3:ahcich5:0:0:0): Probe started > (aprobe3:ahcich5:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe4:ahcich1:0:0:0): Probe started > (aprobe4:ahcich1:0:0:0): Probe PROBE_INVALID to PROBE_RESET > (aprobe1:ahcich3:0:0:0): SIGNATURE: > (aprobe1:ahcich3:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY > (aprobe3:ahcich5:0:0:0): SIGNATURE: > (aprobe3:ahcich5:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY > (aprobe4:ahcich1:0:0:0): SIGNATURE: eb14 > (aprobe4:ahcich1:0:0:0): Probe PROBE_RESET to PROBE_IDENTIFY > (aprobe0:ahcich0:0:0:0): Probe PROBE_IDENTIFY to PROBE_SETMODE > (aprobe1:ahcich3:0:0:0): Probe PROBE_IDENTIFY to PROBE_SETMODE > (aprobe3:ahcic
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 18:21:19 +0300, Oleg V. Nauman wrote: > On Tuesday 24 May 2016 10:02:09 you wrote: > > On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote: > > > On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote: > > > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > > > > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > > > > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > > > > > > > On Monday 23 May 2016 17:30:45 you wrote: > > > > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > > > > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > > > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman > wrote: > > > > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman > > > > > > wrote: > > > > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. > > > > > > > > > > > > > > Nauman > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I have faced the issue with fresh CURRENT stopped > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > boot > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > my > > > > > > > > > > > > > > > old > > > > > > > > > > > > > > > desktop > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > after update to r300299 > > > > > > > > > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > > > > > > > > > messages logged to console. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Below is the relevant portion of ATA > > > > > > > > > > > > > > > controller/devices > > > > > > > > > > > > > > > probed/attached > > > > > > > > > > > > > > > during the boot: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > atapci0: port > > > > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > device > > > > > > > > > > > > > > > 31.1 > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > pci0 > > > > > > > > > > > > > > > ata0: at channel 0 on atapci0 > > > > > > > > > > > > > > > atapci1: port > > > > > > > > > > > > > > > 0xd080-0xd087, > > > > > > > > > > > > > > > 0xd000-0xd003, > > > > > > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 > > > > > > > > > > > > > > > at > > > > > > > > > > &g
Re: AHCI/ADA regression?
On Tue, May 24, 2016 at 15:58:28 +0200, Gary Jennejohn wrote: > On Mon, 23 May 2016 13:51:05 -0400 > "Kenneth D. Merry" <k...@freebsd.org> wrote: > > > On Sat, May 21, 2016 at 10:09:49 +0200, Gary Jennejohn wrote: > > > There appears to be a regression in AHCI/ADA behavior since r300207. > > > > > > Starting a test kernel at r300293 results in extremely long timeouts > > > probing ahcich2 for non-existent multiplier ports. > > > > > > Here some kernel output: > > > > Is this dmesg output with or without the problem? > > > > Actually it's the same with and without the problem. The only real > difference is the timeouts. Ahh. > > > ahci0: > > > port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03, > > > 0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 22 at device 17.0 on pci0 > > > > > > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported > > > > Has the controller always claimed support for Port Multipliers? > > > > Yes, this from today's dmesg.boot: > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported > > > > ahcich2: at channel 2 on ahci0 > > > > > > ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 > > > > > > /dev/ada1p1 on /home (ufs, local, journaled soft-updates) > > > > > > An older kernel at r299170 does not exhibit this peculiar behavior and > > > mounts /home with no delays. > > > > Are you able to send dmesg output before and after? > > > > Before is easy, since I have a dmesg.boot. After - I'll have to > copy it from the screen since I don't have the patience to wait > for booting to complete. The above is pretty much after, but I > can copy down the timeout messages which arise when the > mulitplier ports are probed (no disk is present on any of them, > so it takes forever). > > The question in my mind is - why are "empty" multiplier ports being > probed with the new code but not with the old code? If the HBA says that it supports port multipliers, the kernel should always look for them. It probes the port multiplier first, before moving on to look for regular targets. So, from that standpoint, it should not be any different. It sounds like we're either getting further in the port multiplier probe process, or there is something different about the way things are behaving. If you can determine which commands are timing out, that may give us an idea about where it is in the probe process. Here is one way we may be able to track things down... Build a kernel with these options: options CAMDEBUG options CAM_DEBUG_FLAGS=CAM_DEBUG_PROBE If you build a kernel before and after the change with those options, it will hopefully allow us to compare the probe sequence and get a clue about where to look for the problem. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 16:38:40 +0300, Oleg V. Nauman wrote: > On Tuesday 24 May 2016 09:21:17 Kenneth D. Merry wrote: > > On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > > > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > > > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > > > > > On Monday 23 May 2016 17:30:45 you wrote: > > > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > > > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > > > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman > wrote: > > > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman > > > > > > wrote: > > > > > > > > > > > > > I have faced the issue with fresh CURRENT stopped to > > > > > > > > > > > > > boot > > > > > > > > > > > > > on > > > > > > > > > > > > > my > > > > > > > > > > > > > old > > > > > > > > > > > > > desktop > > > > > > > > > > > > > > > > > > > > > > > > > > after update to r300299 > > > > > > > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > > > > > > > messages logged to console. > > > > > > > > > > > > > > > > > > > > > > > > > > Below is the relevant portion of ATA > > > > > > > > > > > > > controller/devices > > > > > > > > > > > > > probed/attached > > > > > > > > > > > > > during the boot: > > > > > > > > > > > > > > > > > > > > > > > > > > atapci0: port > > > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at > > > > > > > > > > > > > device > > > > > > > > > > > > > 31.1 > > > > > > > > > > > > > on > > > > > > > > > > > > > pci0 > > > > > > > > > > > > > ata0: at channel 0 on atapci0 > > > > > > > > > > > > > atapci1: port > > > > > > > > > > > > > 0xd080-0xd087, > > > > > > > > > > > > > 0xd000-0xd003, > > > > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at > > > > > > > > > > > > > device > > > > > > > > > > > > > 31.2 on > > > > > > > > > > > > > pci0 > > > > > > > > > > > > > ata2: at channel 0 on atapci1 > > > > > > > > > > > > > ata3: at channel 1 on atapci1 > > > > > > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > > > > > > > > > > > ada0: ATA-7 SATA 2.x device > > > > > > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > > > > > > > > > > > ada1: ATA8-ACS SATA 3.x > > > > > > > > > > > > > device > > > > > > > > > > > > > cd0 at ata0 bus 0 scbus0 t
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 08:04:21 +0300, Oleg V. Nauman wrote: > On Monday 23 May 2016 19:08:16 Kenneth D. Merry wrote: > > On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > > > On Monday 23 May 2016 17:30:45 you wrote: > > > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > > > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > > > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > > > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman > wrote: > > > > > > > > > > > I have faced the issue with fresh CURRENT stopped to boot > > > > > > > > > > > on > > > > > > > > > > > my > > > > > > > > > > > old > > > > > > > > > > > desktop > > > > > > > > > > > > > > > > > > > > > > after update to r300299 > > > > > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > > > > > messages logged to console. > > > > > > > > > > > > > > > > > > > > > > Below is the relevant portion of ATA controller/devices > > > > > > > > > > > probed/attached > > > > > > > > > > > during the boot: > > > > > > > > > > > > > > > > > > > > > > atapci0: port > > > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at > > > > > > > > > > > device > > > > > > > > > > > 31.1 > > > > > > > > > > > on > > > > > > > > > > > pci0 > > > > > > > > > > > ata0: at channel 0 on atapci0 > > > > > > > > > > > atapci1: port > > > > > > > > > > > 0xd080-0xd087, > > > > > > > > > > > 0xd000-0xd003, > > > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device > > > > > > > > > > > 31.2 on > > > > > > > > > > > pci0 > > > > > > > > > > > ata2: at channel 0 on atapci1 > > > > > > > > > > > ata3: at channel 1 on atapci1 > > > > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > > > > > > > > > ada0: ATA-7 SATA 2.x device > > > > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > > > > > > > > > ada1: ATA8-ACS SATA 3.x device > > > > > > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > > > > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI > > > > > > > > > > > device > > > > > > > > > > > > > > > > > > > > I'm not entirely sure what is causing the problem with your > > > > > > > > > > system, > > > > > > > > > > but > > > > > > > > > > hopefully we can narrow it down a bit. > > > > > > > > > > > > > > > > > > > > There is a bug that came in with my SMR changes in revision > > > > > > > > > > 300207 > > > > > > > > > > that > > > > > > > > > > broke the quirk functionality in the ada(4) driver. I don'
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 00:59:34 +0300, Oleg V. Nauman wrote: > On Monday 23 May 2016 17:30:45 you wrote: > > On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > > > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > > > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > > > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote: > > > > > > > > > I have faced the issue with fresh CURRENT stopped to boot on > > > > > > > > > my > > > > > > > > > old > > > > > > > > > desktop > > > > > > > > > > > > > > > > > > after update to r300299 > > > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > > > messages logged to console. > > > > > > > > > > > > > > > > > > Below is the relevant portion of ATA controller/devices > > > > > > > > > probed/attached > > > > > > > > > during the boot: > > > > > > > > > > > > > > > > > > atapci0: port > > > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device > > > > > > > > > 31.1 > > > > > > > > > on > > > > > > > > > pci0 > > > > > > > > > ata0: at channel 0 on atapci0 > > > > > > > > > atapci1: port 0xd080-0xd087, > > > > > > > > > 0xd000-0xd003, > > > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device > > > > > > > > > 31.2 on > > > > > > > > > pci0 > > > > > > > > > ata2: at channel 0 on atapci1 > > > > > > > > > ata3: at channel 1 on atapci1 > > > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > > > > > > > ada0: ATA-7 SATA 2.x device > > > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > > > > > > > ada1: ATA8-ACS SATA 3.x device > > > > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device > > > > > > > > > > > > > > > > I'm not entirely sure what is causing the problem with your > > > > > > > > system, > > > > > > > > but > > > > > > > > hopefully we can narrow it down a bit. > > > > > > > > > > > > > > > > There is a bug that came in with my SMR changes in revision > > > > > > > > 300207 > > > > > > > > that > > > > > > > > broke the quirk functionality in the ada(4) driver. I don't > > > > > > > > think > > > > > > > > that > > > > > > > > is > > > > > > > > the problem you're seeing, though. > > > > > > > > > > > > > > > > Can you try out this patch: > > > > > > > > > > > > > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt > > > > > > > > > > > > > > > > In /boot/loader.conf, put the following: > > > > > > > > > > > > > > > > kern.cam.ada.0.quirks="0x04" > > > > > > > > kern.cam.ada.1.quirks="0x04" > > > > > > > > > > > > > > > > If you're
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 00:15:25 +0300, Oleg V. Nauman wrote: > On Monday 23 May 2016 17:11:34 Kenneth D. Merry wrote: > > On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > > > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > > > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote: > > > > > > > I have faced the issue with fresh CURRENT stopped to boot on my > > > > > > > old > > > > > > > desktop > > > > > > > > > > > > > > after update to r300299 > > > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > > > messages logged to console. > > > > > > > > > > > > > > Below is the relevant portion of ATA controller/devices > > > > > > > probed/attached > > > > > > > during the boot: > > > > > > > > > > > > > > atapci0: port > > > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 > > > > > > > on > > > > > > > pci0 > > > > > > > ata0: at channel 0 on atapci0 > > > > > > > atapci1: port 0xd080-0xd087, > > > > > > > 0xd000-0xd003, > > > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on > > > > > > > pci0 > > > > > > > ata2: at channel 0 on atapci1 > > > > > > > ata3: at channel 1 on atapci1 > > > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > > > > > ada0: ATA-7 SATA 2.x device > > > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > > > > > ada1: ATA8-ACS SATA 3.x device > > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device > > > > > > > > > > > > I'm not entirely sure what is causing the problem with your system, > > > > > > but > > > > > > hopefully we can narrow it down a bit. > > > > > > > > > > > > There is a bug that came in with my SMR changes in revision 300207 > > > > > > that > > > > > > broke the quirk functionality in the ada(4) driver. I don't think > > > > > > that > > > > > > is > > > > > > the problem you're seeing, though. > > > > > > > > > > > > Can you try out this patch: > > > > > > > > > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt > > > > > > > > > > > > In /boot/loader.conf, put the following: > > > > > > > > > > > > kern.cam.ada.0.quirks="0x04" > > > > > > kern.cam.ada.1.quirks="0x04" > > > > > > > > > > > > If you're able to boot with those quirk entries in the loader.conf, > > > > > > try > > > > > > taking one of them out, and reboot. If that works, try taking the > > > > > > other > > > > > > one out and reboot. > > > > > > > > > > > > What I'm trying to figure out here is where the problem lies: > > > > > > > > > > > > 1. The bug with the ada(4) driver (in where it loaded the quirks). > > > > > > 2. The extra probe steps in the ada(4) driver might be causing a > > > > > > problem > > > > > > > > > > > >with ada0 (Samsung drive). > > > > > > > > > > > > 3. The extra probe steps in the ada(4) driver might be causing a > > > > > > problem > > > > > > > > > > > >with ada1 (Seagate drive). > > > > > > > >
Re: ATA? related trouble with r300299
On Tue, May 24, 2016 at 00:05:49 +0300, Oleg V. Nauman wrote: > On Monday 23 May 2016 16:53:55 Kenneth D. Merry wrote: > > On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > > > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote: > > > > > I have faced the issue with fresh CURRENT stopped to boot on my old > > > > > desktop > > > > > > > > > > after update to r300299 > > > > > Verbose boot shows the endless cycle of > > > > > > > > > > ata2: SATA reset: ports status=0x05 > > > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > > > messages logged to console. > > > > > > > > > > Below is the relevant portion of ATA controller/devices > > > > > probed/attached > > > > > during the boot: > > > > > > > > > > atapci0: port > > > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on > > > > > pci0 > > > > > ata0: at channel 0 on atapci0 > > > > > atapci1: port 0xd080-0xd087, > > > > > 0xd000-0xd003, > > > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on > > > > > pci0 > > > > > ata2: at channel 0 on atapci1 > > > > > ata3: at channel 1 on atapci1 > > > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > > > ada0: ATA-7 SATA 2.x device > > > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > > > ada1: ATA8-ACS SATA 3.x device > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device > > > > > > > > I'm not entirely sure what is causing the problem with your system, but > > > > hopefully we can narrow it down a bit. > > > > > > > > There is a bug that came in with my SMR changes in revision 300207 that > > > > broke the quirk functionality in the ada(4) driver. I don't think that > > > > is > > > > the problem you're seeing, though. > > > > > > > > Can you try out this patch: > > > > > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt > > > > > > > > In /boot/loader.conf, put the following: > > > > > > > > kern.cam.ada.0.quirks="0x04" > > > > kern.cam.ada.1.quirks="0x04" > > > > > > > > If you're able to boot with those quirk entries in the loader.conf, try > > > > taking one of them out, and reboot. If that works, try taking the other > > > > one out and reboot. > > > > > > > > What I'm trying to figure out here is where the problem lies: > > > > > > > > 1. The bug with the ada(4) driver (in where it loaded the quirks). > > > > 2. The extra probe steps in the ada(4) driver might be causing a problem > > > > > > > >with ada0 (Samsung drive). > > > > > > > > 3. The extra probe steps in the ada(4) driver might be causing a problem > > > > > > > >with ada1 (Seagate drive). > > > > > > > > 4. Something else. > > > > > > > > So, if you can try the patch and try to eliminate a few possibilities, > > > > we > > > > may be able to narrow it down. > > > > > > I was able to boot after applying the patch ; > > > > > > kern.cam.ada.0.quirks="0x04" > > > was the quirk in effect. It is quirk for my Samsung HD200HJ KF100-06 hard > > > drive. > > > > Okay. Just so we can narrow it down a little more, can you try this: > > > > First, let's try getting an ATA Log directory using the PIO version of the > > command: > > > > camcontrol cmd ada0 -v -a "2f 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd > > > > If that works (you should get hexdump output), try the DMA version of the > > command: > > > > camcontrol cmd ada0 -v -d -a "47 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd > > "Expecting a character pointer argument." error for both commands. Did the double quotes make it onto the command line? Both of those work for me... Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ATA? related trouble with r300299
On Mon, May 23, 2016 at 23:21:32 +0300, Oleg V. Nauman wrote: > On Monday 23 May 2016 15:25:39 Kenneth D. Merry wrote: > > On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote: > > > I have faced the issue with fresh CURRENT stopped to boot on my old > > > desktop > > > > > > after update to r300299 > > > Verbose boot shows the endless cycle of > > > > > > ata2: SATA reset: ports status=0x05 > > > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > > > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > > > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > > > messages logged to console. > > > > > > Below is the relevant portion of ATA controller/devices probed/attached > > > during the boot: > > > > > > atapci0: port > > > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 > > > ata0: at channel 0 on atapci0 > > > atapci1: port 0xd080-0xd087, > > > 0xd000-0xd003, > > > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on pci0 > > > ata2: at channel 0 on atapci1 > > > ata3: at channel 1 on atapci1 > > > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > > > ada0: ATA-7 SATA 2.x device > > > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > > > ada1: ATA8-ACS SATA 3.x device > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > > > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device > > > > I'm not entirely sure what is causing the problem with your system, but > > hopefully we can narrow it down a bit. > > > > There is a bug that came in with my SMR changes in revision 300207 that > > broke the quirk functionality in the ada(4) driver. I don't think that is > > the problem you're seeing, though. > > > > Can you try out this patch: > > > > https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt > > > > In /boot/loader.conf, put the following: > > > > kern.cam.ada.0.quirks="0x04" > > kern.cam.ada.1.quirks="0x04" > > > > If you're able to boot with those quirk entries in the loader.conf, try > > taking one of them out, and reboot. If that works, try taking the other > > one out and reboot. > > > > What I'm trying to figure out here is where the problem lies: > > > > 1. The bug with the ada(4) driver (in where it loaded the quirks). > > 2. The extra probe steps in the ada(4) driver might be causing a problem > >with ada0 (Samsung drive). > > 3. The extra probe steps in the ada(4) driver might be causing a problem > >with ada1 (Seagate drive). > > 4. Something else. > > > > So, if you can try the patch and try to eliminate a few possibilities, we > > may be able to narrow it down. > > I was able to boot after applying the patch ; > kern.cam.ada.0.quirks="0x04" > was the quirk in effect. It is quirk for my Samsung HD200HJ KF100-06 hard > drive. Okay. Just so we can narrow it down a little more, can you try this: First, let's try getting an ATA Log directory using the PIO version of the command: camcontrol cmd ada0 -v -a "2f 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd If that works (you should get hexdump output), try the DMA version of the command: camcontrol cmd ada0 -v -d -a "47 0 0 0 0 0 0 0 0 0 1 0" -i 512 - |hd My hope is that we can confirm whether or not this is what is causing the Samsung drive to have issues. It is certainly possible to put in a quirk, but I'd rather not make it unnecessarily broad. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ATA? related trouble with r300299
On Sat, May 21, 2016 at 09:30:35 +0300, Oleg V. Nauman wrote: > > I have faced the issue with fresh CURRENT stopped to boot on my old desktop > after update to r300299 > Verbose boot shows the endless cycle of > > ata2: SATA reset: ports status=0x05 > ata2: reset tp1 mask=03 ostat0=50 ostat1=50 > ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > ata2: stat1=0x50 err=0x01 lsb=0x00 msb=0x00 > ata2: reset tp2 stat0=50 stat1=50 devices=0x3 > messages logged to console. > > Below is the relevant portion of ATA controller/devices probed/attached > during > the boot: > > atapci0: port > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 > ata0: at channel 0 on atapci0 > atapci1: port 0xd080-0xd087, 0xd000-0xd003, > 0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc80f irq 19 at device 31.2 on pci0 > ata2: at channel 0 on atapci1 > ata3: at channel 1 on atapci1 > ada0 at ata2 bus 0 scbus1 target 0 lun 0 > ada0: ATA-7 SATA 2.x device > ada1 at ata2 bus 0 scbus1 target 1 lun 0 > ada1: ATA8-ACS SATA 3.x device > cd0 at ata0 bus 0 scbus0 target 0 lun 0 > cd0: <_NEC DVD_RW ND-3570A 1.11> Removable CD-ROM SCSI device I'm not entirely sure what is causing the problem with your system, but hopefully we can narrow it down a bit. There is a bug that came in with my SMR changes in revision 300207 that broke the quirk functionality in the ada(4) driver. I don't think that is the problem you're seeing, though. Can you try out this patch: https://people.freebsd.org/~ken/cam_smr_ada_patch.20160523.1.txt In /boot/loader.conf, put the following: kern.cam.ada.0.quirks="0x04" kern.cam.ada.1.quirks="0x04" If you're able to boot with those quirk entries in the loader.conf, try taking one of them out, and reboot. If that works, try taking the other one out and reboot. What I'm trying to figure out here is where the problem lies: 1. The bug with the ada(4) driver (in where it loaded the quirks). 2. The extra probe steps in the ada(4) driver might be causing a problem with ada0 (Samsung drive). 3. The extra probe steps in the ada(4) driver might be causing a problem with ada1 (Seagate drive). 4. Something else. So, if you can try the patch and try to eliminate a few possibilities, we may be able to narrow it down. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: AHCI/ADA regression?
On Sat, May 21, 2016 at 10:09:49 +0200, Gary Jennejohn wrote: > There appears to be a regression in AHCI/ADA behavior since r300207. > > Starting a test kernel at r300293 results in extremely long timeouts > probing ahcich2 for non-existent multiplier ports. > > Here some kernel output: Is this dmesg output with or without the problem? > ahci0: > port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03, > 0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 22 at device 17.0 on pci0 > > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported Has the controller always claimed support for Port Multipliers? > ahcich2: at channel 2 on ahci0 > > ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 > > /dev/ada1p1 on /home (ufs, local, journaled soft-updates) > > An older kernel at r299170 does not exhibit this peculiar behavior and > mounts /home with no delays. Are you able to send dmesg output before and after? Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CAM Shingled Disk support patches available
On Tue, Mar 01, 2016 at 20:07:19 -0700, Scott Long wrote: > Hi Ken, > > I???m against changing the function signature of scsi_ata_pass_16(). Even > if you manage to get things right with symbol versioning, it still leads to > problems of code compatibility. Maybe pre-existing binaries will work, but > source code will forever have to include an #if __FreeBSD_version < > xx bit of nonsense. Good point, that would be annoying. > I agree that it was incorrect for dxferlen to be declared as a uint16_t. > However, the function already contains a sector count argument pair. In > theory the sector count multiplied by the sector length, both of which the > application should know in order to arrive at a sensible dxferlen, can > substitute for the dxferlen argument. If so, then we can just ignore that > argument and declare that sector_count has logical priority. Okay. That will probably work for the most part. > Really though, I think that scsi_ata_pass_16() is a crummy function. If its > purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at it: > > - By my count, it only covers 12 of the available 13 registers. > > - It has no 12 byte, opcode 0xa1 variant. > > - It doesn???t make any allowance for providing the response registers to the > caller on completion. Well, maybe it kinda does through a sense descriptor, > but???. it???s kinda open to vague interpretation. > > - Its use of the registers is clunky, assuming for example that you???ll only > want > to fill the six LBA registers with a host-ordered 64-bit number. There are > plenty of commands that re-use sub-parts of the LBA, features, and/or sector > count registers for different things. > > I know you stated that you didn???t want to do this, but I think it???s > better to start > over with a better function that has a better signature and a new name. In > fact, > I think it???s better to use the existing ata_cmd and ata_res structures from > sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if > needed, > provide a 12-byte compatibility, and simply the signature. Something like > this: > > void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries, > void (*cbfcnp)(struct cam_periph *, union ccb *), > u_int32_t flags, u_int8_t tag_action, > struct ata_cmd *cmd, struct ata_res *res, > u_int8_t *data_ptr, u_int32_t dxfer_len, > u_int8_t *data_ptr, u_int16_t dxfer_len, I assume you only intended one line there, not two. :) > u_int8_t sense_len, u_int32_t timeout); > > To differentiate between the 12 and 16 byte variants, you???d look at the > AP_EXTEND flag in the protocol field. Btw, the handling of that flag is > inconsistent in the implementation of the existing scsi_ata_pass_16(). If > the caller providse an ata_res pointer then it gets filled on completion, > otherwise the caller does its best to look at 12.2.2.6 and extract what it > can from the sense descriptor. > > So my proposal is to create a new scsi_ata_pass and deprecate but not remove > scsi_ata_pass_16. Tell people that if they need to use it, dxfer_len is > going to > have lower priority than sector_count/sector_count_exp if the latter multiply > to > more than 65535. In general I think that's a reasonable idea, but we should probably go further. While we're at it, we should figure out what we need to do to add the Auxiliary register to struct ata_cmd. We'll need that to do the NCQ versions of the various SMR commands, as well as TRIM. The obvious challenge is that probably means changing the existing struct ccb_ataio CCB and bumping the CAM version. At least that will be source compatible, but will require ifdefs if people want to compile on older versions of FreeBSD. But in that case, they'll also be faced with no support for sending the NCQ versions of the commands, anyway. No way around that, though, since we have to follow the changing specs. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CAM Shingled Disk support patches available
I have a new set of SMR patches available. (See the original message below for a more detailed description of what these patches do.) The primary change is to add library versioning to libcam so that we can change the function prototype of scsi_ata_pass_16() in a way that won't break existing binaries. If someone more familiar with library versioning wants to review this, I'd appreciate it. The patches are here: FreeBSD/head, as of SVN revision 296278 https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt stable/10, as of SVN revision 296248 https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt (Note that although there is a stable/10 version of the patches, I'm not planning to merge them to stable/10 because of the change to struct bio. I can't really figure out a good way to make that backward compatible. If there is consensus that breaking it is fine because it isn't a user API, then that may be another story.) The problem is that the existing, in-tree version of scsi_ata_pass_16() has a dxfer_len argument that is a uint16_t. That restricts transfer sizes to 64KB. So, we need to update it to allow larger than 64K transfers. I could just create a new function, but I'd rather just retire the broken version. The intent here is that: 1. Binaries built against the old version of libcam, before versioning was turned on, will get the old version of the scsi_ata_pass_16() function with a uint16_t dxfer_len. 2. Binaries built against the new version of libcam will get the new version of the scsi_ata_pass_16() function with a uint32_t dxfer_len. I've tested this, and it appears to work, but I'm not 100% certain this is all correct. I looked at Dan Eischen's description of symbol versioning here: https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt And it looks like the actual implementation is a little different than what is described there. I looked around the tree, and didn't see anything that is obviously exactly like what I'm trying to do here. So, what I did is as follows: 1. For the kernel, the only change is to switch the dxfer_len argument from a uint16_t to a uint32_t. 2. For userland, in scsi_all.c, there are now two versions of scsi_ata_pass_16 -- _ver1 and _ver2. _ver1 is aliased to scsi_ata_pass_16() for FBSD_1.3 using __sym_compat(). _ver2 is aliased to scsi_ata_pass_16() for FBSD_1.4 using __sym_default(). 3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which depends on FBSD_1.3. 4. In lib/libcam/Symbol.map, I pulled out all of the functions defined in libcam, sorted them, and defined them in FBSD_1.3. I moved scsi_ata_pass_16() to FBSD_1.4. (According to the freebsd_versioning.txt paper linked above, I should have been able to have scsi_ata_pass_16() in both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.) In testing an old binary (linked against libcam without symbol versioning) against a new libcam (with symbol versioning), the old version of the function appears to be used. With a new binary, the new version of the function appears to be used. So it looks like things work as intended, but I don't fully trust my understanding here. So, if someone could take a look at the changes, I'd appreciate it. In particular, I have a few questions: 1. If this change to scsi_ata_pass_16() gets merged to stable/10 (apart from the larger SMR changes), what should be done with the libcam library version? 2. Are 1.3 and 1.4 the proper versions to use? 3. If we make additional CAM helper function library changes, when do the versions get bumped? i.e., is this an opportunity to look for other library functions with issues and make changes if possible? 4. When you're going from an unversioned library to a versioned library, which version of a function gets linked in to a binary linked to the unversioned library when you run it against a versioned library? In other words, what is supposed to happen in the test scenario I tried above, and am I really seeing what is supposed to happen? Thanks, Ken On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote: > I have a new set of SMR patches available. See below for the full > explanation. > > The primary change here is that I have added SMR support to the ada(4) > driver. I spent some time considering whether to try to make the da(4) and > ada(4) probe infrastructure somewhat common, but in the end concluded it > would be too involved with not enough code reduction (if any) in the end. > > So, although the ideas are similar, the probe logic is separate. > > Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer, > etc.) for SATA protocol shingled drives isn't active. For both the da(4) > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary > register down to the drive. > > In the ada(4) case, we need to add the register to struct ccb_ataio and > add support in one
Re: CAM Shingled Disk support patches available
On Mon, Jan 18, 2016 at 16:50:34 -0800, Warner Losh wrote: > > > On Jan 18, 2016, at 2:37 PM, Kenneth D. Merry <k...@freebsd.org> wrote: > > > > I have a new set of SMR patches available. See below for the full > > explanation. > > > > The primary change here is that I have added SMR support to the ada(4) > > driver. I spent some time considering whether to try to make the da(4) and > > ada(4) probe infrastructure somewhat common, but in the end concluded it > > would be too involved with not enough code reduction (if any) in the end. > > > > So, although the ideas are similar, the probe logic is separate. > > > > Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer, > > etc.) for SATA protocol shingled drives isn't active. For both the da(4) > > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary > > register down to the drive. > > I???ve plumbed it down, but in a gross, kludgy way to make NCQ Trim work > where the only value in the Auxiliary register needs to be 1. It only takes > up one bit, but it doesn???t change the size of the CCB. If the NCQ Trim > work wasn???t based on the I/O scheduler, I???d have pushed it into head > and would be happy to share code. Yeah, for SMR, we'll need to pass the full register down. That is how you specify the service action (open, close, finish, reset write pointer, report zones). > AHCI can send it, but it turns out that LSI???s drivers (mpt, mps, etc) > can???t do it due to firmware inadequacies. The ability to send a FIS > in these firmwares looked promising, but it requires a full draining of > other requests, which kind of defeats the purpose of NCQ. Yeah, that would kinda defeat the purpose. I'm sending a SCSI command (ATA PASS-THROUGH) to get the SATA zone commands down there. Those are treated like an ordered tag by the LSI firmware as well. Which is just as well, since there is no way to specify the Auxiliary register via that SCSI command, and so we can't do NCQ anyway. LSI/Avago said they're planning to support the zone commands in the SAT layer in the HBAs in the 12Gb boards. Phase 10 doesn't have it from what I understand, but hopefully that'll show up soon. The translation is in the latest SAT draft, and it is very straightforward to map from one to the other, because the SCSI and ATA commands and semantics are pretty much identical. > > In the ada(4) case, we need to add the register to struct ccb_ataio and > > add support in one or more of the underlying SATA drivers, e.g. ahci(4). > > I believe that changes the size of the CCB, so I tried to avoid > that since I didn???t want to force a recompile of camcontrol(8). > Adding it to the atacmd structure wasn???t so bad, and the CCB size > didn???t completely change. The problem was that the atacmd changed > size and pushed all the other fields. Yes. In order to do it, we'll need to add it to struct atacmd, and add compatibility shims. I don't see another way to do it unfortunately. > > In the da(4) case, it will require an update of the T-10 SAT spec to > > provide a way to pass the Auxiliary register down via the SCSI ATA > > PASS-THROUGH command, and then a subsquent update of the SAT layer in > > various vendors' SAS controller firmware. At that point, there may be > > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and > > we may be able to just issue the SCSI version of the commands instead of > > composing ATA commands in the da(4) driver. (We'll still need to keep the > > ATA passthrough version for a while at least to support controllers that > > don't have the updated translation code.) > > I looked to implement things here, but didn???t want to invent something that > the T-10 would later reinvent. Yeah. Is NCQ trim a new thing? Is that why you were looking at sending it down via a FIS? If so, it is likely that LSI will add it to the SCSI Unmap translation in the firmware. Of course if it isn't already in there, they're only going to put it in their 12Gb controllers and not in the 6Gb controllers at this point. Since the SAT spec has the mapping for the SCSI ZBC -> ZAC commands, it sounds like that'll make it into the LSI 12Gb firmware at some point. > > FreeBSD/head as of SVN revision 294105: > > > > https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt > > > > FreeBSD stable/10 as of SVN revision 294100: > > > > https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt > > > > Testing and comments are welcome. > > So having said all that, I???m totally open to something better. I think that for the ATA side, we'll just have to add the register to the CCB, bump the version and add com
Re: CAM Shingled Disk support patches available
On Tue, Jan 19, 2016 at 14:45:23 +0300, Slawa Olhovchenkov wrote: > On Mon, Jan 18, 2016 at 05:37:04PM -0500, Kenneth D. Merry wrote: > > > I have a new set of SMR patches available. See below for the full > > explanation. > > > > The primary change here is that I have added SMR support to the ada(4) > > driver. I spent some time considering whether to try to make the da(4) and > > ada(4) probe infrastructure somewhat common, but in the end concluded it > > would be too involved with not enough code reduction (if any) in the end. > > > > So, although the ideas are similar, the probe logic is separate. > > > > Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer, > > etc.) for SATA protocol shingled drives isn't active. For both the da(4) > > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary > > register down to the drive. > > > > In the ada(4) case, we need to add the register to struct ccb_ataio and > > add support in one or more of the underlying SATA drivers, e.g. ahci(4). > > > > In the da(4) case, it will require an update of the T-10 SAT spec to > > provide a way to pass the Auxiliary register down via the SCSI ATA > > PASS-THROUGH command, and then a subsquent update of the SAT layer in > > various vendors' SAS controller firmware. At that point, there may be > > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and > > we may be able to just issue the SCSI version of the commands instead of > > composing ATA commands in the da(4) driver. (We'll still need to keep the > > ATA passthrough version for a while at least to support controllers that > > don't have the updated translation code.) > > Please, check me: currenly SMR lack of support in SCSI devices? On > [hardvare] vendor level? Currenly only SATA controllers compatible > with SMR (on command level)? (I am don't talk about FreeBSD support, > question about common state). No, there are SAS/SCSI SMR drives in development, and there is the SCSI ZBC spec that defines the command set. I don't know whether any vendors are shipping SAS/SCSI SMR drives yet. You can use SATA drives (SMR or not) with either a SATA controller or a SAS controller. But the way you talk to a SATA drive through a SAS controller is with SCSI commands. There is a SCSI spec (SAT) that defines the mapping of SCSI commands to ATA commands. It has recently been updated to support mapping SMR commands from SCSI to ATA, but most (all?) SAS controllers have not caught up with the spec. So to use a SATA SMR drive with a SAS controller that doesn't know how to map SMR commands from SCSI to ATA, you have to send the ATA SMR commands through the SCSI ATA PASS-THROUGH command. That just bypasses the usual translations, and allows sending ATA commands in something like their native form. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CAM Shingled Disk support patches available
On Tue, Jan 19, 2016 at 20:02:52 +0300, Slawa Olhovchenkov wrote: > On Tue, Jan 19, 2016 at 11:38:31AM -0500, Kenneth D. Merry wrote: > > > On Tue, Jan 19, 2016 at 14:45:23 +0300, Slawa Olhovchenkov wrote: > > > On Mon, Jan 18, 2016 at 05:37:04PM -0500, Kenneth D. Merry wrote: > > > > > > > I have a new set of SMR patches available. See below for the full > > > > explanation. > > > > > > > > The primary change here is that I have added SMR support to the ada(4) > > > > driver. I spent some time considering whether to try to make the da(4) > > > > and > > > > ada(4) probe infrastructure somewhat common, but in the end concluded it > > > > would be too involved with not enough code reduction (if any) in the > > > > end. > > > > > > > > So, although the ideas are similar, the probe logic is separate. > > > > > > > > Note that NCQ support for SMR commands (Report Zones, Reset Write > > > > Pointer, > > > > etc.) for SATA protocol shingled drives isn't active. For both the > > > > da(4) > > > > and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary > > > > register down to the drive. > > > > > > > > In the ada(4) case, we need to add the register to struct ccb_ataio and > > > > add support in one or more of the underlying SATA drivers, e.g. ahci(4). > > > > > > > > In the da(4) case, it will require an update of the T-10 SAT spec to > > > > provide a way to pass the Auxiliary register down via the SCSI ATA > > > > PASS-THROUGH command, and then a subsquent update of the SAT layer in > > > > various vendors' SAS controller firmware. At that point, there may be > > > > an official mapping of the SCSI ZBC commands to the ATA ZAC commands, > > > > and > > > > we may be able to just issue the SCSI version of the commands instead of > > > > composing ATA commands in the da(4) driver. (We'll still need to keep > > > > the > > > > ATA passthrough version for a while at least to support controllers that > > > > don't have the updated translation code.) > > > > > > Please, check me: currenly SMR lack of support in SCSI devices? On > > > [hardvare] vendor level? Currenly only SATA controllers compatible > > > with SMR (on command level)? (I am don't talk about FreeBSD support, > > > question about common state). > > > > No, there are SAS/SCSI SMR drives in development, and there is the SCSI ZBC > > spec that defines the command set. I don't know whether any vendors are > > shipping SAS/SCSI SMR drives yet. > > > > You can use SATA drives (SMR or not) with either a SATA controller or a SAS > > controller. But the way you talk to a SATA drive through a SAS controller > > is with SCSI commands. There is a SCSI spec (SAT) that defines the mapping > > of SCSI commands to ATA commands. It has recently been updated to support > > mapping SMR commands from SCSI to ATA, but most (all?) SAS controllers > > have not caught up with the spec. > > > > So to use a SATA SMR drive with a SAS controller that doesn't know how to > > map SMR commands from SCSI to ATA, you have to send the ATA SMR commands > > through the SCSI ATA PASS-THROUGH command. That just bypasses the usual > > translations, and allows sending ATA commands in something like their > > native form. > > What in case of expanders an port replicatiors (SATA drives and HBA > SAS controllers, of course)? Need expander be compatible with SMR? Or > any expander with SATA support automaticly compatible? Expanders and port replicators shouldn't matter. The place where you need to know about SMR is the place where the native ATA or SCSI drive commands are generated. Expanders and port replicators typically just pass commands through. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CAM Shingled Disk support patches available
I have a new set of SMR patches available. See below for the full explanation. The primary change here is that I have added SMR support to the ada(4) driver. I spent some time considering whether to try to make the da(4) and ada(4) probe infrastructure somewhat common, but in the end concluded it would be too involved with not enough code reduction (if any) in the end. So, although the ideas are similar, the probe logic is separate. Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer, etc.) for SATA protocol shingled drives isn't active. For both the da(4) and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary register down to the drive. In the ada(4) case, we need to add the register to struct ccb_ataio and add support in one or more of the underlying SATA drivers, e.g. ahci(4). In the da(4) case, it will require an update of the T-10 SAT spec to provide a way to pass the Auxiliary register down via the SCSI ATA PASS-THROUGH command, and then a subsquent update of the SAT layer in various vendors' SAS controller firmware. At that point, there may be an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and we may be able to just issue the SCSI version of the commands instead of composing ATA commands in the da(4) driver. (We'll still need to keep the ATA passthrough version for a while at least to support controllers that don't have the updated translation code.) FreeBSD/head as of SVN revision 294105: https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt FreeBSD stable/10 as of SVN revision 294100: https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt Testing and comments are welcome. Ken On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote: > > I have work in progress patches to add SMR (Shingled Magnetic Recording) > support to CAM and GEOM here: > > FreeBSD/head as of SVN revision 290997: > > https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt > > FreeBSD stable/10 as of SVN revision 290995: > > https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt > > This includes support for Host Managed, Host Aware and Drive Managed SMR > drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS > controller. This does not include support for SMR ATA drives attched via > an ATA controller. Also, I have not yet figured out how to properly detect > a Host Managed ATA drive, so this code won't do that. > > The big drive vendors are moving to SMR for at least some of their drives. > The primary challenge with SMR is that it requires writing a relatively > large zone sequentially starting at the beginning of the zone. The usual > zone size is 256MB. It is conceptually almost like having a 256MB sector > size. > > We (Spectra Logic) are working on ZFS changes that will use this CAM and > GEOM infrastructure to make ZFS play well with SMR drives. Those changes > aren't yet done. > > The patches linked above include: > o A new 'camcontrol zone' command that allows displaying and managing >drive zones via SCSI/ATA passthrough. > o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to display >and manage zones via the da(4) (and later ada(4)) driver. > o Changes to diskinfo -v to display the zone mode of a drive. > o A new disk zone API, sys/sys/disk_zone.h. > o A new bio type, BIO_ZONE, and modifications to GEOM to support it. This >new bio will allow filesystems to query zone support in a drive and >manage zoned drives. > o Extensive modifications to the da(4) driver to handle probing SCSI and >SATA behind SAS SMR drives. > o Additional CAM CDB building functions for zone commands. > > The current issues that need to be addressed are: > o The da(4) driver now has 6 additional probe states, 5 of which are >needed for probing ATA drives behind SAS controllers. I have not yet >added support for BIO_ZONE bios to ada(4), but it will be very similar >to the da(4) driver version. The ATA probe code needs to be pulled >out of the da(4) driver and changed into a form that will allow it to >work for either the ada(4) or da(4) driver. Otherwise we'll have a fair >amount of code duplication between the two drivers. > > o There is a reasonable amount of code duplication between 'camcontrol zone' >and zonectl(8). This was done for speed / expediency's sake, but it may >be possible to make more of the code common there. > > o In order to add the new BIO_ZONE bio command, I had to change the bio_cmd >field in struct bio from a uint8_t to a uint16_t. This will cause >binary compatibility problems with any 3rd party loadable modules. >Advice on how to handle this would be welcome. > > o In the process of developing these changes,
Re: CAM Shingled Disk support patches available
On Thu, Nov 19, 2015 at 12:48:41 -0600, Matthew D. Fuller wrote: > On Wed, Nov 18, 2015 at 12:13:09PM -0500 I heard the voice of > Kenneth D. Merry, and lo! it spake thus: > > > > Testing and comments are welcome. > > GELI does explicit handling of each BIO type, so will need to be > updated to pass it through (possibly in the form of inverting the > default handling?) or it'll just EOPNOTSUPP it, whether the underlying > layer does or not. I wouldn't be surprised if there were other geom > layers that did similar things. > > Not meant to be read as some kind of "you need to"; just a comment on > a possible [lack of] impact. You're correct. For GEOM classes like GELI that don't change the layout on disk, passing the BIO_ZONE bio through would be the right thing to do. For those that change the layout (i.e. the lba you write on the virtual disk doesn't match what goes down to the physical disk), like graid or gstripe, I think all we really need to do is just make sure they return EOPNOTSUPP. If someone wants to modify that code to handle shingled disks, they can certainly do that. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
CAM Shingled Disk support patches available
I have work in progress patches to add SMR (Shingled Magnetic Recording) support to CAM and GEOM here: FreeBSD/head as of SVN revision 290997: https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt FreeBSD stable/10 as of SVN revision 290995: https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt This includes support for Host Managed, Host Aware and Drive Managed SMR drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS controller. This does not include support for SMR ATA drives attched via an ATA controller. Also, I have not yet figured out how to properly detect a Host Managed ATA drive, so this code won't do that. The big drive vendors are moving to SMR for at least some of their drives. The primary challenge with SMR is that it requires writing a relatively large zone sequentially starting at the beginning of the zone. The usual zone size is 256MB. It is conceptually almost like having a 256MB sector size. We (Spectra Logic) are working on ZFS changes that will use this CAM and GEOM infrastructure to make ZFS play well with SMR drives. Those changes aren't yet done. The patches linked above include: o A new 'camcontrol zone' command that allows displaying and managing drive zones via SCSI/ATA passthrough. o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to display and manage zones via the da(4) (and later ada(4)) driver. o Changes to diskinfo -v to display the zone mode of a drive. o A new disk zone API, sys/sys/disk_zone.h. o A new bio type, BIO_ZONE, and modifications to GEOM to support it. This new bio will allow filesystems to query zone support in a drive and manage zoned drives. o Extensive modifications to the da(4) driver to handle probing SCSI and SATA behind SAS SMR drives. o Additional CAM CDB building functions for zone commands. The current issues that need to be addressed are: o The da(4) driver now has 6 additional probe states, 5 of which are needed for probing ATA drives behind SAS controllers. I have not yet added support for BIO_ZONE bios to ada(4), but it will be very similar to the da(4) driver version. The ATA probe code needs to be pulled out of the da(4) driver and changed into a form that will allow it to work for either the ada(4) or da(4) driver. Otherwise we'll have a fair amount of code duplication between the two drivers. o There is a reasonable amount of code duplication between 'camcontrol zone' and zonectl(8). This was done for speed / expediency's sake, but it may be possible to make more of the code common there. o In order to add the new BIO_ZONE bio command, I had to change the bio_cmd field in struct bio from a uint8_t to a uint16_t. This will cause binary compatibility problems with any 3rd party loadable modules. Advice on how to handle this would be welcome. o In the process of developing these changes, I discovered that the dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, and it needed to be uint32_t). I increased it, but that will potentially cause a binary incompatibility problem with any existing applications that use the current API via libcam. Advice on how to handle that would be welcome. If you look through the code, you'll notice that the disk_zone.h API is separate from the SCSI and ATA APIs. The intent is to allow filesystems and other consumers of the API to just talk to the disk zone API without dealing with the SCSI and ATA specifics. Another reason behind all of this is that even though the SCSI ZBC and ATA ZAC specs were developed in concert, and are intended to be functionally identical, they are still SCSI and ATA. As usual, SCSI is big endian and ATA is little endian. So to present a common API to the filesystem, we give all of the zone data back in native byte order, regardless of the underlying device protocol. Another thing to note is the extensive use of ATA passthrough in the da(4) driver. This is necessary because the SCSI SAT (SCSI to ATA Translation) specification has not yet caught up with translating SCSI zone commands (ZBC) to ATA zone commands (ZAC). So, until the spec is updated and LSI and other vendors update their SCSI to ATA translation layers, we'll have to use the ATA version of the commands when talking to ATA drives via SAS controllers. I have only tested the code so far with Seagate SATA Drive Managed and Host Aware drives. I would appreciate testing with any drives. (And testing to make sure that the patches don't cause problems with existing hardware.) Right now, all you can really do is manage the zones manually using camcontrol(8) or zonectl(8). Automatic management will come with the ZFS changes. (Or changes to other filesysems if people want to do it.) If you have a SATA Host Aware drive, in theory camcontrol(8) should allow you to manage the drive if you have it attached to a SATA controller. Here is an example of some of the commands. Get
Re: async pass(4) patches available
I have updated the asynchronous pass(4) changes, and fixed a number of bugs in camdd(8). The new patches are here: FreeBSD/head as of SVN revision 290970: http://people.freebsd.org/~ken/async_pass.head.20151117.1.txt FreeBSD stable/10 as of SVN revision 290899: http://people.freebsd.org/~ken/async_pass.stable10.20151117.1.txt And a description / draft commit message, this time updated to include all the files that have changed: http://people.freebsd.org/~ken/async_pass_commitmsg.20151118.txt I have also attached the description to this email. At this point I think I've fixed enough bugs and it is stable enough to go into the tree. That will allow others to more easily use the code and add enhancements. Ken On Mon, Mar 30, 2015 at 16:23:58 -0600, Kenneth D. Merry wrote: > > I have put patches to add an asynchronous interface to the pass(4) driver > and add a new camdd(8) utility here: > > FreeBSD/head as of SVN revision 280857: > > http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt > > FreeBSD stable/10 as of SVN revision 280856: > > http://people.freebsd.org/~ken/async_pass.stable_10.20150330.1.txt > > And the description / draft commit message: > > http://people.freebsd.org/~ken/async_pass_commitmsg.20150330.txt > > I have also attached the description and draft commit message to this > email. > > The asynchronous changes to the pass(4) driver allow queueing and fetching > CAM CCBs via two new ioctls. Notification of completed I/O can come via > kqueue(2), poll(2), select(2), etc. > > The camdd(8) utility is intended as a simple data transfer utility, > benchmark, and an in-tree example of how to use the asynchronous pass(4) > interface. > > camdd(8) is still a work in progress. It needs to be cleaned up a bit and > streamlined. > > There is one known arrival and departure bug with the pass(4) driver > changes. We've reproduced it with our tests at Spectra, but I haven't yet > tracked it down. > > There are many more arrival and departure bugs in FreeBSD/head, however. > We have fixed quite a few in our local tree, but the test (called devad2) > that triggers all of the problems uses the asynchronous pass(4) interface. > So this is a prerequisite for fixing/verifying those bugs. > > Comments and testing are welcome! As I said, camdd(8) in particular is a > work in progress. It could use some cleanup and there are some more useful > features that could be added there. > > Part of the reason for camdd(8) was as a test facility for the new > interface. But, it also serves as a useful demonstration of the > asynchronous pass(4) functionality, given that the original application > that used the API doesn't make sense to go into FreeBSD. (It is > Spectra-specific, and not generally useful.) > > Ken > -- > Kenneth Merry > k...@freebsd.org > Add asynchronous command support to the pass(4) driver, and the new > camdd(8) utility. > > CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and > completed CCBs may be retrieved via the CAMIOGET ioctl. User > processes can use poll(2) or kevent(2) to get notification when > I/O has completed. > > While the existing CAMIOCOMMAND blocking ioctl interface only > supports user virtual data pointers in a CCB (generally only > one per CCB), the new CAMIOQUEUE ioctl supports user virtual and > physical address pointers, as well as user virtual and physical > scatter/gather lists. This allows user applications to have more > flexibility in their data handling operations. > > Kernel memory for data transferred via the queued interface is > allocated from the zone allocator in MAXPHYS sized chunks, and user > data is copied in and out. This is likely faster than the > vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in > configurations with many processors (there are more TLB shootdowns > caused by the mapping/unmapping operation) but may not be as fast > as running with unmapped I/O. > > The new memory handling model for user requests also allows > applications to send CCBs with request sizes that are larger than > MAXPHYS. The pass(4) driver now limits queued requests to the I/O > size listed by the SIM driver in the maxio field in the Path > Inquiry (XPT_PATH_INQ) CCB. > > There are some things things would be good to add: > > 1. Come up with a way to do unmapped I/O on multiple buffers. >Currently the unmapped I/O interface operates on a struct bio, >which includes only one address and length. It would be nice >to be able to send an unmapped scatter/gather list down to >busdma. This would allow eliminating the copy we currently do >for data. > > 2. Add
Re: sa(4) driver changes available for test
On Mon, Aug 24, 2015 at 17:24:22 -0400, Dan Langille wrote: On Mar 2, 2015, at 12:26 PM, Kenneth D. Merry k...@freebsd.org wrote: On Mon, Mar 02, 2015 at 11:43:15 -0500, Dan Langille wrote: On Mar 1, 2015, at 9:06 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes # mtx -f /dev/pass0 status Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Full :VolumeTag=FAI260 Storage Element 5:Full :VolumeTag=FAI261 Storage Element 6:Full :VolumeTag=FAI262 Storage Element 7:Full :VolumeTag=FAI263 Storage Element 8:Empty Storage Element 9:Empty Storage Element 10:Empty It was at this point I spent
Re: Why shoud we cause panic in scsi_da.c?
On Mon, Jul 13, 2015 at 18:29:36 +0300, Alexander Motin wrote: Hi. On 13.07.2015 11:51, Kohji Okuno wrote: On 07/13/15 10:11, Kohji Okuno wrote: Could you comment on my quesion? I found panic() in scsi_da.c. Please find the following. I think we should return with error without panic(). What do you think about this? scsi_da.c: 3018 } else if (bp != NULL) { 3019 if ((done_ccb-ccb_h.status CAM_DEV_QFRZN) != 0) 3020 panic(REQ_CMP with QFRZN); It looks to me more like an KASSERT() is appropriate here. As I can see, this panic() call was added by ken@ about 15 years ago. I've added him to CC in case he has some idea why it was done. From my personal opinion I don't see much reasons to allow CAM_DEV_QFRZN to be returned only together with error. While is may have little sense in case of successful command completion, I don't think it should be treated as error. Simply removing this panic is probably a bad idea, since if it happens device will just remain frozen forever, that will be will be difficult to diagnose, but I would better just dropped device freeze in that case same as in case of completion with error. I put it there because it indicates a software error. The queue shouldn't be frozen if the command is successful. The reason for freezing the queue is to allow error recovery to happen. The queue will get unfrozen after error recovery completes. We could alternately just print a diagnostic message, unfreeze the queue and move on, but the idea is to allow the driver writer to detect and correct his error immediately. As for the original poster's problem, he has uncovered a bug that needs to be fixed. (And I don't mean in the da(4) driver. The bug is in the component that left the queue frozen. Most likely in the USB driver, but it will take a little more investigation.) The panic worked as intended. :) Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
camcontrol(8) attrib patches available
I have put patches to camcontrol(8) to implement the attrib subcommand here: FreeBSD/head as of SVN revision 283160: http://people.freebsd.org/~ken/camcontrol_attrib.20150520.1.txt FreeBSD stable/10 as of SVN revision 283161: http://people.freebsd.org/~ken/camcontrol_attrib.stable10.20150520.1.txt The patches also add libcam support for handling SCSI READ ATTRIBUTE data, and adds a new sbuf_hexdump(3)/(9) routine. The SCSI READ ATTRIBUTE command is used to read Medium Auxiliary Memory on SCSI devices. This is usually found in the small (4KB-16KB) flash chips on LTO and other similar tapes. I have not yet implemented attribute writing support. Here is an abbreviated example of the output: == [root@black-pearl ~]# camcontrol attrib sa0 -r attr_val Remaining Capacity in Partition (0x)[8](RO): 35048 MB Maximum Capacity in Partition (0x0001)[8](RO): 35060 MB TapeAlert Flags (0x0002)[8](RO): 0x0 Load Count (0x0003)[8](RO): 29 MAM Space Remaining (0x0004)[8](RO): 2321 bytes Assigning Organization (0x0005)[8](RO): LTO-CVE Format Density Code (0x0006)[1](RO): 0x5a Initialization Count (0x0007)[2](RO): 20 Volume Change Reference (0x0009)[4](RO): 0x47 Device Vendor/Serial at Last Load (0x020a)[40](RO): IBM 1068022701 Device Vendor/Serial at Last Load - 1 (0x020b)[40](RO): IBM 1068022701 Device Vendor/Serial at Last Load - 2 (0x020c)[40](RO): IBM 1068022701 Device Vendor/Serial at Last Load - 3 (0x020d)[40](RO): IBM 1068022701 Total MB Written in Medium Life (0x0220)[8](RO): 40009 MB Total MB Read in Medium Life (0x0221)[8](RO): 3149 MB Total MB Written in Current/Last Load (0x0222)[8](RO): 0 MB Total MB Read in Current/Last Load (0x0223)[8](RO): 12 MB Logical Position of First Encrypted Block (0x0224)[8](RO): 18446744073709551615 Logical Position of First Unencrypted Block after First Encrypted Block (0x0225)[8](RO): 18446744073709551615 Medium Manufacturer (0x0400)[8](RO): HP Medium Serial Number (0x0401)[32](RO): AE46TCFD0U Medium Length (0x0402)[4](RO): 846 m Medium Width (0x0403)[4](RO): 12.7 mm Assigning Organization (0x0404)[8](RO): LTO-CVE Medium Density Code (0x0405)[1](RO): 0x5a Medium Manufacture Date (0x0406)[8](RO): 20130506 MAM Capacity (0x0407)[8](RO): 16384 bytes Medium Type (0x0408)[1](RO): 0x0 Medium Type Information (0x0409)[2](RO): 0x0 Application Vendor (0x0800)[8](RW): IBM Application Name (0x0801)[32](RW): LTFS Application Version (0x0802)[8](RW): 1.3.0.2 User Medium Text Label (0x0803)[160](RW): Text Localization Identifier (0x0805)[1](RW): 0x81 Barcode (0x0806)[32](RW): Application Format Version (0x080b)[16](RW): 2.2.0 Volume Coherency Information (0x080c)[70](RW): Volume Change Reference Value: 0x45 Volume Coherency Count: 1 Volume Coherency Set Identifier: 0x5 Application Client Specific Information: LTFS LTFS UUID: 28076791-d64e-4cd7-bc43-fa51ec097d83 LTFS Version: 1 == Testing and comments (on the camcontrol changes or the library changes) are welcome. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: async pass(4) patches available
On Tue, Apr 07, 2015 at 13:16:04 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: On Mon, Apr 06, 2015 at 15:39:56 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: I have put patches to add an asynchronous interface to the pass(4) driver and add a new camdd(8) utility here: FreeBSD/head as of SVN revision 280857: http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt [...] Comments and testing are welcome! As I said, camdd(8) in particular is a work in progress. It could use some cleanup and there are some more useful features that could be added there. I've been using the patch for a couple of days on an amd64 system based on 11.0-CURRENT r280952 and didn't notice any obvious regressions using the system as usual. [...] I also tried to test camdd, but didn't get it to work. Some failed attempts: [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536 -o file=blafsel.img (pass2:umass-sim0:0:0:0): READ(6). CDB: 08 00 00 00 80 00 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error 13 bytes read from pass2 13 bytes written to blafsel.img 20.3203 seconds elapsed 0.00 MB/sec [fk@kendra ~]$ sudo hd blafsel.img 55 53 42 53 d9 02 00 00 00 00 01 00 01 |USBS.| 000d [fk@kendra ~]$ sudo dd if=/dev/da0 bs=1k count=1 | hd | head -n 1 1+0 records in 1+0 records out 1024 bytes transferred in 0.000603 secs (1697756 bytes/sec) fc 31 c0 8e c0 8e d8 8e d0 bc 00 0e be 1a 7c bf |.1|.| One possibility is that the device doesn't support 6-byte read/write requests. The da(4) driver has quirk entries and code to figure that out and default to 10-byte read/write requests, but camdd(8) doesn't have anything like that yet. I've attached patches to camdd that allow you to specify a minimum command size. So, apply the patches, rebuild camdd, and try this: # sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img We'll see if that helps. I'm not sure why you were even able to get 13 bytes back. That is very strange. With the patch, reading from da0 seems to work until the end, but again only 13 bytes are written out when writing to a file: [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img (pass2:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 78 a8 00 00 00 00 00 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error 4048551936 bytes read from pass2 13 bytes written to blafsel.img 127.6488 seconds elapsed 0.00 MB/sec Did the file exist before running that command? If so, camdd will look at the file size and not write any more than the current file size. If the file doesn't exist, it'll stop writing when it reaches the end of the input, or it gets a write error, or it reaches the specified I/O limit (-m argument). It also looks like there is a bug; the command above is attempting to read 0 bytes starting from one sector beyond the last logical block. [fk@kendra ~]$ diskinfo -v /dev/da0 /dev/da0 512 # sectorsize 4048551936 # mediasize in bytes (3.8G) 7907328 # mediasize in sectors 0 # stripesize 0 # stripeoffset 492 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. AA000958# Disk ident. It works as expected when writing to stdout, though, so this is probably just a camdd-internal issue: [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=- /dpool/scratch/blafasel.img (pass2:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 78 a8 00 00 00 00 00 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error 4048551936 bytes read from pass2 4048551936 bytes written to - 128.7222 seconds elapsed 29.99 MB/sec Ahh, yes, that is what I would expect. [fk@kendra ~]$ sudo dd if=/dev/da0 bs=65536 of=/dpool/scratch/blafasel-dd.img 61776+0 records in 61776+0 records out 4048551936 bytes transferred in 134.993030 secs (29990822 bytes/sec) [fk@kendra ~]$ sha1 /dpool/scratch/blafasel*.img SHA1 (/dpool/scratch/blafasel-dd.img) = 12d1d9e82f840a6c6485ffcdb1fbf780266ed266 SHA1 (/dpool/scratch/blafasel.img) = 12d1d9e82f840a6c6485ffcdb1fbf780266ed266 Looks good to me. Great! I'll see if I can fix the bug that is causing the zero length read at the end. Thank you for testing it! Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: async pass(4) patches available
On Mon, Apr 06, 2015 at 15:39:56 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: I have put patches to add an asynchronous interface to the pass(4) driver and add a new camdd(8) utility here: FreeBSD/head as of SVN revision 280857: http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt [...] Comments and testing are welcome! As I said, camdd(8) in particular is a work in progress. It could use some cleanup and there are some more useful features that could be added there. I've been using the patch for a couple of days on an amd64 system based on 11.0-CURRENT r280952 and didn't notice any obvious regressions using the system as usual. Scrubbing a pool once revealed checksum errors which I haven't seen before: [fk@kendra ~]$ zpool status -v dpool pool: dpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 0 in 1h52m with 0 errors on Thu Apr 2 13:01:44 2015 config: NAME STATE READ WRITE CKSUM dpool ONLINE 0 0 0 gpt/dpool-ada0.eli ONLINE 0 0 6 errors: No known data errors Apr 2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 30 17 61 55 40 31 00 00 00 00 00 Apr 2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Apr 2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Apr 2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): RES: 51 40 3e 61 55 40 31 00 00 00 00 Apr 2 12:31:34 kendra kernel: (ada0:ahcich0:0:0:0): Error 5, Retries exhausted Apr 2 12:31:34 kendra kernel: GEOM_ELI: g_eli_read_done() failed gpt/dpool-ada0.eli[READ(offset=414970949120, length=24576)] However the issue doesn't seem to be (easily) reproducible and could be unrelated. It is unlikely that this is related to the pass(4) driver patches. Possible, but highly unlikely. camdd(8) doesn't support ATA passthrough yet, so the only way to access it with camdd is with the file I/O method. I also tried to test camdd, but didn't get it to work. Some failed attempts: [fk@kendra ~]$ sudo camdd -i pass=da0,bs=65536 -o file=blafsel.img (pass2:umass-sim0:0:0:0): READ(6). CDB: 08 00 00 00 80 00 (pass2:umass-sim0:0:0:0): CAM status: CCB request completed with an error 13 bytes read from pass2 13 bytes written to blafsel.img 20.3203 seconds elapsed 0.00 MB/sec [fk@kendra ~]$ sudo hd blafsel.img 55 53 42 53 d9 02 00 00 00 00 01 00 01 |USBS.| 000d [fk@kendra ~]$ sudo dd if=/dev/da0 bs=1k count=1 | hd | head -n 1 1+0 records in 1+0 records out 1024 bytes transferred in 0.000603 secs (1697756 bytes/sec) fc 31 c0 8e c0 8e d8 8e d0 bc 00 0e be 1a 7c bf |.1|.| One possibility is that the device doesn't support 6-byte read/write requests. The da(4) driver has quirk entries and code to figure that out and default to 10-byte read/write requests, but camdd(8) doesn't have anything like that yet. I've attached patches to camdd that allow you to specify a minimum command size. So, apply the patches, rebuild camdd, and try this: # sudo camdd -i pass=da0,bs=65536,mcs=10 -o file=blafsel.img We'll see if that helps. I'm not sure why you were even able to get 13 bytes back. That is very strange. Trying the block size suggested in the manual result in: [fk@kendra ~]$ sudo camdd -i pass=da0,bs=1M -o file=blafsel.img camdd: camdd_pass_run: error sending CAMIOQUEUE ioctl to pass2: Invalid argument camdd: camdd_pass_run: CCB address is 0x80250e420: Invalid argument 0 bytes read from pass2 0 bytes written to blafsel.img 0.0007 seconds elapsed 0.00 MB/sec Apr 5 19:08:20 kendra kernel: (pass2:umass-sim0:0:0:0): passmemsetup: data length 1048576 max allowed 65536 bytes Yes. By default, if you don't specify a blocksize, camdd(8) should limit the I/O size to the controller's maximum or 128K, whichever is smaller. If you specify an I/O size, it will try to use that. Thanks for testing the code, I really appreciate it! Let me know how the patch works! Ken -- Kenneth Merry k...@freebsd.org //depot/users/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8#1 - /usr/home/kenm/perforce4/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8 *** /tmp/tmp.54366.13 Mon Apr 6 21:56:38 2015 --- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/usr.sbin/camdd/camdd.8 Mon Apr 6 21:23:29 2015 *** *** 31,37 .\ .\ $FreeBSD$ .\ ! .Dd March 13, 2015 .Dt CAMDD 8 .Os .Sh NAME --- 31,37 .\ .\ $FreeBSD$ .\ ! .Dd April 6, 2015 .Dt CAMDD 8 .Os
Re: async pass(4) patches available
On Tue, Mar 31, 2015 at 03:49:12 +0300, Konstantin Belousov wrote: On Mon, Mar 30, 2015 at 04:23:58PM -0600, Kenneth D. Merry wrote: Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. cam_periph_mapmem() uses vmapbuf() with an indicator to always map the user pages mostly because I do not know CAM code and wanted to make the least intrusive changes there. It is not inherently impossible to pass unmapped pages down from cam_periph_mapmem(), but might require some more plumbing for driver to indicate that it is acceptable. I think that would probably not be too difficult to change. That API isn't one that is exposed, so changing it shouldn't be a problem. The only reason not to do unmapped I/O there is just if the underlying controller doesn't support it. The lower parts of the stack shouldn't be trying to sniff the data that is read or written to the device, although that has happened in the past. We'd have to audit a couple of the drivers to make sure they aren't trying to access the data. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. Only because nothing more was needed. The struct bio does not use address/length pair when unmapped, it passes the list of physical pages, see bio_ma array pointer. It is indeed taylored to be a pointer to struct buf' b_pages, but it does not have to be. The busdma unmapped non-specific interface is bus_dmamap_load_ma(), which again takes array of pages to load. If you want some additional helper, suitable for your goals, please provide the desired interface definition. What I'd like to be able to do is pass down a CCB with a user virtual S/G list (CAM_DATA_SG, but with user virtual pointers) and have busdma deal with it. The trouble would likely be figuring out a flag to use to indicate that the S/G list in question contains user virtual pointers. (Backwards/binary compatibility is always an issue with CCB flags, since they have all been used.) But that is essentially what is needed. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
async pass(4) patches available
I have put patches to add an asynchronous interface to the pass(4) driver and add a new camdd(8) utility here: FreeBSD/head as of SVN revision 280857: http://people.freebsd.org/~ken/async_pass.head.20150330.1.txt FreeBSD stable/10 as of SVN revision 280856: http://people.freebsd.org/~ken/async_pass.stable_10.20150330.1.txt And the description / draft commit message: http://people.freebsd.org/~ken/async_pass_commitmsg.20150330.txt I have also attached the description and draft commit message to this email. The asynchronous changes to the pass(4) driver allow queueing and fetching CAM CCBs via two new ioctls. Notification of completed I/O can come via kqueue(2), poll(2), select(2), etc. The camdd(8) utility is intended as a simple data transfer utility, benchmark, and an in-tree example of how to use the asynchronous pass(4) interface. camdd(8) is still a work in progress. It needs to be cleaned up a bit and streamlined. There is one known arrival and departure bug with the pass(4) driver changes. We've reproduced it with our tests at Spectra, but I haven't yet tracked it down. There are many more arrival and departure bugs in FreeBSD/head, however. We have fixed quite a few in our local tree, but the test (called devad2) that triggers all of the problems uses the asynchronous pass(4) interface. So this is a prerequisite for fixing/verifying those bugs. Comments and testing are welcome! As I said, camdd(8) in particular is a work in progress. It could use some cleanup and there are some more useful features that could be added there. Part of the reason for camdd(8) was as a test facility for the new interface. But, it also serves as a useful demonstration of the asynchronous pass(4) functionality, given that the original application that used the API doesn't make sense to go into FreeBSD. (It is Spectra-specific, and not generally useful.) Ken -- Kenneth Merry k...@freebsd.org Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it would be good to make sure that there is not really a case for multiple queues before pushing this code upstream. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It
Re: SLR140 with new mt(1) [Was: Re: sa(4) driver changes available for test]
On Wed, Mar 11, 2015 at 20:26:49 +0100, Harald Schmalzbauer wrote: Bez?glich Kenneth D. Merry's Nachricht vom 28.02.2015 01:08 (localtime): ? Still just works fine ! :-) (stable_10.20150218.1-patchset with LTO2, LTO3 and DDS5) With DDS5, densitiy is reported as unknown. If I remember correctly, you have your DDS4 reporting DDS4? That means that we need to add DDS5 to the density table in libmt. Can you send the output of 'mt status -v'? It would actually be helpful for all three drives. Hello, I'd like to present some test results. All tests were done with 10-stable-r273923 and Ken's sa_driver_changes-patchset, reduced by the commited scsi-sys-code. Thank you for testing all of these drives and media! I really appreciate it! Unfortunately, there's a problem with appending files to any SLRtape. I can write the first file, but trying to open a second file for writing, results in end of device message. This problem doesn't exist for other drives (tested on VXA-2 (also SCSI-2) and DAT72 (SCSI-3)) with exactly same environment (all currently connected SCSI drives (7) are on one mpt(4) bus). After the first end of device message, consecutive write attempts lead to Operation not permitted. According to the datasheet (http://www.tandbergdata.ru/products/files/SLR140_DS_605_ENG.pdf), the drive should speak SCSI-3, but camcontrol shows SCSI-2. ## TandbergData SLR140 Drive ## camcontrol inq $TAPE -v pass3: TANDBERG SLR140 0605 Removable Sequential Access SCSI-2 device pass3: Serial Number SN140253489 pass3: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Command Queueing Enabled This sounds like it could be an End Of Tape (EOT) model issue. There is a quirk entry in the driver for other SLR drives, but it probably won't match this particular drive because of a leading space in the INQUIRY data: { { T_SEQUENTIAL, SIP_MEDIA_REMOVABLE, TANDBERG, SLR*, *}, SA_QUIRK_1FM, 0 }, So, try doing a 'mt geteotmodel' on that drive. It is probably set to 2 filemarks. If it is, do: mt seteotmodel 1 Obviously, if it is set to 1, try 2 and see what happens. If that is the case, we can adjust the quirk to match that drive. Density 0x36 = ALRF-6, 186000 bpi, SLR140 drive + SLR140tape: SLRtape140 (8mm DualReel, 70GB native, 505.9m length, 5.5MiB/s) Do you have any source documentation for the BPI data? Any information on the number of tracks or the other fields that might go in the mt(1) man page? (We can obviously put it in with that, it's just nice to put it all in if we have it.) mt status -v Drive: sa3: TANDBERG SLR140 0605 Serial Number: SN140253489 - Mode Density Blocksize bpi Compression Current: 0x36:UNKNOWN variable 0 enabled (0x3) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual: 0 Reported File Number: -1 Reported Record Number: -1 Flags: None - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 131072 bytes Maximum I/O size reported by controller (cpi_maxio): 131072 bytes Maximum block size supported by tape drive and media (max_blk): 262144 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 131072 bytes Minimum blocksize to reach highest throughput, thus sustained write of uncompressable data (from /dev/random): 24k@5.5MiB/s That's pretty good! mt status - Current Driver State: at rest. - Partition: 0 Calc File Number: 1 Calc Record Number: 0 Residual: 0 Reported File Number: -1 Reported Record Number: -1 Flags: None short READ POSITION camcontrol cmd $TAPE -v -c 34 0 0 0 0 0 0 0 0 0 -i 20 - | hd 30 00 00 00 00 00 06 83 00 00 00 00 00 00 00 00 |0...| 0010 00 00 00 00 || 0014 vendor READ POSITION camcontrol cmd $TAPE -v -c 34 1 0 0 0 0 0 0 0 0 -i 20 - | hd camcontrol: error sending command (pass3:mpt1:0:13:0): READ POSITION. CDB: 34 01 00 00 00 00 00 00 00 00 (pass3:mpt1:0:13:0): CAM status: SCSI Status Error (pass3:mpt1:0:13:0): SCSI status: Check Condition (pass3:mpt1:0:13:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (pass3:mpt1:0:13:0): Command byte 1 bit 0 is invalid long READ POSITION camcontrol cmd $TAPE -v -c 34 6 0 0 0 0 0 0 0 0 -i 32 - |hd camcontrol: error sending command (pass3:mpt1:0:13:0): READ POSITION. CDB: 34 06 00 00 00 00 00 00 00 00 (pass3:mpt1:0:13:0): CAM status: SCSI Status Error (pass3:mpt1:0:13:0): SCSI status: Check Condition (pass3:mpt1:0:13:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (pass3:mpt1:0:13:0):
Re: sa(4) driver changes available for test
On Sat, Mar 07, 2015 at 14:30:26 +0100, Harald Schmalzbauer wrote: Bez?glich Kenneth D. Merry's Nachricht vom 19.02.2015 01:13 (localtime): I have updated the patches. I have removed the XPT_DEV_ADVINFO changes from the patches to head, since I committed those separately. I have (hopefully) fixed the build for the stable/10 patches by MFCing dependencies. (One of them mav did for me, thanks!) Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt The patches against FreeBSD/head as of SVN revision 278975: http://people.freebsd.org/~ken/sa_changes.20150218.1.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278974: http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt Hello, on 26/02/2105, r278964 seems to be part from the sa_changes patchset. Do you have a sa_changes.stable_10.20150226 ready? I haven't done it yet, sorry. Or is it just a matter of exluding all parts, comitted with r278964 from the patchset? I've done so in the mean while: ftp://ftp.omnilan.de/pub/FreeBSD/OmniLAN/misc/sa_changes.stable_10.20150226.fudge.patch Thanks! Noticed that in sys/dev/mps/mps_sas.c 'cdai.flags' gets conditionally (#if __FreeBSD_version = 1100061) the new CDAI_FLAG_NONE, while in sbin/camcontrol/camcontrol.c, this is unconditionally used. Haven't really looked at the code, mostly because my skills wouldN#t allow me to answer this qusteion myself, but is that versioncheck in mps_sas.c still vaild with the rest of the sa_driver-changes? Yes, that's intentional. The mps(4) and mpr(4) drivers are also maintained outside the tree by LSI/Avago. I usually try to put version checks in there, so that things work when they try to compile against earlier releases. Otherwise they'd be putting in the same checks themselves. It is easier to do them when the changes go in the tree. camcontrol(8), on the other hand, is only maintained in the FreeBSD tree. So it only ever needs to build against the FreeBSD branch it is in. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sa(4) driver changes available for test
On Mon, Mar 02, 2015 at 16:34:34 -0500, Dan Langille wrote: On Mar 2, 2015, at 2:47 PM, Dan Langille d...@langille.org wrote: On Mar 2, 2015, at 2:07 PM, Dan Langille d...@langille.org wrote: On Mar 2, 2015, at 12:28 PM, Kenneth D. Merry k...@freebsd.org wrote: On Mon, Mar 02, 2015 at 11:44:09 -0500, Dan Langille wrote: On Mar 2, 2015, at 11:31 AM, Kenneth D. Merry k...@freebsd.org wrote: On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote: On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I have this in /usr/local/etc/bacula/bacula-sd.conf Device { Name= DLT Description = QUANTUM DLT7000 1624 Media Type = DLT Archive Device = /dev/nsa1 Autochanger = YES Drive Index = 0 Offline On Unmount = no Hardware End of Medium = yes BSF at EOM = yes Backward Space Record = no Fast Forward Space File = no TWO EOF = yes } FYI, http://www.freebsddiary.org/digital-tl891
Re: sa(4) driver changes available for test
On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote: On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I have this in /usr/local/etc/bacula/bacula-sd.conf Device { Name= DLT Description = QUANTUM DLT7000 1624 Media Type = DLT Archive Device = /dev/nsa1 Autochanger = YES Drive Index = 0 Offline On Unmount = no Hardware End of Medium = yes BSF at EOM = yes Backward Space Record = no Fast Forward Space File = no TWO EOF = yes } FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape test on this same model. Here's the test I ran tonight: [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1 Tape block granularity is 1024 bytes. btape: butil.c:287-0 Using device: /dev/nsa1 for writing. btape: btape.c:469-0 open device DLT (/dev/nsa1): OK *test === Write
Re: sa(4) driver changes available for test
On Mon, Mar 02, 2015 at 11:43:15 -0500, Dan Langille wrote: On Mar 1, 2015, at 9:06 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes # mtx -f /dev/pass0 status Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Full :VolumeTag=FAI260 Storage Element 5:Full :VolumeTag=FAI261 Storage Element 6:Full :VolumeTag=FAI262 Storage Element 7:Full :VolumeTag=FAI263 Storage Element 8:Empty Storage Element 9:Empty Storage Element 10:Empty It was at this point I spent the next 90 minute trying to get the tape drive out of the tape library to free a stuck tape. Some of this was spent attempting
Re: sa(4) driver changes available for test
On Mon, Mar 02, 2015 at 11:44:09 -0500, Dan Langille wrote: On Mar 2, 2015, at 11:31 AM, Kenneth D. Merry k...@freebsd.org wrote: On Mon, Mar 02, 2015 at 11:09:57 -0500, Dan Langille wrote: On Mar 1, 2015, at 9:29 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I have this in /usr/local/etc/bacula/bacula-sd.conf Device { Name= DLT Description = QUANTUM DLT7000 1624 Media Type = DLT Archive Device = /dev/nsa1 Autochanger = YES Drive Index = 0 Offline On Unmount = no Hardware End of Medium = yes BSF at EOM = yes Backward Space Record = no Fast Forward Space File = no TWO EOF = yes } FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape test on this same model. Here's the test I ran tonight: [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1 Tape block granularity
Re: sa(4) driver changes available for test
On Mon, Mar 02, 2015 at 11:45:59 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. This came to me today via the Bacula mailing lists. http://marc.info/?l=bacula-usersm=142531236722693w=2 As far as I can tell ltfs support on linux sits on top of the standard mt-st stuff \ as a userspace (fuse) filesystem I'd hope it's much the same with BSD. Removing the standard interface would be \ counterproductive overall Can you answer that and I'll relay please? Sure. In short, the current interface will stay in place. I have added additional ioctls that provide more features and information, but I don't see any issue with leaving the current ioctls in place. The MTIOCGET ioctl even gets an improvement in behavior when the tape drive supports long position information -- it will report the file number after a 'mt eod'. IBM's LTFS sits on top of their own Linux tape driver, and operates with a combination of tape driver ioctls (e.g. the standard MTIOTCOP ioctls) and SCSI passthrough. When I ported it to FreeBSD, I ran into several areas where we needed more information out of the tape driver. So that was the primary motivation behind adding the additional features. (Other features I implemented using SCSI passthrough.) He is correct that it runs with FUSE, although it can be linked into an application as a library as well. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sa(4) driver changes available for test
On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I have this in /usr/local/etc/bacula/bacula-sd.conf Device { Name= DLT Description = QUANTUM DLT7000 1624 Media Type = DLT Archive Device = /dev/nsa1 Autochanger = YES Drive Index = 0 Offline On Unmount = no Hardware End of Medium = yes BSF at EOM = yes Backward Space Record = no Fast Forward Space File = no TWO EOF = yes } FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape test on this same model. Here's the test I ran tonight: [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1 Tape block granularity is 1024 bytes. btape: butil.c:287-0 Using device: /dev/nsa1 for writing. btape: btape.c:469-0 open device DLT (/dev/nsa1): OK *test === Write, rewind, and re-read test === I'm going to write 1 records and an EOF then write 1 records and an EOF, then rewind, and re-read the data to verify that it is correct. This is an *essential* feature ... btape: btape.c:1152-0 Wrote 1 blocks of 64412 bytes. btape: btape.c:604-0 Wrote 1 EOF to DLT
Re: sa(4) driver changes available for test
On Sun, Mar 01, 2015 at 19:40:40 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes # mtx -f /dev/pass0 status Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Full :VolumeTag=FAI260 Storage Element 5:Full :VolumeTag=FAI261 Storage Element 6:Full :VolumeTag=FAI262 Storage Element 7:Full :VolumeTag=FAI263 Storage Element 8:Empty Storage Element 9:Empty Storage Element 10:Empty It was at this point I spent the next 90 minute trying to get the tape drive out of the tape library to free a stuck tape. Some of this was spent attempting, and failing, to undo a stripped screw. I stopped the attempt when I noticed the screw did need to be removed. :/ Thanks for all of the effort
Re: sa(4) driver changes available for test
On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes # mtx -f /dev/pass0 status Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Full :VolumeTag=FAI260 Storage Element 5:Full :VolumeTag=FAI261 Storage Element 6:Full :VolumeTag=FAI262 Storage Element 7:Full :VolumeTag=FAI263 Storage Element 8:Empty Storage Element 9:Empty Storage Element 10:Empty It was at this point I spent the next 90 minute trying to get the tape drive out of the tape library to free a stuck tape. Some of this was spent attempting, and failing, to undo a stripped screw. I stopped the attempt when I noticed the screw did need to be removed. :/ Thanks for all of the effort! Looks like it is paying off! :) When I do this command, I hear the drive move a bit, to read the tape: # mt -f /dev/nsa1 status Drive: sa1: DEC TZ89 (C) DEC 2561 Serial Number: CXA09S1340 - Mode DensityBlocksize bpi
Re: sa(4) driver changes available for test
On Sun, Mar 01, 2015 at 19:28:37 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:18 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 17:06:24 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes # mtx -f /dev/pass0 status Storage Changer /dev/pass0:2 Drives, 10 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Full :VolumeTag=FAI260 Storage Element 5:Full :VolumeTag=FAI261 Storage Element 6:Full :VolumeTag=FAI262 Storage Element 7:Full :VolumeTag=FAI263 Storage Element 8:Empty Storage Element 9:Empty Storage Element 10:Empty It was at this point I spent the next 90 minute trying to get the tape drive out of the tape library to free a stuck tape. Some of this was spent attempting, and failing, to undo a stripped screw. I stopped the attempt when I noticed the screw did need to be removed. :/ Thanks for all of the effort! Looks like it is paying off! :) When I do this command, I hear the drive move a bit, to read the tape: # mt -f /dev/nsa1 status
Re: sa(4) driver changes available for test
On Sun, Mar 01, 2015 at 19:41:07 -0500, Dan Langille wrote: On Mar 1, 2015, at 7:31 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sun, Mar 01, 2015 at 19:15:05 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I have this in /usr/local/etc/bacula/bacula-sd.conf Device { Name= DLT Description = QUANTUM DLT7000 1624 Media Type = DLT Archive Device = /dev/nsa1 Autochanger = YES Drive Index = 0 Offline On Unmount = no Hardware End of Medium = yes BSF at EOM = yes Backward Space Record = no Fast Forward Space File = no TWO EOF = yes } FYI, http://www.freebsddiary.org/digital-tl891.php (from 2006) has a btape test on this same model. Here's the test I ran tonight: [root@cuppy:/usr/home/dan] # btape -c /usr/local/etc/bacula/bacula-sd.conf /dev/nsa1 Tape block granularity is 1024 bytes. btape: butil.c:287-0 Using device: /dev/nsa1 for writing. btape: btape.c:469-0 open device DLT (/dev/nsa1): OK *test === Write, rewind, and re-read test === I'm going to write 1 records and an EOF then write 1 records and an EOF, then rewind, and re-read
Re: sa(4) driver changes available for test
On Sat, Feb 28, 2015 at 17:29:48 -0500, Dan Langille wrote: On Feb 18, 2015, at 7:13 PM, Kenneth D. Merry k...@freebsd.org wrote: I have updated the patches. I have removed the XPT_DEV_ADVINFO changes from the patches to head, since I committed those separately. I have (hopefully) fixed the build for the stable/10 patches by MFCing dependencies. (One of them mav did for me, thanks!) Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt I have current installed and running with Bacula, but I have not tried the tape drive yet. Thanks for all your work on this! It seems like your changes are in there from about 5 days ago. Yes, that is correct. Having solved my server hardware issues, I'm now having issues with the autochanger mechanism of the tape library. Does it work with chio(1)? Does it look like hardware or software? (If it is software, I can help with that.) Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sa(4) driver changes available for test
On Fri, Feb 27, 2015 at 17:56:42 -0500, Dan Langille wrote: On Feb 17, 2015, at 1:36 PM, Kenneth D. Merry k...@freebsd.org wrote: On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. I have been unable to test yet. I've encountered time and hardware issues. I know how that goes! (On both counts.) I may be able to try tomorrow. So I have tested building it and it does build at least. If you're able to figure out some of the answers below, that would be great! In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I'll let the other Bacula devs know about this. They deal with the hardware. I work on PostgreSQL. Thanks! If there are additional features they would like out of the tape driver, I'm happy to talk about it. (Or help if they'd like to use the new status reporting ioctl, MTIOCEXTGET or any of the other new ioctls.) Ken -- Kenneth Merry k...@freebsd.org ? Dan Langille http://langille.org/ Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sa(4) driver changes available for test
On Fri, Feb 27, 2015 at 20:05:05 +0100, Harald Schmalzbauer wrote: Bez?glich Kenneth D. Merry's Nachricht vom 26.02.2015 23:42 (localtime): ? And (untested) patches against FreeBSD stable/10 as of SVN revision 278974: http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt ? I'm glad it is working well for you! You can do larger I/O sizes with the Adaptec by changing your MAXPHYS and DFLTPHYS values in your kernel config file. e.g.: options MAXPHYS=(1024*1024) options DFLTPHYS=(1024*1024) If you set those values larger, you won't be able to do more than 132K with the sym(4) driver on an x86 box. (It limits the maximum I/O size to 33 segments * PAGE_SIZE.) Thanks for the hint! I wasn't aware that kern.cam.sa.N.maxio has driver limitations corresponding to systems MAX/DFLTPHYS. I thought only silicon limitations define it's value. It depends on the driver. I thought that the Adaptec drivers go off of MAXPHYS (because that's what the driver author told me last week :), but in looking at the code, they actually have a hard-coded value that can be increased. You can bump AHC_MAXPHYS or AHD_MAXPHYS in aic7xxx_osm.h or aic79xx_osm.h, respectively. In order to make any difference, though, you would have to bump MAXPHYS/DFLTPHYS (so the sa(4) driver will use that value) or change the ahc(4)/ahd(4) driver to set the maxio field in the path inquiry CCB. But in order to have a best matching pre-production test-environment, I nevertheless replaced it, now using mpt(4) instead of ahc(4)/ahc_pci on PCI-X@S3210 (for parallel tape drives I consistently have mpt(4)@PCIe, which is the same LSI(53c1020) chip but with on-board PCI-X-PCIe bridge). Okay. That should work. Still just works fine ! :-) (stable_10.20150218.1-patchset with LTO2, LTO3 and DDS5) With DDS5, densitiy is reported as unknown. If I remember correctly, you have your DDS4 reporting DDS4? That means that we need to add DDS5 to the density table in libmt. Can you send the output of 'mt status -v'? It would actually be helpful for all three drives. Also, do any of your drives give a full report for 'mt getdensity'? If so, can you send that as well? (By full report, I mean more than one line.) We don't have density codes for DDS-5/DAT 72, DAT 160 or DAT 320 yet. It looks like DDS-5 should be 0x47. therefore I'd like to point to the new port misc/vdmfec https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197950 That looks cool. :) I'm not a ports committer, but hopefully one of them will pick it up. Cool it is indeed, but whether it's really usefull or not is beyond my expertise. I couldn't collect much MT experience yet. I know that LTO and similar modern MT technology do their own ECC (in the meaning of erasure code, mostly Reed-Solomon). What I don't know (but wanting to be best prepared for) is how arbitrary LTO drives behave, if the one (1) in 10^17 bits was detected to be uncorrectable. If it wasn't detected, the post erasure code (vdmfec in that case) would help for sure. But If the drive just cuts the output, or stops streaming at all, vdmfec was useless? There is a difference in the uncorrectable bit error rate and the undetectable bit error rate. The uncorrectable bit error rate for LTO-6 is 1 in 10^17. It is 1 in 10^19 for Oracle T1 C/D drives, and 1 in 10^20 for IBM TS1150. Seagate Enterprise drives claim to have an uncorrectable bit error rate of 1 sector per 10^15 bits read. See: http://www.oracle.com/us/products/servers-storage/storage/tape-storage/t1c-reliability-wp-409919.pdf http://www.spectralogic.com/index.cfm?fuseaction=home.displayFileDocID=2513 http://www.seagate.com/www-content/product-content/enterprise-hdd-fam/enterprise-capacity-3-5-hdd/constellation-es-4/en-us/docs/enterprise-capacity-3-5-hdd-ds1791-8-1410us.pdf The second white paper claims that tape has an undetectable bit error rate of 1 in 1.6x10^33 bits. I assume it is referring to TS1150, but I don't know for sure. It is far more likely that your tape or tape drive will break than it is that you would get a bad bit back from the drive. According to excerpts of Study of Perpendicular AME Media in a Linear Tape Drive, LTO-4 has a soft read error rate of 1 in 10^6 bits and DDS has 1 in 10^4 bits (!!!, according to HP C1537A DDS 3 - ACT/Apricot). So with DDS, _every_ single block pax(1) writes to tape needs to be internally corrected! Of course, nobody wants zfs' send output stream to DDS, it's much too slow/small, but just to mention. For archives of zfs streams, I don't feel safe relying on the tape drives' FEC, which was designed for backup solutions which do their own blocking+cheksumming, so the very seldom to expect uncorrectable read error would at worst lead to some/single unrecoverable files ? even in case of database files most likely post-recoverable. But with one flipped bit in the zfs stream, you'd loose hundred of
Re: sa(4) driver changes available for test
On Thu, Feb 26, 2015 at 10:57:50 +0100, Harald Schmalzbauer wrote: Bez?glich Kenneth D. Merry's Nachricht vom 19.02.2015 01:13 (localtime): I have updated the patches. I have removed the XPT_DEV_ADVINFO changes from the patches to head, since I committed those separately. I have (hopefully) fixed the build for the stable/10 patches by MFCing dependencies. (One of them mav did for me, thanks!) Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt The patches against FreeBSD/head as of SVN revision 278975: http://people.freebsd.org/~ken/sa_changes.20150218.1.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278974: http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt Ken, thank you very much for your work! Last sa(4) overhaul (with 10.0 I guess) was a great success and I highly appreciate your work on tape support for FreeBSD! I compiled your 10-stable patchset for one machine with LTO2 and DDS5 drives, but haven't done much testing since I'll replace the adaptec (39160) because it's maxio is limited to 64k (while 53c1020 has 128k). sa(4) seems to work just fine with both drives, mt(1) showing Reported File/Record Number :-) No EOM tests done so far? I'm glad it is working well for you! You can do larger I/O sizes with the Adaptec by changing your MAXPHYS and DFLTPHYS values in your kernel config file. e.g.: options MAXPHYS=(1024*1024) options DFLTPHYS=(1024*1024) If you set those values larger, you won't be able to do more than 132K with the sym(4) driver on an x86 box. (It limits the maximum I/O size to 33 segments * PAGE_SIZE.) I'll archive zfs streams, therefore I needed some kind of forward error correction. Probably people following this thread also have found to need this, therefore I'd like to point to the new port misc/vdmfec https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197950 Perhaps someone want's to take this bug report. That looks cool. :) I'm not a ports committer, but hopefully one of them will pick it up. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sa(4) driver changes available for test
I have updated the patches. I have removed the XPT_DEV_ADVINFO changes from the patches to head, since I committed those separately. I have (hopefully) fixed the build for the stable/10 patches by MFCing dependencies. (One of them mav did for me, thanks!) Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150218.1.txt The patches against FreeBSD/head as of SVN revision 278975: http://people.freebsd.org/~ken/sa_changes.20150218.1.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278974: http://people.freebsd.org/~ken/sa_changes.stable_10.20150218.1.txt Thanks, Ken On Fri, Feb 13, 2015 at 17:32:32 -0700, Kenneth D. Merry wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes 6. Existing applications should work without changes. If not, please let me know. Hopefully they will move over time to the new interfaces. 7. There are lots of additional features that could be added later. Append-only support, encryption, more log pages, etc. 8. I have SCSI READ ATTRIBUTE changes for camcontrol(8) that will go in separately. These changes allow displaying the contents of the MAM (Medium Auxiliary Memory) chips on LTO, TS and other modern tape drives. These are good, and a future possible direction is adding attributes to the status XML from the sa(4) driver. Significant upgrades to sa(4) and mt(1). The primary focus of these changes is to modernize FreeBSD's tape infrastructure
Re: sa(4) driver changes available for test
On Sat, Feb 14, 2015 at 18:22:43 -0500, Dan Langille wrote: On Feb 13, 2015, at 7:32 PM, Kenneth D. Merry k...@freebsd.org wrote: I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. I have a DLT 8000 and an SDLT 220. I don't have anything running current, but I have a spare machine which I could use for testing. Do you see any value is tests with that hardware? I'd be testing it via Bacula. disclosure: I'm the sysutils/bacula-* maintainer and a Bacula committer. Actually, yes. Bacula is a bit tricky to configure, so your trying it out would be helpful if you have the time. In looking at the manuals for both the SDLT 220 and the DLT 8000, they both claim to support long position information for the SCSI READ POSITION command. You can see what I'm talking about by doing: mt eod mt status On my DDS-4 tape drive, this shows: # mt -f /dev/nsa3 status Drive: sa3: SEAGATE DAT06240-XXX 8071 Serial Number: HJ00YWY - Mode Density Blocksize bpi Compression Current: 0x26:DDS-4 1024 bytes 97000enabled (DCLZ) - Current Driver State: at rest. - Partition: 0 Calc File Number: -1 Calc Record Number: -1 Residual:0 Reported File Number: -1 Reported Record Number: -1 Flags: None But on an LTO-5, which will give long position information, I get: [root@doc ~]# mt status Drive: sa0: IBM ULTRIUM-HH5 E4J1 - Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 enabled (0x1) - Current Driver State: at rest. - Partition: 0 Calc File Number: 2 Calc Record Number: -1 Residual:0 Reported File Number: 2 Reported Record Number: 32373 Flags: None That, in combination with the changes I made to the position information code in the driver, mean that even the old MTIOCGET ioctl should return an accurate file number at end of data. e.g., on the LTO-5: [root@doc ~]# mt ostatus Mode Density Blocksize bpi Compression Current: 0x58:LTO-5 variable 384607 0x1 -available modes- 0:0x58:LTO-5 variable 384607 0x1 1:0x58:LTO-5 variable 384607 0x1 2:0x58:LTO-5 variable 384607 0x1 3:0x58:LTO-5 variable 384607 0x1 - Current Driver State: at rest. - File Number: 2 Record Number: -1 Residual Count -1 So the thing to try, in addition to just making sure that Bacula continues to work properly, is to try setting this for the tape drive in bacula-sd.conf: Hardware End of Medium = yes It looks like the Bacula tape program (btape) has a test mode, and it would be good to run through the tests on one of the tape drives and see whether they work, and whether the results are different before and after the changes. I'm not sure how to enable the test mode. I'll let the other Bacula devs know about this. They deal with the hardware. I work on PostgreSQL. Thanks! If there are additional features they would like out of the tape driver, I'm happy to talk about it. (Or help if they'd like to use the new status reporting ioctl, MTIOCEXTGET or any of the other new ioctls.) Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
sa(4) driver changes available for test
I have a fairly large set of changes to the sa(4) driver and mt(1) driver that I'm planning to commit in the near future. A description of the changes is here and below in this message. If you have tape hardware and the inclination, I'd appreciate testing and feedback. Rough draft commit message: http://people.freebsd.org/~ken/sa_changes_commitmsg.20150213.3.txt The patches against FreeBSD/head as of SVN revision 278706: http://people.freebsd.org/~ken/sa_changes.20150213.3.txt And (untested) patches against FreeBSD stable/10 as of SVN revision 278721. http://people.freebsd.org/~ken/sa_changes.stable_10.20150213.3.txt The intent is to get the tape infrastructure more up to date, so we can support LTFS and more modern tape drives: http://www.ibm.com/systems/storage/tape/ltfs/ I have ported IBM's LTFS Single Drive Edition to FreeBSD. The port depends on the patches linked above. It isn't fully cleaned up and ready for redistribution. If you're interested, though, let me know and I'll tell you when it is ready to go out. You need an IBM LTO-5, LTO-6, TS1140 or TS1150 tape drive. HP drives aren't supported by IBM's LTFS, and older drives don't have the necessary features to support LTFS. The commit message below outlines most of the changes. A few comments: 1. I'm planning to commit the XPT_DEV_ADVINFO changes separately. 2. The XML output is similar to what GEOM and CTL do. It would be nice to figure out how to put a standard schema on it so that standard tools could read it. I don't know how feasible that is, since I haven't time to dig into it. If anyone has suggestions on whether that is feasible or advisable, I'd appreciate feedback. 3. I have tested with a reasonable amount of tape hardware (see below for a list), but more testing and feedback would be good. 4. Standard 'mt status' output looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP 5. 'mt status -v' looks like this: # mt -f /dev/nsa3 status -v Drive: sa3: IBM ULTRIUM-HH6 E4J1 Serial Number: 101500520A - Mode Density Blocksize bpi Compression Current: 0x5a:LTO-6 variable 384607 enabled (0xff) - Current Driver State: at rest. - Partition: 0 Calc File Number: 0 Calc Record Number: 0 Residual:0 Reported File Number: 0 Reported Record Number: 0 Flags: BOP - Tape I/O parameters: Maximum I/O size allowed by driver and controller (maxio): 1081344 bytes Maximum I/O size reported by controller (cpi_maxio): 5197824 bytes Maximum block size supported by tape drive and media (max_blk): 8388608 bytes Minimum block size supported by tape drive and media (min_blk): 1 bytes Block granularity supported by tape drive and media (blk_gran): 0 bytes Maximum possible I/O size (max_effective_iosize): 1081344 bytes 6. Existing applications should work without changes. If not, please let me know. Hopefully they will move over time to the new interfaces. 7. There are lots of additional features that could be added later. Append-only support, encryption, more log pages, etc. 8. I have SCSI READ ATTRIBUTE changes for camcontrol(8) that will go in separately. These changes allow displaying the contents of the MAM (Medium Auxiliary Memory) chips on LTO, TS and other modern tape drives. These are good, and a future possible direction is adding attributes to the status XML from the sa(4) driver. Significant upgrades to sa(4) and mt(1). The primary focus of these changes is to modernize FreeBSD's tape infrastructure so that we can take advantage of some of the features of modern tape drives and allow support for LTFS. Significant changes and new features include: o sa(4) driver status and parameter information is now exported via an XML structure. This will allow for changes and improvements later on that will not break userland applications. The old MTIOCGET status ioctl remains, so applications using the existing interface will not break. o 'mt status' now reports drive-reported tape position information as well as the previously available calculated tape position information. These numbers will be different at times, because the drive-reported block numbers are relative to BOP (Beginning of Partition), but the block numbers calculated previously via sa(4) (and still
Re: LSI SAS2008 mps(4) 4TB disk only shows 2TB on CURRENT r255089
On Sat, Aug 31, 2013 at 13:07:53 -0400, Sam Fourman Jr. wrote: Hello list I have two issues that may in fact be both related to the LSI SAS2008 card or the mps(4) driver. this server is running FreeBSD 10.0-CURRENT #0 r255089 1) All of the SSD disks are showing up at SATA2 300MB's but the card is in fact a 6GB Sata3 card.. 2) a Westren Digital 4TB disk only shows 2TB (connected to the LSI controller) full dmesg here https://gist.github.com/sfourman/6399419 Both problems are because the drives in question are plugged into an mpt(4) controller, not the mps(4) controller in the system. The first one is because mpt(4) controllers only support up to 3Gb of course, but the second one is because mpt(4) controllers don't support SATA drives over 2TB. Or more precisely don't let you access the capacity over 2TB. Both problems should go away if you plug them into the mps(4) controller. From the dmesg: mps0: LSI SAS2008 port 0xe000-0xe0ff mem 0xfeb3c000-0xfeb3,0xfeb4-0xfeb7 irq 24 at device 0.0 on pci9 mps0: Firmware: 15.00.00.00, Driver: 16.00.00.00-fbsd mps0: IOCCapabilities: 1285cScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc pcib2: ACPI PCI-PCI bridge irq 52 at device 3.0 on pci0 pci8: ACPI PCI bus on pcib2 mpt0: LSILogic SAS/SATA Adapter port 0xd000-0xd0ff mem 0xfe7ec000-0xfe7e,0xfe7f-0xfe7f irq 28 at device 0.0 on pci8 mpt0: MPI Version=1.5.20.0 [ ...] da0 at mpt0 bus 0 scbus1 target 1 lun 0 da0: ATA INTEL SSDSC2BB12 0350 Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C) cd0 at ata0 bus 0 scbus6 target 1 lun 0 cd0: MATSHITA DVD-ROM UJ8B0AC 1.00 Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed da1 at mpt0 bus 0 scbus1 target 2 lun 0 da1: ATA INTEL SSDSC2BB12 0350 Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C) da2 at mpt0 bus 0 scbus1 target 3 lun 0 da2: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C) da3 at mpt0 bus 0 scbus1 target 4 lun 0 da3: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing enabled da3: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C) da4 at mpt0 bus 0 scbus1 target 5 lun 0 da4: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing enabled da4: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C) da5 at mpt0 bus 0 scbus1 target 6 lun 0 da5: ATA INTEL SSDSC2BA80 0250 Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing enabled da5: 763097MB (1562824368 512 byte sectors: 255H 63S/T 97281C) da6 at mpt0 bus 0 scbus1 target 8 lun 0 da6: ATA WDC WD4000FYYZ-0 1K01 Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing enabled da6: 2097151MB (4294967294 512 byte sectors: 255H 63S/T 267349C) Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Removing an SDHC card causes a kernel panic on -current
On Wed, Jun 27, 2012 at 10:22:59 -0400, Michael Butler wrote: On 06/26/12 22:29, Kenneth D. Merry wrote: On Tue, Jun 26, 2012 at 19:41:07 -0400, Benjamin Kaduk wrote: On Tue, 26 Jun 2012, Michael Butler wrote: As follows, in g_disk_providergone, a NULL pointer reference?: g_disk_providergone() is new in r237518 (by ken); ken cc'd. Can you try the attached patch to sys/geom/geom_disk.c? This fixes the panic :-) Great! I just committed it. Also, do you have full dmesg information for when the panic happened? It looks like disk_destroy() has already been called in this case, and I suppose that's likely to happen for any of the users of the GEOM disk class that haven't been updated with the reference count changes I made in da(4). (i.e. all of the rest of them.) Let me know whether this works for you. All I have is the following leading up to my removal of the card (and the restart afterwards): Thanks! Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Removing an SDHC card causes a kernel panic on -current
On Tue, Jun 26, 2012 at 19:41:07 -0400, Benjamin Kaduk wrote: On Tue, 26 Jun 2012, Michael Butler wrote: As follows, in g_disk_providergone, a NULL pointer reference?: g_disk_providergone() is new in r237518 (by ken); ken cc'd. Can you try the attached patch to sys/geom/geom_disk.c? Also, do you have full dmesg information for when the panic happened? It looks like disk_destroy() has already been called in this case, and I suppose that's likely to happen for any of the users of the GEOM disk class that haven't been updated with the reference count changes I made in da(4). (i.e. all of the rest of them.) Let me know whether this works for you. Thanks, Ken -- Kenneth Merry k...@freebsd.org //depot/users/kenm/FreeBSD-test2/sys/geom/geom_disk.c#7 - /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/geom/geom_disk.c *** /tmp/tmp.75357.20 Tue Jun 26 20:25:44 2012 --- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/geom/geom_disk.cTue Jun 26 20:25:29 2012 *** *** 502,507 --- 502,515 struct g_disk_softc *sc; sc = (struct g_disk_softc *)pp-geom-softc; + + /* +* If the softc is already NULL, then we've probably been through +* g_disk_destroy already; there is nothing for us to do anyway. +*/ + if (sc == NULL) + return; + dp = sc-dp; if (dp-d_gone != NULL) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
On Fri, Jun 22, 2012 at 19:50:01 +0300, Alexander Motin wrote: Hi. I understand problem you are going to fix and I think your patch should do it. What I don't very like is addition of new GEOM method. Now GEOM doesn't need it because all internal open/close operations and provider destructions there protected by the topology SX lock. Unluckily that lock doesn't cover g_wither_provider(), called by disk_gone() while holding CAM SIM lock. If not that SIM lock, it would be enough to just grab and drop GEOM topology lock to ensure that no new open() calls will follow. Indirect way to do it could be to post GEOM event that would drop the reference as soon as it will be handled and can obtain the topology lock. Unluckily it uses malloc() for event storage and also can be unreliable if called from under the SIM mutex lock. So it seems many things would be much easier if it was possible to drop SIM lock inside periph invalidate method, but now it is unsafe That is not an objection, just some thoughts about. Yeah, there are things in CAM (and GEOM) that need to be cleaned up. I wouldn't have added a GEOM method if there were a reasonable way around it, but as you pointed out, there isn't right now. I committed the patch, and plan to merge it to stable/9. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 19:53:03 +0400, Andrey V. Elsukov wrote: On 21.06.2012 08:29, Kenneth D. Merry wrote: Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 23:58:10 +0400, Andrey V. Elsukov wrote: On 21.06.2012 20:48, Kenneth D. Merry wrote: In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. If i understand correctly your patch, you acquires a reference to the periph and release it when g_destroy_provider finished. What if you will queue some custom event from the disk_gone() that will call cddiskgonecb()? Does it close the race? This event will be executed after the taste completes. That still would not close the race. It would still be possible for another context to come along and open the device at any point. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
minor GEOM disk API change coming
Hi folks, I have attached some patches that fix an object lifetime issue between CAM and GEOM. Fixing the bug required adding a callback to the GEOM disk code, and adding a callback that a GEOM class can register to get notified when a provider is destroyed. The probable commit message is below. If I don't hear any objections, I will commit it on Friday, June 22nd. Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. The solution is to add a callback to the GEOM disk code that is called when all of its resources are cleaned up. This is implemented inside GEOM by adding an optional callback that is called when all consumers have detached from a provider, and the provider is about to be deleted. scsi_cd.c, scsi_da.c: In the register routine for the cd(4) and da(4) routines, acquire a reference to the CAM peripheral instance just before we call disk_create(). Use the new GEOM disk d_gone() callback to register a callback (dadiskgonecb()/cddiskgonecb()) that decrements the peripheral reference count once GEOM has finished cleaning up its resources. In the cd(4) driver, clean up open and close behavior slightly. GEOM makes sure we only get one open() and one close call, so there is no need to set an open flag and decrement the reference count if we are not the first open. In the cd(4) driver, use cam_periph_release_locked() in a couple of error scenarios to avoid extra mutex calls. geom.h: Add a new, optional, providergone callback that is called when a provider is about to be deleted. geom_disk.h:Add a new d_gone() callback to the GEOM disk interface. Bump the DISK_VERSION to version 2. This probably should have been done after a couple of previous changes, especially the addition of the d_getattr() callback. geom_disk.c:Add a providergone callback for the disk class, g_disk_providergone(), that calls the user's d_gone() callback if it exists. Bump the DISK_VERSION to 2. geom_subr.c:In g_destroy_provider(), call the providergone callback if it has been provided. In g_new_geomf(), propagate the class's providergone callback to the new geom instance. disk.9: Update the disk(9) man page to include information on the new d_gone() callback, as well as the previously added d_getattr() callback, d_descr field, and HBA PCI ID fields. Ken -- Kenneth Merry k...@freebsd.org //depot/users/kenm/FreeBSD-test/share/man/man9/disk.9#1 - /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9 *** /tmp/tmp.81866.21 Wed Jun 20 22:19:20 2012 --- /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9Wed Jun 20 21:30:45 2012 *** *** 145,150 --- 145,160 .Xr dumpon 8 , this function is invoked from a very restricted system state after a kernel panic to record a copy of the system RAM to the disk. + .It Vt disk_getattr_t * Va d_getattr + Optional: if this method is provided, it gives the disk driver the + opportunity to override the default GEOM response to BIO_GETATTR requests. + This function should return -1 if the attribute is not handled, 0 if the + attribute is handled, or an errno to be passed to g_io_deliver(). + .It Vt disk_gone_t * Va d_gone + Optional: if this method is provided, it will be called after disk_gone() + is called, once GEOM has finished its cleanup process. + Once this callback is called, it is safe for the disk driver to free all of + its resources, as it will not be receiving further calls from GEOM. .El .Ss Mandatory Media Properties The following fields identify the size
Re: LSI supported mps(4) driver available
On Tue, Mar 27, 2012 at 23:50:31 +1030, Matt Thyer wrote: On 26 March 2012 23:55, Gary Palmer gpal...@freebsd.org wrote: On Mon, Mar 26, 2012 at 08:05:59PM +1030, Matt Thyer wrote: On Mar 26, 2012 3:43 AM, Garrett Cooper yaneg...@gmail.com wrote: On Sun, Mar 25, 2012 at 5:16 AM, Matt Thyer matt.th...@gmail.com wrote: Has this driver been MFC to 8-STABLE yet ? I'm asking because I updated my NAS on the 4th of March from 8-STABLE r225723 to r232477 and am now seeing 157,000 interrupts per second on irq 16 where my SuperMicro AOC-USAS2-L8i resides (this card uses the LSI SAS2008 chip). [snip] After encountering this problem I updated my firmware from phase 7 to phase 11 but this did not fix things. My question is: Is the LSI driver even in 8-STABLE yet?. If not I'll upgrade to 9-STABLE to get the new driver. If it is, then I want to downgrade to just before it came in to see if this high interrupt rate problem is fixed. I'm no export in svn, however: http://svnweb.freebsd.org/base?view=revisionamp;revision=230922 would appear to suggest that the new driver is in 8-Stable Gary It's painful to take this system back to r230921 due to intolerance for downtime from it's users so I'd like to investigate the cause of the problem and try patches/sysctls/whatever first. The drives I'm using are 7 x WDC WD20EARS-00M (3 are AB50, 4 are AB51) and 1 x WD20EARX-00P AB51. The WD20EARX-00P AB51 is a SATA 3 (6 Gbps) drive but the others are all SATA 2 (3 Gbps). I know the driver doesn't like mixed speeds in IR mode but I'm flashed with IT firmware as ZFS is doing my RAID (raidz2). I was having problems with the WD20EARX-00P AB51 drive being faulted by ZFS until I updated the firmware to 11 and now ZFS is happy (I've also done a full extended drive SMART test and the drive is fine). So what do people suggest (before reversion to r230921) ? If you're going to prove that it's the new LSI driver, you will probably have to go back to the old driver. You don't have to back out your entire tree, you can just back out the driver itself if you have an SVN tree. You can go into sys/dev/mps and do: svn update -r 230714 And then edit sys/conf/files and comment out these three lines: dev/mps/mps_config.coptional mps dev/mps/mps_mapping.c optional mps dev/mps/mps_sas_lsi.c optional mps Then you should be able to rebuild your kernel with the old driver and see if the problem occurs again. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Tue, Feb 07, 2012 at 21:00:28 +0530, Desai, Kashyap wrote: Can you to reproduce issue with below mentioned changes.. In mps.c mps_get_tunables(struct mps_softc *sc) { char tmpstr[80]; /* XXX default to some debugging for now */ sc-mps_debug = MPS_FAULT; Instead of above line make sc-mps_debug = 0xd; You can also put the following in /boot/loader.conf: hw.mps.debug_level=0xd Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Wed, Jan 25, 2012 at 20:47:37 -0800, Dennis Glatting wrote: On Fri, 2012-01-20 at 13:44 -0700, Kenneth D. Merry wrote: The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Please test it out and let me know if you run into any problems. In addition to supporting WarpDrive, the driver also supports Integrated RAID. Thanks to LSI for doing the work on this driver! Does this include the SAS2008 series chips? I have two systems, one a Tyan FT48-B8812 with a S8812 MB and Interlagos chips, where I am interested in using a driver under 9.0 amd64. Yes. The driver in 9.0 supports the 2008 as well. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Firewire disk/tape access stopped working after recent CAM commit
On Tue, Jan 24, 2012 at 00:03:56 -0600, Richard Todd wrote: On Mon, Jan 23, 2012 at 11:16:05AM -0700, Kenneth D. Merry wrote: If you can, please try the attached patch and see if it has any impact on the problem. There is a bug in that commit in that we shouldn't be invalidating all LUNs on a target when we get a status of CAM_DEV_NOT_THERE. Just applied the patch, built new kernel, and rebooted, and all the FW drivees are showing up now. Thanks! Great! It may be that we need to do a more thorough audit of how various SIM drivers are using the CAM_DEV_NOT_THERE status. So I take it the layers for the different hardware (SCSI, FW, USB, ATA/AHCI) are handling this status differently, so that's why this bug only showed up on the Firewire buses but not on ATA/AHCI, USB, or (on my other machine) SCSI buses? Yes. Some drivers report a selection timeout, some report that the device isn't there, and they may be using those status values in different situations. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Firewire disk/tape access stopped working after recent CAM commit
On Sun, Jan 22, 2012 at 20:52:38 -0600, Richard Todd wrote: Hi. I tried upgrading my amd64 10-CURRENT box to the most recent -CURRENT code and found that the new kernel couldn't find my two disks and tape drive that are on a Firewire bus. All the USB and AHCI-attached hardware still showed up okay, it's just the Firewire stuff that failed to show up properly on boot. Spent today doing binary search to find the responsible commit and it looks to be this one: r23 | ken | 2012-01-11 18:41:48 -0600 (Wed, 11 Jan 2012) | 72 lines Fix a race condition in CAM peripheral free handling, locking in the CAM XPT bus traversal code, and a number of other periph level issues. Not sure what in this commit triggers the problem, or why it just hits Firewire and not the rest of the system. I've built kernels both right before and right after the r23 commit, with CAM debugging turned on real high on the firewire bus in question, bus 0 (hardwired to that number in device.hints, if that matters) options CAMDEBUG options CAM_DEBUG_BUS=0 options CAM_DEBUG_TARGET=-1 options CAM_DEBUG_LUN=-1 options CAM_DEBUG_FLAGS=CAM_DEBUG_INFO|CAM_DEBUG_TRACE|CAM_DEBUG_CDB and got dmesgs of both the bad (r23) and good (pre-r23) kernels, which I've put online at http://ln.servalan.com/rmtodd/bug1/dmesg.bad and http://ln.servalan.com/rmtodd/bug1/dmesg.good, respectively. They're a bit lengthy, what with all that debug info. Grepping out the info for one of the targets (disk 0, sbp0:0:0:0) and just looking at the lines for that one, we see that the good kernel does a lot more with that target, starting with the (noperiph:sbp0:0:0:0): xpt_compile_path bit, that the bad kernel doesn't do, as seen in the diff below. Not sure what's going on here, but if anyone has suggestions on more things I can test/debug code I can add to track this down further, let me know. Thanks for testing this out, and for sending all of the debugging output! If you can, please try the attached patch and see if it has any impact on the problem. There is a bug in that commit in that we shouldn't be invalidating all LUNs on a target when we get a status of CAM_DEV_NOT_THERE. It may be that we need to do a more thorough audit of how various SIM drivers are using the CAM_DEV_NOT_THERE status. Thanks, Ken -- Kenneth Merry k...@freebsd.org //depot/users/kenm/FreeBSD-test2/sys/cam/cam_periph.c#7 - /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/cam/cam_periph.c *** /tmp/tmp.87992.13 Mon Jan 23 11:11:36 2012 --- /usr/home/kenm/perforce4/kenm/FreeBSD-test2/sys/cam/cam_periph.cMon Jan 23 10:53:13 2012 *** *** 1864,1876 case CAM_DEV_NOT_THERE: { struct cam_path *newpath; error = ENXIO; /* Should we do more if we can't create the path?? */ if (xpt_create_path(newpath, periph, xpt_path_path_id(ccb-ccb_h.path), xpt_path_target_id(ccb-ccb_h.path), ! CAM_LUN_WILDCARD) != CAM_REQ_CMP) break; /* --- 1864,1889 case CAM_DEV_NOT_THERE: { struct cam_path *newpath; + lun_id_t lun_id; error = ENXIO; + + /* +* For a selection timeout, we consider all of the LUNs on +* the target to be gone. If the status is CAM_DEV_NOT_THERE, +* then we only get rid of the device(s) specified by the +* path in the original CCB. +*/ + if (status == CAM_DEV_NOT_THERE) + lun_id = xpt_path_lun_id(ccb-ccb_h.path); + else + lun_id = CAM_LUN_WILDCARD; + /* Should we do more if we can't create the path?? */ if (xpt_create_path(newpath, periph, xpt_path_path_id(ccb-ccb_h.path), xpt_path_target_id(ccb-ccb_h.path), ! lun_id) != CAM_REQ_CMP) break; /* ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
LSI supported mps(4) driver available
The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Please test it out and let me know if you run into any problems. In addition to supporting WarpDrive, the driver also supports Integrated RAID. Thanks to LSI for doing the work on this driver! I have added a number of other infrastructure changes that are necessary for the driver, and here is a brief summary: - A new Advanced Information buffer is now added to the EDT for drives that support READ CAPACITY (16). The da(4) driver updates this buffer when it grabs new read capacity data from a drive. - The mps(4) driver will look for Advanced Information state change async events, and updates its table of drives with protection information turned on accordingly. - The size of struct scsi_read_capacity_data_long has been bumped up to the amount specified in the latest SBC-3 draft. The hope is to avoid some future structure size bumps with that change. The API for scsi_read_capacity_16() has been changed to add a length argument. Hopefully this will future-proof it somewhat. - __FreeBSD_version bumped for the addition of the Advanced Information buffer with the read capacity information. The mps(4) driver has a kludgy way of getting the information on versions of FreeBSD without this change. I believe that the CAM API changes are mild enough and beneficial enough for a merge into stable/9, but they are intertwined with the unmap changes in the da(4) driver, so those changes will have to go back to stable/9 as well in order to MFC the full set of changes. Otherwise it'll just be the driver that gets merged into stable/9, and it'll use the kludgy method of getting the read capacity data for each drive. A couple of notes about issues with this driver: - Unlike the current mps(4) driver, it probes sequentially. If you have a lot of drives in your system, it will take a while to probe them all. - You may see warning messages like this: _mapping_add_new_device: failed to add the device with handle 0x0019 to persiste nt table because there is no free space available _mapping_add_new_device: failed to add the device with handle 0x001a to persiste nt table because there is no free space available - The driver is not endian safe. (It assumes a little endian machine.) This is not new, the driver in the tree has the same issue. The LSI folks know about these issues. The driver has passed their testing process. Many thanks to LSI for going through the effort to support FreeBSD. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Fri, Jan 20, 2012 at 12:53:04 -0800, Freddie Cash wrote: On Fri, Jan 20, 2012 at 12:44 PM, Kenneth D. Merry k...@freebsd.org wrote: The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: Just to clarify, this will replace the existing mps(4) driver in FreeBSD 10-CURRENT and 9-STABLE? That is correct. So there won't be mps(4) (FreeBSD driver) and mpslsi(4) (LSI driver) anymore? Just mps(4)? Right. Just mps(4), which will be the LSI driver. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Fri, Jan 20, 2012 at 23:14:20 -, Steven Hartland wrote: - Original Message - From: Kenneth D. Merry k...@freebsd.org To: freebsd-s...@freebsd.org; freebsd-current@freebsd.org Sent: Friday, January 20, 2012 8:44 PM Subject: LSI supported mps(4) driver available The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Great to see this being done, thanks to everyone! Be even better to see this MFC'ed to 8.x as well if all goes well. Do you think this will possible? Yes, that should be doable as well. It's unlikely that all of the CAM changes will get merged back, but the driver itself shouldn't be a problem. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ctlstat not building with clang
On Thu, Jan 12, 2012 at 14:59:11 -0600, Dan McGregor wrote: Building world with clang now (as of r229997) no longer compiles because ctlstat was imported into the tree. The error is: clang -O2 -pipe -I/usr/src/usr.bin/ctlstat/../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /usr/src/usr.bin/ctlstat/ctlstat.c /usr/src/usr.bin/ctlstat/ctlstat.c:149:35: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security] fprintf(error ? stderr : stdout, ctlstat_usage); ^ 1 error generated. *** Error code 1 Stop in /usr/src/usr.bin/ctlstat How do people feel about the attached patch that turns a call to fprintf to fputs? Looks fine, I just committed it. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: CAM Target Layer available
On Wed, Jan 04, 2012 at 21:53:11 -0700, Kenneth D. Merry wrote: The CAM Target Layer (CTL) is now available for testing. I am planning to commit it to to head next week, barring any major objections. CTL is a disk and processor device emulation subsystem originally written for Copan Systems under Linux starting in 2003. It has been shipping in Copan (now SGI) products since 2005. It was ported to FreeBSD in 2008, and thanks to an agreement between SGI (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is available under a BSD-style license. The intent behind the agreement was that Spectra would work to get CTL into the FreeBSD tree. The patches are against FreeBSD/head as of SVN change 229516 and are located here: http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz The code is not perfect (few pieces of software are), but is in good shape from a functional standpoint. My intent is to get it out there for other folks to use, and perhaps help with improvements. There are a few other CAM changes included with these diffs, some of which will be committed separately from CTL, some concurrently. This is a quick summary: - Fix a panic in the da(4) driver when a drive disappears on boot. - Fix locking in the CAM EDT traversal code. - Add an optional sysctl/tunable (disabled by default) to suppress duplicate devices. This most frequently shows up with dual ported SAS drives. - Add some very basic error injection into the da(4) driver. - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with more recent SCSI specs. CTL Features: - Disk and processor device emulation. - Tagged queueing - SCSI task attribute support (ordered, head of queue, simple tags) - SCSI implicit command ordering support. (e.g. if a read follows a mode select, the read will be blocked until the mode select completes.) - Full task management support (abort, LUN reset, target reset, etc.) - Support for multiple ports - Support for multiple simultaneous initiators - Support for multiple simultaneous backing stores - Persistent reservation support - Mode sense/select support - Error injection support - High Availability support (1) - All I/O handled in-kernel, no userland context switch overhead. (1) HA Support is just an API stub, and needs much more to be fully functional. See the to-do list below. Configuring and Running CTL: === - After applying the CTL patchset to your tree, build world and install it on your target system. - Add 'device ctl' to your kernel configuration file. - If you're running with a 8Gb or 4Gb Qlogic FC board, add 'options ISP_TARGET_MODE' to your kernel config file. 'device ispfw' or loading the ispfw module is also recommended. - Rebuild and install a new kernel. - Reboot with the new kernel. - To add a LUN with the RAM disk backend: ctladm create -b ramdisk -s 10485760 ctladm port -o on - You should now see the CTL disk LUN through camcontrol devlist: scbus6 on ctl2cam0 bus 0: FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32) at scbus6 target -1 lun -1 () This is visible through the CTL CAM SIM. This allows using CTL without any physical hardware. You should be able to issue any normal SCSI commands to the device via the pass(4)/da(4) devices. If any target-capable HBAs are in the system (e.g. isp(4)), and have target mode enabled, you should now also be able to see the CTL LUNs via that target interface. Note that all CTL LUNs are presented to all frontends. There is no LUN masking, or separate, per-port configuration. - Note that the ramdisk backend is a fake ramdisk. That is, it is backed by a small amount of RAM that is used for all I/O requests. This is useful for performance testing, but not for any data integrity tests. - To add a LUN with the block/file backend: truncate -s +1T myfile ctladm create -b block -o file=myfile ctladm port -o on - You can also see a list of LUNs and their backends like this: # ctladm devlist LUN Backend Size (Blocks) BS Serial NumberDevice ID 0 block2147483648 512 MYSERIAL 0 MYDEVID 0 1 block2147483648 512 MYSERIAL 1 MYDEVID 1 2 block2147483648 512 MYSERIAL 2 MYDEVID 2 3 block2147483648 512 MYSERIAL 3 MYDEVID 3 4 block2147483648 512 MYSERIAL 4 MYDEVID 4 5 block2147483648 512 MYSERIAL 5 MYDEVID 5 6 block2147483648 512 MYSERIAL 6 MYDEVID 6 7 block2147483648 512 MYSERIAL 7 MYDEVID 7 8 block2147483648 512 MYSERIAL 8
CAM Target Layer available
The CAM Target Layer (CTL) is now available for testing. I am planning to commit it to to head next week, barring any major objections. CTL is a disk and processor device emulation subsystem originally written for Copan Systems under Linux starting in 2003. It has been shipping in Copan (now SGI) products since 2005. It was ported to FreeBSD in 2008, and thanks to an agreement between SGI (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is available under a BSD-style license. The intent behind the agreement was that Spectra would work to get CTL into the FreeBSD tree. The patches are against FreeBSD/head as of SVN change 229516 and are located here: http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz The code is not perfect (few pieces of software are), but is in good shape from a functional standpoint. My intent is to get it out there for other folks to use, and perhaps help with improvements. There are a few other CAM changes included with these diffs, some of which will be committed separately from CTL, some concurrently. This is a quick summary: - Fix a panic in the da(4) driver when a drive disappears on boot. - Fix locking in the CAM EDT traversal code. - Add an optional sysctl/tunable (disabled by default) to suppress duplicate devices. This most frequently shows up with dual ported SAS drives. - Add some very basic error injection into the da(4) driver. - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with more recent SCSI specs. CTL Features: - Disk and processor device emulation. - Tagged queueing - SCSI task attribute support (ordered, head of queue, simple tags) - SCSI implicit command ordering support. (e.g. if a read follows a mode select, the read will be blocked until the mode select completes.) - Full task management support (abort, LUN reset, target reset, etc.) - Support for multiple ports - Support for multiple simultaneous initiators - Support for multiple simultaneous backing stores - Persistent reservation support - Mode sense/select support - Error injection support - High Availability support (1) - All I/O handled in-kernel, no userland context switch overhead. (1) HA Support is just an API stub, and needs much more to be fully functional. See the to-do list below. Configuring and Running CTL: === - After applying the CTL patchset to your tree, build world and install it on your target system. - Add 'device ctl' to your kernel configuration file. - If you're running with a 8Gb or 4Gb Qlogic FC board, add 'options ISP_TARGET_MODE' to your kernel config file. 'device ispfw' or loading the ispfw module is also recommended. - Rebuild and install a new kernel. - Reboot with the new kernel. - To add a LUN with the RAM disk backend: ctladm create -b ramdisk -s 10485760 ctladm port -o on - You should now see the CTL disk LUN through camcontrol devlist: scbus6 on ctl2cam0 bus 0: FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32) at scbus6 target -1 lun -1 () This is visible through the CTL CAM SIM. This allows using CTL without any physical hardware. You should be able to issue any normal SCSI commands to the device via the pass(4)/da(4) devices. If any target-capable HBAs are in the system (e.g. isp(4)), and have target mode enabled, you should now also be able to see the CTL LUNs via that target interface. Note that all CTL LUNs are presented to all frontends. There is no LUN masking, or separate, per-port configuration. - Note that the ramdisk backend is a fake ramdisk. That is, it is backed by a small amount of RAM that is used for all I/O requests. This is useful for performance testing, but not for any data integrity tests. - To add a LUN with the block/file backend: truncate -s +1T myfile ctladm create -b block -o file=myfile ctladm port -o on - You can also see a list of LUNs and their backends like this: # ctladm devlist LUN Backend Size (Blocks) BS Serial NumberDevice ID 0 block2147483648 512 MYSERIAL 0 MYDEVID 0 1 block2147483648 512 MYSERIAL 1 MYDEVID 1 2 block2147483648 512 MYSERIAL 2 MYDEVID 2 3 block2147483648 512 MYSERIAL 3 MYDEVID 3 4 block2147483648 512 MYSERIAL 4 MYDEVID 4 5 block2147483648 512 MYSERIAL 5 MYDEVID 5 6 block2147483648 512 MYSERIAL 6 MYDEVID 6 7 block2147483648 512 MYSERIAL 7 MYDEVID 7 8 block2147483648 512 MYSERIAL 8 MYDEVID 8 9 block2147483648 512 MYSERIAL 9 MYDEVID 9 10 block2147483648 512 MYSERIAL 10 MYDEVID 10 11 block
CAM Target Layer available
The CAM Target Layer (CTL) is now available for testing. I am planning to commit it to to head next week, barring any major objections. CTL is a disk and processor device emulation subsystem originally written for Copan Systems under Linux starting in 2003. It has been shipping in Copan (now SGI) products since 2005. It was ported to FreeBSD in 2008, and thanks to an agreement between SGI (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is available under a BSD-style license. The intent behind the agreement was that Spectra would work to get CTL into the FreeBSD tree. The attached patches are against FreeBSD/head as of SVN change 229516. They are also located here: http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz The code is not perfect (few pieces of software are), but is in good shape from a functional standpoint. My intent is to get it out there for other folks to use, and perhaps help with improvements. There are a few other CAM changes included with these diffs, some of which will be committed separately from CTL, some concurrently. This is a quick summary: - Fix a panic in the da(4) driver when a drive disappears on boot. - Fix locking in the CAM EDT traversal code. - Add an optional sysctl/tunable (disabled by default) to suppress duplicate devices. This most frequently shows up with dual ported SAS drives. - Add some very basic error injection into the da(4) driver. - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with more recent SCSI specs. CTL Features: - Disk and processor device emulation. - Tagged queueing - SCSI task attribute support (ordered, head of queue, simple tags) - SCSI implicit command ordering support. (e.g. if a read follows a mode select, the read will be blocked until the mode select completes.) - Full task management support (abort, LUN reset, target reset, etc.) - Support for multiple ports - Support for multiple simultaneous initiators - Support for multiple simultaneous backing stores - Persistent reservation support - Mode sense/select support - Error injection support - High Availability support (1) - All I/O handled in-kernel, no userland context switch overhead. (1) HA Support is just an API stub, and needs much more to be fully functional. See the to-do list below. Configuring and Running CTL: === - After applying the CTL patchset to your tree, build world and install it on your target system. - Add 'device ctl' to your kernel configuration file. - If you're running with a 8Gb or 4Gb Qlogic FC board, add 'options ISP_TARGET_MODE' to your kernel config file. 'device ispfw' or loading the ispfw module is also recommended. - Rebuild and install a new kernel. - Reboot with the new kernel. - To add a LUN with the RAM disk backend: ctladm create -b ramdisk -s 10485760 ctladm port -o on - You should now see the CTL disk LUN through camcontrol devlist: scbus6 on ctl2cam0 bus 0: FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32) at scbus6 target -1 lun -1 () This is visible through the CTL CAM SIM. This allows using CTL without any physical hardware. You should be able to issue any normal SCSI commands to the device via the pass(4)/da(4) devices. If any target-capable HBAs are in the system (e.g. isp(4)), and have target mode enabled, you should now also be able to see the CTL LUNs via that target interface. Note that all CTL LUNs are presented to all frontends. There is no LUN masking, or separate, per-port configuration. - Note that the ramdisk backend is a fake ramdisk. That is, it is backed by a small amount of RAM that is used for all I/O requests. This is useful for performance testing, but not for any data integrity tests. - To add a LUN with the block/file backend: truncate -s +1T myfile ctladm create -b block -o file=myfile ctladm port -o on - You can also see a list of LUNs and their backends like this: # ctladm devlist LUN Backend Size (Blocks) BS Serial NumberDevice ID 0 block2147483648 512 MYSERIAL 0 MYDEVID 0 1 block2147483648 512 MYSERIAL 1 MYDEVID 1 2 block2147483648 512 MYSERIAL 2 MYDEVID 2 3 block2147483648 512 MYSERIAL 3 MYDEVID 3 4 block2147483648 512 MYSERIAL 4 MYDEVID 4 5 block2147483648 512 MYSERIAL 5 MYDEVID 5 6 block2147483648 512 MYSERIAL 6 MYDEVID 6 7 block2147483648 512 MYSERIAL 7 MYDEVID 7 8 block2147483648 512 MYSERIAL 8 MYDEVID 8 9 block2147483648 512 MYSERIAL 9 MYDEVID 9 10 block2147483648 512 MYSERIAL 10 MYDEVID
Re: SCSI descriptor sense changes, testing needed
This has been committed to head, and the plan is to get it into stable/9 in time for 9.0. Please let me know if you run into any problems with the changes. Thanks, Ken On Fri, Sep 30, 2011 at 23:19:14 -0600, Kenneth D. Merry wrote: I have attached a new version of the patches, with a number of changes. One issue that has cropped up is that the previous sense code and my new descriptor sense changes never paid any attention to the actual length of the sense data returned by the controller. I have changed all of the error recovery code and sense printing code to honor the sense data length in the CAM CCB. One other problem related to that is that many controller drivers don't set the sense residual field in struct ccb_scsiio properly, or don't set it at all. This patch includes changes to the isp, mps, mpt, umass, and ciss drivers to set the sense_resid field properly. There are lots of other drivers in the system, however, that haven't been audited, and may or may not set the sense residual correctly. I also fixed an issue reported by Fabian Keil that showed up with the ahci driver. In reverting a change I have in my local tree to switch to a 2 byte length field in the SCSI inquiry CDB, I accidently shortened the CDB to 5 bytes. Oops. I'd really appreciate more feedback; Fabian is the only person to report testing the previous patch. Thanks, Ken On Thu, Sep 22, 2011 at 13:33:05 -0600, Kenneth D. Merry wrote: I have attached a set of patches against head that implement SCSI descriptor sense support for CAM. Descriptor sense is a new sense (SCSI error) format introduced in the SPC-3 spec in 2006. FreeBSD doesn't currently support it. Seagate's new 3TB SAS drives come with descriptor sense enabled by default, and it's possible that other newer drives do as well. Because all the sense key, additional sense code, and additional sense code qualifier fields are in different places, the CAM error recovery code will not do the right thing when it gets descriptor sense. These patches do bump up the size of struct scsi_sense_data, and so I have incremented CAM_VERSION as well. I have discussed this with re@, and it looks like we'll be putting the changes in before 9.0, so it ships with support for newer SCSI devices. A number of things have changed in these patches, but in particular, it would be good to test the following: - The sa(4) (SCSI tape) driver. The residual handling code, which looks at the sense data, has changed. - The Playstation 3 CDROM driver. - Firewire target mode. - umass devices with the NO_INQUIRY_EVPD quirk. Also, please let me know if you see any anomalies with the sense printing code. In the common cases the output should look identical to the old code, but in some cases it will be a little different. e.g.: # camcontrol inquiry da40 -v pass47: SEAGATE ST33000650SS 0002 Fixed Direct Access SCSI-6 device pass47: Serial Number 9XK0GAJ7S125XDNU pass47: 300.000MB/s transfers, Command Queueing Enabled (Seagate 3TB drive) # camcontrol modepage da40 -m 10 |grep D_SENSE D_SENSE: 1 (Descriptor sense is enabled) # camcontrol modepage da40 -m 15 -v (pass47:mps1:0:47:0): MODE SENSE(6). CDB: 1a 0 4f 0 ff 0 (pass47:mps1:0:47:0): CAM status: SCSI Status Error (pass47:mps1:0:47:0): SCSI status: Check Condition (pass47:mps1:0:47:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (pass47:mps1:0:47:0): Field Replaceable Unit: 1 (pass47:mps1:0:47:0): Command byte 2 bit 5 is invalid (pass47:mps1:0:47:0): Descriptor 0x80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 camcontrol: error sending mode sense command (The FRU and Sense Key Specific entries are on separate lines, and a vendor-specific sense descriptor is printed out in hex format.) Anyway, I'd appreciate any testing and feedback on these changes. As I said, they will probably be in 9.0, so if there are any issues it would be better to find them now. :) Thanks, Ken -- Kenneth Merry k...@freebsd.org -- Kenneth Merry k...@freebsd.org -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCSI descriptor sense changes, testing needed
On Tue, Sep 27, 2011 at 21:46:03 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: On Sat, Sep 24, 2011 at 21:27:22 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: I have attached a set of patches against head that implement SCSI descriptor sense support for CAM. Anyway, I'd appreciate any testing and feedback on these changes. As I said, they will probably be in 9.0, so if there are any issues it would be better to find them now. :) I've been using the patch on a ThinkPad R500 since yesterday and just reverted it today again to get my kernel closer to HEAD before looking into some (probably unrelated) panics. I didn't notice it while using the patch, but it looks like the kernel wasn't able to pick up cd0 anymore: Hmm. I don't think any of the changes would have caused this, but evidently something did... Let's see if we can debug it... I have attached a patch to add some debugging output, and I see at least one interesting thing in the logs below. Can you re-apply the descriptor sense patch, and then try the attached debugging patch as well? Sure. I believe this is fixed with my latest set of patches. Can you try them and let me know? Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCSI descriptor sense changes, testing needed
On Sat, Sep 24, 2011 at 21:27:22 +0200, Fabian Keil wrote: Kenneth D. Merry k...@freebsd.org wrote: I have attached a set of patches against head that implement SCSI descriptor sense support for CAM. Anyway, I'd appreciate any testing and feedback on these changes. As I said, they will probably be in 9.0, so if there are any issues it would be better to find them now. :) I've been using the patch on a ThinkPad R500 since yesterday and just reverted it today again to get my kernel closer to HEAD before looking into some (probably unrelated) panics. I didn't notice it while using the patch, but it looks like the kernel wasn't able to pick up cd0 anymore: Hmm. I don't think any of the changes would have caused this, but evidently something did... Let's see if we can debug it... I have attached a patch to add some debugging output, and I see at least one interesting thing in the logs below. Can you re-apply the descriptor sense patch, and then try the attached debugging patch as well? fk@r500 ~ $grep -h new dis /var/log/messages /var/log/messages.[123] | sort Sep 21 23:40:23 r500 kernel: GEOM: new disk da0 Sep 21 23:40:30 r500 kernel: GEOM: new disk da1 Sep 21 23:45:21 r500 kernel: GEOM: new disk ada0 Sep 21 23:45:21 r500 kernel: GEOM: new disk cd0 Sep 21 23:45:21 r500 kernel: GEOM: new disk da0 Sep 21 23:45:21 r500 kernel: GEOM: new disk da1 Sep 21 23:52:44 r500 kernel: GEOM: new disk ada0 Sep 21 23:52:44 r500 kernel: GEOM: new disk cd0 Sep 21 23:53:14 r500 kernel: GEOM: new disk da0 Sep 21 23:56:23 r500 kernel: GEOM: new disk da1 Sep 22 21:14:17 r500 kernel: GEOM: new disk ada0 Sep 22 21:14:17 r500 kernel: GEOM: new disk cd0 Sep 22 22:10:20 r500 kernel: GEOM: new disk da0 [patch applied] Sep 22 23:29:45 r500 kernel: GEOM: new disk da0 Sep 23 14:38:31 r500 kernel: GEOM: new disk ada0 Sep 23 17:19:40 r500 kernel: GEOM: new disk da0 Sep 23 19:20:21 r500 kernel: GEOM: new disk da0 Sep 23 19:20:42 r500 kernel: GEOM: new disk da1 Sep 23 22:58:56 r500 kernel: GEOM: new disk da0 Sep 24 09:31:02 r500 kernel: GEOM: new disk ada0 Sep 24 14:17:22 r500 kernel: GEOM: new disk da0 Sep 24 14:44:03 r500 kernel: GEOM: new disk ada0 Sep 24 14:44:03 r500 kernel: GEOM: new disk da0 Sep 24 14:53:30 r500 kernel: GEOM: new disk ada0 Sep 24 15:03:24 r500 kernel: GEOM: new disk da0 Sep 24 15:06:03 r500 kernel: GEOM: new disk da0 Sep 24 15:13:57 r500 kernel: GEOM: new disk ada0 Sep 24 15:14:16 r500 kernel: GEOM: new disk da0 Sep 24 15:27:11 r500 kernel: GEOM: new disk ada0 Sep 24 15:28:05 r500 kernel: GEOM: new disk da0 Sep 24 15:32:10 r500 kernel: GEOM: new disk ada0 Sep 24 15:32:10 r500 kernel: GEOM: new disk da0 Sep 24 15:38:16 r500 kernel: GEOM: new disk ada0 Sep 24 15:38:16 r500 kernel: GEOM: new disk da0 Sep 24 15:43:33 r500 kernel: GEOM: new disk ada0 Sep 24 15:43:33 r500 kernel: GEOM: new disk da0 Sep 24 15:49:30 r500 kernel: GEOM: new disk ada0 [patch reverted] Sep 24 19:32:51 r500 kernel: GEOM: new disk ada0 Sep 24 19:32:51 r500 kernel: GEOM: new disk cd0 Sep 24 19:32:51 r500 kernel: GEOM: new disk da0 Sep 24 19:38:07 r500 kernel: GEOM: new disk ada0 Sep 24 19:38:07 r500 kernel: GEOM: new disk cd0 Without the patch I'm used to getting the following kernel messages when booting (without a disc in cd0): Sep 24 19:32:51 r500 kernel: ahcich0: AHCI reset: device ready after 100ms Sep 24 19:32:51 r500 kernel: (aprobe0:ahcich0:0:0:0): SIGNATURE: Sep 24 19:32:51 r500 kernel: ahcich1: AHCI reset: device ready after 100ms Sep 24 19:32:51 r500 kernel: (aprobe1:ahcich1:0:0:0): SIGNATURE: eb14 Sep 24 19:32:51 r500 kernel: GEOM: new disk cd0 Sep 24 19:32:51 r500 kernel: pass0 at ahcich0 bus 0 scbus0 target 0 lun 0 Sep 24 19:32:51 r500 kernel: pass0: HITACHI HTS543225L9SA00 FBEZC4EC ATA-8 SATA 1.x device Sep 24 19:32:51 r500 kernel: pass0: Serial Number 090509FB2F32LLEY6D8A Sep 24 19:32:51 r500 kernel: pass0: 150.000MB/s transfers (SATA 1.x, UDMA5, PIO 8192bytes) Sep 24 19:32:51 r500 kernel: pass0: Command Queueing enabled Sep 24 19:32:51 r500 kernel: pass1 at ahcich1 bus 0 scbus1 target 0 lun 0 Sep 24 19:32:51 r500 kernel: pass1: HL-DT-ST DVDRAM GSA-T50N RX05 Removable CD-ROM SCSI-0 device Sep 24 19:32:51 r500 kernel: pass1: Serial Number M2R96NC0647 Sep 24 19:32:51 r500 kernel: pass1: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes) Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status Error Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condition Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Sep 24 19:32:51 r500 kernel: (cd0:ahcich1:0:0:0): Retrying command (per sense data) Sep 24 19:32:51 r500 kernel: ada0
Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)
On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote: On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote: These two are interesting: http://img825.imageshack.us/img825/1249/21062011014m.jpg http://img839.imageshack.us/img839/3791/21062011015.jpg It looks like the GEOM event thread is stuck inside the cd(4) driver. The cd(4) driver is trying to acquire the peripheral lock, and is sleeping until it gets it. What isn't clear is who is holding it. The ps output shows an idle thread running on CPU 1, and thread 100014 (taskq) running on CPU 0. Unfortunately I don't see a stack trace for that. (I might have missed it.) Do you happen to have the image with the stack trace for that thread? I don't have the image because no disks are mounted at that stage and the swap slice is not attached. But I can issue more specific DDB commands to narrow it down, just say what you need in detail. BTW, the machine have 2 DVD both are attached to Marvell IDE plain ATA interface, they always works before. Are you sure that something holding the lock? 'show lock' shows absolutely nothing, it is empty. Well, after looking at the code a little more, it looks like the lock that is being held is the periph lock, which is really just a flag. So 'show lock' wouldn't show anything relevant. Here's cam_periph_hold(): int cam_periph_hold(struct cam_periph *periph, int priority) { int error; /* * Increment the reference count on the peripheral * while we wait for our lock attempt to succeed * to ensure the peripheral doesn't disappear out * from user us while we sleep. */ if (cam_periph_acquire(periph) != CAM_REQ_CMP) return (ENXIO); mtx_assert(periph-sim-mtx, MA_OWNED); while ((periph-flags CAM_PERIPH_LOCKED) != 0) { periph-flags |= CAM_PERIPH_LOCK_WANTED; if ((error = mtx_sleep(periph, periph-sim-mtx, priority, caplck, 0)) != 0) { cam_periph_release_locked(periph); return (error); } } periph-flags |= CAM_PERIPH_LOCKED; return (0); } The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So that tells me that one of several things is going on: - There is a path in the cd(4) driver where it can call cam_periph_hold() but not cam_periph_unhold(). - There is another thread in the system that has called cam_periph_hold(), and has gotten stuck before it can call cam_periph_unhold(). - The hold/unhold logic is broken, and there is a case where a thread waiting for the lock can miss the wakeup. After looking at the code, I don't think this is the case, but I may have missed something. So it is probably one of the first two cases. From the dmesg, I only see cd1 listed, not cd0. So it is possible that cd0 is stuck in the probe code somewhere, and the geom code just gets stuck trying to open it when the probe hasn't completed. Seeing the stack trace for the taskq thread that is running on CPU 0 (process 100014) might be enlightening, it's hard to say. That may or may not show the issue. It's possible that this issue is directly related to the commit in question; perhaps there is an error being returned that wasn't returned before and it isn't being handled right in the cd(4) driver. (The cd(4) driver wasn't touched in the commit.) It's also possible that the commit in question just changed the timing and your system is hitting a race that was there previously. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)
On Wed, Jun 22, 2011 at 00:49:34 +0400, Andrey Chernov wrote: On Tue, Jun 21, 2011 at 10:17:19AM -0600, Kenneth D. Merry wrote: ps alltrace show locks show msgbuf Hopefully that will give us something to start looking at... This would really work a lot better if there is any way to get a serial console on the machine. The above will produce a good bit of output, and would likely need a lot of pictures. Since we can't reproduce the problem here, some debugging help would be greatly appreciated. Sorry I have no serial console. Here are the photos. I remove very similar looking USB parts from 'ps' and 'alltrace', and very general parts from 'alltrace' always been there. I hope remaining info will be enough. USB hotplagging works at this stage, so no reason to look there. If it will be not enough, I'll upload whole series. Thanks for uploading all of the photos. That's a lot of work, but they are helpful... I think I see part of the problem, but not the whole problem: 'show lock' outputs nothing, it means no locks just sleep somewhere forever. 'ps': http://img43.imageshack.us/img43/1424/21062011001j.jpg http://img835.imageshack.us/img835/6607/21062011002.jpg http://img841.imageshack.us/img841/5401/21062011003.jpg 'alltrace': http://img864.imageshack.us/img864/6757/21062011004ya.jpg http://img542.imageshack.us/img542/4857/21062011005.jpg http://img828.imageshack.us/img828/823/21062011006.jpg http://img5.imageshack.us/img5/910/21062011007.jpg http://img7.imageshack.us/img7/4704/21062011008.jpg http://img848.imageshack.us/img848/5487/21062011009.jpg http://img641.imageshack.us/img641/2/21062011010.jpg http://img7.imageshack.us/img7/7946/21062011011.jpg http://img860.imageshack.us/img860/8185/21062011012.jpg http://img696.imageshack.us/img696/5276/21062011013.jpg These two are interesting: http://img825.imageshack.us/img825/1249/21062011014m.jpg http://img839.imageshack.us/img839/3791/21062011015.jpg It looks like the GEOM event thread is stuck inside the cd(4) driver. The cd(4) driver is trying to acquire the peripheral lock, and is sleeping until it gets it. What isn't clear is who is holding it. The ps output shows an idle thread running on CPU 1, and thread 100014 (taskq) running on CPU 0. Unfortunately I don't see a stack trace for that. (I might have missed it.) Do you happen to have the image with the stack trace for that thread? http://img594.imageshack.us/img594/1773/21062011016.jpg http://img109.imageshack.us/img109/9937/21062011017.jpg http://img51.imageshack.us/img51/6047/21062011018l.jpg 'show msgbuf': http://img59.imageshack.us/img59/46/21062011019.jpg http://img189.imageshack.us/img189/483/21062011020.jpg http://img19.imageshack.us/img19/8163/21062011021.jpg http://img683.imageshack.us/img683/3171/21062011022.jpg http://img819.imageshack.us/img819/5923/21062011023.jpg http://img692.imageshack.us/img692/3789/21062011024.jpg http://img580.imageshack.us/img580/1550/21062011025.jpg http://img560.imageshack.us/img560/7478/21062011026.jpg http://img94.imageshack.us/img94/9371/21062011027.jpg http://img857.imageshack.us/img857/5185/21062011028.jpg Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)
On Mon, Jun 20, 2011 at 15:46:56 +0400, Andrey Chernov wrote: On Mon, Jun 20, 2011 at 11:01:46AM +0300, Kostik Belousov wrote: On Mon, Jun 20, 2011 at 11:02:22AM +0400, Andrey Chernov wrote: On Sun, Jun 19, 2011 at 08:15:43PM -0600, Justin T. Gibbs wrote: On 6/19/11 6:19 PM, Andrey Chernov wrote: Exactly that commit is responsible for boot hang. Please fix. BTW, I have MBR on SATA disk (CAM emulated), ICH9. Since it works for me, you'll need to provide more information. Can you at least drop into kdb to determine the likely source of the hang by getting a stack trace of all processes to see where they are sleeping and dumping lock information? I drop into DDB and put 'bt' console photo in the very first message of this thread - nothing unusual seen in the main stack. Could you please specify exact DDB commands you want to be issued by me? No dump can be provided since nothing is mounted yet including swap, BTW, I remember I saw previously unseen warnings with post Jun 14 kernels: xpt_action_default: CCB type 0xe not supported 'ps' inside DDB shows [xpt_thrd] at ccb_scan wmesg state and [g_event] at caplck wmesg state, [kernel] at g_waitid state. Even don't know, if it matters. Just in case, please try r223277. As the second message in the thread states, I try first even 223296 with the same hang and the same xpt_action_default: CCB type 0xe not supported As I think, DDB's 'ps' indicates that kernel waits something from geom and geom waits something from ccb_scan forever, just raw guess. I will be glad to issue more specific DDB commands and upload corresponding photos. BTW, pluging and unplugging USB devides works in that stage. Can you do the following when the hang happens: ps alltrace show locks show msgbuf Hopefully that will give us something to start looking at... This would really work a lot better if there is any way to get a serial console on the machine. The above will produce a good bit of output, and would likely need a lot of pictures. Since we can't reproduce the problem here, some debugging help would be greatly appreciated. Thanks, Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: message buffer scrambling fix
On Sat, May 28, 2011 at 11:26:50 -0700, Julian Elischer wrote: On 5/27/11 3:45 PM, Kenneth D. Merry wrote: Hey folks, I have attached some patches to the kernel message buffer code (this affects dmesg(8) output as well as kernel messages that go to the syslog) to address log scrambling. This fixes the same issue that 'options PRINTF_BUFR_SIZE=128' fixes for the console. The problem is that you can have multiple kernel threads writing to the message buffer at the same time, and so their characters will get interleaved. All of the characters will get in there, because they're written with atomic operations, but the output might looked scrambled. So the fix is to use the same stack buffer that is used for the console output (so the stack size doesn't increase), and use a spin lock instead of atomic operations to insert the string into the message buffer. The result is that dmesg and syslog output should look the same as the console output. As long as individual kernel prints fit in the printf buffer size, they will be put into the message buffer atomically. I also fixed a couple of other long-standing issues. putcons() (in subr_prf.c) was adding a carriage return before calling cnputs(). But cnputs() calls cnputc(), which adds a carriage return before every newline. So much of the console output (the part that came from putcons() at least) had two carriage returns at the end. The other issue was that log_console() was inserting a newline for any console write that didn't already have one at the end. The issue with that can be seen if you do a 'dmesg -a' and compare that to the console output. You'll see something like this on the console: Updating motd:. But this in dmesg -a: Updating motd: . That is because Updating motd: is written first, log_console() appends a newline, and then .\n is written. I added a loader tunable and sysctl to turn the old behavior back on (kern.log_console_add_linefeed) if you want the old behavior, but I think we should be able to safely remove it. Also, the new msgbuf_addstr() function allows the caller to optionally ask for carriage returns to be stripped out. However, in my testing I haven't seen any carriage returns to strip. Let me know if you have any comments. I'm planning to check this into head next week. looks good.. as long as we don't end up with the behaviour that I think I see on Linux (it's hard to tell sometimes) where the last message (the one you really want to see) doesn't make it out. Everything passed into the kernel printf() call should make it out to the console, message buffer, etc. before the printf call completes. The only way that wouldn't happen is if spin locks break for some reason. One thing I forgot to mention is that I think the PRINTF_BUFR_SIZE option should be made non-optional. Even on smaller embedded machines, I think we should be able to afford the 128 bytes of stack space to keep messages from getting scrambled. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
message buffer scrambling fix
Hey folks, I have attached some patches to the kernel message buffer code (this affects dmesg(8) output as well as kernel messages that go to the syslog) to address log scrambling. This fixes the same issue that 'options PRINTF_BUFR_SIZE=128' fixes for the console. The problem is that you can have multiple kernel threads writing to the message buffer at the same time, and so their characters will get interleaved. All of the characters will get in there, because they're written with atomic operations, but the output might looked scrambled. So the fix is to use the same stack buffer that is used for the console output (so the stack size doesn't increase), and use a spin lock instead of atomic operations to insert the string into the message buffer. The result is that dmesg and syslog output should look the same as the console output. As long as individual kernel prints fit in the printf buffer size, they will be put into the message buffer atomically. I also fixed a couple of other long-standing issues. putcons() (in subr_prf.c) was adding a carriage return before calling cnputs(). But cnputs() calls cnputc(), which adds a carriage return before every newline. So much of the console output (the part that came from putcons() at least) had two carriage returns at the end. The other issue was that log_console() was inserting a newline for any console write that didn't already have one at the end. The issue with that can be seen if you do a 'dmesg -a' and compare that to the console output. You'll see something like this on the console: Updating motd:. But this in dmesg -a: Updating motd: . That is because Updating motd: is written first, log_console() appends a newline, and then .\n is written. I added a loader tunable and sysctl to turn the old behavior back on (kern.log_console_add_linefeed) if you want the old behavior, but I think we should be able to safely remove it. Also, the new msgbuf_addstr() function allows the caller to optionally ask for carriage returns to be stripped out. However, in my testing I haven't seen any carriage returns to strip. Let me know if you have any comments. I'm planning to check this into head next week. Thanks, Ken -- Kenneth Merry k...@freebsd.org Index: sys/kern/subr_msgbuf.c === --- sys/kern/subr_msgbuf.c (revision 222390) +++ sys/kern/subr_msgbuf.c (working copy) @@ -31,8 +31,16 @@ #include sys/param.h #include sys/systm.h +#include sys/lock.h +#include sys/mutex.h #include sys/msgbuf.h +/* + * Maximum number conversion buffer length: uintmax_t in base 2, plus + * around the priority, and a terminating NUL. + */ +#defineMAXPRIBUF (sizeof(intmax_t) * NBBY + 3) + /* Read/write sequence numbers are modulo a multiple of the buffer size. */ #define SEQMOD(size) ((size) * 16) @@ -51,6 +59,9 @@ mbp-msg_seqmod = SEQMOD(size); msgbuf_clear(mbp); mbp-msg_magic = MSG_MAGIC; + mbp-msg_lastpri = -1; + mbp-msg_needsnl = 0; + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); } /* @@ -80,6 +91,11 @@ } msgbuf_clear(mbp); } + + mbp-msg_lastpri = -1; + /* Assume that the old message buffer didn't end in a newline. */ + mbp-msg_needsnl = 1; + mtx_init(mbp-msg_lock, msgbuf, NULL, MTX_SPIN); } /* @@ -110,28 +126,143 @@ } /* - * Append a character to a message buffer. This function can be - * considered fully reentrant so long as the number of concurrent - * callers is less than the number of characters in the buffer. - * However, the message buffer is only guaranteed to be consistent - * for reading when there are no callers in this function. + * Add a character into the message buffer, and update the checksum and + * sequence number. + * + * The caller should hold the message buffer spinlock. */ +static inline void +msgbuf_do_addchar(struct msgbuf *mbp, u_int *seq, int c) +{ + u_int pos; + + /* Make sure we properly wrap the sequence number. */ + pos = MSGBUF_SEQ_TO_POS(mbp, *seq); + + mbp-msg_cksum += (u_int)c - + (u_int)(u_char)mbp-msg_ptr[pos]; + + mbp-msg_ptr[pos] = c; + + *seq = MSGBUF_SEQNORM(mbp, *seq + 1); +} + +/* + * Append a character to a message buffer. + */ void msgbuf_addchar(struct msgbuf *mbp, int c) { - u_int new_seq, pos, seq; + mtx_lock_spin(mbp-msg_lock); - do { - seq = mbp-msg_wseq; - new_seq = MSGBUF_SEQNORM(mbp, seq + 1); - } while (atomic_cmpset_rel_int(mbp-msg_wseq, seq, new_seq) == 0); - pos = MSGBUF_SEQ_TO_POS(mbp, seq); - atomic_add_int(mbp-msg_cksum, (u_int)(u_char)c - - (u_int)(u_char)mbp-msg_ptr[pos]); - mbp-msg_ptr[pos] = c; + msgbuf_do_addchar(mbp, mbp-msg_wseq, c); + + mtx_unlock_spin(mbp-msg_lock); } /* + * Append a NUL-terminated string with a priority to a message
Re: multiple issues with devstat_*(9)
On Thu, Apr 07, 2011 at 13:59:35 +0300, Alexander Motin wrote: Alexander Best wrote: On Fri Apr 1 11, John Baldwin wrote: On Thursday, March 31, 2011 6:33:39 pm Alexander Best wrote: i think there are multiple issues with devstat. i found the following in devicestat.h: ... funny thing is i found the following in scsi_pass.c: softc-device_stats = devstat_new_entry(pass, periph-unit_number, 0, DEVSTAT_NO_BLOCKSIZE | (no_tags ? DEVSTAT_NO_ORDERED_TAGS : 0), softc-pd_type | DEVSTAT_TYPE_IF_SCSI | DEVSTAT_TYPE_PASS, DEVSTAT_PRIORITY_PASS); ...so pass* *should* show up under iostat -t scsi. As I can see, this is a bug (or feature) of the libdevstat / devstat_selectdevs(). If you specify any -t, then pass devices will be reported only if you request pass specifically. Hmm, pass devices for adaX should not be SCSI though, they should be ide I think. i think the situation with ATA_CAM should be discussed further. still besides this issue there are many more with devstat(3). i'll try to track all the devstat_new_entry() occurrences and see if some issues can be fixed. maybe only the proper DEVSTAT_* args were forgotten. Assuming that SCSI and IDE in -t option means transport type, and assuming that we count everything except ATA and SATA as SCSI, I've made following patch, that should fix issues from the CAM side: http://people.freebsd.org/~mav/cam.devstat.patch Any objections? Or SCSI/IDE there expected to mean command set? For what it's worth, I think the above patch is the right approach. The device type stuff in devstat has been broken since GEOM went in, so I'm glad to see you step up to fix it! Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: multiple issues with devstat_*(9)
On Sun, Apr 10, 2011 at 23:19:31 +0300, Alexander Motin wrote: Alexander Best wrote: On Thu Apr 7 11, Alexander Motin wrote: Alexander Best wrote: On Fri Apr 1 11, John Baldwin wrote: On Thursday, March 31, 2011 6:33:39 pm Alexander Best wrote: i think there are multiple issues with devstat. i found the following in devicestat.h: ... funny thing is i found the following in scsi_pass.c: softc-device_stats = devstat_new_entry(pass, periph-unit_number, 0, DEVSTAT_NO_BLOCKSIZE | (no_tags ? DEVSTAT_NO_ORDERED_TAGS : 0), softc-pd_type | DEVSTAT_TYPE_IF_SCSI | DEVSTAT_TYPE_PASS, DEVSTAT_PRIORITY_PASS); ...so pass* *should* show up under iostat -t scsi. As I can see, this is a bug (or feature) of the libdevstat / devstat_selectdevs(). If you specify any -t, then pass devices will be reported only if you request pass specifically. Hmm, pass devices for adaX should not be SCSI though, they should be ide I think. i think the situation with ATA_CAM should be discussed further. still besides this issue there are many more with devstat(3). i'll try to track all the devstat_new_entry() occurrences and see if some issues can be fixed. maybe only the proper DEVSTAT_* args were forgotten. Assuming that SCSI and IDE in -t option means transport type, and assuming that we count everything except ATA and SATA as SCSI, I've made following patch, that should fix issues from the CAM side: http://people.freebsd.org/~mav/cam.devstat.patch with your patch i get the following output: otaku% iostat -t ide ttyada0 ada1 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 6 144 14.21 6 0.09 20.46 40 0.81 2 0 3 0 95 otaku% iostat -t scsi tty cd0 cpu tin tout KB/t tps MB/s us ni sy in id 6 146 2.32 0 0.00 2 0 3 0 95 otaku% iostat -t pass tty pass0pass1pass2 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 6 147 0.36 0 0.00 0.36 0 0.00 0.00 0 0.00 2 0 3 0 95 otaku% iostat -t da ttyada0 ada1 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 6 147 14.21 6 0.08 20.46 37 0.75 1 0 3 0 95 otaku% iostat -t cd tty cd0 cpu tin tout KB/t tps MB/s us ni sy in id 7 147 2.32 0 0.00 1 0 3 0 95 otaku% iostat -t other ttycpu tin tout us ni sy in id 7 149 1 0 3 0 95 otaku% iostat -n 100 ttyada0 ada1 cd0 pass0pass1pass2 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 6 135 14.21 5 0.07 20.44 32 0.64 2.32 0 0.00 0.36 0 0.00 0.36 0 0.00 0.00 0 0.00 1 0 3 0 96 the the remaining issues imho are: 1) ada* and cd* are SATA/ATA devices. so i think they should show up together either under ide *or* scsi. i don't have any *real* scsi devices. I've just retested the patch and haven't reproduced your problem: %iostat -d da0 ada0 da1 cd0 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 0.01 0 0.00 3.27 1 0.00 2.65 1 0.00 0.00 0 0.00 %iostat -d -t ide da0 ada0 cd0 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 0.01 0 0.00 3.27 1 0.00 0.00 0 0.00 %iostat -d -t scsi da1 KB/t tps MB/s 2.65 1 0.00 %iostat -d -t pass pass0pass1pass2pass3 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 %iostat -d -t ide,pass pass0pass1pass2 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 %iostat -d -t scsi,pass pass3 KB/t tps MB/s 0.00 0 0.00 da0 is an PATA ATAPI ZIP, da1 - USB floppy, ada0 - SATA HDD, cd0 - PATA ATAPI CD-ROM. Just an idea, aren't you are using legacy ata(4) + atapicam for your cd0? atapicam lies that it's buses are SPI (SCSI). 2) the pass* devices still don't show up under ide/scsi/other. that's ok, but then the src comments and manual pages need to be changed accordingly. As I have told - it is a bug/feature of libdevstat. It should not be difficult to fix, if it is really a bug.
Re: multiple issues with devstat_*(9)
On Mon, Apr 11, 2011 at 19:09:00 +0300, Alexander Motin wrote: On 11.04.2011 18:43, Kenneth D. Merry wrote: On Sun, Apr 10, 2011 at 23:19:31 +0300, Alexander Motin wrote: Alexander Best wrote: 2) the pass* devices still don't show up under ide/scsi/other. that's ok, but then the src comments and manual pages need to be changed accordingly. As I have told - it is a bug/feature of libdevstat. It should not be difficult to fix, if it is really a bug. That was intentional, if I can remember what I intended in 1997/1998. The reason was that since there is one passthrough device created for every device that CAM manages, you don't want to show pass(4) devices when the user says 'iostat -t scsi'. Otherwise he might get all pass(4) devices, depending on the order of devices in the system. But, if it's pass(4) devices you want, you can ask for them specifically, or for all SCSI/IDE pass(4) devices, as mav did above. But it is impossible to get, for example, all SCSI devices including pass. Either only non-pass, or pass only. It is strange that if I won't specify -t (most probable for inexperienced users), I'll gel all devices including pass, but if specify -t scsi (as more advanced user who knows what to ask), I'll get only non-pass. It is at least inconsistent. Perhaps it is somewhat inconsistent, and we should do some filtering by default to not show pass(4) devices. The idea was that in most cases, people will not want to see the pass(4) devices. That is not where most of the I/O typically happens. If they want to see the pass(4) devices, they can ask for them specifically by type or by name. When I have a system full of drives and I want to look at one particular pass(4) device, I always specify it manually, e.g.: 'iostat -d pass4 1' Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
LSI 6Gb SAS driver committed
I sent this out to the -scsi list earlier today. Testers would be appreciated for the 6Gb LSI SAS driver. Please follow up to me or the -scsi list. Thanks, Ken - Forwarded message from Kenneth D. Merry k...@freebsd.org - Date: Fri, 10 Sep 2010 09:04:38 -0600 From: Kenneth D. Merry k...@freebsd.org To: s...@freebsd.org Subject: LSI 6Gb SAS driver committed Hey folks, I have commited the mps driver (LSI Logic 6Gb SAS controller driver) to the FreeBSD perforce server (//depot/projects/mps/... and FreeBSD-current. The driver works with SAS and SATA drives, directly attached or attached through expanders. Basic error recovery works as well (i.e. timeouts and aborts). There are some known issues, including: - No support for integrated RAID (IR) arrays. - Devices tend to disappear and come back in one of my configurations. I also see some phantom devices, and events that don't make sense: mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 (da2:mps0:0:6:0): SCSI command timeout on device handle 0x0017 SMID 90 mps0: mpssas_abort_complete: abort request on handle 0x17 SMID 90 complete mps0: Unhandled event 0x0 (probe2:mps0:0:2:0): AutoSense failed mps0: Unhandled event 0x0 (da10:mps0:0:0:0): unsupportable block size 0 (da10:mps0:0:0:0): lost device (da10:mps0:0:0:0): removing device entry (da2:mps0:0:6:0): lost device (da2:mps0:0:6:0): removing device entry da2 at mps0 bus 0 scbus0 target 6 lun 0 da2: ATA ST3160023AS 3.05 Fixed Direct Access SCSI-5 device da2: 150.000MB/s transfers da2: Command Queueing enabled da2: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C) mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 mps0: Unhandled event 0x0 - Sometimes you'll run into a device that fails part of the probe on boot, and you'll end up running into the run_interrupt_driven_config_hooks timeout. You see some aborts during probe, and then the 5 minute probe timeout kicks in and panics the kernel. For instance: (probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 81 mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 81 complete run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config (probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 214 mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 214 complete run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_config (probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 281 mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 281 complete run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_config (probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 348 mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 348 complete run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_config (probe4:mps0:0:20:0): SCSI command timeout on device handle 0x0012 SMID 415 mps0: mpssas_abort_complete: abort request on handle 0x12 SMID 415 complete panic: run_interrupt_driven_config_hooks: waited too long cpuid = 0 KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at kdb_enter+0x3d: movq$0,0x4c70b0(%rip) db - ioctl support isn't complete, and there is no userland utility. - There is no man page. The driver is in the tree at this point to allow people to test it out, report any problems, and hopefully contribute bug fixes. LSI has some developers working on this driver, and we hope to get them to put some of their work-in-progress in the FreeBSD Perforce repo. So, in view of that, if you make any changes to the driver, please make them in the FreeBSD Perforce repository first (in //depot/projects/mps/...) and then merge them into FreeBSD-current. Thanks to Scott Long for writing the driver, and to Yahoo and Spectra Logic for sponsoring the work. Ken -- Kenneth Merry k...@freebsd.org - End forwarded message - -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
HEADS UP: CAM error recovery change
I checked in a change to the CAM error recovery code that will hopefully have a positive effect on systems with CDROM drives that were taking a while to probe. Anyway, try this out and let me know if there are any regressions. Thanks, Ken - Forwarded message from Kenneth D. Merry [EMAIL PROTECTED] - From: Kenneth D. Merry [EMAIL PROTECTED] Date: Sun, 26 Oct 2003 22:15:55 -0800 (PST) To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: cvs commit: src/sys/cam cam_periph.c src/sys/cam/scsi scsi_cd.c ken 2003/10/26 22:15:55 PST FreeBSD src repository Modified files: sys/cam cam_periph.c sys/cam/scsi scsi_cd.c Log: In camperiphdone(), make sure we check for fatal errors and bail out instead of retrying them blindly. This should fix some of the problems people have been having with cdrom drives taking a long time to probe. This should also eliminate the need for the initial TUR in cdsize(). cam_periph.c: Don't keep retrying if the error we get back is a fatal error. This should help us detect the transition from Logical unit not ready, cause not reportable to Medium not present in the TUR many handler. (The TUR many handler gets triggered for Logical unit not ready, cause not reportable errors.) scsi_cd.c: Remove the initial test unit ready in cdsize(). Hopefully it isn't necessary after the above change. Submitted by: gibbs (mostly) Tested by: peter MFC After: 2 weeks Revision ChangesPath 1.55 +17 -2 src/sys/cam/cam_periph.c 1.88 +0 -14 src/sys/cam/scsi/scsi_cd.c - End forwarded message - -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cd0 errors during probe?
On Sun, Oct 12, 2003 at 10:26:54 -0700, Steve Kargl wrote: Can I assume that the following error messages are erronous because cd0 appears to function without any problems? There is a CD in the drive. cd0 at ahc0 bus 0 target 4 lun 0 cd0: TOSHIBA CD-ROM XM-6401TA 1001 Removable CD-ROM SCSI-2 device cd0: 10.000MB/s transfers (10.000MHz, offset 15) cd0: cd present [129875 x 2048 byte records] (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error (cd0:ahc0:0:4:0): SCSI Status: Check Condition (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0 (cd0:ahc0:0:4:0): Illegal mode for this track (cd0:ahc0:0:4:0): Retrying Command (per Sense Data) (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error (cd0:ahc0:0:4:0): SCSI Status: Check Condition (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0 (cd0:ahc0:0:4:0): Illegal mode for this track (cd0:ahc0:0:4:0): Retrying Command (per Sense Data) (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error (cd0:ahc0:0:4:0): SCSI Status: Check Condition (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0 (cd0:ahc0:0:4:0): Illegal mode for this track (cd0:ahc0:0:4:0): Retrying Command (per Sense Data) Looks like GEOM is trying to read the first sector of the CD, but since it's likely an audio CD, it doesn't quite work. Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: cd0 errors during probe?
On Sun, Oct 12, 2003 at 14:29:38 -0700, Steve Kargl wrote: On Sun, Oct 12, 2003 at 02:51:50PM -0600, Kenneth D. Merry wrote: On Sun, Oct 12, 2003 at 10:26:54 -0700, Steve Kargl wrote: Can I assume that the following error messages are erronous because cd0 appears to function without any problems? There is a CD in the drive. cd0 at ahc0 bus 0 target 4 lun 0 cd0: TOSHIBA CD-ROM XM-6401TA 1001 Removable CD-ROM SCSI-2 device cd0: 10.000MB/s transfers (10.000MHz, offset 15) cd0: cd present [129875 x 2048 byte records] (cd0:ahc0:0:4:0): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (cd0:ahc0:0:4:0): CAM Status: SCSI Status Error (cd0:ahc0:0:4:0): SCSI Status: Check Condition (cd0:ahc0:0:4:0): BLANK CHECK asc:64,0 (cd0:ahc0:0:4:0): Illegal mode for this track (cd0:ahc0:0:4:0): Retrying Command (per Sense Data) Looks like GEOM is trying to read the first sector of the CD, but since it's likely an audio CD, it doesn't quite work. Yes, it is an audio CD. I suspected that it was a transient GEOM/CAM issue, but wanted to make sure before I needlessly replaced the cdrom drive. There's nothing wrong with your drive, most likely. I suppose it plays audio CDs and reads data CDs okay? If so, it's nothing to worry about. Back when the cd(4) driver used the old slice code, it had a function, cdfirsttrackisdata(), that figured out whether the first track was an audio or data track. It would set the flags in the disk structure accordingly to tell the slice code whether or not to attempt to read a disklabel from the CD. The code in -stable still works that way. My guess is that we need something similar again to tell GEOM not to attempt to read the first sector of the CD when it's not a data CD. Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ath(4) driver problems with WEP...
On Fri, Sep 19, 2003 at 08:22:13 -0700, Sam Leffler wrote: Hmm. One other thing I'm seeing is that when I configure a 128 bit key with ifconfig or wicontrol (wicontrol shows all 28 characters -- 0x plus 26 hex characters), ifconfig still thinks it is a 104 bit key. This is because ireq.i_len is 13. You must not have the up to date ifconfig. This was the behaviour from before. I believe the mechanism that wicontrol uses to fetch keys does not handle 104-bit keys so you see the zero-padded key string. I rebuilt ifconfig from the sources you checked in, although I didn't do a full buildworld. Does it depend on a library change somewhere? The key I got from wicontrol wasn't zero-padded. It showed all 26 hex characters. In a separate issue, the ath(4) driver can't see the 802.11a side of the wireless router at all when it is running in 108Mbps turbo mode. If I drop it down to 54Mbps, it sees it. (Works fine in Windows.) Is the ath(4) driver supposed to support the 108Mbps turbo mode? I was able to associate with an Atheros AP with turbo mode enabled but didn't get any higher throughput. I'm investigating this. FWIW I enabled turbo mode with: ifconfig ath0 mediaopt turbo I had to also set the mode to 11a before it wanted to accept the turbo option. Otherwise I got: ifconfig: SIOCSIFMEDIA (mediaopt): Device not configured Ah, yes. Turbo mode is orthogonal to 11a/b/g. You can use it with 11g too so you need to specifiy 11a or 11g. I was already locked in 11a mode. (But note that 11g+turbo is not yet supported by the driver.) Ahh. I take it there's no way for the driver/card to autodetect that there's a turbo network around and attach to it? (The Windows driver seems to find it..) Then I typed: # ifconfig ath0 mode 11a mediaopt turbo atalk 0.0 range 0-0 phase 2 Does it think I'm doing appletalk or something? Hmm, didn't see this, will have to check. It seems to see the base station in turbo mode now, but I'm still getting the authentication failed (reason 13) errors. Are you using WEP? As I explained WEP doesn't work right now. Yeah, I know. I need to see if I can get an IPSec tunnel running to the router. My guess is that it will probably only want to talk IPSec over its internet port. Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ath(4) driver problems with WEP...
On Wed, Sep 17, 2003 at 12:43:08 -0700, Sam Leffler wrote: I've got a Netgear WAG511 (Atheros 5212-based card) and a Netgear FWAG114 wireless router. I've been trying to get the card and the router talking under FreeBSD. (Both 802.11a and 802.11g work fine under Windows on the same machine.) I'm using -current from September 15th. Anyway, whenever I try to get the card talking to the router, which is running WEP (128 bit keys) on both the a and b/g sides, I get: ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] Here's what the ifconfig looks like: ath0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 ether [ card mac address ] media: IEEE 802.11 Wireless Ethernet autoselect mode 11a (OFDM/6Mbps) status: no carrier ssid [my ssid] 1:[my ssid] channel -1 authmode OPEN powersavemode OFF powersavesleep 100 wepmode MIXED weptxkey 1 wepkey 1:128-bit wepkey 2:128-bit wepkey 3:128-bit wepkey 4:128-bit I've verified and re-verified, via cut-and-paste from the router setup screen, that the WEP key is correct. Good news+bad news. I just committed a fix to ifconfig to correctly handle 128-bit WEP keys. I'm not sure how you thought you were setting your key up but ifconfig was barfing on anything more than 104 bits. FWIW ifconfig wrongly indicated keys 5 bytes (40 bits) were 128-bit keys; I also fixed that so ifconfig now indicates keys are 40-, 104-, or 128-bit according to their length. Beware also that wicontrol displays WEP keys longer than 104 bits zero-padded; I believe this is because of limitations in the RID API for fetching keys. Someone else may want to investigate that issue. The bad news is that with 128-bit keys installed I'm getting decryption errors at the AP. Actually, I'm seeing errors for any length key so it's likely a botch in the WEP frame construction in the driver. I've run out of time to look at this right now and will have to investigate later. Hmm. One other thing I'm seeing is that when I configure a 128 bit key with ifconfig or wicontrol (wicontrol shows all 28 characters -- 0x plus 26 hex characters), ifconfig still thinks it is a 104 bit key. This is because ireq.i_len is 13. Anyway, I can't get the ath(4) driver to talk to the router when it is running WEP. I have been able to get it to talk 802.11g to the router without WEP enabled, though. I tried setting the authmode to shared via ifconfig, but from looking at ieee80211_ioctl.c: # if 0 case IEEE80211_IOC_AUTHMODE: sc-wi_authmode = ireq-i_val; break; # endif i.e. I get EINVAL back. Is WEP supposed to work in -current? authmode is not relevant. WEP worked at one time; I seem to have broken it. As I said above I will have to look at it later. Okay. In a separate issue, the ath(4) driver can't see the 802.11a side of the wireless router at all when it is running in 108Mbps turbo mode. If I drop it down to 54Mbps, it sees it. (Works fine in Windows.) Is the ath(4) driver supposed to support the 108Mbps turbo mode? I was able to associate with an Atheros AP with turbo mode enabled but didn't get any higher throughput. I'm investigating this. FWIW I enabled turbo mode with: ifconfig ath0 mediaopt turbo I had to also set the mode to 11a before it wanted to accept the turbo option. Otherwise I got: ifconfig: SIOCSIFMEDIA (mediaopt): Device not configured Then I typed: # ifconfig ath0 mode 11a mediaopt turbo atalk 0.0 range 0-0 phase 2 Does it think I'm doing appletalk or something? It seems to see the base station in turbo mode now, but I'm still getting the authentication failed (reason 13) errors. I verified turbo mode was in use by disabling it on either station or AP side and with things mismatched the station/AP couldn't see each other. With turbo mode enabled on each side I was able to associate and communicate as normal; but netperf throughput was identical to the non-turbo setup. I'm asking Atheros folks for clarification on this--I may need to do some additional setup work to enable turbo operation. This is actually the first time I've tried turbo mode... Ahh. Thanks! Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
ath(4) driver problems with WEP...
I've got a Netgear WAG511 (Atheros 5212-based card) and a Netgear FWAG114 wireless router. I've been trying to get the card and the router talking under FreeBSD. (Both 802.11a and 802.11g work fine under Windows on the same machine.) I'm using -current from September 15th. Anyway, whenever I try to get the card talking to the router, which is running WEP (128 bit keys) on both the a and b/g sides, I get: ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] ath0: authentication failed (reason 13) for [ base station MAC address ] Here's what the ifconfig looks like: ath0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 ether [ card mac address ] media: IEEE 802.11 Wireless Ethernet autoselect mode 11a (OFDM/6Mbps) status: no carrier ssid [my ssid] 1:[my ssid] channel -1 authmode OPEN powersavemode OFF powersavesleep 100 wepmode MIXED weptxkey 1 wepkey 1:128-bit wepkey 2:128-bit wepkey 3:128-bit wepkey 4:128-bit I've verified and re-verified, via cut-and-paste from the router setup screen, that the WEP key is correct. Anyway, I can't get the ath(4) driver to talk to the router when it is running WEP. I have been able to get it to talk 802.11g to the router without WEP enabled, though. I tried setting the authmode to shared via ifconfig, but from looking at ieee80211_ioctl.c: #if 0 case IEEE80211_IOC_AUTHMODE: sc-wi_authmode = ireq-i_val; break; #endif i.e. I get EINVAL back. Is WEP supposed to work in -current? In a separate issue, the ath(4) driver can't see the 802.11a side of the wireless router at all when it is running in 108Mbps turbo mode. If I drop it down to 54Mbps, it sees it. (Works fine in Windows.) Is the ath(4) driver supposed to support the 108Mbps turbo mode? Thanks, Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: scsi_cd or atapicam crash in current.
On Fri, Sep 12, 2003 at 08:57:22 -0700, Kevin Oberman wrote: I am seeing a peculiar, possibly timing sensitive, crash that looks like if is probably in either atapicam or scsi_cd. The system is CURRENT as of yesterday morning. The crash happens frequently when nautilus starts up. It does not always crash, but does so fairly frequently and leaves my laptop locked in X with no access to the console. If nautilus starts, the system continues without problems until X is terminated and restarted. I managed to get a panic printout by switching back to vty0 (console) while the X startup was in progress and I am entering the panic by hand. Slight chance of a typo, but I have checked it a couple of times. For some reason I can't explain, I didn't get a crash dump, but I probably can get one after a future crash. The easy fix is to remove the DVD/CD-RW drive. FWIW, the system is an IBM T30 and it happens with either APM or ACPI. I am attaching the dmesg and the config file. Hopefully the mailer won't strip them. Fatal trap 18: integer divide fault while in kernel mode instruction pointer = 0x8:oxc0139a8b stack pointer = 0x10:0xdd5b6a38 frame pointer = 0x10:0xdd5b6a80 code segment= base 0x0, limit 0x, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= Interrupt enabled, resume, IOPL = 0 current process = 737 (nautilus) kernel: type 18 trap, code 0 Stopped at cdstart+0xcb: divl0x30(%ebx), %eax db tr cdstart(c419d500,c4192000,1,c407cc30,c407cc00) at cdstart+0xcb xpt_run_dev_allocq(c40b8c00,c407cc08,1,c418d800,c419d500) at xpt_run_dev_allocq+0xab xpt_schedule(c419d500,1,0,ce54ec78,dd5b6c70) at xpt_schedule+0xca cdstrategy(ce54ec78,0,0,0,d439f000) at cdstrategy+0x88 physio(c4197700,dd5b6c70,10,dd5b6b78,c03f4900) at physio+0x2df spec_read(dd5b6bd0,dd5b6c20,c02b35e3,dd5b6bd0,1020002) at spec_read+0x19a spec_vnoperate(dd5b6bd0,1020002,c470c850,0,dd5b6c70) at spec_vnoperate+0x18 vn_read(c489d8c4,dd5b6c70,c478ee00,0,c470c850) at vn_read+0x1a3 dofileread(c470c850,c489d8c4,12,bfbfeb40,800) at dofileread+0xd9 read(c470c850,dd5b6d10,c,c,3) at read+0x6b syscall(2f,2f,2f,80cb000,0) at syscall+0x2b0 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (3, FreeBSD ELF32, read), eip = 0x28da2b5f, esp = 0xbfbfeadc,ebp = 0xbfbfeb08 --- db Other folks have reported seeing bogus values returned from read capacity for atapicam-attached driver. I've seen it on my laptop as well (which runs -current). (Only since the ATAng code went in. It worked fine before.) cdstart() uses the blocksize to try to figure out the LBA to pass to the SCSI read or write commands, so that's likely what's causing the integer divide fault. What does dmesg say about the size of the disk in the drive? Do you have a CD in the drive? What happens when you do: camcontrol cmd cd0 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4 That should give you the media size and blocksize of the CD in the drive, or an error if you don't have any media. If you're getting bogus values for the media/blocksize, or if it says there's a disk there when there isn't one, then you've got a problem either with the ATAPI or atapicam code. Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: wi0: cardbus card activation failed
On Thu, Sep 04, 2003 at 10:02:10 -0600, M. Warner Losh wrote: In message: [EMAIL PROTECTED] [EMAIL PROTECTED] writes: : Hello, : I have same problem as http://lists.freebsd.org/pipermail/freebsd-current/2003-August/008948.html, : but with other PCMCIA card - new Proxim Orinoco. : When I plug it in, I get: : : cardbus0: network, ethernet at device 0.0 (no driver attached) : cbb0: CardBus card activation failed : : I use kernel from yestereday, pccbb.c ver 1.95. You need a driver. This message says that none attached. Maybe FreeBSD doesn't support this chip yet? Maybe it is supported by the atheros driver (ath). I have the same problem with my fxp card, although it isn't because I don't have a driver or I don't have the card attached. If I pull it out and re-insert it, it probes. When I boot: cardbus0: Resource not specified in CIS: id=10, size=2000 cardbus0: network, ethernet at device 0.0 (no driver attached) cbb0: CardBus card activation failed After re-inserting it: fxp0: Intel 82559ER Pro/100 Ethernet port 0x1000-0x103f mem 0xf602-0xf603,0xf604-0xf6040fff irq 11 at device 0.0 on cardbus0 fxp0: Ethernet address 00:03:47:49:82:2b miibus0: MII bus on fxp0 inphy0: i82555 10/100 media interface on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto (This is -current from August 16th.) Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: make buildworld errors (libcam)
On Wed, Sep 03, 2003 at 03:03:04 -0700, Don Lewis wrote: On 3 Sep, Michael Bretterklieber wrote: Hi, buildworld fails (cvsup some minutes ago): In file included from /usr/src/sys/cam/scsi/scsi_da.c:51: /usr/src/sys/sys/taskqueue.h:33:2: #error no user-servicable parts inside mkdep: compile failed The following patch works for me: Ack! Sorry about that! Pass the pointy hat... It's fixed now. Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: need some debugging help
On Sun, Aug 31, 2003 at 12:52:47 +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Kenneth D. Merry writes: Anyway, I got some debugging output, and I've attached dmesg output. Let me know whether anything in there looks suspicious or points to a possible problem. There's nothing which jumps out at me, and I guess the best strategy is hunting down the devbuf thing by changing all users of M_DEVBUF until something trips... Thanks. That did the trick. As it turns out, it was a one-line problem in the da(4) patches that was causing the problem. Anyway, that's fixed, and things seem to work fine. I've attached a new version of the patches. I'll try to come up with a -stable version that'll fix things there as well. If anyone wants to take a look at the way I'm using mutexes here, especially in the new taskqueue thread, I'd appreciate it. In particular, I went through some interesting permutations in taskqueue_kthread() to make things work right: - I tried holding Giant when calling tsleep, but it complained that I didn't own Giant. - I tried not holding a mutex at all when calling tsleep, but ran into this assert in msleep(): KASSERT(timo != 0 || mtx_owned(Giant) || mtx != NULL, (sleeping without a mutex)); - I tried just holding a mutex all the time, but obviously you can't malloc while holding a mutex (except Giant), and the sysctl code does a number of mallocs. (The original cause of this problem -- M_WAITOK mallocs.) So in the end, I just acquire a mutex, drop it for taskqueue_run(), re-acquire it and and pass it into the msleep call so that it can drop it and re-acquire it for me. There's no other reason for it. The taskqueue stuff already has its own mutex that isn't exposed to taskqueue_run(), and it shouldn't be held anyway when the task's function is called. I also put code in the sysctl functions in the cd(4) and da(4) drivers to acquire Giant, since I'm assuming that the sysctl code needs it. Comments are welcome. Thanks, Ken -- Kenneth Merry [EMAIL PROTECTED] //depot/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c#39 - /usr/home/ken/perforce2/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c *** /tmp/tmp.319.0 Mon Sep 1 00:33:39 2003 --- /usr/home/ken/perforce2/FreeBSD-ken/src/sys/cam/scsi/scsi_cd.c Mon Sep 1 00:21:23 2003 *** *** 62,67 --- 62,68 #include sys/dvdio.h #include sys/devicestat.h #include sys/sysctl.h + #include sys/taskqueue.h #include cam/cam.h #include cam/cam_ccb.h *** *** 154,159 --- 155,161 eventhandler_tagclonetag; int minimum_command_size; int outstanding_cmds; + struct task sysctl_task; struct sysctl_ctx_list sysctl_ctx; struct sysctl_oid *sysctl_tree; STAILQ_HEAD(, cd_mode_params) mode_queue; *** *** 598,603 --- 600,642 } } + static void + cdsysctlinit(void *context, int pending) + { + struct cam_periph *periph; + struct cd_softc *softc; + char tmpstr[80], tmpstr2[80]; + + periph = (struct cam_periph *)context; + softc = (struct cd_softc *)periph-softc; + + snprintf(tmpstr, sizeof(tmpstr), CAM CD unit %d, periph-unit_number); + snprintf(tmpstr2, sizeof(tmpstr2), %d, periph-unit_number); + + mtx_lock(Giant); + + sysctl_ctx_init(softc-sysctl_ctx); + softc-sysctl_tree = SYSCTL_ADD_NODE(softc-sysctl_ctx, + SYSCTL_STATIC_CHILDREN(_kern_cam_cd), OID_AUTO, + tmpstr2, CTLFLAG_RD, 0, tmpstr); + + if (softc-sysctl_tree == NULL) { + printf(cdsysctlinit: unable to allocate sysctl tree\n); + return; + } + + /* +* Now register the sysctl handler, so the user can the value on +* the fly. +*/ + SYSCTL_ADD_PROC(softc-sysctl_ctx,SYSCTL_CHILDREN(softc-sysctl_tree), + OID_AUTO, minimum_cmd_size, CTLTYPE_INT | CTLFLAG_RW, + softc-minimum_command_size, 0, cdcmdsizesysctl, I, + Minimum CDB size); + + mtx_unlock(Giant); + } + /* * We have a handler function for this so we can check the values when the * user sets them, instead of every time we look at them. *** *** 642,648 struct ccb_setasync csa; struct ccb_pathinq cpi; struct ccb_getdev *cgd; ! char tmpstr[80], tmpstr2[80]; caddr_t match; cgd = (struct ccb_getdev *)arg; --- 681,687 struct ccb_setasync csa; struct ccb_pathinq cpi; struct ccb_getdev *cgd; ! char tmpstr[80]; caddr_t match; cgd = (struct ccb_getdev *)arg; *** *** 696,712 if (cpi.ccb_h.status == CAM_REQ_CMP (cpi.hba_misc PIM_NO_6_BYTE)) softc-quirks |= CD_Q_10_BYTE_ONLY; ! snprintf(tmpstr, sizeof
Re: need some debugging help
On Mon, Sep 01, 2003 at 02:23:18 +0200, Pawel Jakub Dawidek wrote: On Mon, Sep 01, 2003 at 02:13:45AM +0200, Pawel Jakub Dawidek wrote: + I was getting same panics while I was working on GEOM Gate. + After many hours of debugging I've tracked this down - I've initialized + a mutex, but I haven't destroy it. + + As I susspect you're loading cd(4) as kld module? No, you don't need to load it as kld module, because you initiate this mutex on every function call (and mutex is locally allocated to), so try to put mtx_destroy() on the end of this function, this should help. (I hope there is no problem with calling msleep(9) with mutex from stack) Well, keep in mind that this function, taskqueue_kthread(), is only called once, when the kthread is forked off. It then runs in an infinite loop forever. So far it doesn't seem like there's any problem with calling msleep() with a mutex allocated on the stack. The problem I was having turned out to be that I forgot to deference periph-softc in dasysctlinit(). Ken -- Kenneth Merry [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]