Re: Any objections/comments on axing out old ATA stack?
Am 20.04.2013 23:29, schrieb Jeremy Chadwick: My feeling is that the stalls are mostly from the error handler and the overall time the drive is frozen gets shorter. If it had not _felt_ faster, I'd not have left that in sysctl.conf in the first place. Your understanding of what that sysctl does is wrong, or I'm misunderstanding what you're saying (very possible!). What I am saying is a high-level view on the situation. If I leave the default slot timeout set, whenever the computer gets into an episode of stalls, it becomes unusable (all I/O stalled so anything that needs disk I/O will hang) for so long that it is much faster to depress the reset button, reboot, force fsck, and retry. This usually entails hand-holding and manually cleaning up debris, such as b0rked .o files from a buildworld, or similar. These stalls happens out of the middle of the buildworld, under heavy I/O, so I'd dispute excessive head unloading and drive spindown is the issue -- the computer (and fans in particular) is generally very quiet, no VGA board (just fanless onboard Radeon HD 3300), I could hear re-spinups or parking heads. I don't hear anything like it. I don't know how rescheduling commands that timed out and get rescheduled happens overall. How I interpret what you're saying: that the sysctl somehow decreases stall times during I/O operations that fail. This is incorrect. That may not be the intention of the sysctl, but it is the high-level outcome. What that sysctl does is define the number of seconds that transpire ***before*** the CAM layer says Okay, I didn't get a response to the ATA CDB I sent the disk, and then re-submits the same CDB to the disk. The other question (to Alexander Motin) then is why do I see the timeouts for the related slots rougly $timeout seconds apart. Alexander, is there any way we can make the kernel dump the entire set of pending NCQ queue entries including submitted timestamp, or timeout values, so that we can see how much workload is queued? Note also that the CRC count has not increased since I've put the smartctl output online, it's still at 14 -- I would have to see CRC errors and their consequences in Linux or Windows, too. Linux's smartd 5.41 never mailed about an increase of the CRC value, and I told it not to mail temperature changes. Rephrased: in the case of a disk stalling on an I/O request, you will experience the effects of that stall no matter what that sysctl is set to. A lower value in that sysctl will result in CAM spitting out nasties on the console + hitting the CDB retry submission scenario sooner, which if the drive is awake/responsive by that time will go smoothly. That's all it does. That's how you have explained and I have understood it on the queue-slot level (microscopic), but at a larger scale, I do not observe that the shorter timeout sysctl value led to these stall episodes happen more often (as should be the consequence if spindown were the cause of the stalls), only recovery is faster. Thus a value of 5 indicates a device/drive did not respond to a CDB within 5 seconds, and a value of 30 indicates a device/drive did not respond to a CDB within 30 seconds. Regardless, those lengths of time are VERY long for an I/O operation on a mechanical HDD. Indeed they are, and because /usr is on the offending drive, I lowered the value to 5 s, which I still deem conservative. I know that an older ATA standard edition permitted longer completion times for flushing HDD internal write caches to platters (15 s IIRC). Oh look, it's the Samsung SpinPoint series, especially the EcoGreen (EG) series. No joke: ~60% of the problem reports I deal with when it comes to weird wonky problems stem from this drive series. I have no idea why, but they're a common pain point for me. I know they are, especially the larger siblings 1.5 G up. Politely, your analysis of the drive (looks sane to me) is an indicator of why SMART output needs to be interpreted by a person who is familiar with the information. That drive *does not* look sane to me. :-) 14 CRC errors with a drive that moved through computers that got modified over time, that does not run the whole day, and that was first attached to a computer whose controller (VIA garbage) could only talk to 1.5 Gb/s ATA drives but not 3 Gb/s is not something I care about. Key points about these errors: [...] - These are conditions that short, long, select (LBA range scan), and conveyance SMART tests would probably not detect. Like I said: it seems to be all over the board. I agree that it is more likely to be a communications issue between FreeBSD and the drive's logic, with all components, hard- and software involved. Bernd Walter responded indicating that his experience indicated that the issue related to NCQ compatibility. This would not surprise me. Neither would it surprise me, but Linux should suffer, too, then. It does use NCQ, too. FreeBSD can be booted
Re: Any objections/comments on axing out old ATA stack?
On 21.04.2013 00:29, Jeremy Chadwick wrote: - The ATA commands which lead up to the error also vary. Many are for write requests, and from some entries I can see that the OS was doing NCQ writes (WRITE FPDMA QUEUED) and then suddenly decided to do a classic 28-bit LBA write (WRITE DMA). I'm not sure why an OS would do this (there's nothing optimal about it) unless there were conditions occurring where the OS/ATA driver said this NCQ write isn't working (timeout, etc.), let me retry with a classic 28-bit LBA write. ATA disk driver in CAM inserts non-queued command every several seconds of continuous load to limit possible command starvation inside the disk. SCSI driver does alike things, but inserts ordered command flag, that does not exist in SATA, instead of different command. -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Sun, Apr 21, 2013 at 02:11:04PM +0300, Alexander Motin wrote: On 21.04.2013 00:29, Jeremy Chadwick wrote: - The ATA commands which lead up to the error also vary. Many are for write requests, and from some entries I can see that the OS was doing NCQ writes (WRITE FPDMA QUEUED) and then suddenly decided to do a classic 28-bit LBA write (WRITE DMA). I'm not sure why an OS would do this (there's nothing optimal about it) unless there were conditions occurring where the OS/ATA driver said this NCQ write isn't working (timeout, etc.), let me retry with a classic 28-bit LBA write. ATA disk driver in CAM inserts non-queued command every several seconds of continuous load to limit possible command starvation inside the disk. SCSI driver does alike things, but inserts ordered command flag, that does not exist in SATA, instead of different command. Thanks for the insights Alexander, greatly appreciated. I'm a little confused by your description, because if I'm reading it right, it sounds like it conflicts with what the ACS-2 spec states. Quoting T13/2015-D rev 3 (I'm aware it's a working draft), section 4.16.1: If the device receives a command that is not an NCQ command while NCQ commands are in the queue, then the device shall return command aborted for the new command and for all of the NCQ commands that are in the queue. I assume this means ABRT status is returned to the host controller; if so (and by design of course), how do we differentiate between that condition and any other I/O condition that induces ABRT? Possibly in the answer is in this admission: I should probably get around to reading ATA8-AST sometime. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
ATA controller drivers are delaying conflicting commands, avoiding conflicts in device. 21.04.2013 14:32 пользователь Jeremy Chadwick j...@koitsu.org написал: On Sun, Apr 21, 2013 at 02:11:04PM +0300, Alexander Motin wrote: On 21.04.2013 00:29, Jeremy Chadwick wrote: - The ATA commands which lead up to the error also vary. Many are for write requests, and from some entries I can see that the OS was doing NCQ writes (WRITE FPDMA QUEUED) and then suddenly decided to do a classic 28-bit LBA write (WRITE DMA). I'm not sure why an OS would do this (there's nothing optimal about it) unless there were conditions occurring where the OS/ATA driver said this NCQ write isn't working (timeout, etc.), let me retry with a classic 28-bit LBA write. ATA disk driver in CAM inserts non-queued command every several seconds of continuous load to limit possible command starvation inside the disk. SCSI driver does alike things, but inserts ordered command flag, that does not exist in SATA, instead of different command. Thanks for the insights Alexander, greatly appreciated. I'm a little confused by your description, because if I'm reading it right, it sounds like it conflicts with what the ACS-2 spec states. Quoting T13/2015-D rev 3 (I'm aware it's a working draft), section 4.16.1: If the device receives a command that is not an NCQ command while NCQ commands are in the queue, then the device shall return command aborted for the new command and for all of the NCQ commands that are in the queue. I assume this means ABRT status is returned to the host controller; if so (and by design of course), how do we differentiate between that condition and any other I/O condition that induces ABRT? Possibly in the answer is in this admission: I should probably get around to reading ATA8-AST sometime. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 12:15:32AM +0200, Matthias Andree wrote: I have just sent more information to the PR at http://www.freebsd.org/cgi/query-pr.cgi?pr=157397 The short summary (more info in the PR) is: - limiting tags to 31 does not help - disabling NCQ appears to help in initial testing, but warrants more testing - error happens during WRITE_FPDMA_QUEUED, - File system in question is SU+J UFS2 mounted on /usr, and I can for instance rm -rf /usr/obj or just log into GNOME and try to open a gnome-terminal to trigger stalls; - Linux uses 31 tags (for different reason) and has no drive quirks, but a controller quirk; for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by Linux on the SB700 that my computer is using, see ahci_error_intr() in libahci.h - I am not going to interpret that for lack of expertise, but it does affect error handling and appears to ignore a certain condition. Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. I have had data corruption with Samsung drive and CAM connected to an onboard intel AHCI. The system was known good running with an older FreeBSD version and was brought back into service for another use case with a fresh installation. Regulary on major filesystem write activity we got random FS corruptions and panics. My assumption was broen NCQ firmware on the drive, but have nothing to proof this assumtion. We switched to old ata driver and lived with this until we replaced the whole machine. Don't know if the machine still exists somewhere. -- B.Walter be...@bwct.de http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 10:00:18AM +0200, Matthias Andree wrote: Am 04.04.2013 03:05, schrieb Jeremy Chadwick: { snipping stuff I have no comment on. reference thread: } { http://lists.freebsd.org/pipermail/freebsd-stable/2013-April/073036.html } One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). Now, the FreeBSD slice is the only partition on that disk that would likely see concurrent write accesses (think make -j8 on a quadcore computer) which is more prone to ferret out such alignment contention. The NTFS partition is aligned on a multi-MB boundary, so wouldn't hit the problem anyways. The Linux partition is in ext4 format for mostly sequential access to files usually in excess of 10 MB each. Linux's ext4 jumps through several hoops to end up with bulk writes, like extents, delayed allocations (to avoid fragmentation), reordering of data and metadata writes, serialized log writes and all that stuff, and it would appear I am permitting it to cache writes -- Linux uses write barriers to enforce proper ordering of journal/meta-data writes. It would be rather hard to hit ATA taskfile timeouts, the expected rate with which the drive needs to do a partial write is orders of magnitude lower. Any good concurrent write exercise tools for Unix that I could run on the Linux ext4 partition that you would propose? The only tool I'm familiar with is bonnie++. But I don't think this (partition alignment) is what matters now. Your smartctl output has shed some light on your situation. - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster I can definitely imagine cases where a drive using NCQ but doing writes to a non-aligned partition could take longer than 5 seconds to respond to an ATA CDB (this is different than a SATA or AHCI layer timeout). I am not telling you change this back to 30, but it might not be helping your situation at all given my above theory. My feeling is that the stalls are mostly from the error handler and the overall time the drive is frozen gets shorter. If it had not _felt_ faster, I'd not have left that in sysctl.conf in the first place. Your understanding of what that sysctl does is wrong, or I'm misunderstanding what you're saying (very possible!). How I interpret what you're saying: that the sysctl somehow decreases stall times during I/O operations that fail. This is incorrect. What that sysctl does is define the number of seconds that transpire ***before*** the CAM layer says Okay, I didn't get a response to the ATA CDB I sent the disk, and then re-submits the same CDB to the disk. Rephrased: in the case of a disk stalling on an I/O request, you will experience the effects of that stall no matter what that sysctl is set to. A lower value in that sysctl will result in CAM spitting out nasties on the console + hitting the CDB retry submission scenario sooner, which if the drive is awake/responsive by that time will go smoothly. That's all it does. Thus a value of 5 indicates a device/drive did not respond to a CDB within 5 seconds, and a value of 30 indicates a device/drive did not respond to a CDB within 30 seconds. Regardless, those lengths of time are VERY long for an I/O operation on a mechanical HDD. When you get to the bottom of my Email, you'll understand why I screamed at you about adjusting that sysctl. Finally: could you please provide output from smartctl -x /dev/ada1? I would like to rule out any possibility of your drive having some other kind of issue that might cause it to go catatonic. Thanks. I have fetched the data with Linux this time (should not make a difference as it's all drive internal data, not host OS stuff). Looks sane to me, http://people.freebsd.org/~mandree/smartctl.log. I'll be happy to refetch this data with a more current smartctl version under FreeBSD if required. Oh look, it's the Samsung SpinPoint series, especially the EcoGreen (EG) series. No joke: ~60% of the problem reports I deal with when it comes to weird wonky problems stem from this drive series. I have no idea why, but they're a common pain point for me. First, about the shown sector size: smartmontools 5.41 was the first release to show the sector sizes per ATA IDENTIFY. I assume they got this right from the get-go. So as of this moment I'm going to assume that this drive really is a 512-byte sector drive. Politely, your analysis of the drive (looks sane to me) is an indicator of why SMART output needs to be interpreted by a person who is familiar with
Re: Any objections/comments on axing out old ATA stack?
On 04/04/2013 09:00, Matthias Andree wrote: Any good concurrent write exercise tools for Unix that I could run on the Linux ext4 partition that you would propose? benchmarks/fio is good for that. -- Bruce Cran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Am 04.04.2013 03:05, schrieb Jeremy Chadwick: Please provide gpart show -p ada1 output, both here and in the PR, if you could. =63 1953525105ada1 MBR (931G) 63 209714337 ada1s1 freebsd [active] (100G) 209714400 800 - free - (400k) 2097152007168 ada1s2 ntfs (34G) 281395200 15405 - free - (7.5M) 281410605 488263545 ada1s3 linux-data (232G) 769674150 1183851018 - free - (564G) Thanks for all the useful information provided so far (including further down). I know some of that already, but am not going to complain because it is very useful in the logs. The problem here is that I cannot guarantee you that alignment is the problem. The performance impact of writes to partitions which are non-aligned is quite high, and NCQ just exacerbates this problem. I would love to tell you switch to GPT and follow Warren Block's document*** but if your NTFS partition is Windows and is a Windows version older than Windows 7 GPT is not supported. I am happy to make that realign-and-use-GPT experiment. My Windows is 7 Professional 64-bit. It will take me a few days because this is spare-time stuff. One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). Now, the FreeBSD slice is the only partition on that disk that would likely see concurrent write accesses (think make -j8 on a quadcore computer) which is more prone to ferret out such alignment contention. The NTFS partition is aligned on a multi-MB boundary, so wouldn't hit the problem anyways. The Linux partition is in ext4 format for mostly sequential access to files usually in excess of 10 MB each. Linux's ext4 jumps through several hoops to end up with bulk writes, like extents, delayed allocations (to avoid fragmentation), reordering of data and metadata writes, serialized log writes and all that stuff, and it would appear I am permitting it to cache writes -- Linux uses write barriers to enforce proper ordering of journal/meta-data writes. It would be rather hard to hit ATA taskfile timeouts, the expected rate with which the drive needs to do a partial write is orders of magnitude lower. Any good concurrent write exercise tools for Unix that I could run on the Linux ext4 partition that you would propose? If you have the time and want to put forth the effort, I would recommend backing up all your data on ada1, zero the first and last 1MByte of the drive, and then try following Warren Block's guide. I'd just recommend doing this: gpart create -s gpt ada1 gpart add -t freebsd-ufs -b 2m ada1 newfs -U -j /dev/ada1p1 (or remove -j if you don't want to use SUJ) Will do. - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster I can definitely imagine cases where a drive using NCQ but doing writes to a non-aligned partition could take longer than 5 seconds to respond to an ATA CDB (this is different than a SATA or AHCI layer timeout). I am not telling you change this back to 30, but it might not be helping your situation at all given my above theory. My feeling is that the stalls are mostly from the error handler and the overall time the drive is frozen gets shorter. If it had not _felt_ faster, I'd not have left that in sysctl.conf in the first place. Finally: could you please provide output from smartctl -x /dev/ada1? I would like to rule out any possibility of your drive having some other kind of issue that might cause it to go catatonic. Thanks. I have fetched the data with Linux this time (should not make a difference as it's all drive internal data, not host OS stuff). Looks sane to me, http://people.freebsd.org/~mandree/smartctl.log. I'll be happy to refetch this data with a more current smartctl version under FreeBSD if required. ** -- http://www.seagate.com/files/www-content/support-content/documentation/samsung/tech-specs/eco_greenf2.pdf *** -- http://www.wonkity.com/~wblock/docs/html/ssd.html signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
On 02.04.2013 21:39, Matthias Andree wrote: Am 31.03.2013 23:02, schrieb Scott Long: So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? Well, I am running the driver fine off of my WD Caviar RE3 disk, and the problematic drive also works just fine with Windows and Linux, so it must be something between the problematic drive and the FreeBSD driver. I would like to see any of this, in decreasing order of precedence: - debugged driver - assistance/instructions on helping how to debug the driver/trace NCQ stuff/... (as in Jeremy Chadwick's followup in this same thread - this helps, I will attempt to procure the required information; back then, reducing the number of tags to 31 was ineffective, including an error message and getting a value of 32 when reading the setting back) Unfortunately, I don't know how to debug that. Command timeouts reported on the lists before are the kind of errors that are most difficult to diagnose since the controller gives no information to do that. We just see that sent commands are no longer completing. May be it is some incompatibility of specific drive and HBA firmwares, triggered by some innocent specifics of our ATA stack, GEOM or filesystems implementation. All I can propose is to try to identify such cases and add some quirks to workaround it, like disabling NCQ or limiting number of tags. I am not sure what else can we do about it without some controlled lab environment with affected hardware and SATA analyzer. - user-space contingency features, such as letting camcontrol limit the number of open NCQ tags, or disable NCQ, either on a per-drive basis I've merged support for that to 8/9-STABLE about 9 months ago: `camcontrol tags ada0 -v -N X` should change number of simultaneously used tags, `camcontrol negotiate ada0 -T (en|dis)able` should enable/disable use of NCQ. I just did some tests on HEAD and these commands seems like working. If you can reproduce the problem, it would be nice to collect information how these changes affect it. I am capable of debugging C - mostly with gdb command-line, and graphical Windows IDEs - but am unfamiliar with FreeBSD kernel debugging. If necessary, I can pull up a second console, but the PC that is affected is legacy-free, so serial port only works through a serial/USB converter. -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
I have just sent more information to the PR at http://www.freebsd.org/cgi/query-pr.cgi?pr=157397 The short summary (more info in the PR) is: - limiting tags to 31 does not help - disabling NCQ appears to help in initial testing, but warrants more testing - error happens during WRITE_FPDMA_QUEUED, - File system in question is SU+J UFS2 mounted on /usr, and I can for instance rm -rf /usr/obj or just log into GNOME and try to open a gnome-terminal to trigger stalls; - Linux uses 31 tags (for different reason) and has no drive quirks, but a controller quirk; for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by Linux on the SB700 that my computer is using, see ahci_error_intr() in libahci.h - I am not going to interpret that for lack of expertise, but it does affect error handling and appears to ignore a certain condition. Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Hope that helps a bit. signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 12:15:32AM +0200, Matthias Andree wrote: I have just sent more information to the PR at http://www.freebsd.org/cgi/query-pr.cgi?pr=157397 The short summary (more info in the PR) is: - limiting tags to 31 does not help - disabling NCQ appears to help in initial testing, but warrants more testing - error happens during WRITE_FPDMA_QUEUED, This is an NCQ-based write LBA request. There are many non-NCQ equivalents of this, ATA-protocol-wise (too many to list here), but the most likely non-NCQ ATA command you'd see is WRITE_DMA48. - File system in question is SU+J UFS2 mounted on /usr, and I can for instance rm -rf /usr/obj or just log into GNOME and try to open a gnome-terminal to trigger stalls; - Linux uses 31 tags (for different reason) and has no drive quirks, but a controller quirk; for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by Linux on the SB700 that my computer is using, see ahci_error_intr() in libahci.h - I am not going to interpret that for lack of expertise, but it does affect error handling and appears to ignore a certain condition. Alexander could expand on this, but the name of the flag implies that there are certain conditions where the SATA-level SERR condition gets ignored (IGN). While skimming Linux libata code and commits in the past, the only glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the hardware revision apparently matters) and port multiplier (PMP) support and soft resets. Are you using a port multiplier? I doubt it, but I have to ask. Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Please provide gpart show -p ada1 output, both here and in the PR, if you could. I have a gut feeling I know what the issue is (and if it is what I think it is, it's actually happening all the time, just that NCQ exacerbates it given how command queueing works), but I won't know for sure until I see the output. Thanks. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Am 04.04.2013 01:38, schrieb Jeremy Chadwick: ... While skimming Linux libata code and commits in the past, the only glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the hardware revision apparently matters) and port multiplier (PMP) support and soft resets. Are you using a port multiplier? I doubt it, but I have to ask. I am not using a PMP as far as I know (unless one is buried on my Asus M4A78T-E main board). It would seem the drives are directly attached to the south bridge's SATA ports. Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Please provide gpart show -p ada1 output, both here and in the PR, if you could. =63 1953525105ada1 MBR (931G) 63 209714337 ada1s1 freebsd [active] (100G) 209714400 800 - free - (400k) 2097152007168 ada1s2 ntfs (34G) 281395200 15405 - free - (7.5M) 281410605 488263545 ada1s3 linux-data (232G) 769674150 1183851018 - free - (564G) HTH Best regards Matthias signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 02:19:16AM +0200, Matthias Andree wrote: Am 04.04.2013 01:38, schrieb Jeremy Chadwick: ... While skimming Linux libata code and commits in the past, the only glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the hardware revision apparently matters) and port multiplier (PMP) support and soft resets. Are you using a port multiplier? I doubt it, but I have to ask. I am not using a PMP as far as I know (unless one is buried on my Asus M4A78T-E main board). It would seem the drives are directly attached to the south bridge's SATA ports. Then the answer is nope, you're not using a PM. Details: http://www.serialata.org/technology/port_multipliers.asp http://en.wikipedia.org/wiki/Port_multiplier Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Please provide gpart show -p ada1 output, both here and in the PR, if you could. =63 1953525105ada1 MBR (931G) 63 209714337 ada1s1 freebsd [active] (100G) 209714400 800 - free - (400k) 2097152007168 ada1s2 ntfs (34G) 281395200 15405 - free - (7.5M) 281410605 488263545 ada1s3 linux-data (232G) 769674150 1183851018 - free - (564G) This is what I was worried about. Referring to your camcontrol identify output: device model SAMSUNG HD103SI sector size logical 512, physical 512, offset 0 Hear me out entirely on this one. My theory is that your hard disk actually uses 4096-byte sectors but is too old to provide ATA IDENTIFY semantics to delineate between logical vs. physical sector size. In other words, only logical is provided, thus logical=physical in the eyes of all software; smartctl will show you the exact same thing too. There are drives like this in the wild, both SSDs as well as MHDDs. For example, the Intel 320-series SSD behaves this way too (providing only logical size). Do not let the capacity/size of the drive be the deciding factor; your drive is 1TB, but I also have many 1TB MHDDs that use 4096-byte sectors. Seagate/Samsung's specification** for the HD103SI states, and I quote: Byte per Sensor: 512 bytes. Yes, it says Sensor. Whether or not this documentation is correct/accurate is unknown, and when vendors have typos in their own specification docs, I cannot help but to honour the possibility of the information being wrong. So I'm unsure if this drive uses 512-byte sectors or 4096-byte sectors. That said: in your gpart show ada1 output, none of your partitions (FreeBSD, NTFS, nor Linux) appear to be aligned to 4096-byte boundaries. Ideally you'd want to have these aligned to 1MB or 2MByte boundaries in the case you ever move to an SSD. You're also using the MBR scheme, which does not tend to play well with alignment. Comparatively, your WD5002ABYS drive **does** use 512-byte sectors (I know this for a fact). The problem here is that I cannot guarantee you that alignment is the problem. The performance impact of writes to partitions which are non-aligned is quite high, and NCQ just exacerbates this problem. I would love to tell you switch to GPT and follow Warren Block's document*** but if your NTFS partition is Windows and is a Windows version older than Windows 7 GPT is not supported. One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). If you have the time and want to put forth the effort, I would recommend backing up all your data on ada1, zero the first and last 1MByte of the drive, and then try following Warren Block's guide. I'd just recommend doing this: gpart create -s gpt ada1 gpart add -t freebsd-ufs -b 2m ada1 newfs -U -j /dev/ada1p1 (or remove -j if you don't want to use SUJ) I picked an alignment value of 2MBytes since it's both 4K-aligned and is generally safe for things like newer SSDs that have larger NAND erase block size (I am not going to get into a discussion about that here, so please stay focused. :-) ) If the problem is gone after that (it should be easy to induce by writing tons and tons of data to the drive), then we can safely say that the drive uses 4096-byte sectors and need to add it to the quirks list in ata_da.c. If the problem remains after that, then further investigation is needed, and we can safely rule out alignment. Welcome to all the pain/effort one has to go through when troubleshooting things like this. :-) Another thing: in your PR you state: - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster I can definitely imagine cases where
Re: Any objections/comments on axing out old ATA stack?
Am 31.03.2013 23:02, schrieb Scott Long: So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? Well, I am running the driver fine off of my WD Caviar RE3 disk, and the problematic drive also works just fine with Windows and Linux, so it must be something between the problematic drive and the FreeBSD driver. I would like to see any of this, in decreasing order of precedence: - debugged driver - assistance/instructions on helping how to debug the driver/trace NCQ stuff/... (as in Jeremy Chadwick's followup in this same thread - this helps, I will attempt to procure the required information; back then, reducing the number of tags to 31 was ineffective, including an error message and getting a value of 32 when reading the setting back) - user-space contingency features, such as letting camcontrol limit the number of open NCQ tags, or disable NCQ, either on a per-drive basis I am capable of debugging C - mostly with gdb command-line, and graphical Windows IDEs - but am unfamiliar with FreeBSD kernel debugging. If necessary, I can pull up a second console, but the PC that is affected is legacy-free, so serial port only works through a serial/USB converter. signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
Am 01.04.2013 17:07, schrieb Stefan Esser: Am 01.04.2013 15:14, schrieb Victor Balada Diaz: Being able to configure quirks from loader.conf for disks AND controllers would be great and is not hard to do. If you want i can do a patch in two weeks and send it to you. That way it's easy to test disabling NCQ and/or other things in case of hitting a bug. Also being able to modify the configuration without a kernel recompile would be a big improvement because we could still use freebsd-update to keep systems updated. Something like: kern.cam.ada.0.quirks=1 to force 4KB sectors? No need to implement that, it is in -CURRENT (did not check -STABLE). But there is no quirk, that disables NCQ, currently, although it is easy to implement. See the places where ADA_FLAG_CAN_NCQ is set and make that value depend on a new quirk flag being unset ... But instead of setting that flag in the loader, it would be good to collect drive signatures that need it and to add quirk entries for them in ata_da.c ... Before we can do that, we need to know if it's really the drive's fault or if the driver is wrong. We need to debug that. If we have relevant parameters exposed through the CAM interface (rather than loader variables), that would also help expedite the debugging. signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
On Sun, Mar 31, 2013 at 03:02:09PM -0600, Scott Long wrote: On Mar 31, 2013, at 7:04 AM, Victor Balada Diaz vic...@bsdes.net wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Hello, At my previous job we had troubles with NCQ on some controllers. It caused failures and silent data corruption. As old ata code didn't use NCQ we just used it. I reported some of the problems on 8.2[1] but the problem existed with 8.3. I no longer have access to those systems, so i don't know if the problem still exists or have been fixed on newer versions. Regards. Victor. So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? Scott Being able to configure quirks from loader.conf for disks AND controllers would be great and is not hard to do. If you want i can do a patch in two weeks and send it to you. That way it's easy to test disabling NCQ and/or other things in case of hitting a bug. Also being able to modify the configuration without a kernel recompile would be a big improvement because we could still use freebsd-update to keep systems updated. Anyway, my comment was not against dropping old ata code, but more on the comments on regresssions on the new one. Regards. Victor. -- La prueba más fehaciente de que existe vida inteligente en otros planetas, es que no han intentado contactar con nosotros. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Am 01.04.2013 15:14, schrieb Victor Balada Diaz: Being able to configure quirks from loader.conf for disks AND controllers would be great and is not hard to do. If you want i can do a patch in two weeks and send it to you. That way it's easy to test disabling NCQ and/or other things in case of hitting a bug. Also being able to modify the configuration without a kernel recompile would be a big improvement because we could still use freebsd-update to keep systems updated. Something like: kern.cam.ada.0.quirks=1 to force 4KB sectors? No need to implement that, it is in -CURRENT (did not check -STABLE). But there is no quirk, that disables NCQ, currently, although it is easy to implement. See the places where ADA_FLAG_CAN_NCQ is set and make that value depend on a new quirk flag being unset ... But instead of setting that flag in the loader, it would be good to collect drive signatures that need it and to add quirk entries for them in ata_da.c ... Regards, STefan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Mon, Apr 01, 2013 at 05:07:20PM +0200, Stefan Esser wrote: Am 01.04.2013 15:14, schrieb Victor Balada Diaz: Being able to configure quirks from loader.conf for disks AND controllers would be great and is not hard to do. If you want i can do a patch in two weeks and send it to you. That way it's easy to test disabling NCQ and/or other things in case of hitting a bug. Also being able to modify the configuration without a kernel recompile would be a big improvement because we could still use freebsd-update to keep systems updated. Something like: kern.cam.ada.0.quirks=1 to force 4KB sectors? No need to implement that, it is in -CURRENT (did not check -STABLE). But there is no quirk, that disables NCQ, currently, although it is easy to implement. See the places where ADA_FLAG_CAN_NCQ is set and make that value depend on a new quirk flag being unset ... But instead of setting that flag in the loader, it would be good to collect drive signatures that need it and to add quirk entries for them in ata_da.c ... Regards, STefan Yep, something like that but also for controllers. Looking here[1] i don't see it implemented for controllers on current. I agree that we should collect drive and controller signatures and add that quirks to the OS, but being able to play with quirks from loader is still useful. If your FreeBSD version don't have yet the quirks needed for the disk/controller that you're using, you'd need to patch and rebuild a custom kernel. Having a loader tunable helps maintaining old FreeBSD versions easier. Regards. Victor. [1]: http://fxr.watson.org/fxr/source/dev/ahci/ahci.c -- La prueba más fehaciente de que existe vida inteligente en otros planetas, es que no han intentado contactar con nosotros. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 31.03.2013 08:13, Ian Smith wrote: On Sat, 30 Mar 2013 21:00:24 -0700, Peter Wemm wrote: On Sat, Mar 30, 2013 at 4:29 PM, Matthias Andree mand...@freebsd.org wrote: Am 27.03.2013 22:22, schrieb Alexander Motin: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Alexander, The regression in http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/157397 where the SATA NCQ slots stall for some Samsung drives in the new stack, and consequently hang the computer for prolonged episodes where it is in the NCQ error handling, disallows removal of the old driver. (Last checked with 9.1-RELEASE at current patchlevel.) We're talking about 10.x, so if you want it fixed, you need update with 10.x information. Please put 10.x diagnostics in the PR. Given Alexander also posted this to -stable, just for clarity, are we _only_ talking about 10.x here, or might this change get MFC'd to 9? Yes, I am only going to drop it from 10.x, but bug reports from 9-STABLE users are welcome, as at some point they will become 10.x users. -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Am 31.03.2013 06:00, schrieb Peter Wemm: We're talking about 10.x, so if you want it fixed, you need update with 10.x information. Please put 10.x diagnostics in the PR. I will not. The PR was filed four months before 10-CURRENT branched; I have no reason to assume it were to be no longer pertinent -- no MFCs, no PR followups). (according to http://www.freebsd.org/doc/en/books/porters-handbook/freebsd-versions.html, 10-CURRENT appeared on 2011-09-26) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Hello, At my previous job we had troubles with NCQ on some controllers. It caused failures and silent data corruption. As old ata code didn't use NCQ we just used it. I reported some of the problems on 8.2[1] but the problem existed with 8.3. I no longer have access to those systems, so i don't know if the problem still exists or have been fixed on newer versions. Regards. Victor. [1]: https://groups.google.com/forum/#!topic/muc.lists.freebsd.stable/dAMf028CtXM -- La prueba más fehaciente de que existe vida inteligente en otros planetas, es que no han intentado contactar con nosotros. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Mar 31, 2013, at 7:04 AM, Victor Balada Diaz vic...@bsdes.net wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Hello, At my previous job we had troubles with NCQ on some controllers. It caused failures and silent data corruption. As old ata code didn't use NCQ we just used it. I reported some of the problems on 8.2[1] but the problem existed with 8.3. I no longer have access to those systems, so i don't know if the problem still exists or have been fixed on newer versions. Regards. Victor. So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Sun, Mar 31, 2013 at 03:02:09PM -0600, Scott Long wrote: On Mar 31, 2013, at 7:04 AM, Victor Balada Diaz vic...@bsdes.net wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Hello, At my previous job we had troubles with NCQ on some controllers. It caused failures and silent data corruption. As old ata code didn't use NCQ we just used it. I reported some of the problems on 8.2[1] but the problem existed with 8.3. I no longer have access to those systems, so i don't know if the problem still exists or have been fixed on newer versions. So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? I've read a referenced PR (157397) except there really isn't enough technical troubleshooting/detail to determine what the root cause is. That isn't the fault of the reporter either -- the reporter needs to be told what information they need to provide / how to troubleshoot it. Meaning: kernel folks who are in-the-know need to step up and help. That PR is soon-to-be 2 years old and is missing tons of information that, even as a non-kernel guy, that *I* would find useful: 1. Output from: - camcontrol tags ada1 -v - camcontrol identify ada1 - What sorts of filesystems are on ada1; if UFS, tunefs -p output would be greatly appreciated - If the timeouts happen during heavy I/O load, and if so, during what kinds of I/O load (reads or writes). 2. Does camcontrol tags ada1 -N 31 help? I mention this because stated here: http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072985.html ...there are statements which imply decreasing queue length may solve the issue. What confuses me, however, is that the queue length on my own systems (with different models of disks, as well as an SSD) all have a limit of 32. I dug through the kernel source for a while but could not easily find where this number comes from. (I have very little familiarity with command queuing at the protocol level) 3. Why not find out why Linux (probably libata) has a 32 (or 31?) queue limit? They have commit logs, and there is the LVKM where you could ask. While I understand reluctance to add something just because Linux does it, it doesn't appear anyone's stepped up to the plate to ask them why; I pray this is not caused by anti-Linux sentiment. 4. The ada1 device in the PR is a Samsung Spinpoint EcoGreen F2 hard drive (1TB, 5400rpm, 32MB cache). Possibly the drive has firmware bugs relating to its NCQ implementation, or possibly it's going into some power-saving mode (it is an EcoGreen model). I've always been wary of the EcoGreen disks since reading about the F4 EcoGreen firmware fiasco (even though the same page says the F1 and F3 EcoGreen had no issue): http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks 5. We really need to have some way to print active quirks for devices, even if it's only at boot-up, e.g.: ada3: quirks=0x00034K,NO_NCQ I'd be happy to write the code for this (basing it on how we do CPU flags), but as I've said in the past, kernel-land is scary to me. 6. The controller referenced is an ATI IXP700. I cannot tell you how many times on the mailing lists I've seen weird issues reported by people using that controller. I am in no way/shape/form saying the issue is with the controller or with AHCI compatibility (FreeBSD vs. ATI), because I have no proof. I just find it very unnerving that so many issues have been reported where that controller is involved, and often across all sorts of different device/disk models. All that said: I agree a loader tunable to inhibit command queueing would be nice. sysctl would be even more convenient (easier for real-time testing) but I don't know the implications of turning CQ off in the middle of any pending I/O requests. -- | Jeremy Chadwick j...@koitsu.org | |
Re: Any objections/comments on axing out old ATA stack?
Am 27.03.2013 22:22, schrieb Alexander Motin: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Alexander, The regression in http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/157397 where the SATA NCQ slots stall for some Samsung drives in the new stack, and consequently hang the computer for prolonged episodes where it is in the NCQ error handling, disallows removal of the old driver. (Last checked with 9.1-RELEASE at current patchlevel.) Chances are that limiting the open queue slots to 31 might help, but that is hearsay from what Linux would be doing. Unless we get a fix, if you want to drop the old driver, you'll need to add features so that 1. the new driver to lets users (down-)configure the max. number of tagged openings 2. the new driver allows disabling NCQ altogether for individual drives 3. list the relevant Samsung drives in some quirks data base so that we avoid the stalls while permitting users to open it up to 32 NCQ slots. So unless these are all addressed, I'd veto removal of the old ATA driver - sorry! Best regards Matthias signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
Am 28.03.2013 16:31, schrieb Scott Long: On Mar 28, 2013, at 8:00 AM, Ian Lepore i...@freebsd.org wrote: On Thu, 2013-03-28 at 09:17 +0200, Alexander Motin wrote: On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. It's not necessarily what the boards have but how they're used. We use industrial SBCs at work that have ata compact flash sockets on the board which we do use, and usb interfaces which we don't use. I've never tested the new ata+cam stuff on some of these boards, most based on Cyrix, Via, Geode, and VortexD86 chipsets. The older ata code works, but not always very well -- for example, we usually have to set hw.ata.ata_dma=0 for absolutely no reason we've ever been able to figure out except that if we leave it enabled we get DMA errors and panics on some CF cards and not on others. I have no idea whether to expect such things to be better, worse, or no different by changing to the ata+cam way of doing things (but I don't really have time to do extensive testing right now either). The legacy ATA code was hard to maintain, very buggy (as you point out), and is essentially unmaintained. Also, IIRC, the legacy stack simply cannot support NCQ tagged queueing. ...which is exactly why it currently is the only way to get certain Samsung drives to cooperate reliably, without stalling the kernel for prolonged times (minutes) making the computer essentially unusable once it gets under I/O load (such as make -C /usr/src -j4 buildworld) - as the new ahci+ata+cam+... would. Details including PR reference in my other message in this thread. signature.asc Description: OpenPGP digital signature
Re: Any objections/comments on axing out old ATA stack?
On Sat, Mar 30, 2013 at 4:29 PM, Matthias Andree mand...@freebsd.org wrote: Am 27.03.2013 22:22, schrieb Alexander Motin: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Alexander, The regression in http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/157397 where the SATA NCQ slots stall for some Samsung drives in the new stack, and consequently hang the computer for prolonged episodes where it is in the NCQ error handling, disallows removal of the old driver. (Last checked with 9.1-RELEASE at current patchlevel.) We're talking about 10.x, so if you want it fixed, you need update with 10.x information. Please put 10.x diagnostics in the PR. -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV bitcoin:188ZjyYLFJiEheQZw4UtU27e2FMLmuRBUE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Sat, 30 Mar 2013 21:00:24 -0700, Peter Wemm wrote: On Sat, Mar 30, 2013 at 4:29 PM, Matthias Andree mand...@freebsd.org wrote: Am 27.03.2013 22:22, schrieb Alexander Motin: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Alexander, The regression in http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/157397 where the SATA NCQ slots stall for some Samsung drives in the new stack, and consequently hang the computer for prolonged episodes where it is in the NCQ error handling, disallows removal of the old driver. (Last checked with 9.1-RELEASE at current patchlevel.) We're talking about 10.x, so if you want it fixed, you need update with 10.x information. Please put 10.x diagnostics in the PR. Given Alexander also posted this to -stable, just for clarity, are we _only_ talking about 10.x here, or might this change get MFC'd to 9? cheers, Ian (dropping -current as I'm not subscribed so would only get bounced) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Alexander Motin wrote this message on Thu, Mar 28, 2013 at 09:17 +0200: On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. The net4501 board has ATA but no USB.. Also, depending upon use, you might choose to not include USB, but use ATA, or not use umass, but the rest of USB... Someone on a list was talking about trying to get FreeBSD down on a really small system, 16MB ram... /me thinks of the old wd driver. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, 2013-03-28 at 09:17 +0200, Alexander Motin wrote: On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. It's not necessarily what the boards have but how they're used. We use industrial SBCs at work that have ata compact flash sockets on the board which we do use, and usb interfaces which we don't use. I've never tested the new ata+cam stuff on some of these boards, most based on Cyrix, Via, Geode, and VortexD86 chipsets. The older ata code works, but not always very well -- for example, we usually have to set hw.ata.ata_dma=0 for absolutely no reason we've ever been able to figure out except that if we leave it enabled we get DMA errors and panics on some CF cards and not on others. I have no idea whether to expect such things to be better, worse, or no different by changing to the ata+cam way of doing things (but I don't really have time to do extensive testing right now either). -- Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Wed, 27 Mar 2013 17:43:07 -0700 Adrian Chadd adr...@freebsd.org wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. /me never seen embedded devices with ATA/SATA and less than 64MB of RAM. (i386/i486 old machines does not count :) ) I'm missing something? Thanks, adrian ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Aleksandr Rybalko r...@ddteam.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Mar 27, 2013, at 6:43 PM, Adrian Chadd adr...@freebsd.org wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. From a code execution standpoint? No, it's not. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. From a code segment size standpoint, there's definitely some stuff that should be made modular and optional. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Mar 28, 2013, at 8:00 AM, Ian Lepore i...@freebsd.org wrote: On Thu, 2013-03-28 at 09:17 +0200, Alexander Motin wrote: On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. It's not necessarily what the boards have but how they're used. We use industrial SBCs at work that have ata compact flash sockets on the board which we do use, and usb interfaces which we don't use. I've never tested the new ata+cam stuff on some of these boards, most based on Cyrix, Via, Geode, and VortexD86 chipsets. The older ata code works, but not always very well -- for example, we usually have to set hw.ata.ata_dma=0 for absolutely no reason we've ever been able to figure out except that if we leave it enabled we get DMA errors and panics on some CF cards and not on others. I have no idea whether to expect such things to be better, worse, or no different by changing to the ata+cam way of doing things (but I don't really have time to do extensive testing right now either). The legacy ATA code was hard to maintain, very buggy (as you point out), and is essentially unmaintained. Also, IIRC, the legacy stack simply cannot support NCQ tagged queueing. I think that Alexander has done a superb job with both developing and supporting the CAM_ATA stack. Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
Hello, Aleksandr. You wrote 28 марта 2013 г., 18:09:53: It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. AR /me never seen embedded devices with ATA/SATA and less than 64MB of RAM. AR (i386/i486 old machines does not count :) ) AR I'm missing something? Yes: USB UMASS. It uses CAM too, and useful for very small systems, like 4MiB FLASH and 16MiB RAM (yes, whole system image, kernel and all, should be packed to 4MiB). Please note, Adrian speaks about CAM, not only CAM + ATA. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 28 March 2013 09:05, Lev Serebryakov l...@freebsd.org wrote: Yes: USB UMASS. It uses CAM too, and useful for very small systems, like 4MiB FLASH and 16MiB RAM (yes, whole system image, kernel and all, should be packed to 4MiB). Please note, Adrian speaks about CAM, not only CAM + ATA. And I'm not at all saying we should keep the old ATA driver around. I'm just pointing out a set of use cases that most FreeBSD developers aren't involved with and I'd like to find a way to squeeze it more efficiently into embedded platforms. I've never had any noticable performance issues with CAM on my embedded MIPS boards because it's typically pushing packets. It's just the resultant binary size of the whole stack that's a problem. adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | grep scsi textdata bss dec hex filename 49372 10672 80 60124eadc scsi_all.o 212002576 16 237925cf0 scsi_da.o 232881488 16 2479260d8 scsi_xpt.o adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | grep cam textdata bss dec hex filename 3824 96 163936 f60 cam.o 13552 144 16 137123590 cam_periph.o 2344 144 02488 9b8 cam_queue.o 640 48 0 688 2b0 cam_sim.o 40684 752 192 41628a29c cam_xpt.o adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | grep umass textdata bss dec hex filename 225921072 16 236805c80 umass.o adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | egrep '(cam_|umass|scsi_)' 13552 144 16 137123590 cam_periph.o 2344 144 02488 9b8 cam_queue.o 640 48 0 688 2b0 cam_sim.o 40684 752 192 41628a29c cam_xpt.o 49372 10672 80 60124eadc scsi_all.o 212002576 16 237925cf0 scsi_da.o 232881488 16 2479260d8 scsi_xpt.o 225921072 16 236805c80 umass.o adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | egrep '(cam_|umass|scsi_)' | awk '{a+=$4} END {print a}' 190904 It doesn't seem like a lot, but it does add up.. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
.. and before you ask - yes, there are embedded boards with limited RAM that also have ATA ports. :-) Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
In message CAJ-Vmo=qATZHubkKZ2heiJ3528e__JG4RLru7LU9rwP5_EwT=g...@mail.gmail.com, Adrian Chadd wri tes: On 28 March 2013 09:05, Lev Serebryakov l...@freebsd.org wrote: adrian@freefall:~/public_html/ath$ cat AP121-nodebug.txt | egrep '(cam_|umass|scsi_)' | awk '{a+=$4} END {print a}' 190904 It doesn't seem like a lot, but it does add up.. Isn't there some kernel compile-time option to eliminate the huge tables used for errormessages etc ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, 28 Mar 2013, Ian Lepore wrote: On Thu, 2013-03-28 at 09:17 +0200, Alexander Motin wrote: On 28.03.2013 02:43, Adrian Chadd wrote: My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Are there many boards now with ATA, but without USB? But I agree, it should be checked. It's not necessarily what the boards have but how they're used. We use industrial SBCs at work that have ata compact flash sockets on the board which we do use, and usb interfaces which we don't use. I've never tested the new ata+cam stuff on some of these boards, most based on Cyrix, Via, Geode, and VortexD86 chipsets. The older ata code works, but not always very well -- for example, we usually have to set hw.ata.ata_dma=0 for absolutely no reason we've ever been able to figure out except that if we leave it enabled we get DMA errors and panics on some CF cards and not on others. I have no idea whether to expect such things to be better, worse, or no different by changing to the ata+cam way of doing things (but I don't really have time to do extensive testing right now either). Woa, I have to set hw.ata.ata_dma=0 also in order to get FreeBSD to boot on a PC104 board. I think ours is a Cyrix or Via also. -- DE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 28 March 2013 10:26, Poul-Henning Kamp p...@phk.freebsd.dk wrote: Isn't there some kernel compile-time option to eliminate the huge tables used for errormessages etc ? Yup. It doesn't save all that much in the grand scheme of things. Doubly so since my secondary size constraint is an 896k partition that I lzma compress the kernel to fit into. Those strings don't add much to the final lzma image size. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Yes, I use the legacy ATA stack. Does anybody have good ideas why we should not drop it now? Because it works? -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 27.03.2013 23:32, Steve Kargl wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Yes, I use the legacy ATA stack. On 9.x or HEAD where new one is default? Does anybody have good ideas why we should not drop it now? Because it works? Any problems with new one? -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Wed, Mar 27, 2013 at 2:32 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Yes, I use the legacy ATA stack. You're missing the reason for why you're running the old ATA stack. Do you have hardware that doesn't work with ATA_CAM? Have you not tried ATA_CAM on that box? Some other reason? -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Wed, Mar 27, 2013 at 11:35:35PM +0200, Alexander Motin wrote: On 27.03.2013 23:32, Steve Kargl wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Yes, I use the legacy ATA stack. On 9.x or HEAD where new one is default? Head. Does anybody have good ideas why we should not drop it now? Because it works? Any problems with new one? Last time I tested the new one, and this was several months ago, the system (a Dell Latitude D530 laptop) would not boot. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On 28.03.2013 00:05, Steve Kargl wrote: On Wed, Mar 27, 2013 at 11:35:35PM +0200, Alexander Motin wrote: On 27.03.2013 23:32, Steve Kargl wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Yes, I use the legacy ATA stack. On 9.x or HEAD where new one is default? Head. Does anybody have good ideas why we should not drop it now? Because it works? Any problems with new one? Last time I tested the new one, and this was several months ago, the system (a Dell Latitude D530 laptop) would not boot. Probably we should just fix that. Any more info? -- Alexander Motin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Mar 28, 2013 at 12:22:11AM +0200, Alexander Motin wrote: On 28.03.2013 00:05, Steve Kargl wrote: Last time I tested the new one, and this was several months ago, the system (a Dell Latitude D530 laptop) would not boot. Probably we should just fix that. Any more info? I can't remember all the details. I intended to try again as work was being done on the new code at the time. I never got around to it as my laptop worked fine with the old code and unfortunately I got busy with work and family. Reading the freebsd-current mailing lists suggests that now is not the time to be a hero. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
My main concern with the new stuff is that it requires CAM and that's reasonably big compared to the standalone ATA code. It'd be nice if we could slim down the CAM stack a bit first; it makes embedding it on the smaller devices really freaking painful. Thanks, adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org