Am 31.08.2011, 21:12 Uhr, schrieb Andrew Wiley <wiley.andre...@gmail.com>:

Yes, but the disk should be able to do out-of-order execution of read and
write requests (which SCSI was designed to allow), which should make this
never be an issue. My understanding is that most SCSI implementations (and
it's in just about every storage protocol out there) allow this, with the
exception of Bulk Only Transport, which is used with USB hard drives and
flash drives. That will soon be replaced by USB Attached SCSI (
http://en.wikipedia.org/wiki/USB_Attached_SCSI), which specifically allows
out of order execution.
The end idea behind much of SCSI is that the disk designers are much better at knowing what's fast and what's not than the OS developers, and if we need
to worry about it on the application level, something has seriously gone
wrong.

SCSI has internal COPY commands. Under optimal conditions the data is never copied to main memory. That's what would be used ideally. But here we have our application that wants to read a chunk from the file, say 64 KB and the OS detects a linear file read pattern quickly and reads a larger block of say 512 KB. The chunks are written piece-wise and if all went well the kernel will merge the writes so they become a 512 KB chunk on their own. These are sent to the disk that has an internal cache of 8 MB and is in the trouble of deciding at this low level if it should write 8 MB at once or allow for some read access in-between which makes applications more responsive, but increases the seek times. Depending on the scenario either case is preferable. In the file copy scenario it would be better to write the whole cache and then switch back to reading. If on the other hand you were saving a video and click on the 'start' menu that has to load a lot of small icons, then the long write operation should have low priority over the reads. The disk cannot make an informed decision. I think the basic conflict is that we have desktop and server environments that are heavily multi-threaded and HDDs that are inherently single-threaded. They work best if you perform one continuous operation, because every context-switch requires seeking. If the issue was so easy to solve at the low level there wouldn't be three official full blown IO schedulers in the Linux kernel now, IMO. I don't worry about it on the application level, because I know the kernel and disk will handle most use cases well without real worst-case scenarios. But if using a HDD in a multi-threaded way actually yielded a higher throughput than the naive read-first-write-later approach I would be seriously surprised. If I was to bet I'd say it is at least 5% slower (with enabled buffers and IO scheduler) and a real disaster without them. :D

- Marco

Reply via email to