On Saturday 13 March 2010 18:02:10 Stephan von Krawczynski wrote: > On Fri, 12 Mar 2010 17:00:08 +0100 > Hubert Kario <h...@qbs.com.pl> wrote: > > > Even on true > > > spinning disks your assumption is wrong for relocated sectors. > > > > Which we don't have to worry about because if the drive has less than 5 > > of 'em, the impact of hitting them is marginal and if there are more, > > the user has much more pressing problem than the performance of the > > drive or FS. > > Are you really sure that a drive firmware tells you about the true number > of relocated sectors? I mean if it makes the product look better in > comparison to another product, are you really sure that the firmware will > not tell you what you expect to see only to make you content and happy > with your drive?
because Joe Sixpack reads SMART values, and even if he does, he will be much more angry when a drive that has no or few relocations fails, that when a drive that reports that's failing fails. If the drive arrives with badsectors, it goes where it came from the same day if it meets an IT guy worth its salt, any IT guy knows that some HDDs develop badsectors no matter the make and model, but if they do, you replace them. And as the Google disk survey showed, the SMART has very high percentage of Type I errors, but very few Type II errors. But we're off-topic here > > > Which > > > basically means that every disk controller firmware fiddles around with > > > the physical layout since decades. Please accept that you cannot do a > > > disks' job in FS. The more advanced technology gets the more disks > > > become black boxes with a defined software interface. Use this > > > interface and drop the idea of having inside knowledge of such a > > > device. That's other peoples' work. If you want to design smart SSD > > > controllers hire at a company that builds those. > > > > And I don't think that doing disks' job in the FS is good idea, but I > > think that we should be able to minimise the impact of the translation > > layer. > > > > The way to do this, is to threat the device as a block device with > > sectors the size of erase-blocks. That's nothing too fancy, don't you > > think? > > I don't believe anyone is able to tell the size of erase-blocks of some > device - current and future - for sure. Well, if the engeneer that designed it doesn't know this, I don't know how he got his degree. Just because it isn't publicised now, doesn't mean it won't be in near future. Besides that, to detect how big the erase-blocks are in size is easy, if they have any impact on the performance, if they don't have any impact (whatever the reason) tunning for their size is pointless anyway. > I do believe that making this > guess only reduces the future design options for new devices - if its > creators care at all about your guess. Did I, or any one else, say that we want to hardwire a specific erase-block size to the design of the FS?! That would be utter stupidity! > Why not let the fs designer take his creative options in fs layer and let > the device designer use his brain on the device level and all meet at the > predefined software interface in between - and nowhere _else_. We (well, at least Gordon and I) just want a "stripe_width" option added to the mkfs.btrfs, just like it is there for ext2/3/4, reiserfs, xfs and jfs to name a few. It would need very few additional tweaks to make it SSD friendly, hardly any considering how -o ssd or -o ssd_spread already work. You're forgetting there's an elephant in the room that won't to talk to devices that don't have sectors 512B in size. If not for it, there wouldn't even _be_ SSDs with 512B sectors. It's not the way Flash memory works. The 512B abstraction is there to be compatible, to work with one current OS, it's not there because it describes better the way Flash memory works or is the best way to address the data on the device itself. There are already consumer HDDs with 4kiB sector size, so the situation is getting better. We can only hope that in few years time the SSDs will have sectors the size of erase-blocks. But in the mean time, stripe_width would be enough. Besides, the stripe_width option will be not only useful for the SSDs but also in environments where btrfs is on a device that is a RAID5/6 array (reconfiguring a server with many virtual machines is far from easy and sometimes just can't be done because of heterogeneous virtualised OSs that need the data protection provided by lower layers). -- Hubert Kario QBS - Quality Business Software ul. Ksawerów 30/85 02-656 Warszawa POLAND tel. +48 (22) 646-61-51, 646-74-24 fax +48 (22) 646-61-50 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html