On Saturday 13 March 2010 17:43:59 Stephan von Krawczynski wrote:
> On Thu, 11 Mar 2010 13:00:17 -0500
> Chris Mason <chris.ma...@oracle.com> wrote:
> > On Thu, Mar 11, 2010 at 06:35:06PM +0100, Stephan von Krawczynski wrote:
> > > On Thu, 11 Mar 2010 15:39:05 +0100
> > > Sander <san...@humilis.net> wrote:
> > > > Stephan von Krawczynski wrote (ao):
> > > > > Honestly I would just drop the idea of an SSD option simply because
> > > > > the vendors implement all kinds of neat strategies in their
> > > > > devices. So in the end you cannot really tell if the option does
> > > > > something constructive and not destructive in combination with a
> > > > > SSD controller.
> > > > 
> > > > My understanding of the ssd mount option is also that the fs doens't
> > > > try to do all kinds of smart (and potential expensive) things which
> > > > make sense for rotating media to reduce seeks and the like.
> > > > 
> > > >         Sander
> > > 
> > > Such an optimization sounds valid on first sight. But re-think closely:
> > > how does the fs really know about seeks needed during some operation?
> > 
> > Well the FS makes a few assumptions (in the nonssd case).  First it
> > assumes the storage is not a memory device.  If things would fit in
> > memory we wouldn't need filesytems in the first place.
> 
> Ok, here is the bad news. This assumption everything from right to
> completely wrong, and you cannot really tell the mainstream answer.
> Two examples from opposite parts of the technology world:
> - History: way back in the 80's there was a 3rd party hardware for C=1541
> (floppy drive for C=64) that read in the complete floppy and served all
> incoming requests from the ram buffer. So your assumption can already be
> wrong for a trivial floppy drive from ancient times.

such assumption doesn't make it work slower on such device

> - Nowadays: being a linux installation today chances are that the matrix
> has you. Quite a lot of installations are virtualized. So your storage is
> a virtual one either, which means it is likely being a fs buffer from the
> host system, i.e. RAM.

Buffers use read_ahead and are smaller than the underlaying device, still, such 
assumption doesn't make the FS perform worse in this situation. 

> And sorry to say: "if things would fit in memory" you probably still need a
> fs simply because there is no actual way to organize data (be it
> executable or not) in RAM without a fs layer. You can't save data without
> an abstract file data type. To have one accessible you need a fs.

yes, that's why there is tmpfs, btrfs isn't meant to be all and end all as far 
as FSs go

> Btw the other way round is as interesting: there is currently no fs for
> linux that knows how to execute in place. Meaning if you really had only
> RAM and you have a fs to organize your data it would be just logical to
> have ways to _not_ load data (in other parts of the RAM), but to use it in
> its original storage (RAM-)space.

at least ext2 does support XIP on platform that support it...

> 
> > Then it assumes that adjacent blocks are cheap to read and blocks that
> > are far away are expensive to read.  Given expensive raid controllers,
> > cache, and everything else, you're correct that sometimes this
> > assumption is wrong.
> 
> As already mentioned this assumption may be completely wrong even without a
> raid controller, being within a virtual environment. Even far away blocks
> can be one byte away in the next fs buffer of the underlying host fs
> (assuming your device is in fact a file on the host;-).

and again, such assumption doesn't reduce the performance

> 
> >  But, on average seeking hurts.  Really a lot.
> 
> Yes, seeking hurts. But there is no way to know if there is seeking at all.
> On the other hand, if your storage is a netblock device seeking on the
> server is probably your smallest problem, compared to the network latency
> in between.

and because of that, there's read ahead and support for big packets on the TCP 
level, so the assumption does make the FS perform better with it than without 
it.


It's one of the assumptions that you _have_ to make, just like the assumption 
that the computer counts in binary, or there's more disk space than RAM. But 
those assumptions _don't_ make the performance (much) worse when they don't 
hold true for known devices that can impersonate rotating magnetic media.

> > We try to organize files such that files that are likely to be read
> > together are found together on disk.  Btrfs is fairly good at this
> > during file creation and not as good as ext*/xfs as files over
> > overwritten and modified again and again (due to cow).
> 
> You are basically saying that btrfs perfectly organizes write-once devices
> ;-)
> 
> > If you turn mount -o ssd on for your drive and do a test, you might not
> > notice much difference right away.  ssds tend to be pretty good right
> > out of the box.  Over time it tends to help, but it is a very hard thing
> > to benchmark in general.
> 
> Honestly, this sounds like "I give up" to me ;-)
> You just said that generally it is "very hard to benchmark". Which means
> "nobody can see or feel it in real world" in non-tech language.

No, it's not this. When a SSD is fresh, the undeling write leveling has many 
blocks to choose from, so it's blaizing fast. The same holds true when the 
test uses small amount of data (relative to SSD size).

"very hard to benchmark" means just that -- the benchmark is much more 
complicated, must take into account much more variables and takes much more 
time compared to rotating magnetic media benchmark.

To test SSD performance you need to benchmark both the speed of flash memory 
_and_ the speed and performance of the write leveling algorithm (because it 
shows its ugly head only after specific workloads or when all blocks are 
allocated), and that's non trivial to say the least. Add FS on top of it and 
you have a nice dissertation right there.

-- 
Hubert Kario
QBS - Quality Business Software
ul. Ksawerów 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to