Re: reiser4 status (correction)

David Masover Fri, 21 Jul 2006 15:40:43 -0700

Mike Benoit wrote:

On Fri, 2006-07-21 at 16:06 -0500, David Masover wrote:

Mike Benoit wrote:
Tuning fsync will fix the last wart on Reiser4 as far as benchmarks are
concerned won't it? Right now Reiser4 looks excellent on the benchmarks
that don't use fsync often (mongo?), but last I recall the fsync
performance was so poor it overshadows the rest of the performance. It
would also probably be more useful to a much wider audience, especially
if Namesys decides to charge for the repacker.
If Namesys does decide to charge for the repacker, I'll have to considerwhether it's worth it to pay for it or to use XFS instead. Reiser4tends to become much more fragmented than most other Linux FSes --purely subjective, but probably true.


I would like to see some actual data on this. I haven't used Reiser4 for
over a year, and when I did it was only to benchmark it. But Reiser4
allocates on flush, so in theory this should decrease fragmentation, not
increase it. Due to this I question what you are _really_ seeing, or if
perhaps it is a bug in the allocator? Why would XFS or any other
multi-purpose file system resist fragmentation noticeably more then
Reiser4 does.

Maybe not XFS, but in any case, Reiser4 fragments more because of howits journaling works. It's the wandering logs.

Basically, when most Linux filesystems allocate space, they do try toallocate it contiguously, and it generally stays in the same place.With ext3, if you write to the middle of a file, or overwrite the entirefile, you're generally going to see your writes be written once to thejournal, and then again to the same place the file originally was.

Similarly, if you delete and then create a bunch of small files, you'regenerally going to see the new files created in the same place the oldfiles were.

With Reiser4, wandering logs means that rather than write to thejournal, if you write to the middle of the file, it writes that chunk tosomewhere else on the disk, and somehow gets it down to one atomicoperation where it simply changes the file to point to the new locationon disk. Which means if you have a filesystem that is physically laidout on disk like this (for simplicity, assume it only has a single file):


# is data
* is also data
- is free space

######*****########--------------

When you try to write in the middle (the '*' chars) -- let's say we'rechanging them to '%' chars, this happens:


######*****########%%%%%---------

Once that's done, the file is updated so that the middle of it points tothe fragment in the new location, and the old location is freed:


######-----########%%%%%---------

Keep in mind, because of lazy writes, it's much more likely for thewhole change to happen at once. Here's another example:


#####------------

Let's say we just want to overwrite the file with another one of thesame length:


#####%%%%%-------

then, commit the transaction:

-----%%%%%-------

You see the problem? You've now split the free space in half.Realistically, of course, it wouldn't be by halves, but you're basicallyinserting random air holes all over the place, and your FS is becomingmore like foam, taking up more of the free space, until you can nolonger use the free space.... In the above example, if we then have tocome write some huge file, it looks like this:


*****%%%%%*******

Split right in half. Now imagine this effect multiplied by hundreds orthousands of files, over time...

This is why Reiser4 needs a repacker. While it's fine for larger files-- I believe after a certain point, it will write twice, so looking atour first example:



######*****########--------------

Write to a new, temporary place:

######*****########%%%%%---------

Write back to the original place:

######%%%%%########%%%%%---------

Complete the transaction and free the temporary space:

######%%%%%########--------------

This technique is what other journaling filesystems use, and it alsomeans that writing is literally twice as slow as on a non-journalingfilesystem, or on one with wandering logs like Reiser4. But, it's apractical necessity when you're dealing with some 300 gig MySQL databaseof which only small 10k chunks are changing. Taking twice as long on a10k chunk won't kill anyone, but fragmenting your 300 gig database (on a320 gig partition) will kill your performance, and will be verydifficult to defragment.

But on smaller files, it would be very beneficial if we could allow theFS to slowly fragment (to foam-ify, if you will) and defrag once a week.The amount of speed gained in each write -- and read, if it's notgetting too awful during that week -- definitely makes up for having tospend an hour or so defragmenting, especially if the FS can be online atthe time.

And you can probably figure out an optimal time to wait beforedefragmenting, since your biggest fragmentation problems happen when thechunk of contiguous space at the end of the disk disappears, and all ofyour free space is scattered (fragmented) throughout the disk.

Anyway, that's why. If you disable the wandering log behavior, yourwrite performance drops in half. If you don't have a repacker, your FSbecomes very fragmented, very fast.


I apologize for my poor ASCII art, especially if I'm dead wrong...

No Linux file system that I'm aware of has a defragmentor, but they DO
become fragmented, just not near as bad as FAT32 used to when MS created
their defragmentor. The highest "non-contiguous" percent I've seen with
EXT3 is about 12%, FAT32 I have seen over 50%, and NTFS over 30%. In

I'd like to see some numbers on Reiser4, then. Maybe a formalfragmentation benchmark?

Re: reiser4 status (correction)

Reply via email to