Mike Benoit wrote:
Your detailed explanation is appreciated David and while I'm far from a
file system expert, I believe you've overstated the negative effects
somewhat.

It sounds to me like you've gotten Reiser4's allocation process in
regards to wandering logs correct, from what I've read anyways, but I
think you've overstated its fragmentation disadvantage when compared
against other file systems.

I think the thing we need to keep in mind here is that fragmentation
isn't always a net loss. Depending on the workload, fragmentation (or at
least not tightly packing data) could actually be a gain. In cases where

defragmented != tightly packed.

you have files (like log files or database files) that constantly grow
over a long period of time, packing them tightly at regularly scheduled
intervals (or at all?) could cause more harm then good.

This is true...

Consider this scenario of two MySQL tables having rows inserted to each
one simultaneously, and lets also assume that the two tables were
tightly packed before we started the insert process.

1 = Data for Table1
2 = Data for Table2
Tightly packed:

111111111111222222222222----------------------------

Simultaneous inserts start:

1111111111112222222222221122112211221122------------


Allocate on flush alone would probably help this scenario immensely.

Yes, it would.  You'd end up with

1111111111112222222222221111111122222222------------

assuming they both fit into RAM. And of course they could later be repacked.

By the way, this is the NTFS approach to avoiding fragmentation -- try to avoid fragmenting anything below a certain block size. I, for one, would be perfectly happy if my large files were split up every 50 or 100 megs or so.

The problem is when you get tons of tiny files and metadata stored so horribly inefficiently that things like Native Command Queuing is actually a huge performance boost.

The other thing you need to keep in mind is that database files are like
their own little mini-file system. They have their own fragmentation
issues to deal with (especially PostgreSQL).

I'd rather not add to that. This is one reason to hate virtualization, by the way -- it's bad enough to have a fragmented NTFS on your Windows installation, but worse if the disk itself is a fragmented sparse file on Linux.

So in cases like you
described where you are overwriting data in the middle of a file,
Reiser4 may be poor at doing this specific operation compared to other
file systems, but just because you overwrite a row that appears to be in
the middle of a table doesn't mean that the data itself is actually in
the middle of the table. If your original row is 1K, and you try to
overwrite it with 4K of data, it most likely will be put at the end of
the file anyways, and the original 1K of data will be marked for
overwriting later on. Isn't this what myisampack is for?

If what you say is true, isn't myisampack also an issue here? Surely it doesn't write out an entirely separate copy of the file?

Anyway, the most common usage I can see for mysql would be overwriting a 1K row with another 1K row, or dropping a row, or adding a wholly new row. I may be a bit naive here...

But then, isn't there also some metadata somewhere which says things like how many rows you have in a given table?

And it's not just databases. Consider BitTorrent. The usual BitTorrent way of doing things is to create a sparse file, then fill it in randomly as you receive data. Only if you decide to allocate the whole file right away, instead of making it sparse, you gain nothing on Reiser4, since writes will be just as fragmented as if it was sparse.

Personally, I'd rather leave it as sparse, but repack everything later.

So while I think what you described is ultimately correct, I believe
extreme negative effects from it to be a corner case, and probably not
representative of the norm. I also believe that other Reiser4
improvements would outweigh this draw back to wandering logs, again in
average workloads.

Depends on your definition of average. I'm also speaking from experience. On Gentoo, /usr/portage started out being insanely fast on Reiser4, because it barely had to seek at all -- despite being about 145,000 small files. I think it was maybe half that when I first put it on r4, but it's more than twice as slow now, and you can hear it thrashing.

Now, the wandering logs did make the rsync process pretty fast -- the entire thing gets rsync'd against one of the Gentoo mirrors. For anyone using Debian, this is the equivalent of "apt-get update".

Only now, this rsync process is not only entirely disk-bound, it's something like 10x as slow. I have a gig of RAM, so at least it's fast once it's cached, but it's obviously horrendously fragmented. I am not sure if it's individual files or directories, but it could REALLY use a repack.

From what I remember of v3, it was never quite this bad, but it never started out as fast as it did on Reiser4.

This is why I'm curious to see some benchmarks, by the way -- all of this is subjective, and from memory.

Like you mentioned, if Reiser4 performance gets so poor without the
repacker, and Hans decides to charge for it, I think that will turn away
a lot potential users as they could feel that this is a type of
extortion. Get them hooked on something that only performs well for a
certain amount of time, then charge them money to keep it up. I also
think the community would write their own repacker pretty quick in
response.

Depends. Unfortunately, it's far more likely that the community would go "fsck this" and use XFS instead. Or JFS. Or any of the other filesystems that Linux has which don't need a repacker.

It would eventually get done by the community, but if it's taking the Namesys guys this long, and if they really expect to be able to make money off of it, it must not be as trivial as I think it is.

A much better approach in my opinion would be to have Reiser4 perform
well in the majority of cases without the repacker, and sell the
repacker to people who need that extra bit of performance. If I'm not
mistaken this is actually Hans intent.

Hans?

If Reiser4 does turn out to
perform much worse over time, I would expect Hans would consider it a
bug or design flaw and try to correct the problem however possible.

Or a design constraint...

But I guess only time will tell if this is true or not. ;)

I'll tell you now it's true.

To be fair, I'm not entirely up to date, but I've had a Reiser4 root partition for over a year now. It seems pretty decent for most things, but I've definitely noticed that anywhere like /usr/portage -- lots of files changing, lots staying the same, over time -- ends up pretty badly fragmented. Other examples would be games, especially Steam games and MMOs, played using Wine.

And I'd like some benchmarks, but I strongly suspect that this problem is pretty bad -- and that the more you'd think a particular workload is suited for Reiser4, the better the benchmarks are initially, the worse it will degrade if there's any writing going on.

Reply via email to