Mike Benoit wrote:
Your detailed explanation is appreciated David and while I'm far from a
file system expert, I believe you've overstated the negative effects
somewhat.
It sounds to me like you've gotten Reiser4's allocation process in
regards to wandering logs correct, from what I've read anyways, but I
think you've overstated its fragmentation disadvantage when compared
against other file systems.
I think the thing we need to keep in mind here is that fragmentation
isn't always a net loss. Depending on the workload, fragmentation (or at
least not tightly packing data) could actually be a gain. In cases where
defragmented != tightly packed.
you have files (like log files or database files) that constantly grow
over a long period of time, packing them tightly at regularly scheduled
intervals (or at all?) could cause more harm then good.
This is true...
Consider this scenario of two MySQL tables having rows inserted to each
one simultaneously, and lets also assume that the two tables were
tightly packed before we started the insert process.
1 = Data for Table1
2 = Data for Table2
Tightly packed:
111111111111222222222222----------------------------
Simultaneous inserts start:
1111111111112222222222221122112211221122------------
Allocate on flush alone would probably help this scenario immensely.
Yes, it would. You'd end up with
1111111111112222222222221111111122222222------------
assuming they both fit into RAM. And of course they could later be
repacked.
By the way, this is the NTFS approach to avoiding fragmentation -- try
to avoid fragmenting anything below a certain block size. I, for one,
would be perfectly happy if my large files were split up every 50 or 100
megs or so.
The problem is when you get tons of tiny files and metadata stored so
horribly inefficiently that things like Native Command Queuing is
actually a huge performance boost.
The other thing you need to keep in mind is that database files are like
their own little mini-file system. They have their own fragmentation
issues to deal with (especially PostgreSQL).
I'd rather not add to that. This is one reason to hate virtualization,
by the way -- it's bad enough to have a fragmented NTFS on your Windows
installation, but worse if the disk itself is a fragmented sparse file
on Linux.
So in cases like you
described where you are overwriting data in the middle of a file,
Reiser4 may be poor at doing this specific operation compared to other
file systems, but just because you overwrite a row that appears to be in
the middle of a table doesn't mean that the data itself is actually in
the middle of the table. If your original row is 1K, and you try to
overwrite it with 4K of data, it most likely will be put at the end of
the file anyways, and the original 1K of data will be marked for
overwriting later on. Isn't this what myisampack is for?
If what you say is true, isn't myisampack also an issue here? Surely it
doesn't write out an entirely separate copy of the file?
Anyway, the most common usage I can see for mysql would be overwriting a
1K row with another 1K row, or dropping a row, or adding a wholly new
row. I may be a bit naive here...
But then, isn't there also some metadata somewhere which says things
like how many rows you have in a given table?
And it's not just databases. Consider BitTorrent. The usual BitTorrent
way of doing things is to create a sparse file, then fill it in randomly
as you receive data. Only if you decide to allocate the whole file
right away, instead of making it sparse, you gain nothing on Reiser4,
since writes will be just as fragmented as if it was sparse.
Personally, I'd rather leave it as sparse, but repack everything later.
So while I think what you described is ultimately correct, I believe
extreme negative effects from it to be a corner case, and probably not
representative of the norm. I also believe that other Reiser4
improvements would outweigh this draw back to wandering logs, again in
average workloads.
Depends on your definition of average. I'm also speaking from
experience. On Gentoo, /usr/portage started out being insanely fast on
Reiser4, because it barely had to seek at all -- despite being about
145,000 small files. I think it was maybe half that when I first put it
on r4, but it's more than twice as slow now, and you can hear it thrashing.
Now, the wandering logs did make the rsync process pretty fast -- the
entire thing gets rsync'd against one of the Gentoo mirrors. For anyone
using Debian, this is the equivalent of "apt-get update".
Only now, this rsync process is not only entirely disk-bound, it's
something like 10x as slow. I have a gig of RAM, so at least it's fast
once it's cached, but it's obviously horrendously fragmented. I am not
sure if it's individual files or directories, but it could REALLY use a
repack.
From what I remember of v3, it was never quite this bad, but it never
started out as fast as it did on Reiser4.
This is why I'm curious to see some benchmarks, by the way -- all of
this is subjective, and from memory.
Like you mentioned, if Reiser4 performance gets so poor without the
repacker, and Hans decides to charge for it, I think that will turn away
a lot potential users as they could feel that this is a type of
extortion. Get them hooked on something that only performs well for a
certain amount of time, then charge them money to keep it up. I also
think the community would write their own repacker pretty quick in
response.
Depends. Unfortunately, it's far more likely that the community would
go "fsck this" and use XFS instead. Or JFS. Or any of the other
filesystems that Linux has which don't need a repacker.
It would eventually get done by the community, but if it's taking the
Namesys guys this long, and if they really expect to be able to make
money off of it, it must not be as trivial as I think it is.
A much better approach in my opinion would be to have Reiser4 perform
well in the majority of cases without the repacker, and sell the
repacker to people who need that extra bit of performance. If I'm not
mistaken this is actually Hans intent.
Hans?
If Reiser4 does turn out to
perform much worse over time, I would expect Hans would consider it a
bug or design flaw and try to correct the problem however possible.
Or a design constraint...
But I guess only time will tell if this is true or not. ;)
I'll tell you now it's true.
To be fair, I'm not entirely up to date, but I've had a Reiser4 root
partition for over a year now. It seems pretty decent for most things,
but I've definitely noticed that anywhere like /usr/portage -- lots of
files changing, lots staying the same, over time -- ends up pretty badly
fragmented. Other examples would be games, especially Steam games and
MMOs, played using Wine.
And I'd like some benchmarks, but I strongly suspect that this problem
is pretty bad -- and that the more you'd think a particular workload is
suited for Reiser4, the better the benchmarks are initially, the worse
it will degrade if there's any writing going on.