Re: Filesystem overhead

Andrew W. Gaunt Mon, 04 Aug 2003 05:25:29 -0700

Ben, et.al.,

Your explaination has corrected some misconceptions I had regarding journaling filesystems. Thanks. I think I've gleaned that the journal is an "add before you subtract" kind of system, meaning you never put at risk information you don't have a copy of squirrled away somewhere else (just in case). Somehow, this reminds me of my workshop; only I do more adding than subtracting ;-)

I did read up the ext3 implementation a bit. It's basically ext2 with a journal file (/.journal) That is kind of neat and helps minimize the potential (as you point out below) for new bugs as much of the code is reused. An ext3 fielsystem can even mounted as ext2 if it is unmounted cleanly. Also, there are tools to add a journal to an ext2 filesystem essential converting it to ext3. Not to denegrate other journaling filesystems, but, it would seem ext3 is a nice way to go if your're already comfortable with ext2 and you want journalling.

--
____    __
| 0|___||.   Andrew Gaunt *nix Sys. Admin,, etc. Lucent Technologies
_| _| : : }   [EMAIL PROTECTED] - http://www-cde.mv.lucent.com/~quantum
-(O)-==-o\   [EMAIL PROTECTED] - http://www.gaunt.org

[EMAIL PROTECTED] wrote:

On Wed, 30 Jul 2003, at 8:35am, [EMAIL PROTECTED] wrote:

Very cool, that was revealing. Perhaps this discussion can evolve into how journalling (e.g. ext3, etc.) works and why it is good/bad. Anybody?

If a system crashes (software, hardware, power, whatever) in the middle of a write transaction, then it likely that the filesystem will be left in an inconsistent state. For that reason, many OSes will run a consistency check on a filesystem that was not unmounted cleanly before mounting it again. Most everyone here has probably seen "fsck" run after a crash for this reason.
 That consistency check can take quite a long time, especially on a large
filesystem.  If the filesystem is sufficiently large, the check time can be
hours.  Worse still, if the crash happened at just the right (or wrong) time,
it can cause logical filesystem damage (e.g., a corrupt directory), causing
additional data loss.
 To solve this problem, one can use a journaling filesystem.  A
journaling filesystem does not simply write changes to the disk.  First, it
writes the changes to a journal (sometimes called a "transaction log" or
just "log").  Then it writes the actual changes to the disk (sometimes
called "committing").  Finally, it updates the journal to note that the
changes were successfully written (sometimes called "checkpointing").
 Now, if the system crashes in the middle of a transaction, upon re-mount,
the system just has to look at the journal.  If a complete transaction is
present in the journal, but has not been checkpointed, the journal is
"played back" to ensure the filesystem is made consistent.  If an incomplete
transaction is present in the journal, it was never committed, and thus can
be discarded.
 Of course, none of this guarantees you won't lose data.  If a program was
in the middle of writing data to a file when the system crashed, chances
are, that file is now scrambled.  Journaling protects the filesystem itself
from damage, and avoids the need for a consistency after a crash.
 It is also important to understand the difference between journaling
*all* writes to a filesystem, and journaling just *metadata* writes.  The
term "metadata" means "data about data".  Things such as a file's name,
size, time it was last modified, the specific blocks on disk used to store
it, that sort of thing, is metadata.  The metadata is critical, because
corruption of a small amount of metadata can lead to the loss of large
amounts of file data.
Some journaling filesystems journal just metadata. This keeps the filesystem itself from becoming inconsistent in a crash, but may leave the file data itself corrupted. ReiserFS does this. Why journal just metadata? Because journaling everything can cause a big performance hit, and, as noted above, if the system crashed in the middle of a write, there is a good chance you've already lost data anyway.

Other filesystems journal all writes, or at least give you the option to. EXT3 is one such filesystem. This can prevent file corruption in the case where an "atomic" write of the file data was buffered in memory and being written to disk when the crash occurred.
 About the only real drawback to a journaling filesystem is the
performance hit.  You have to write everything to disk *twice*: Once to the
journal, and once to the actual filesystem.
 There are other drawbacks:  Journaling filesystems are more complex, so
are statistically more likely to have bugs in the implementation.  But a
non-journaling filesystem can have bugs, too, so I think the best answer is
just more through code review and more testing.  The journal also uses some
space on the disk.  But as the space used by the journal is typically
megabytes on a multi-gigabyte filesystem, the overhead is insignificant.
Finally, a journaling filesystem does not eliminate the need for "fsck" and similar programs. Inconsistencies can be introduced into a filesystem in other ways (such as bugs in the filesystem code or hardware problems). Since, with a journaling filesystem, "fsck" will normally *never* be run automatically by the system, it becomes a good idea to run an fsck on a periodic basis, "just in case". EXT2/3 even has a feature that will cause the filesystem to be automatically checked every X days or every Y mounts.

Hope this helps,

_______________________________________________
gnhlug-discuss mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Re: Filesystem overhead

Reply via email to