On Wed, 30 Jul 2003, at 8:35am, [EMAIL PROTECTED] wrote: > Very cool, that was revealing. Perhaps this discussion can evolve into how > journalling (e.g. ext3, etc.) works and why it is good/bad. Anybody?
If a system crashes (software, hardware, power, whatever) in the middle of a write transaction, then it likely that the filesystem will be left in an inconsistent state. For that reason, many OSes will run a consistency check on a filesystem that was not unmounted cleanly before mounting it again. Most everyone here has probably seen "fsck" run after a crash for this reason. That consistency check can take quite a long time, especially on a large filesystem. If the filesystem is sufficiently large, the check time can be hours. Worse still, if the crash happened at just the right (or wrong) time, it can cause logical filesystem damage (e.g., a corrupt directory), causing additional data loss. To solve this problem, one can use a journaling filesystem. A journaling filesystem does not simply write changes to the disk. First, it writes the changes to a journal (sometimes called a "transaction log" or just "log"). Then it writes the actual changes to the disk (sometimes called "committing"). Finally, it updates the journal to note that the changes were successfully written (sometimes called "checkpointing"). Now, if the system crashes in the middle of a transaction, upon re-mount, the system just has to look at the journal. If a complete transaction is present in the journal, but has not been checkpointed, the journal is "played back" to ensure the filesystem is made consistent. If an incomplete transaction is present in the journal, it was never committed, and thus can be discarded. Of course, none of this guarantees you won't lose data. If a program was in the middle of writing data to a file when the system crashed, chances are, that file is now scrambled. Journaling protects the filesystem itself from damage, and avoids the need for a consistency after a crash. It is also important to understand the difference between journaling *all* writes to a filesystem, and journaling just *metadata* writes. The term "metadata" means "data about data". Things such as a file's name, size, time it was last modified, the specific blocks on disk used to store it, that sort of thing, is metadata. The metadata is critical, because corruption of a small amount of metadata can lead to the loss of large amounts of file data. Some journaling filesystems journal just metadata. This keeps the filesystem itself from becoming inconsistent in a crash, but may leave the file data itself corrupted. ReiserFS does this. Why journal just metadata? Because journaling everything can cause a big performance hit, and, as noted above, if the system crashed in the middle of a write, there is a good chance you've already lost data anyway. Other filesystems journal all writes, or at least give you the option to. EXT3 is one such filesystem. This can prevent file corruption in the case where an "atomic" write of the file data was buffered in memory and being written to disk when the crash occurred. About the only real drawback to a journaling filesystem is the performance hit. You have to write everything to disk *twice*: Once to the journal, and once to the actual filesystem. There are other drawbacks: Journaling filesystems are more complex, so are statistically more likely to have bugs in the implementation. But a non-journaling filesystem can have bugs, too, so I think the best answer is just more through code review and more testing. The journal also uses some space on the disk. But as the space used by the journal is typically megabytes on a multi-gigabyte filesystem, the overhead is insignificant. Finally, a journaling filesystem does not eliminate the need for "fsck" and similar programs. Inconsistencies can be introduced into a filesystem in other ways (such as bugs in the filesystem code or hardware problems). Since, with a journaling filesystem, "fsck" will normally *never* be run automatically by the system, it becomes a good idea to run an fsck on a periodic basis, "just in case". EXT2/3 even has a feature that will cause the filesystem to be automatically checked every X days or every Y mounts. Hope this helps, -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do | | not represent the views or policy of any other person or organization. | | All information is provided without warranty of any kind. | _______________________________________________ gnhlug-discuss mailing list [EMAIL PROTECTED] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss