Re: where is the journal kept?
David Masover wrote: > Shawn Rutledge wrote: > > >On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote: > > >>Reiser4 does a very nice job of packing the tree tightly, which is > >>independent of seeks. Ditto for compression plugin. > >>He merely needs to ignore some code, he is not harmed by it. If he > >>wants to write a new block allocator, sure, why not, we have allocator > >>plugins yes? His will just be simpler. > > > >But the most important thing is to reduce the number of writes as low > >as possible. > > > Something Reiser4 does very well. If you have enough RAM, it's possible > to avoid any reads/writes at all -- given enough RAM, it behaves as a > ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount > tmpfs over /dev. > > One other thing you might try is disabling the write-twice behavior. > Currently, if you've got a huge, fairly well-sorted file that you're > making lots of tiny writes to, such as a database, it makes sense to > write twice to keep the file from getting fragmented. But, > fragmentation isn't nearly as much an issue on truly random-access > media, so you'd want the default small-file behavior to be used > everywhere -- first write the data to the new location, then atomically > update the pointer to it as you deallocate the old location. > > Am I right about this? I'm not feeling very lucid today... Yes, but there is not a lot of write twice for most usage patterns Hans
Re: where is the journal kept?
On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote: > Shawn Rutledge wrote: > > On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote: > >>Something Reiser4 does very well. If you have enough RAM, it's possible > >>to avoid any reads/writes at all -- given enough RAM, it behaves as a > > > > Well that's cool if it's true. But IMO for this application it ought > > to have a deadline - after the writes have been pending for 10 seconds > > or so, go ahead and commit all of them, in case I forget to unmount > > before I remove the card. > > Not that it'd be a bad feature, but if you forget to unmount before you > remove the card, you're going to lose something. Even Windows now has a > form of unmounting -- you have to click "safely remove this device" > before you take a USB keychain out. And people often don't bother. For one thing, half the time it fails with some sort of error (some process still has a file open - gee, what do I do about that? I didn't ask it to hold that file open! Well nevermind... ) It doesn't have to be that way. That's probably why DOS didn't usually have any write caching at all on floppies, so that you could always safely remove a floppy whenever the access LED was not lit. This made it OK for floppy drives to have mechanical eject buttons. Apple came up with the alternative idea of having every kind of media slot motorized, so that you must use the software to do the eject because it's impossible to do it mechanically (without a paperclip, at least). This was a good idea but the existing CF slots are more problematic than floppies or CDs - they have no motorized eject, no access LED and no lock. (Although I had a good laugh when I discovered that some Powerbooks do have motorized PCMCIA slots!) So I think having a hard "write deadline" for removable media is a reasonable compromise, isn't it? It reduces the chances of corruption. Maybe it could even automatically unmount after the writes are committed and nobody has any files open; but cache the metadata so you can still see the files, and automatically remount if anybody opens a file. If the card is removed, hopefully the kernel can detect it and send a message to the FS implementation to flush the cache so that you no longer see the metadata. Supermount tried to achieve some of this, I think. The idea of having to umount stuff yourself is one of the biggest usability problems in general, on both Windows and Linux, and it just doesn't have to be that way. > >>One other thing you might try is disabling the write-twice behavior. > >>Currently, if you've got a huge, fairly well-sorted file that you're > >>making lots of tiny writes to, such as a database, it makes sense to > >>write twice to keep the file from getting fragmented. But, > >>fragmentation isn't nearly as much an issue on truly random-access > >>media, so you'd want the default small-file behavior to be used > >>everywhere -- first write the data to the new location, then atomically > >>update the pointer to it as you deallocate the old location. > > > > > > What would you change to do that? > > Because you want to write less. Normally, when there's a big file, I was asking how. What files do you change? Is it a source change, or just a parameter somewhere? > everything gets written twice, once to the "journal", then once back to > the file, so that the file stays in the same place on disk and doesn't > get fragmented. > > But if we're talking about a flash device, fragmentation doesn't matter > so much, and you want to minimize the number of writes, so if you have a > small write in the middle of a big file, you write that once to the > "journal", and that's it -- once it's been successfully written, that > chunk of "journal" becomes the new location for that chunk of the file. Yeah that would be a good approach as long as it doesn't cost too much efficiency (what if you have changed it multiple times? Are there multiple journal entries that override each other?) > Obviously, you wouldn't have this behavior turned on by default. On > desktop machines, you want to let the FS decide when to write twice. > But on a flash device, at least as a mount option, you want to force > everything to be written at most once unless something fails. Yep. Maybe there is a way to detect what kind of media it is, too, so you don't have to use the mount parameter most of the time. The default should be the correct one for the kind of media that it is.
Re: where is the journal kept?
Shawn Rutledge wrote: On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote: Something Reiser4 does very well. If you have enough RAM, it's possible to avoid any reads/writes at all -- given enough RAM, it behaves as a Well that's cool if it's true. But IMO for this application it ought to have a deadline - after the writes have been pending for 10 seconds or so, go ahead and commit all of them, in case I forget to unmount before I remove the card. Not that it'd be a bad feature, but if you forget to unmount before you remove the card, you're going to lose something. Even Windows now has a form of unmounting -- you have to click "safely remove this device" before you take a USB keychain out. ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount tmpfs over /dev. One other thing you might try is disabling the write-twice behavior. Currently, if you've got a huge, fairly well-sorted file that you're making lots of tiny writes to, such as a database, it makes sense to write twice to keep the file from getting fragmented. But, fragmentation isn't nearly as much an issue on truly random-access media, so you'd want the default small-file behavior to be used everywhere -- first write the data to the new location, then atomically update the pointer to it as you deallocate the old location. What would you change to do that? Because you want to write less. Normally, when there's a big file, everything gets written twice, once to the "journal", then once back to the file, so that the file stays in the same place on disk and doesn't get fragmented. But if we're talking about a flash device, fragmentation doesn't matter so much, and you want to minimize the number of writes, so if you have a small write in the middle of a big file, you write that once to the "journal", and that's it -- once it's been successfully written, that chunk of "journal" becomes the new location for that chunk of the file. Obviously, you wouldn't have this behavior turned on by default. On desktop machines, you want to let the FS decide when to write twice. But on a flash device, at least as a mount option, you want to force everything to be written at most once unless something fails.
Re: where is the journal kept?
On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote: > Something Reiser4 does very well. If you have enough RAM, it's possible > to avoid any reads/writes at all -- given enough RAM, it behaves as a Well that's cool if it's true. But IMO for this application it ought to have a deadline - after the writes have been pending for 10 seconds or so, go ahead and commit all of them, in case I forget to unmount before I remove the card. > ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount > tmpfs over /dev. > > One other thing you might try is disabling the write-twice behavior. > Currently, if you've got a huge, fairly well-sorted file that you're > making lots of tiny writes to, such as a database, it makes sense to > write twice to keep the file from getting fragmented. But, > fragmentation isn't nearly as much an issue on truly random-access > media, so you'd want the default small-file behavior to be used > everywhere -- first write the data to the new location, then atomically > update the pointer to it as you deallocate the old location. What would you change to do that?
Re: where is the journal kept?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Shawn Rutledge wrote: > On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote: > >>Reiser4 does a very nice job of packing the tree tightly, which is >>independent of seeks. Ditto for compression plugin. >>He merely needs to ignore some code, he is not harmed by it. If he >>wants to write a new block allocator, sure, why not, we have allocator >>plugins yes? His will just be simpler. > > > But the most important thing is to reduce the number of writes as low > as possible. Something Reiser4 does very well. If you have enough RAM, it's possible to avoid any reads/writes at all -- given enough RAM, it behaves as a ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount tmpfs over /dev. One other thing you might try is disabling the write-twice behavior. Currently, if you've got a huge, fairly well-sorted file that you're making lots of tiny writes to, such as a database, it makes sense to write twice to keep the file from getting fragmented. But, fragmentation isn't nearly as much an issue on truly random-access media, so you'd want the default small-file behavior to be used everywhere -- first write the data to the new location, then atomically update the pointer to it as you deallocate the old location. Am I right about this? I'm not feeling very lucid today... -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQwJo9HgHNmZLgCUhAQK92BAAk68ZQsOMyS3TTKmu3//gkC+/RHzrNXey UR2YxH+FdC7BDR3Py2Mmot3F7Ch5ykdT3gRzbMQ3vTQoRDlcmIVOuQ4VqfJ4vN2a CVWVxc0X38fF7QVMoHnlwX6mRlr7PTd64BZEmYJ8cdDkPmAUrlCe0/+vLmorc2Oo 3Wwz3rtLotu+7Z4sjWvboqyXFoX8hvt0iSo+45UnQI7bNwdbWCpFZBTsbU9SsI4J NnplO/IG1d5jQq+Vdmo7lL0XB+Zv71s9u0l/QNe5eahLNUfzghrTVTZSpJT/+h3b xSPMy2xVurG7p2jp3Rg8sN0YMiAGaQtVr9yhvcinZnmVOY9HK272Epihg3eUHYIX LiKkCLblLaUiwJs6DKSCoCsF3oWELH/SsYC31R4fSWEVehObRIoU/Kz4TRtOeA0G SakH4s+Hiju4GPTL7AKaMMi2TbeZgGg7BlZof+zC5HgQf7d2t0/sb2613X0R5agH tQyiX7PDg+zQr4KXC8Rb3kwwcgURMcAUWw6gcJ+hTbFrLtlkvK9k1VJNmllH7j8J aYTsz3FCgGo5a7zq4iogFt7GgnHeTJ3ErxjohK0uE3J10l68DvVa3TqaanGXdhDK pqhfduXM+pt8FLoXuHVkp1c4VpZMzj4WWKPNOhH+FTxCVJee6ZeMckV+9AUbZkkW ja3lcplEEMM= =B2OX -END PGP SIGNATURE-
Re: where is the journal kept?
On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote: > Reiser4 does a very nice job of packing the tree tightly, which is > independent of seeks. Ditto for compression plugin. > He merely needs to ignore some code, he is not harmed by it. If he > wants to write a new block allocator, sure, why not, we have allocator > plugins yes? His will just be simpler. But the most important thing is to reduce the number of writes as low as possible.
Re: where is the journal kept?
Shawn Rutledge wrote: >On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote: > > >>I must be misunderstanding something. Half of reiser4' complexity is about to >>make disk writes and reads >>more sequential to decrease number of hard disk head seeks. >>IMHO, filesystem for storage devices which do not have moving mechanical >>parts can go better without such code. >> >> You exaggerate. Reiser4 does a very nice job of packing the tree tightly, which is independent of seeks. Ditto for compression plugin. He merely needs to ignore some code, he is not harmed by it. If he wants to write a new block allocator, sure, why not, we have allocator plugins yes? His will just be simpler. > >Well so far the choices are FAT and EXT2, both of which have >well-known limitations. I'm trying to get set up to compile things on >the zaurus, and installing gcc requires symlinks so I had to quit >using FAT. > >JFFS2 is used for the internal flash but is overkill for a CF card (as >well as not designed for it, and thus would require a lot of changes >to make it possible). > >What I want is an FS that never loses data because of a crash (writes >should be atomic) and doesn't ever need to be fsck'd, yet still is >very fast and caches writes and accumulates writes, so that repeated >writes to the same file result in only one physical write. > > > >
Re: where is the journal kept?
On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote: > I must be misunderstanding something. Half of reiser4' complexity is about to > make disk writes and reads > more sequential to decrease number of hard disk head seeks. > IMHO, filesystem for storage devices which do not have moving mechanical > parts can go better without such code. Well so far the choices are FAT and EXT2, both of which have well-known limitations. I'm trying to get set up to compile things on the zaurus, and installing gcc requires symlinks so I had to quit using FAT. JFFS2 is used for the internal flash but is overkill for a CF card (as well as not designed for it, and thus would require a lot of changes to make it possible). What I want is an FS that never loses data because of a crash (writes should be atomic) and doesn't ever need to be fsck'd, yet still is very fast and caches writes and accumulates writes, so that repeated writes to the same file result in only one physical write.
Re: where is the journal kept?
Hello Shawn Rutledge wrote: On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote: Pat Double wrote: ... reiser4 has no filesystem area dedicated for journal. Instead it allocats log(journal) records dynamically. it is transactional. If that's the case, why this option in debug.reiser4 : Usage: /sbin/debugfs.reiser4 [ options ] FILE Print options: -j, --print-journal prints journal. yes, this prints reiser4 "wandering logs". Suppose a new file was being created; when the journal entry for the new file actually gets converted to an inode, can it be done without rewriting the whole file? What if the program was only writing part of the file? Is journalling involved in that? If so, does it duplicate the whole file? It doesn't duplicate it more than once does it? (first into the journal, then re-write it on the filesystem later)? When one modifies a filesystem (creat a file or write to a file or anything else) all blocks are modified in memory. All related changes get in one atom. When changes are old enough to be flushed to disk - atom commit is called. Atom commit starts with "flushing". This procedure decides whether changed block gets to relocate of overwrite set. Blocks of relocate set are written to disk once only. Blocks of relocate set are to written twice: to log and in place. What I'm getting at is whether Reiser4 is any better for a flash-based filesystem than Reiser3? I was interested in using it on a CF card in my Zaurus, but the guys on the oe list are saying that journalling filesystems are bad in general on flash, because they involve rewriting the same areas of the flash (where the journal is) over and over. I was thinking that CF and SD cards both use smart block allocation so that actually the writes are getting spread around on the physical flash memory; but it's still bad if every time you write a file, it's getting written 2 or more times. If you have gotten it down to just writing the main part of the file once, and then re-writing some sort of header from the journal into the "real" fs, then the reduction of the life of the flash might be tolerable. I must be misunderstanding something. Half of reiser4' complexity is about to make disk writes and reads more sequential to decrease number of hard disk head seeks. IMHO, filesystem for storage devices which do not have moving mechanical parts can go better without such code. The other problem with using it on the Z is that busybox's version of mount does not support ReiserFS, so I'd have to compile another version for it. (Not a big problem, but I wonder why busybox doesn't support it in the first place.)
Re: where is the journal kept?
On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote: > Pat Double wrote: ... > reiser4 has no filesystem area dedicated for journal. > Instead it allocats log(journal) records dynamically. > > it is transactional. > > If that's the case, why this option in debug.reiser4 : > > > > Usage: /sbin/debugfs.reiser4 [ options ] FILE > > Print options: > >-j, --print-journal prints journal. > > > > yes, this prints reiser4 "wandering logs". Suppose a new file was being created; when the journal entry for the new file actually gets converted to an inode, can it be done without rewriting the whole file? What if the program was only writing part of the file? Is journalling involved in that? If so, does it duplicate the whole file? It doesn't duplicate it more than once does it? (first into the journal, then re-write it on the filesystem later)? What I'm getting at is whether Reiser4 is any better for a flash-based filesystem than Reiser3? I was interested in using it on a CF card in my Zaurus, but the guys on the oe list are saying that journalling filesystems are bad in general on flash, because they involve rewriting the same areas of the flash (where the journal is) over and over. I was thinking that CF and SD cards both use smart block allocation so that actually the writes are getting spread around on the physical flash memory; but it's still bad if every time you write a file, it's getting written 2 or more times. If you have gotten it down to just writing the main part of the file once, and then re-writing some sort of header from the journal into the "real" fs, then the reduction of the life of the flash might be tolerable. The other problem with using it on the Z is that busybox's version of mount does not support ReiserFS, so I'd have to compile another version for it. (Not a big problem, but I wonder why busybox doesn't support it in the first place.)
Re: where is the journal kept?
Vladimir V. Saveliev wrote: Hello Pat Double wrote: Stupid, question. I thought that reiser4 had no journal, reiser4 has no filesystem area dedicated for journal. Instead it allocats log(journal) records dynamically. it is transactional. If that's the case, why this option in debug.reiser4 : Usage: /sbin/debugfs.reiser4 [ options ] FILE Print options: -j, --print-journal prints journal. yes, this prints reiser4 "wandering logs". Not to beat a dead horse, but it might make things go faster if people thought of the journal as an invisible, inaccessible file, because everyone knows that files can be fragmented, and everyone also knows that fragmented files can be reassembled and serialized (cat file). Only possible difference is, I don't know if the reiser4 log comes out of debugfs in any particular order. It's an oversimplification, but probably a useful one.
Re: where is the journal kept?
Hello Pat Double wrote: Stupid, question. I thought that reiser4 had no journal, reiser4 has no filesystem area dedicated for journal. Instead it allocats log(journal) records dynamically. it is transactional. If that's the case, why this option in debug.reiser4 : Usage: /sbin/debugfs.reiser4 [ options ] FILE Print options: -j, --print-journal prints journal. yes, this prints reiser4 "wandering logs". On Monday 15 August 2005 07:58 am, Vladimir V. Saveliev wrote: Hello Payal Rathod wrote: On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote: Each journaling filesystem keeps its journal by its own way. In reiserfs by default journal is kept in statically pre-allocated on mkfs time 8192 blocks (4096 bytes each) starting from 18-th block. Is it kept in some sort of file? No. That area of filesystem does not belong to any files stored on that filesystem. You can read it from device directly: dd if=/dev/hda1 bs=4096 count=8192. I am not sure that it can be of any interest, because you will see just binary data. I mean can I see the file contents using normal UNIX tools? If yes how do I do it? You can use debugreiserfs -j /dev/hda1 /dev/hda1 to see journal content. This will decode binary data into human readable form. With warm regards, -Payal
Re: where is the journal kept?
Stupid, question. I thought that reiser4 had no journal, it is transactional. If that's the case, why this option in debug.reiser4 : Usage: /sbin/debugfs.reiser4 [ options ] FILE Print options: -j, --print-journal prints journal. On Monday 15 August 2005 07:58 am, Vladimir V. Saveliev wrote: > Hello > > Payal Rathod wrote: > > On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote: > >>Each journaling filesystem keeps its journal by its own way. > >>In reiserfs by default journal is kept in statically pre-allocated on > >> mkfs time 8192 blocks (4096 bytes each) starting from 18-th block. > > > > Is it kept in some sort of file? > > No. That area of filesystem does not belong to any files stored on that > filesystem. You can read it from device directly: dd if=/dev/hda1 bs=4096 > count=8192. I am not sure that it can be of any interest, because you will > see just binary data. > > I mean can I see the file contents > > > using normal UNIX tools? If yes how do I do it? > > You can use > debugreiserfs -j /dev/hda1 /dev/hda1 > to see journal content. This will decode binary data into human readable > form. > > > With warm regards, > > -Payal -- Pat Double, [EMAIL PROTECTED] "In the beginning God created the heaven and the earth."
Re: where is the journal kept?
Hello Payal Rathod wrote: On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote: Each journaling filesystem keeps its journal by its own way. In reiserfs by default journal is kept in statically pre-allocated on mkfs time 8192 blocks (4096 bytes each) starting from 18-th block. Is it kept in some sort of file? No. That area of filesystem does not belong to any files stored on that filesystem. You can read it from device directly: dd if=/dev/hda1 bs=4096 count=8192. I am not sure that it can be of any interest, because you will see just binary data. I mean can I see the file contents using normal UNIX tools? If yes how do I do it? You can use debugreiserfs -j /dev/hda1 /dev/hda1 to see journal content. This will decode binary data into human readable form. With warm regards, -Payal
Re: where is the journal kept?
On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote: > Each journaling filesystem keeps its journal by its own way. > In reiserfs by default journal is kept in statically pre-allocated on mkfs > time 8192 blocks (4096 bytes each) starting from 18-th block. Is it kept in some sort of file? I mean can I see the file contents using normal UNIX tools? If yes how do I do it? With warm regards, -Payal
Re: where is the journal kept?
Hello Payal Rathod wrote: Hi, I have been reading about journalling filesystems but still don't get where exactly the journal is kept phyically on the harddisk. Can anyone help me with basics of this? Each journaling filesystem keeps its journal by its own way. In reiserfs by default journal is kept in statically pre-allocated on mkfs time 8192 blocks (4096 bytes each) starting from 18-th block.