Re: where is the journal kept?

2005-08-16 Thread Hans Reiser
David Masover wrote:

> Shawn Rutledge wrote:
>
> >On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote:
>
> >>Reiser4 does a very nice job of packing the tree tightly, which is
> >>independent of seeks.  Ditto for compression plugin.
> >>He merely needs to ignore some code, he is not harmed by it.  If he
> >>wants to write a new block allocator, sure, why not, we have allocator
> >>plugins yes?   His will just be simpler.
>
>
> >But the most important thing is to reduce the number of writes as low
> >as possible.
>
>
> Something Reiser4 does very well.  If you have enough RAM, it's possible
> to avoid any reads/writes at all -- given enough RAM, it behaves as a
> ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount
> tmpfs over /dev.
>
> One other thing you might try is disabling the write-twice behavior.
> Currently, if you've got a huge, fairly well-sorted file that you're
> making lots of tiny writes to, such as a database, it makes sense to
> write twice to keep the file from getting fragmented.  But,
> fragmentation isn't nearly as much an issue on truly random-access
> media, so you'd want the default small-file behavior to be used
> everywhere -- first write the data to the new location, then atomically
> update the pointer to it as you deallocate the old location.
>
> Am I right about this?  I'm not feeling very lucid today...

Yes, but there is not a lot of write twice for most usage patterns

Hans


Re: where is the journal kept?

2005-08-16 Thread Shawn Rutledge
On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote:
> Shawn Rutledge wrote:
> > On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote:
> >>Something Reiser4 does very well.  If you have enough RAM, it's possible
> >>to avoid any reads/writes at all -- given enough RAM, it behaves as a
> >
> > Well that's cool if it's true.  But IMO for this application it ought
> > to have a deadline - after the writes have been pending for 10 seconds
> > or so, go ahead and commit all of them, in case I forget to unmount
> > before I remove the card.
> 
> Not that it'd be a bad feature, but if you forget to unmount before you
> remove the card, you're going to lose something.  Even Windows now has a
>   form of unmounting -- you have to click "safely remove this device"
> before you take a USB keychain out.

And people often don't bother.  For one thing, half the time it fails
with some sort of error (some process still has a file open - gee,
what do I do about that?  I didn't ask it to hold that file open! 
Well nevermind... )

It doesn't have to be that way.  That's probably why DOS didn't
usually have any write caching at all on floppies, so that you could
always safely remove a floppy whenever the access LED was not lit. 
This made it OK for floppy drives to have mechanical eject buttons.
Apple came up with the alternative idea of having every kind of media
slot motorized, so that you must use the software to do the eject
because it's impossible to do it mechanically (without a paperclip, at
least).  This was a good idea but the existing CF slots are more
problematic than floppies or CDs - they have no motorized eject, no
access LED and no lock.  (Although I had a good laugh when I
discovered that some Powerbooks do have motorized PCMCIA slots!)

So I think having a hard "write deadline" for removable media is a
reasonable compromise, isn't it?  It reduces the chances of
corruption.

Maybe it could even automatically unmount after the writes are
committed and nobody has any files open; but cache the metadata so you
can still see the files, and automatically remount if anybody opens a
file.  If the card is removed, hopefully the kernel can detect it and
send a message to the FS implementation to flush the cache so that you
no longer see the metadata.  Supermount tried to achieve some of this,
I think.  The idea of having to umount stuff yourself is one of the
biggest usability problems in general, on both Windows and Linux, and
it just doesn't have to be that way.

> >>One other thing you might try is disabling the write-twice behavior.
> >>Currently, if you've got a huge, fairly well-sorted file that you're
> >>making lots of tiny writes to, such as a database, it makes sense to
> >>write twice to keep the file from getting fragmented.  But,
> >>fragmentation isn't nearly as much an issue on truly random-access
> >>media, so you'd want the default small-file behavior to be used
> >>everywhere -- first write the data to the new location, then atomically
> >>update the pointer to it as you deallocate the old location.
> >
> >
> > What would you change to do that?
> 
> Because you want to write less.  Normally, when there's a big file,

I was asking how.  What files do you change?  Is it a source change,
or just a parameter somewhere?

> everything gets written twice, once to the "journal", then once back to
> the file, so that the file stays in the same place on disk and doesn't
> get fragmented.
> 
> But if we're talking about a flash device, fragmentation doesn't matter
> so much, and you want to minimize the number of writes, so if you have a
> small write in the middle of a big file, you write that once to the
> "journal", and that's it -- once it's been successfully written, that
> chunk of "journal" becomes the new location for that chunk of the file.

Yeah that would be a good approach as long as it doesn't cost too much
efficiency (what if you have changed it multiple times?  Are there
multiple journal entries that override each other?)

> Obviously, you wouldn't have this behavior turned on by default.  On
> desktop machines, you want to let the FS decide when to write twice.
> But on a flash device, at least as a mount option, you want to force
> everything to be written at most once unless something fails.

Yep.

Maybe there is a way to detect what kind of media it is, too, so you
don't have to use the mount parameter most of the time.  The default
should be the correct one for the kind of media that it is.


Re: where is the journal kept?

2005-08-16 Thread David Masover



Shawn Rutledge wrote:

On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote:


Something Reiser4 does very well.  If you have enough RAM, it's possible
to avoid any reads/writes at all -- given enough RAM, it behaves as a



Well that's cool if it's true.  But IMO for this application it ought
to have a deadline - after the writes have been pending for 10 seconds
or so, go ahead and commit all of them, in case I forget to unmount
before I remove the card.


Not that it'd be a bad feature, but if you forget to unmount before you 
remove the card, you're going to lose something.  Even Windows now has a 
 form of unmounting -- you have to click "safely remove this device" 
before you take a USB keychain out.



ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount
tmpfs over /dev.

One other thing you might try is disabling the write-twice behavior.
Currently, if you've got a huge, fairly well-sorted file that you're
making lots of tiny writes to, such as a database, it makes sense to
write twice to keep the file from getting fragmented.  But,
fragmentation isn't nearly as much an issue on truly random-access
media, so you'd want the default small-file behavior to be used
everywhere -- first write the data to the new location, then atomically
update the pointer to it as you deallocate the old location.



What would you change to do that?


Because you want to write less.  Normally, when there's a big file, 
everything gets written twice, once to the "journal", then once back to 
the file, so that the file stays in the same place on disk and doesn't 
get fragmented.


But if we're talking about a flash device, fragmentation doesn't matter 
so much, and you want to minimize the number of writes, so if you have a 
small write in the middle of a big file, you write that once to the 
"journal", and that's it -- once it's been successfully written, that 
chunk of "journal" becomes the new location for that chunk of the file.


Obviously, you wouldn't have this behavior turned on by default.  On 
desktop machines, you want to let the FS decide when to write twice. 
But on a flash device, at least as a mount option, you want to force 
everything to be written at most once unless something fails.




Re: where is the journal kept?

2005-08-16 Thread Shawn Rutledge
On 8/16/05, David Masover <[EMAIL PROTECTED]> wrote:
> Something Reiser4 does very well.  If you have enough RAM, it's possible
> to avoid any reads/writes at all -- given enough RAM, it behaves as a

Well that's cool if it's true.  But IMO for this application it ought
to have a deadline - after the writes have been pending for 10 seconds
or so, go ahead and commit all of them, in case I forget to unmount
before I remove the card.

> ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount
> tmpfs over /dev.
> 
> One other thing you might try is disabling the write-twice behavior.
> Currently, if you've got a huge, fairly well-sorted file that you're
> making lots of tiny writes to, such as a database, it makes sense to
> write twice to keep the file from getting fragmented.  But,
> fragmentation isn't nearly as much an issue on truly random-access
> media, so you'd want the default small-file behavior to be used
> everywhere -- first write the data to the new location, then atomically
> update the pointer to it as you deallocate the old location.

What would you change to do that?


Re: where is the journal kept?

2005-08-16 Thread David Masover
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Shawn Rutledge wrote:
> On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote:
> 
>>Reiser4 does a very nice job of packing the tree tightly, which is
>>independent of seeks.  Ditto for compression plugin.
>>He merely needs to ignore some code, he is not harmed by it.  If he
>>wants to write a new block allocator, sure, why not, we have allocator
>>plugins yes?   His will just be simpler.
> 
> 
> But the most important thing is to reduce the number of writes as low
> as possible.

Something Reiser4 does very well.  If you have enough RAM, it's possible
to avoid any reads/writes at all -- given enough RAM, it behaves as a
ramdisk, which is why I wish I knew how to tell Gentoo to *not* mount
tmpfs over /dev.

One other thing you might try is disabling the write-twice behavior.
Currently, if you've got a huge, fairly well-sorted file that you're
making lots of tiny writes to, such as a database, it makes sense to
write twice to keep the file from getting fragmented.  But,
fragmentation isn't nearly as much an issue on truly random-access
media, so you'd want the default small-file behavior to be used
everywhere -- first write the data to the new location, then atomically
update the pointer to it as you deallocate the old location.

Am I right about this?  I'm not feeling very lucid today...
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQIVAwUBQwJo9HgHNmZLgCUhAQK92BAAk68ZQsOMyS3TTKmu3//gkC+/RHzrNXey
UR2YxH+FdC7BDR3Py2Mmot3F7Ch5ykdT3gRzbMQ3vTQoRDlcmIVOuQ4VqfJ4vN2a
CVWVxc0X38fF7QVMoHnlwX6mRlr7PTd64BZEmYJ8cdDkPmAUrlCe0/+vLmorc2Oo
3Wwz3rtLotu+7Z4sjWvboqyXFoX8hvt0iSo+45UnQI7bNwdbWCpFZBTsbU9SsI4J
NnplO/IG1d5jQq+Vdmo7lL0XB+Zv71s9u0l/QNe5eahLNUfzghrTVTZSpJT/+h3b
xSPMy2xVurG7p2jp3Rg8sN0YMiAGaQtVr9yhvcinZnmVOY9HK272Epihg3eUHYIX
LiKkCLblLaUiwJs6DKSCoCsF3oWELH/SsYC31R4fSWEVehObRIoU/Kz4TRtOeA0G
SakH4s+Hiju4GPTL7AKaMMi2TbeZgGg7BlZof+zC5HgQf7d2t0/sb2613X0R5agH
tQyiX7PDg+zQr4KXC8Rb3kwwcgURMcAUWw6gcJ+hTbFrLtlkvK9k1VJNmllH7j8J
aYTsz3FCgGo5a7zq4iogFt7GgnHeTJ3ErxjohK0uE3J10l68DvVa3TqaanGXdhDK
pqhfduXM+pt8FLoXuHVkp1c4VpZMzj4WWKPNOhH+FTxCVJee6ZeMckV+9AUbZkkW
ja3lcplEEMM=
=B2OX
-END PGP SIGNATURE-


Re: where is the journal kept?

2005-08-16 Thread Shawn Rutledge
On 8/16/05, Hans Reiser <[EMAIL PROTECTED]> wrote:
> Reiser4 does a very nice job of packing the tree tightly, which is
> independent of seeks.  Ditto for compression plugin.
> He merely needs to ignore some code, he is not harmed by it.  If he
> wants to write a new block allocator, sure, why not, we have allocator
> plugins yes?   His will just be simpler.

But the most important thing is to reduce the number of writes as low
as possible.


Re: where is the journal kept?

2005-08-16 Thread Hans Reiser
Shawn Rutledge wrote:

>On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote:
>  
>
>>I must be misunderstanding something. Half of reiser4' complexity is about to 
>>make disk writes and reads
>>more sequential to decrease number of hard disk head seeks.
>>IMHO, filesystem for storage devices which do not have moving mechanical 
>>parts can go better without such code.
>>
>>
You exaggerate.

Reiser4 does a very nice job of packing the tree tightly, which is
independent of seeks.  Ditto for compression plugin.
He merely needs to ignore some code, he is not harmed by it.  If he
wants to write a new block allocator, sure, why not, we have allocator
plugins yes?   His will just be simpler.

>
>Well so far the choices are FAT and EXT2, both of which have
>well-known limitations.  I'm trying to get set up to compile things on
>the zaurus, and installing gcc requires symlinks so I had to quit
>using FAT.
>
>JFFS2 is used for the internal flash but is overkill for a CF card (as
>well as not designed for it, and thus would require a lot of changes
>to make it possible).
>
>What I want is an FS that never loses data because of a crash (writes
>should be atomic) and doesn't ever need to be fsck'd, yet still is
>very fast and caches writes and accumulates writes, so that repeated
>writes to the same file result in only one physical write.
>
>
>  
>



Re: where is the journal kept?

2005-08-15 Thread Shawn Rutledge
On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote:
> I must be misunderstanding something. Half of reiser4' complexity is about to 
> make disk writes and reads
> more sequential to decrease number of hard disk head seeks.
> IMHO, filesystem for storage devices which do not have moving mechanical 
> parts can go better without such code.

Well so far the choices are FAT and EXT2, both of which have
well-known limitations.  I'm trying to get set up to compile things on
the zaurus, and installing gcc requires symlinks so I had to quit
using FAT.

JFFS2 is used for the internal flash but is overkill for a CF card (as
well as not designed for it, and thus would require a lot of changes
to make it possible).

What I want is an FS that never loses data because of a crash (writes
should be atomic) and doesn't ever need to be fsck'd, yet still is
very fast and caches writes and accumulates writes, so that repeated
writes to the same file result in only one physical write.


Re: where is the journal kept?

2005-08-15 Thread Vladimir V. Saveliev

Hello

Shawn Rutledge wrote:

On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote:


Pat Double wrote:


...


reiser4 has no filesystem area dedicated for journal.
Instead it allocats log(journal) records dynamically.

it is transactional.


If that's the case, why this option in debug.reiser4 :

Usage: /sbin/debugfs.reiser4 [ options ] FILE
Print options:
  -j, --print-journal   prints journal.



yes, this prints reiser4 "wandering logs".



Suppose a new file was being created; when the journal entry for the
new file actually gets converted to an inode, can it be done without
rewriting the whole file?

What if the program was only writing part of the file?  Is journalling
involved in that?  If so, does it duplicate the whole file?  It
doesn't duplicate it more than once does it?  (first into the journal,
then re-write it on the filesystem later)?



When one modifies a filesystem (creat a file or write to a file or anything 
else)
all blocks are modified in memory. All related changes get in one atom.
When changes are old enough to be flushed to disk - atom commit is called.
Atom commit starts with "flushing". This procedure decides whether changed 
block gets to relocate of overwrite set.
Blocks of relocate set are written to disk once only. Blocks of relocate set 
are to written twice: to log and in place.


What I'm getting at is whether Reiser4 is any better for a flash-based
filesystem than Reiser3?  I was interested in using it on a CF card in
my Zaurus, but the guys on the oe list are saying that journalling
filesystems are bad in general on flash, because they involve
rewriting the same areas of the flash (where the journal is) over and
over.  I was thinking that CF and SD cards both use smart block
allocation so that actually the writes are getting spread around on
the physical flash memory; but it's still bad if every time you write
a file, it's getting written 2 or more times. If you have gotten it
down to just writing the main part of the file once, and then
re-writing some sort of header from the journal into the "real" fs,
then the reduction of the life of the flash might be tolerable.



I must be misunderstanding something. Half of reiser4' complexity is about to 
make disk writes and reads
more sequential to decrease number of hard disk head seeks.
IMHO, filesystem for storage devices which do not have moving mechanical parts 
can go better without such code.


The other problem with using it on the Z is that busybox's version of
mount does not support ReiserFS, so I'd have to compile another
version for it.  (Not a big problem, but I wonder why busybox doesn't
support it in the first place.)






Re: where is the journal kept?

2005-08-15 Thread Shawn Rutledge
On 8/15/05, Vladimir V. Saveliev <[EMAIL PROTECTED]> wrote:
> Pat Double wrote:
...
> reiser4 has no filesystem area dedicated for journal.
> Instead it allocats log(journal) records dynamically.
> 
> it is transactional.
> > If that's the case, why this option in debug.reiser4 :
> >
> > Usage: /sbin/debugfs.reiser4 [ options ] FILE
> > Print options:
> >-j, --print-journal   prints journal.
> >
> 
> yes, this prints reiser4 "wandering logs".

Suppose a new file was being created; when the journal entry for the
new file actually gets converted to an inode, can it be done without
rewriting the whole file?

What if the program was only writing part of the file?  Is journalling
involved in that?  If so, does it duplicate the whole file?  It
doesn't duplicate it more than once does it?  (first into the journal,
then re-write it on the filesystem later)?

What I'm getting at is whether Reiser4 is any better for a flash-based
filesystem than Reiser3?  I was interested in using it on a CF card in
my Zaurus, but the guys on the oe list are saying that journalling
filesystems are bad in general on flash, because they involve
rewriting the same areas of the flash (where the journal is) over and
over.  I was thinking that CF and SD cards both use smart block
allocation so that actually the writes are getting spread around on
the physical flash memory; but it's still bad if every time you write
a file, it's getting written 2 or more times. If you have gotten it
down to just writing the main part of the file once, and then
re-writing some sort of header from the journal into the "real" fs,
then the reduction of the life of the flash might be tolerable.

The other problem with using it on the Z is that busybox's version of
mount does not support ReiserFS, so I'd have to compile another
version for it.  (Not a big problem, but I wonder why busybox doesn't
support it in the first place.)


Re: where is the journal kept?

2005-08-15 Thread David Masover


Vladimir V. Saveliev wrote:

Hello

Pat Double wrote:

Stupid, question. I thought that reiser4 had no journal, 



reiser4 has no filesystem area dedicated for journal.
Instead it allocats log(journal) records dynamically.

it is transactional.


If that's the case, why this option in debug.reiser4 :

Usage: /sbin/debugfs.reiser4 [ options ] FILE
Print options:
   -j, --print-journal   prints journal.



yes, this prints reiser4 "wandering logs".



Not to beat a dead horse, but it might make things go faster if people 
thought of the journal as an invisible, inaccessible file, because 
everyone knows that files can be fragmented, and everyone also knows 
that fragmented files can be reassembled and serialized (cat file). 
Only possible difference is, I don't know if the reiser4 log comes out 
of debugfs in any particular order.


It's an oversimplification, but probably a useful one.



Re: where is the journal kept?

2005-08-15 Thread Vladimir V. Saveliev

Hello

Pat Double wrote:
Stupid, question. I thought that reiser4 had no journal, 


reiser4 has no filesystem area dedicated for journal.
Instead it allocats log(journal) records dynamically.

it is transactional.

If that's the case, why this option in debug.reiser4 :

Usage: /sbin/debugfs.reiser4 [ options ] FILE
Print options:
   -j, --print-journal   prints journal.



yes, this prints reiser4 "wandering logs".



On Monday 15 August 2005 07:58 am, Vladimir V. Saveliev wrote:


Hello

Payal Rathod wrote:


On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote:


Each journaling filesystem keeps its journal by its own way.
In reiserfs by default journal is kept in statically pre-allocated on
mkfs time 8192 blocks (4096 bytes each) starting from 18-th block.


Is it kept in some sort of file?


No. That area of filesystem does not belong to any files stored on that
filesystem. You can read it from device directly: dd if=/dev/hda1 bs=4096
count=8192. I am not sure that it can be of any interest, because you will
see just binary data.

I mean can I see the file contents



using normal UNIX tools? If yes how do I do it?


You can use
debugreiserfs -j /dev/hda1 /dev/hda1
to see journal content. This will decode binary data into human readable
form.



With warm regards,
-Payal







Re: where is the journal kept?

2005-08-15 Thread Pat Double
Stupid, question. I thought that reiser4 had no journal, it is transactional. 
If that's the case, why this option in debug.reiser4 :

Usage: /sbin/debugfs.reiser4 [ options ] FILE
Print options:
   -j, --print-journal   prints journal.


On Monday 15 August 2005 07:58 am, Vladimir V. Saveliev wrote:
> Hello
>
> Payal Rathod wrote:
> > On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote:
> >>Each journaling filesystem keeps its journal by its own way.
> >>In reiserfs by default journal is kept in statically pre-allocated on
> >> mkfs time 8192 blocks (4096 bytes each) starting from 18-th block.
> >
> > Is it kept in some sort of file?
>
> No. That area of filesystem does not belong to any files stored on that
> filesystem. You can read it from device directly: dd if=/dev/hda1 bs=4096
> count=8192. I am not sure that it can be of any interest, because you will
> see just binary data.
>
> I mean can I see the file contents
>
> > using normal UNIX tools? If yes how do I do it?
>
> You can use
> debugreiserfs -j /dev/hda1 /dev/hda1
> to see journal content. This will decode binary data into human readable
> form.
>
> > With warm regards,
> > -Payal

-- 
Pat Double, [EMAIL PROTECTED]
"In the beginning God created the heaven and the earth."


Re: where is the journal kept?

2005-08-15 Thread Vladimir V. Saveliev

Hello

Payal Rathod wrote:

On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote:


Each journaling filesystem keeps its journal by its own way.
In reiserfs by default journal is kept in statically pre-allocated on mkfs 
time 8192 blocks (4096 bytes each) starting from 18-th block.



Is it kept in some sort of file? 


No. That area of filesystem does not belong to any files stored on that 
filesystem.
You can read it from device directly: dd if=/dev/hda1 bs=4096 count=8192.
I am not sure that it can be of any interest, because you will see just binary 
data.

I mean can I see the file contents

using normal UNIX tools? If yes how do I do it?


You can use
debugreiserfs -j /dev/hda1 /dev/hda1
to see journal content. This will decode binary data into human readable form.


With warm regards,
-Payal






Re: where is the journal kept?

2005-08-15 Thread Payal Rathod
On Mon, Aug 15, 2005 at 04:25:37PM +0400, Vladimir V. Saveliev wrote:
> Each journaling filesystem keeps its journal by its own way.
> In reiserfs by default journal is kept in statically pre-allocated on mkfs 
> time 8192 blocks (4096 bytes each) starting from 18-th block.

Is it kept in some sort of file? I mean can I see the file contents 
using normal UNIX tools? If yes how do I do it?

With warm regards,
-Payal


Re: where is the journal kept?

2005-08-15 Thread Vladimir V. Saveliev

Hello

Payal Rathod wrote:

Hi,
I have been reading about journalling filesystems but still don't get 
where exactly the journal is kept phyically on the harddisk. Can anyone 
help me with basics of this?




Each journaling filesystem keeps its journal by its own way.
In reiserfs by default journal is kept in statically pre-allocated on mkfs time 
8192 blocks (4096 bytes each) starting from 18-th block.