Nikita Danilov wrote:

> Xuan Baldauf writes:
>  >
>  >
>  > Nikita Danilov wrote:
>  >
>  > > Xuan Baldauf writes:
>  > >  > Hello Hans,
>  > >  >
>  > >  > are you considering pingpong-journaling for reiser4?
>  > >  >
>  > >  > Ping-Pong journaling is that, in the case you are able to
>  > >  > know that the blocks you are writing to will be overwritten
>  > >  > due to outstanding requests|future transactions, you do not
>  > >  > write the invalidation block of the old transaction until
>  > >  > the affected blocks are overwritten by the new transaction.
>  > >  > Doing journaling in that way, you are bringing the count of
>  > >  > required writes for journaling to the count of required
>  > >  > writes for non-journaling (1 changed block, 1 write instead
>  > >  > of 1 changed block, 1 write to the journal and 1 write to
>  > >  > the real location), and thus  saving half the
>  > >  > journal-related writes in the ideal case. The superblock is
>  > >  > a good candidate for this feature.
>  > >
>  > > In reiserfs journalling written by Chris, transactions are "batched"
>  > > that is, sequence of transactions is accumulated in memory and then
>  > > dumped to journal area at once. If block was modified several times
>  > > during batch, it'll be written only once, so, in some sense, reiserfs
>  > > does have ping-pong journalling already.
>  >
>  > Yes, but I don't think that this is the same. Batching makes parallel
>  > writes of different processes more efficient, pingpong-journaling also
>  > makes serial writes of the same process more efficient (e.g. a "mv" over
>  > a large directory, or an rpm install, etc.). Usually, metadata changes
>  > should have been synced to disk before the filesystem call returns.
>
> Now as I am confused, I'll just describe how reiserfs (3.6) journalling works:
>
> Transaction batch starts
> call 1 starts
>  tr1 involves blocks into transaction (no io)
> call 1 ends
> call 2 starts
>  tr2 involves blocks into transaction (no io)
> call 2 ends
> ...
> call N starts
>  trN involves blocks into transaction (no io)
> call N ends
> At that point either some predefined amount of time has elapsed, or
> transactions have involved more than JOURNAL_MAX_BATCH (==900) blocks.
> All blocks are dumped into journal area.
> Wait until io complete.
> Write commit block into journal area.
> Mark all blocks dirty so they will be flushed to real locations by
> normal means.
> Next transaction batch starts.

So, while metadata changes are atomic, they are not synchronous?

>
>
> How does ping-pong journalling differs from this?

It is only for the synchronous case, where applications wait until the changes
reach the disk (maybe due to fsync and friends). So if the asynchronous case is
the normal case (seems so), additional pingpong-journaling does not seem to have a
large impact.

When intensive data journaling is needed (which is about to be enabled in
reiser4?), the impact may become greater due to the higher frequency of
commit-blocks.

>
>
> Nikita.
>
>  > Without pingpong-journaling, the situation is as follows (maybe actually
>  > in reiserfs, it is slightly different, I do not know):
>  >
>  >   call 1 start
>  >   transaction 1 start
>  >   transaction 1 write to journal
>  >   transaction 1 commit
>  >   transaction 1 write to real location
>  >   transaction 1 invalidate
>  >   call 1 end
>  >
>  >   [some short time is elapsing where the application thinks what to do
>  > next]
>  >
>  >   call 2 start
>  >   transaction 2 start
>  >   transaction 2 write to journal
>  >   transaction 2 commit
>  >   transaction 2 write to real location
>  >   transaction 2 invalidate
>  >   call 2 end
>  >
>  > You total in 4 writes. (Write to real location may be deferred until
>  > "transaction 2 start", so response times to filesystem calls can be
>  > faster).
>  >
>  > With pingpong journaling, you can do it as follows:
>  >
>  >   call 1 start
>  >   transaction 1 start
>  >   transaction 1 write to journal
>  >   transaction 1 commit
>  >   call 1 end
>  >
>  >   [some short time is elapsing where the application thinks what to do
>  > next]
>  >
>  >   call 2 start
>  >   transaction 2 write to real location
>  >   transaction 1 invalidate
>  >   call 2 end
>  >
>  > You total in 2 writes.
>  >
>  > Basically, it is just delaying the transaction invalidation into some
>  > future. This possibly may require additional memory be pinned, so the
>  > user should be able to set an upper limit of the amount of memory used
>  > for this purpose to set the speed-memory-tradeoff.
>  >
>  > Batching cannot speed up serial transactions of the same process (does
>  > it?), while pingpong-journaling does.
>  >
>  > Actually, it is somewhat more tricky, because the set of blocks changed
>  > by transaction 2 may not be a subset of the set of blocks changed by
>  > transaction 1, so there must be some "transaction invalidation
>  > transaction" which does, when comitted, not only invalidate the
>  > transactions which previously changed the view on the blocks now changed
>  > again, but which also changes the view on the blocks now changed but not
>  > changed previously by now-invalidated transactions.
>  >
>  > >
>  > >  >
>  > >  > I think that heavily loaded servers with parallel disk
>  > >  > writes will be able to see a considerable speedup.
>  > >  >
>  > >  > Xuân.
>  > >
>  > > Nikita.
>  >
>  > Xuân.

Xuân.


Reply via email to