On Thu, Jun 20, 2013 at 04:26:09PM +0200, Benoît Canet wrote:
> ---
>  docs/specs/qcow2.txt |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..a4ffc85 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -350,3 +350,45 @@ Snapshot table entry:
>          variable:   Unique ID string for the snapshot (not null terminated)
>  
>          variable:   Name of the snapshot (not null terminated)
> +
> +== Journal ==
> +
> +QCOW2 can use one or more instance of a metadata journal.

s/instance/instances/

Is there a reason to use multiple journals rather than a single journal
for all entry types?  The single journal area avoids seeks.

> +
> +A journal is a sequential log of journal entries appended on a previously
> +allocated and reseted area.

I think you say "previously reset area" instead of "reseted".  Another
option is "initialized area".

> +A journal is designed like a linked list with each entry pointing to the next
> +so it's easy to iterate over entries.
> +
> +A journal uses the following constants to denote the type of each entry
> +
> +TYPE_NONE = 0xFF      default value of any bytes in a reseted journal
> +TYPE_END  = 1         the entry ends a journal cluster and point to the next
> +                      cluster
> +TYPE_HASH = 2         the entry contains a deduplication hash
> +
> +QCOW2 journal entry:
> +
> +    Byte 0         :    Size of the entry: size = 2 + n with size <= 254

This is not clear.  I'm wondering if the +2 is included in the byte
value or not.  I'm also wondering what a byte value of zero means and
what a byte value of 255 means.

Please include an example to illustrate how this field works.

> +
> +         1         :    Type of the entry
> +
> +         2 - size  :    The optional n bytes structure carried by entry
> +
> +A journal is divided into clusters and no journal entry can be spilled on two
> +clusters. This avoid having to read more than one cluster to get a single 
> entry.
> +
> +For this purpose an entry with the end type is added at the end of a journal
> +cluster before starting to write in the next cluster.
> +The size of such an entry is set so the entry points to the next cluster.
> +
> +As any journal cluster must be ended with an end entry the size of regular
> +journal entries is limited to 254 bytes in order to always left room for an 
> end
> +entry which mimimal size is two bytes.
> +
> +The only cases where size > 254 are none entries where size = 255.
> +
> +The replay of a journal stop when the first end none entry is reached.

s/stop/stops/

> +The journal cluster size is 4096 bytes.

Questions about this layout:

1. Journal entries have no integrity mechanism, which is especially
   important if they span physical sectors where cheap disks may perform
   a partial write.  This would leave a corrupt journal.  If the last
   bytes are a checksum then you can get some confidence that the entry
   was fully written and is valid.

   Did I miss something?

2. Byte-granularity means that read-modify-write is necessary to append
   entries to the journal.  Therefore a failure could destroy previously
   committed entries.

   Any ideas how existing journals handle this?

Reply via email to