On 6/11/2018 3:19 PM, Stefan Beller wrote:
Hi Derrick,
On Thu, Jun 7, 2018 at 7:03 AM Derrick Stolee <sto...@gmail.com> wrote:
The multi-pack-index (MIDX) feature generalizes the existing pack-
index (IDX) feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  Documentation/technical/pack-format.txt | 49 +++++++++++++++++++++++++
  1 file changed, 49 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 70a99fd142..17666b4bfc 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,52 @@ Pack file entry: <+
      corresponding packfile.

      20-byte SHA-1-checksum of all of the above.
+
+== midx-*.midx files have the following format:
+
+The meta-index files refer to multiple pack-files and loose objects.
So is it meta or multi?

Good catch. We were calling this the meta-index internally before changing to "multi-pack-index" (helps to not change the acronym).


+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+       4-byte signature:
+           The signature is: {'M', 'I', 'D', 'X'}
+
+       1-byte version number:
+           Git only writes or recognizes version 1
+
+       1-byte Object Id Version
+           Git only writes or recognizes verion 1 (SHA-1)
s/verion/version/

+       1-byte number (C) of "chunks"
+
+       1-byte number (I) of base multi-pack-index files:
+           This value is currently always zero.
Oh? Are meta-index and multi-index files different things?

Not intended to be different things, but this number is related to making the feature incremental.


+       4-byte number (P) of pack files
+
+CHUNK LOOKUP:
+
+       (C + 1) * 12 bytes providing the chunk offsets:
+           First 4 bytes describe chunk id. Value 0 is a terminating label.
+           Other 8 bytes provide offset in current file for chunk to start.
+           (Chunks are provided in file-order, so you can infer the length
+           using the next chunk position if necessary.)
It is so nice to have the header also have 12 bytes, so it fits right into the
lookup table. So an alternative point of view:

   If a chunk needs to store more than 8 bytes, we'll have an offset after
   the first 4 bytes that describe the chunk, otherwise you can store the 8 
bytes
   of information directly after the 4 bytes.
    "MIDX" is a special chunk and must come first (does it?) and only once
   as it contains the version number.

This sounds feasible, but unnecessarily complicated. I don't think any other chunk will be this small.

+       The remaining data in the body is described one chunk at a time, and
+       these chunks may be given in any order. Chunks are required unless
+       otherwise specified.
+
+CHUNK DATA:
+
+       (This section intentionally left incomplete.)
+
+TRAILER:
+
+       H-byte HASH-checksum of all of the above.
This means we have to rehash the whole file for updating its contents.
okay.

Reply via email to