Re: [PATCH 05/23] midx: write header information to lockfile

2018-06-19 Thread Derrick Stolee

On 6/19/2018 10:59 AM, Duy Nguyen wrote:

On Tue, Jun 19, 2018 at 2:54 PM Derrick Stolee  wrote:

On 6/12/2018 11:00 AM, Duy Nguyen wrote:

On Thu, Jun 7, 2018 at 7:01 PM Derrick Stolee  wrote:

diff --git a/midx.c b/midx.c
index 616af66b13..3e55422a21 100644
--- a/midx.c
+++ b/midx.c
@@ -1,9 +1,62 @@
   #include "git-compat-util.h"
   #include "cache.h"
   #include "dir.h"
+#include "csum-file.h"
+#include "lockfile.h"
   #include "midx.h"

+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1 /* SHA-1 */

...

+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;

Quoting from "State of NewHash work, future directions, and discussion" [1]

* If you need to serialize an algorithm identifier into your data
format, use the format_id field of struct git_hash_algo.  It's
designed specifically for that purpose.

[1] 
https://public-inbox.org/git/20180612024252.ga141...@aiede.svl.corp.google.com/T/#m5fdd09dcaf31266c45343fb6c0beaaa3e928bc60

Thanks! I'll also use the_hash_algo->rawsz to infer the length of the
hash function.

BTW, since you're the author of commit-graph.c and may notice it has
the same problem. Don't touch that code. Brian already has some WIP
changes [1]. We just make sure new code does not add extra work for
him. I expect he'll send all those patches out soon.

[1] 
https://github.com/bk2204/git/commit/3f9031e06cfb21534eb7dfff7b54e7598ac1149f


Thanks for the link. It seems he is creating an oid_version() method 
that returns a 1-byte version for the hash version instead of the 4-byte 
signature of the_hash_algo->format_id. I look forward to incorporating 
that into the MIDX format. I'll keep my macros for now, as we work out 
the other details, and while Brain's patch is cooking.


Thanks,
-Stolee


Re: [PATCH 05/23] midx: write header information to lockfile

2018-06-19 Thread Duy Nguyen
On Tue, Jun 19, 2018 at 2:54 PM Derrick Stolee  wrote:
>
> On 6/12/2018 11:00 AM, Duy Nguyen wrote:
> > On Thu, Jun 7, 2018 at 7:01 PM Derrick Stolee  wrote:
> >> diff --git a/midx.c b/midx.c
> >> index 616af66b13..3e55422a21 100644
> >> --- a/midx.c
> >> +++ b/midx.c
> >> @@ -1,9 +1,62 @@
> >>   #include "git-compat-util.h"
> >>   #include "cache.h"
> >>   #include "dir.h"
> >> +#include "csum-file.h"
> >> +#include "lockfile.h"
> >>   #include "midx.h"
> >>
> >> +#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
> >> +#define MIDX_VERSION 1
> >> +#define MIDX_HASH_VERSION 1 /* SHA-1 */
> > ...
> >> +static size_t write_midx_header(struct hashfile *f,
> >> +   unsigned char num_chunks,
> >> +   uint32_t num_packs)
> >> +{
> >> +   char byte_values[4];
> >> +   hashwrite_be32(f, MIDX_SIGNATURE);
> >> +   byte_values[0] = MIDX_VERSION;
> >> +   byte_values[1] = MIDX_HASH_VERSION;
> > Quoting from "State of NewHash work, future directions, and discussion" [1]
> >
> > * If you need to serialize an algorithm identifier into your data
> >format, use the format_id field of struct git_hash_algo.  It's
> >designed specifically for that purpose.
> >
> > [1] 
> > https://public-inbox.org/git/20180612024252.ga141...@aiede.svl.corp.google.com/T/#m5fdd09dcaf31266c45343fb6c0beaaa3e928bc60
>
> Thanks! I'll also use the_hash_algo->rawsz to infer the length of the
> hash function.

BTW, since you're the author of commit-graph.c and may notice it has
the same problem. Don't touch that code. Brian already has some WIP
changes [1]. We just make sure new code does not add extra work for
him. I expect he'll send all those patches out soon.

[1] 
https://github.com/bk2204/git/commit/3f9031e06cfb21534eb7dfff7b54e7598ac1149f

-- 
Duy


Re: [PATCH 05/23] midx: write header information to lockfile

2018-06-19 Thread Derrick Stolee

On 6/12/2018 11:00 AM, Duy Nguyen wrote:

On Thu, Jun 7, 2018 at 7:01 PM Derrick Stolee  wrote:

diff --git a/midx.c b/midx.c
index 616af66b13..3e55422a21 100644
--- a/midx.c
+++ b/midx.c
@@ -1,9 +1,62 @@
  #include "git-compat-util.h"
  #include "cache.h"
  #include "dir.h"
+#include "csum-file.h"
+#include "lockfile.h"
  #include "midx.h"

+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1 /* SHA-1 */

...

+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;

Quoting from "State of NewHash work, future directions, and discussion" [1]

* If you need to serialize an algorithm identifier into your data
   format, use the format_id field of struct git_hash_algo.  It's
   designed specifically for that purpose.

[1] 
https://public-inbox.org/git/20180612024252.ga141...@aiede.svl.corp.google.com/T/#m5fdd09dcaf31266c45343fb6c0beaaa3e928bc60


Thanks! I'll also use the_hash_algo->rawsz to infer the length of the 
hash function.


Re: [PATCH 05/23] midx: write header information to lockfile

2018-06-12 Thread Duy Nguyen
On Thu, Jun 7, 2018 at 7:01 PM Derrick Stolee  wrote:
> diff --git a/midx.c b/midx.c
> index 616af66b13..3e55422a21 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1,9 +1,62 @@
>  #include "git-compat-util.h"
>  #include "cache.h"
>  #include "dir.h"
> +#include "csum-file.h"
> +#include "lockfile.h"
>  #include "midx.h"
>
> +#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
> +#define MIDX_VERSION 1
> +#define MIDX_HASH_VERSION 1 /* SHA-1 */
...
> +static size_t write_midx_header(struct hashfile *f,
> +   unsigned char num_chunks,
> +   uint32_t num_packs)
> +{
> +   char byte_values[4];
> +   hashwrite_be32(f, MIDX_SIGNATURE);
> +   byte_values[0] = MIDX_VERSION;
> +   byte_values[1] = MIDX_HASH_VERSION;

Quoting from "State of NewHash work, future directions, and discussion" [1]

* If you need to serialize an algorithm identifier into your data
  format, use the format_id field of struct git_hash_algo.  It's
  designed specifically for that purpose.

[1] 
https://public-inbox.org/git/20180612024252.ga141...@aiede.svl.corp.google.com/T/#m5fdd09dcaf31266c45343fb6c0beaaa3e928bc60

> +   byte_values[2] = num_chunks;
> +   byte_values[3] = 0; /* unused */
> +   hashwrite(f, byte_values, sizeof(byte_values));
> +   hashwrite_be32(f, num_packs);
> +
> +   return MIDX_HEADER_SIZE;
> +}
-- 
Duy


Re: [PATCH 05/23] midx: write header information to lockfile

2018-06-07 Thread Duy Nguyen
On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee  wrote:
> +static char *get_midx_filename(const char *object_dir)
> +{
> +   struct strbuf midx_name = STRBUF_INIT;
> +   strbuf_addstr(_name, object_dir);
> +   strbuf_addstr(_name, "/pack/multi-pack-index");
> +   return strbuf_detach(_name, NULL);
> +}

I think this whole function can be written as
xstrfmt("%s/pack/multi-pack-index", object_dir);

> +
> +static size_t write_midx_header(struct hashfile *f,
> +   unsigned char num_chunks,
> +   uint32_t num_packs)
> +{
> +   char byte_values[4];

unsigned char just to be on the safe side? 'char' is signed on ARM if
I remember correctly.
-- 
Duy


[PATCH 05/23] midx: write header information to lockfile

2018-06-07 Thread Derrick Stolee
As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.

As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.

Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 53 +
 t/t5319-midx.sh |  5 +++--
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 616af66b13..3e55422a21 100644
--- a/midx.c
+++ b/midx.c
@@ -1,9 +1,62 @@
 #include "git-compat-util.h"
 #include "cache.h"
 #include "dir.h"
+#include "csum-file.h"
+#include "lockfile.h"
 #include "midx.h"
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1 /* SHA-1 */
+#define MIDX_HEADER_SIZE 12
+
+static char *get_midx_filename(const char *object_dir)
+{
+   struct strbuf midx_name = STRBUF_INIT;
+   strbuf_addstr(_name, object_dir);
+   strbuf_addstr(_name, "/pack/multi-pack-index");
+   return strbuf_detach(_name, NULL);
+}
+
+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;
+   byte_values[2] = num_chunks;
+   byte_values[3] = 0; /* unused */
+   hashwrite(f, byte_values, sizeof(byte_values));
+   hashwrite_be32(f, num_packs);
+
+   return MIDX_HEADER_SIZE;
+}
+
 int write_midx_file(const char *object_dir)
 {
+   unsigned char num_chunks = 0;
+   uint32_t num_packs = 0;
+   char *midx_name;
+   struct hashfile *f;
+   struct lock_file lk;
+
+   midx_name = get_midx_filename(object_dir);
+   if (safe_create_leading_directories(midx_name)) {
+   UNLEAK(midx_name);
+   die_errno(_("unable to create leading directories of %s"),
+ midx_name);
+   }
+
+   hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
+   f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+   FREE_AND_NULL(midx_name);
+
+   write_midx_header(f, num_chunks, num_packs);
+
+   finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+   commit_lock_file();
+
return 0;
 }
diff --git a/t/t5319-midx.sh b/t/t5319-midx.sh
index a590137af7..80f9389837 100755
--- a/t/t5319-midx.sh
+++ b/t/t5319-midx.sh
@@ -3,8 +3,9 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 
-test_expect_success 'write midx with no pakcs' '
-   git midx --object-dir=. write
+test_expect_success 'write midx with no packs' '
+   git midx --object-dir=. write &&
+   test_path_is_file pack/multi-pack-index
 '
 
 test_done
-- 
2.18.0.rc1