Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-14 Thread Junio C Hamano
Derrick Stolee  writes:

>>  close_commit_graph();
>>
>> And after writing all data out (oh by the way, why aren't we passing
>> commit_graph instance around and instead relying on a file-scope
>> static global?)...
>
> Yeah, we should remove the global dependence. Is this a blocker for
> the series?

I do not think it is such a big deal.  It was just that I found it a
bit curious while reading it through, knowing that you are already
familiar with the work being done in that "the_repository" area.

>> I _think_ the word "close" in the name hashclose() is about closing
>> the (virtual) stream for the hashing that is overlayed on top of the
>> underlying file descriptor, and being able to choose between closing
>> and not closing the underlying file descriptor when "closing" the
>> hashing layer sort of makes sense.  So I won't complain too much
>> about hashclose() that takes optional CSUM_CLOSE flag.
>
> I agree this "close" word is incorrect. We really want
> "finalize_hashfile()" which may include closing the file.

Yeah, that is much better.  I do not think I'd mind seeing a prelim
step at the very beginning of the series to just rename the function
before the series starts to change anything else (there aren't that
many callers and I do not think there is any topic in flight that
changes these existing callsites).  Or we can leave it for clean-up
after the dust settles.  Either is fine as long as we know that we
eventually get there.

> My new solution works this way. The only caveat is that existing
> callers end up with this diff:
>
> - hashclose(f, _, CSUM_FSYNC);
> + hashclose(f, _, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);

I think I am fine with that.  It feels a bit nonsensical for a
caller to ask fsync when it is not asking fd to be closed, as I'd
imagine that the typical reason why the caller wants the fd left
open is because the caller still wants to do something to it
(e.g. write some more things into it) and a caller who would care
about fsync would want to do so _after_ finishing its own writing,
but that may be just me.

>> And then we can keep the "FSYNC means fsync and then close" the
>> current set of callers rely on.  I dunno if that is a major issue,
>> but I do think "close this, or no, keep it open" is far worse than
>> "do we want the resulting hash in the stream?"
>
> I'm not happy with this solution of needing an extra call like this
> in-between, especially since hashclose() knows how to FSYNC.

I guess we are repeating the same as above ;-)  As I said, I do not
care too deeply either way.

>> An alternative design of the above is without making
>> CSUM_HASH_IN_STREAM a new flag bit.  I highly suspect that the
>> calling codepath _knows_ whether the resulting final hash will be
>> written out at the end of the stream or not when it wraps an fd with
>> a hashfile structure, so "struct hashfile" could gain a bit to tell
>> hashclose() whether the resulting hash need to be written (or not).
>> That would be a bit larger change than what I outlined above, and I
>> do not know if it is worth doing, though.
>
> This certainly seems trickier to get right, but if we think it is the
> right solution I'll spend the time pairing struct creations with
> stream closings.

I still do not think of a compelling reason why such an alternative
approach would be worth taking, and do prefer the approach to let
the caller choose when finalize function is called via a flag bit.

Thanks.



Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-13 Thread Derrick Stolee

On 3/13/2018 5:42 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


On 2/26/2018 9:32 PM, Derrick Stolee wrote:

This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --

I haven't gotten any feedback on this step of the patch. Could someone
take a look and let me know what you think?


Let me just say that I appreciate the level of detail you provided in 
answering this question. The discussion below is very illuminating.



Let's follow the commit-graph writing codepath to see what happens:

fd = hold_lock_file_for_update(, graph_name, 0);
...
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);

The caller creates a lockfile, and then wraps its file descriptor in
a hashfile.

hashwrite_be32(f, GRAPH_SIGNATURE);
...

Then it goes on writing to the hashfile, growing the lockfile.

 ...
write_graph_chunk_large_edges(f, commits.list, commits.nr);

close_commit_graph();

And after writing all data out (oh by the way, why aren't we passing
commit_graph instance around and instead relying on a file-scope
static global?)...


Yeah, we should remove the global dependence. Is this a blocker for the 
series?


After this lands, I plan to add a one-commit patch that moves all global 
"commit_graph" state to the_repository in the spirit of Stefan's 
patches. For now, this is equivalent to "close_all_packs()".



hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);

We ask for the final hash value to be written to the file (and also
returned to us---although you do not seem to use that at all).  See
a comment on this, though, at the end.


I'm not using the final_hash anymore, since we are using the 
"$GIT_DIR/objects/info/commit-graph" filename always. If we make the 
file incremental, then we can use "final_hash" to name the split files. 
For now, I'll remove final_hash in favor of NULL.



commit_lock_file();

And then, we put the lockfile to its final place, while closing its
file descriptor.

The overall API sounds sensible, from the above.

However.

The function whose name is hashclose() that takes a flag word whose
possible bit value includes "Please close this thing" feels strange
enough (does it mean the hashclose() function does not close it if
CSUM_CLOSE is not given?), but adding another to the mix that lets
us say "Please close this (with or without FSYNC), oh by the way
please leave it open" feels a bit borderline to insanity.

I _think_ the word "close" in the name hashclose() is about closing
the (virtual) stream for the hashing that is overlayed on top of the
underlying file descriptor, and being able to choose between closing
and not closing the underlying file descriptor when "closing" the
hashing layer sort of makes sense.  So I won't complain too much
about hashclose() that takes optional CSUM_CLOSE flag.


I agree this "close" word is incorrect. We really want 
"finalize_hashfile()" which may include closing the file.



But then what does it mean to give KEEP_OPEN and CLOSE together?


This should have been a red flag that something was wrong.


The new caller (which is the only one that wants the nominally
nonsensical CLOSE|KEEP_OPEN combination, which is shown above) wants
the final checksum of the data sent over the (virtual) stream
computed and written, and the file descriptor fsync'ed, but the file
descriptor kept open.  As we _DO_ want to keep the verbs in flags
CSUM_CLOSE and CSUM_FSYNC to be about the underlying file
descriptor, I think your new code for KEEP_OPEN that is inside the
if() block that is for CSUM_CLOSE is an ugly hack, and your asking
for improvements is very much appreciated.

Let's step back and see what different behaviours the existing code
wants to support before your patch:

 - hashclose() is always about finializing the hash computation
   over the data sent through the struct hashfile (i.e. the
   virtual stream opened by hashfd()).  The optional *result can
   be used to receive this hash value, even when the caller does
   not want to write that hash value to the output stream.

 - when CSUM_CLOSE is given, however, the hash value is written
   out as the trailing record to the output stream and the stream
   is closed.  CSUM_FSYNC can instead be used to ensure that the
   data hits the disk platter when the output stream is closed.

 - when CSUM_CLOSE nor CSUM_FSYNC is not given, hash value is not
   written to the output stream (the caller takes responsibility
   of using *result), and the output stream is left open.

I think the first mistake in the existing code is to associate
"close the underlying stream" and "write 

Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-13 Thread Junio C Hamano
Derrick Stolee  writes:

> On 2/26/2018 9:32 PM, Derrick Stolee wrote:
>> This patch is new to the series due to the interactions with the lockfile API
>> and the hashfile API. I need to ensure the hashfile writes the hash value at
>> the end of the file, but keep the file descriptor open so the lock is valid.
>>
>> I welcome any susggestions to this patch or to the way I use it in the commit
>> that follows.
>>
>> -- >8 --
>
> I haven't gotten any feedback on this step of the patch. Could someone
> take a look and let me know what you think?

Let's follow the commit-graph writing codepath to see what happens:

fd = hold_lock_file_for_update(, graph_name, 0);
...
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);

The caller creates a lockfile, and then wraps its file descriptor in
a hashfile.

hashwrite_be32(f, GRAPH_SIGNATURE);
...

Then it goes on writing to the hashfile, growing the lockfile.

...
write_graph_chunk_large_edges(f, commits.list, commits.nr);

close_commit_graph();

And after writing all data out (oh by the way, why aren't we passing
commit_graph instance around and instead relying on a file-scope
static global?)...

hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);

We ask for the final hash value to be written to the file (and also
returned to us---although you do not seem to use that at all).  See
a comment on this, though, at the end.

commit_lock_file();

And then, we put the lockfile to its final place, while closing its
file descriptor.

The overall API sounds sensible, from the above.

However.

The function whose name is hashclose() that takes a flag word whose
possible bit value includes "Please close this thing" feels strange
enough (does it mean the hashclose() function does not close it if
CSUM_CLOSE is not given?), but adding another to the mix that lets
us say "Please close this (with or without FSYNC), oh by the way
please leave it open" feels a bit borderline to insanity.

I _think_ the word "close" in the name hashclose() is about closing
the (virtual) stream for the hashing that is overlayed on top of the
underlying file descriptor, and being able to choose between closing
and not closing the underlying file descriptor when "closing" the
hashing layer sort of makes sense.  So I won't complain too much
about hashclose() that takes optional CSUM_CLOSE flag.

But then what does it mean to give KEEP_OPEN and CLOSE together?

The new caller (which is the only one that wants the nominally
nonsensical CLOSE|KEEP_OPEN combination, which is shown above) wants
the final checksum of the data sent over the (virtual) stream
computed and written, and the file descriptor fsync'ed, but the file
descriptor kept open.  As we _DO_ want to keep the verbs in flags
CSUM_CLOSE and CSUM_FSYNC to be about the underlying file
descriptor, I think your new code for KEEP_OPEN that is inside the
if() block that is for CSUM_CLOSE is an ugly hack, and your asking
for improvements is very much appreciated.

Let's step back and see what different behaviours the existing code
wants to support before your patch:

- hashclose() is always about finializing the hash computation
  over the data sent through the struct hashfile (i.e. the
  virtual stream opened by hashfd()).  The optional *result can
  be used to receive this hash value, even when the caller does
  not want to write that hash value to the output stream.

- when CSUM_CLOSE is given, however, the hash value is written
  out as the trailing record to the output stream and the stream
  is closed.  CSUM_FSYNC can instead be used to ensure that the
  data hits the disk platter when the output stream is closed.

- when CSUM_CLOSE nor CSUM_FSYNC is not given, hash value is not
  written to the output stream (the caller takes responsibility
  of using *result), and the output stream is left open.

I think the first mistake in the existing code is to associate
"close the underlying stream" and "write the hash out to the
underlying stream" more closely than it should.  It should be
possible to "close the underlying steam" without first writing the
hash out to the underlying stream", and vice versa.

IOW, I think

hashclose() {
hashflush();
the_hash_algo->final_fn();
if (result) 
hashcpy(result, f->buffer);
+   if (flags & CSUM_HASH_IN_STREAM)
+   flush(f, f->buffer, the_hash_algo->rawsz);
+   if (flags & CSUM_FSYNC)
+   fsync_or_die();
if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-   flush();
-   if (flags & CSUM_FSYNC)
-   fsync_or_die();
if (close(f->fd))
die_errno();

Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-12 Thread Derrick Stolee

On 2/26/2018 9:32 PM, Derrick Stolee wrote:

This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --


I haven't gotten any feedback on this step of the patch. Could someone 
take a look and let me know what you think?


Thanks,
-Stolee


If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Signed-off-by: Derrick Stolee 
---
  csum-file.c | 10 +++---
  csum-file.h |  1 +
  2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 5eda7fb..302e6ae 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, 
unsigned int flags)
flush(f, f->buffer, the_hash_algo->rawsz);
if (flags & CSUM_FSYNC)
fsync_or_die(f->fd, f->name);
-   if (close(f->fd))
-   die_errno("%s: sha1 file error on close", f->name);
-   fd = 0;
+   if (flags & CSUM_KEEP_OPEN)
+   fd = f->fd;
+   else {
+   if (close(f->fd))
+   die_errno("%s: sha1 file error on close", 
f->name);
+   fd = 0;
+   }
} else
fd = f->fd;
if (0 <= f->check_fd) {
diff --git a/csum-file.h b/csum-file.h
index 992e5c0..b7c0e48 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct 
hashfile_checkpoint *);
  /* hashclose flags */
  #define CSUM_CLOSE1
  #define CSUM_FSYNC2
+#define CSUM_KEEP_OPEN 4
  
  extern struct hashfile *hashfd(int fd, const char *name);

  extern struct hashfile *hashfd_check(const char *name);




[PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-02-26 Thread Derrick Stolee
This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --

If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Signed-off-by: Derrick Stolee 
---
 csum-file.c | 10 +++---
 csum-file.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 5eda7fb..302e6ae 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, 
unsigned int flags)
flush(f, f->buffer, the_hash_algo->rawsz);
if (flags & CSUM_FSYNC)
fsync_or_die(f->fd, f->name);
-   if (close(f->fd))
-   die_errno("%s: sha1 file error on close", f->name);
-   fd = 0;
+   if (flags & CSUM_KEEP_OPEN)
+   fd = f->fd;
+   else {
+   if (close(f->fd))
+   die_errno("%s: sha1 file error on close", 
f->name);
+   fd = 0;
+   }
} else
fd = f->fd;
if (0 <= f->check_fd) {
diff --git a/csum-file.h b/csum-file.h
index 992e5c0..b7c0e48 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct 
hashfile_checkpoint *);
 /* hashclose flags */
 #define CSUM_CLOSE 1
 #define CSUM_FSYNC 2
+#define CSUM_KEEP_OPEN 4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
-- 
2.7.4