Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Junio C Hamano
Derrick Stolee  writes:

> My thought was that using a fixed name
> (e.g. .git/objects/info/commit-graph) would block making the graph
> incremental. Upon thinking again, this is not the case. That feature
> could be designed with a fixed name for the small, frequently-updated
> file and use .../info/graph-.graph names for the "base" graph
> files.

I guess that is in line with the way how split-index folks did their
thing ;-)


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/23/2018 2:33 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


The (unlikely, but possible) race condition involves two processes (P1
and P2):

1. P1 reads from graph-latest to see commit graph file F1.
2. P2 updates graph-latest to point to F2 and deletes F1.
3. P1 tries to read F1 and fails.

I could explicitly mention this condition in the message, or we can
just let P2 fail by deleting all files other than the one referenced
by 'graph-latest'. Thoughts?

The established way we do this is not to have -latest pointer, I
would think, and instead, make -latest be the actual thing.  That is
how $GIT_DIR/index is updated, for example, by first writing into a
temporary file and then moving it to the final destination.  Is
there a reason why the same pattern cannot be used?


My thought was that using a fixed name (e.g. 
.git/objects/info/commit-graph) would block making the graph 
incremental. Upon thinking again, this is not the case. That feature 
could be designed with a fixed name for the small, frequently-updated 
file and use .../info/graph-.graph names for the "base" graph files.


I'll spend some time thinking about the ramifications of this fixed-name 
concept. At the moment, it would remove two commits from this patch 
series, which is nice.


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Junio C Hamano
Derrick Stolee  writes:

> The (unlikely, but possible) race condition involves two processes (P1
> and P2):
>
> 1. P1 reads from graph-latest to see commit graph file F1.
> 2. P2 updates graph-latest to point to F2 and deletes F1.
> 3. P1 tries to read F1 and fails.
>
> I could explicitly mention this condition in the message, or we can
> just let P2 fail by deleting all files other than the one referenced
> by 'graph-latest'. Thoughts?

The established way we do this is not to have -latest pointer, I
would think, and instead, make -latest be the actual thing.  That is
how $GIT_DIR/index is updated, for example, by first writing into a
temporary file and then moving it to the final destination.  Is
there a reason why the same pattern cannot be used?


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/22/2018 1:48 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


Teach git-commit-graph to delete the .graph files that are siblings of a
newly-written graph file, except for the file referenced by 'graph-latest'
at the beginning of the process and the newly-written file. If we fail to
delete a graph file, only report a warning because another git process may
be using that file. In a multi-process environment, we expect the previoius
graph file to be used by a concurrent process, so we do not delete it to
avoid race conditions.

I do not understand the later part of the above.  On some operating
systems, you actually can remove a file that is open by another
process without any ill effect.  There are systems that do not allow
removing a file that is in use, and an attempt to unlink it may
fail.  The need to handle such a failure gracefully is not limited
to the case of removing a commit graph file---we need to deal with
it when removing file of _any_ type.


My thought is that we should _warn_ when we fail to delete a .graph file 
that we think should be safe to delete. However, if we are warning for a 
file that is currently being accessed (as is the case on Windows, at 
least), then we will add a lot of noise. This is especially true when 
using IDEs that run 'status' or 'fetch' in the background, frequently.



Especially the last sentence "we do not delete it to avoid race
conditions" I find problematic.  If a system does not allow removing
a file in use and we detect a failure after an attempt to do so, it
is not "we do not delete it" --- even if you do, you won't succeed
anyway, so there is no point saying that.  And on systems that do
allow safe removal of a file in use (i.e. they allow an open file to
be used by processes that have open filehandles to it after its
removal), there is no point refraining to delete it "to avoid race
conditions", either---in fact it is unlikely that you would even know
somebody else had it open and was using it.


The (unlikely, but possible) race condition involves two processes (P1 
and P2):


1. P1 reads from graph-latest to see commit graph file F1.
2. P2 updates graph-latest to point to F2 and deletes F1.
3. P1 tries to read F1 and fails.

I could explicitly mention this condition in the message, or we can just 
let P2 fail by deleting all files other than the one referenced by 
'graph-latest'. Thoughts?



In any case, I do not think '--delete-expired' option that can be
given only when you are writing makes much sense as an API.  An
'expire' command, just like 'set-latest' command, that is a separate
command from 'write',  may make sense, though.


In another message, I proposed dropping the argument and assuming 
expires happen on every write.


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/21/2018 4:34 PM, Stefan Beller wrote:

On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee  wrote:


 graph_name = write_commit_graph(opts.obj_dir);

 if (graph_name) {
 if (opts.set_latest)
 set_latest_file(opts.obj_dir, graph_name);

+   if (opts.delete_expired)
+   do_delete_expired(opts.obj_dir,
+ old_graph_name,
+ graph_name);
+

So this only allows to delete expired things and setting the latest
when writing a new graph. Would we ever envision a user to produce
a new graph (e.g. via obtaining a graph that they got from a server) and
then manually rerouting the latest to that new graph file without writing
that graph file in the same process? The same for expired.

I guess these operations are just available via editing the
latest or deleting files manually, which slightly contradicts
e.g. "git update-ref", which in olden times was just a fancy way
of rewriting the refs file manually. (though it claims to be less
prone to errors as it takes lock files)


I imagine these alternatives for placing a new, latest commit graph file 
would want Git to handle rewriting the "graph-latest" file. Given such a 
use case, we could consider extending the 'commit-graph' interface, but 
I don't want to plan for it now.





  extern char *get_graph_latest_filename(const char *obj_dir);
+extern char *get_graph_latest_contents(const char *obj_dir);

Did
https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/
ever make it into tree? (It is sort of new, but I feel we'd want to
strive for consistency in the code base, eventually.)


Thank you for the reminder. I've removed the externs from 'commit-graph.h'.

Should I also remove the externs from other methods I introduce even 
though their surrounding definitions include 'extern'?


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-22 Thread Junio C Hamano
Derrick Stolee  writes:

> Teach git-commit-graph to delete the .graph files that are siblings of a
> newly-written graph file, except for the file referenced by 'graph-latest'
> at the beginning of the process and the newly-written file. If we fail to
> delete a graph file, only report a warning because another git process may
> be using that file. In a multi-process environment, we expect the previoius
> graph file to be used by a concurrent process, so we do not delete it to
> avoid race conditions.

I do not understand the later part of the above.  On some operating
systems, you actually can remove a file that is open by another
process without any ill effect.  There are systems that do not allow
removing a file that is in use, and an attempt to unlink it may
fail.  The need to handle such a failure gracefully is not limited
to the case of removing a commit graph file---we need to deal with
it when removing file of _any_ type.

Especially the last sentence "we do not delete it to avoid race
conditions" I find problematic.  If a system does not allow removing
a file in use and we detect a failure after an attempt to do so, it
is not "we do not delete it" --- even if you do, you won't succeed
anyway, so there is no point saying that.  And on systems that do
allow safe removal of a file in use (i.e. they allow an open file to
be used by processes that have open filehandles to it after its
removal), there is no point refraining to delete it "to avoid race
conditions", either---in fact it is unlikely that you would even know
somebody else had it open and was using it.

In any case, I do not think '--delete-expired' option that can be
given only when you are writing makes much sense as an API.  An
'expire' command, just like 'set-latest' command, that is a separate
command from 'write',  may make sense, though.


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-21 Thread Stefan Beller
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee  wrote:

> graph_name = write_commit_graph(opts.obj_dir);
>
> if (graph_name) {
> if (opts.set_latest)
> set_latest_file(opts.obj_dir, graph_name);
>
> +   if (opts.delete_expired)
> +   do_delete_expired(opts.obj_dir,
> + old_graph_name,
> + graph_name);
> +

So this only allows to delete expired things and setting the latest
when writing a new graph. Would we ever envision a user to produce
a new graph (e.g. via obtaining a graph that they got from a server) and
then manually rerouting the latest to that new graph file without writing
that graph file in the same process? The same for expired.

I guess these operations are just available via editing the
latest or deleting files manually, which slightly contradicts
e.g. "git update-ref", which in olden times was just a fancy way
of rewriting the refs file manually. (though it claims to be less
prone to errors as it takes lock files)

>
>  extern char *get_graph_latest_filename(const char *obj_dir);
> +extern char *get_graph_latest_contents(const char *obj_dir);

Did
https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/
ever make it into tree? (It is sort of new, but I feel we'd want to
strive for consistency in the code base, eventually.)