Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Junio C Hamano
Derrick Stolee  writes:

> My thought was that using a fixed name
> (e.g. .git/objects/info/commit-graph) would block making the graph
> incremental. Upon thinking again, this is not the case. That feature
> could be designed with a fixed name for the small, frequently-updated
> file and use .../info/graph-.graph names for the "base" graph
> files.

I guess that is in line with the way how split-index folks did their
thing ;-)


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/23/2018 2:33 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


The (unlikely, but possible) race condition involves two processes (P1
and P2):

1. P1 reads from graph-latest to see commit graph file F1.
2. P2 updates graph-latest to point to F2 and deletes F1.
3. P1 tries to read F1 and fails.

I could explicitly mention this condition in the message, or we can
just let P2 fail by deleting all files other than the one referenced
by 'graph-latest'. Thoughts?

The established way we do this is not to have -latest pointer, I
would think, and instead, make -latest be the actual thing.  That is
how $GIT_DIR/index is updated, for example, by first writing into a
temporary file and then moving it to the final destination.  Is
there a reason why the same pattern cannot be used?


My thought was that using a fixed name (e.g. 
.git/objects/info/commit-graph) would block making the graph 
incremental. Upon thinking again, this is not the case. That feature 
could be designed with a fixed name for the small, frequently-updated 
file and use .../info/graph-.graph names for the "base" graph files.


I'll spend some time thinking about the ramifications of this fixed-name 
concept. At the moment, it would remove two commits from this patch 
series, which is nice.


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Junio C Hamano
Derrick Stolee  writes:

> The (unlikely, but possible) race condition involves two processes (P1
> and P2):
>
> 1. P1 reads from graph-latest to see commit graph file F1.
> 2. P2 updates graph-latest to point to F2 and deletes F1.
> 3. P1 tries to read F1 and fails.
>
> I could explicitly mention this condition in the message, or we can
> just let P2 fail by deleting all files other than the one referenced
> by 'graph-latest'. Thoughts?

The established way we do this is not to have -latest pointer, I
would think, and instead, make -latest be the actual thing.  That is
how $GIT_DIR/index is updated, for example, by first writing into a
temporary file and then moving it to the final destination.  Is
there a reason why the same pattern cannot be used?


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/22/2018 1:48 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


Teach git-commit-graph to delete the .graph files that are siblings of a
newly-written graph file, except for the file referenced by 'graph-latest'
at the beginning of the process and the newly-written file. If we fail to
delete a graph file, only report a warning because another git process may
be using that file. In a multi-process environment, we expect the previoius
graph file to be used by a concurrent process, so we do not delete it to
avoid race conditions.

I do not understand the later part of the above.  On some operating
systems, you actually can remove a file that is open by another
process without any ill effect.  There are systems that do not allow
removing a file that is in use, and an attempt to unlink it may
fail.  The need to handle such a failure gracefully is not limited
to the case of removing a commit graph file---we need to deal with
it when removing file of _any_ type.


My thought is that we should _warn_ when we fail to delete a .graph file 
that we think should be safe to delete. However, if we are warning for a 
file that is currently being accessed (as is the case on Windows, at 
least), then we will add a lot of noise. This is especially true when 
using IDEs that run 'status' or 'fetch' in the background, frequently.



Especially the last sentence "we do not delete it to avoid race
conditions" I find problematic.  If a system does not allow removing
a file in use and we detect a failure after an attempt to do so, it
is not "we do not delete it" --- even if you do, you won't succeed
anyway, so there is no point saying that.  And on systems that do
allow safe removal of a file in use (i.e. they allow an open file to
be used by processes that have open filehandles to it after its
removal), there is no point refraining to delete it "to avoid race
conditions", either---in fact it is unlikely that you would even know
somebody else had it open and was using it.


The (unlikely, but possible) race condition involves two processes (P1 
and P2):


1. P1 reads from graph-latest to see commit graph file F1.
2. P2 updates graph-latest to point to F2 and deletes F1.
3. P1 tries to read F1 and fails.

I could explicitly mention this condition in the message, or we can just 
let P2 fail by deleting all files other than the one referenced by 
'graph-latest'. Thoughts?



In any case, I do not think '--delete-expired' option that can be
given only when you are writing makes much sense as an API.  An
'expire' command, just like 'set-latest' command, that is a separate
command from 'write',  may make sense, though.


In another message, I proposed dropping the argument and assuming 
expires happen on every write.


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-23 Thread Derrick Stolee

On 2/21/2018 4:34 PM, Stefan Beller wrote:

On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee  wrote:


 graph_name = write_commit_graph(opts.obj_dir);

 if (graph_name) {
 if (opts.set_latest)
 set_latest_file(opts.obj_dir, graph_name);

+   if (opts.delete_expired)
+   do_delete_expired(opts.obj_dir,
+ old_graph_name,
+ graph_name);
+

So this only allows to delete expired things and setting the latest
when writing a new graph. Would we ever envision a user to produce
a new graph (e.g. via obtaining a graph that they got from a server) and
then manually rerouting the latest to that new graph file without writing
that graph file in the same process? The same for expired.

I guess these operations are just available via editing the
latest or deleting files manually, which slightly contradicts
e.g. "git update-ref", which in olden times was just a fancy way
of rewriting the refs file manually. (though it claims to be less
prone to errors as it takes lock files)


I imagine these alternatives for placing a new, latest commit graph file 
would want Git to handle rewriting the "graph-latest" file. Given such a 
use case, we could consider extending the 'commit-graph' interface, but 
I don't want to plan for it now.





  extern char *get_graph_latest_filename(const char *obj_dir);
+extern char *get_graph_latest_contents(const char *obj_dir);

Did
https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/
ever make it into tree? (It is sort of new, but I feel we'd want to
strive for consistency in the code base, eventually.)


Thank you for the reminder. I've removed the externs from 'commit-graph.h'.

Should I also remove the externs from other methods I introduce even 
though their surrounding definitions include 'extern'?


Thanks,
-Stolee


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-22 Thread Junio C Hamano
Derrick Stolee  writes:

> Teach git-commit-graph to delete the .graph files that are siblings of a
> newly-written graph file, except for the file referenced by 'graph-latest'
> at the beginning of the process and the newly-written file. If we fail to
> delete a graph file, only report a warning because another git process may
> be using that file. In a multi-process environment, we expect the previoius
> graph file to be used by a concurrent process, so we do not delete it to
> avoid race conditions.

I do not understand the later part of the above.  On some operating
systems, you actually can remove a file that is open by another
process without any ill effect.  There are systems that do not allow
removing a file that is in use, and an attempt to unlink it may
fail.  The need to handle such a failure gracefully is not limited
to the case of removing a commit graph file---we need to deal with
it when removing file of _any_ type.

Especially the last sentence "we do not delete it to avoid race
conditions" I find problematic.  If a system does not allow removing
a file in use and we detect a failure after an attempt to do so, it
is not "we do not delete it" --- even if you do, you won't succeed
anyway, so there is no point saying that.  And on systems that do
allow safe removal of a file in use (i.e. they allow an open file to
be used by processes that have open filehandles to it after its
removal), there is no point refraining to delete it "to avoid race
conditions", either---in fact it is unlikely that you would even know
somebody else had it open and was using it.

In any case, I do not think '--delete-expired' option that can be
given only when you are writing makes much sense as an API.  An
'expire' command, just like 'set-latest' command, that is a separate
command from 'write',  may make sense, though.


Re: [PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-21 Thread Stefan Beller
On Mon, Feb 19, 2018 at 10:53 AM, Derrick Stolee  wrote:

> graph_name = write_commit_graph(opts.obj_dir);
>
> if (graph_name) {
> if (opts.set_latest)
> set_latest_file(opts.obj_dir, graph_name);
>
> +   if (opts.delete_expired)
> +   do_delete_expired(opts.obj_dir,
> + old_graph_name,
> + graph_name);
> +

So this only allows to delete expired things and setting the latest
when writing a new graph. Would we ever envision a user to produce
a new graph (e.g. via obtaining a graph that they got from a server) and
then manually rerouting the latest to that new graph file without writing
that graph file in the same process? The same for expired.

I guess these operations are just available via editing the
latest or deleting files manually, which slightly contradicts
e.g. "git update-ref", which in olden times was just a fancy way
of rewriting the refs file manually. (though it claims to be less
prone to errors as it takes lock files)

>
>  extern char *get_graph_latest_filename(const char *obj_dir);
> +extern char *get_graph_latest_contents(const char *obj_dir);

Did
https://public-inbox.org/git/20180208213806.ga6...@sigill.intra.peff.net/
ever make it into tree? (It is sort of new, but I feel we'd want to
strive for consistency in the code base, eventually.)


[PATCH v4 08/13] commit-graph: implement --delete-expired

2018-02-19 Thread Derrick Stolee
Teach git-commit-graph to delete the .graph files that are siblings of a
newly-written graph file, except for the file referenced by 'graph-latest'
at the beginning of the process and the newly-written file. If we fail to
delete a graph file, only report a warning because another git process may
be using that file. In a multi-process environment, we expect the previoius
graph file to be used by a concurrent process, so we do not delete it to
avoid race conditions.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 11 +--
 builtin/commit-graph.c | 61 --
 commit-graph.c | 23 ++
 commit-graph.h |  1 +
 t/t5318-commit-graph.sh|  7 +++--
 5 files changed, 96 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index dc948c5..b9b4031 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -37,6 +37,11 @@ resulting filename.
 +
 With `--set-latest` option, update the graph-latest file to point
 to the written graph file.
++
+With the `--delete-expired` option, delete the graph files in the pack
+directory that are not referred to by the graph-latest file. To avoid race
+conditions, do not delete the file previously referred to by the
+graph-latest file if it is updated by the `--set-latest` option.
 
 'read'::
 
@@ -56,11 +61,11 @@ EXAMPLES
 $ git commit-graph write
 
 
-* Write a graph file for the packed commits in your local .git folder
-* and update graph-latest.
+* Write a graph file for the packed commits in your local .git folder,
+* update graph-latest, and delete stale graph files.
 +
 
-$ git commit-graph write --set-latest
+$ git commit-graph write --set-latest --delete-expired
 
 
 * Read basic information from a graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index bf86172..fd99169 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ] [--file=]"),
-   N_("git commit-graph write [--object-dir ] [--set-latest]"),
+   N_("git commit-graph write [--object-dir ] [--set-latest] 
[--delete-expired]"),
NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--set-latest]"),
+   N_("git commit-graph write [--object-dir ] [--set-latest] 
[--delete-expired]"),
NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
const char *obj_dir;
const char *graph_file;
int set_latest;
+   int delete_expired;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -98,9 +99,56 @@ static void set_latest_file(const char *obj_dir, const char 
*graph_file)
commit_lock_file();
 }
 
+/*
+ * To avoid race conditions and deleting graph files that are being
+ * used by other processes, look inside a pack directory for all files
+ * of the form "graph-.graph" that do not match the old or new
+ * graph hashes and delete them.
+ */
+static void do_delete_expired(const char *obj_dir,
+ const char *old_graph_name,
+ const char *new_graph_name)
+{
+   DIR *dir;
+   struct dirent *de;
+   int dirnamelen;
+   struct strbuf path = STRBUF_INIT;
+
+   strbuf_addf(, "%s/info", obj_dir);
+   dir = opendir(path.buf);
+   if (!dir) {
+   if (errno != ENOENT)
+   error_errno("unable to open object pack directory: %s",
+   obj_dir);
+   return;
+   }
+
+   strbuf_addch(, '/');
+   dirnamelen = path.len;
+   while ((de = readdir(dir)) != NULL) {
+   size_t base_len;
+
+   if (is_dot_or_dotdot(de->d_name))
+   continue;
+
+   strbuf_setlen(, dirnamelen);
+   strbuf_addstr(, de->d_name);
+
+   base_len = path.len;
+   if (strip_suffix_mem(path.buf, _len, ".graph") &&
+   strcmp(new_graph_name, de->d_name) &&
+   (!old_graph_name || strcmp(old_graph_name, de->d_name)) &&
+   remove_path(path.buf))
+   die("failed to remove path %s", path.buf);
+   }
+
+   strbuf_release();
+}
+
 static int graph_write(int argc, const char **argv)
 {
char *graph_name;
+   char *old_graph_name;
 
static struct option builtin_commit_graph_write_options[] = {