Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-16 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> grep "fatal: test-blob-1 is neither a commit nor blob" actual
>
> OK, that might be somewhat unsatisfying from end-user's point of
> view (logically "test-blob-1" is already a name based on the 'graph
> relations' that is satisfactory).

... but that is correct and I think we need any further updates to
address it.  If a user starts with a tag object, the user already
has one usable human-readable name.  By "unsatisfying", I didn't
mean that it is something we need to spend more cycles.



Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-15 Thread Junio C Hamano
Stefan Beller  writes:

> grep "fatal: test-blob-1 is neither a commit nor blob" actual

OK, that might be somewhat unsatisfying from end-user's point of
view (logically "test-blob-1" is already a name based on the 'graph
relations' that is satisfactory).

[side note] should that grep become test_i18ngrep?

If this machinery is to find an object that is *bad* (in the sense
in one of your earlier messages), I suspect it may be quite easy to
use the new mechanism added by the series to also locate a tree
object and report "that tree appears in this commit at this path"
and it would be equally useful as the feature to find a blob.

Thanks.



Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-15 Thread Stefan Beller
>
> Give an object a human readable name based on an available ref
>
> or something like that?

will use

> Or a sentence in BUGS section.

will add.

> A case (or two) I find more interesting is to see how the code
> behaves against these:
>
> git tag -a -m "annotated blob" a-blob HEAD:Makefile
> git tag -a -m "annotated tree" a-tree HEAD:t
> git describe a-blob a-tree

Glad I added a test for exactly this (well only the a-blob case,
but a-tree will be the same):

test_expect_success 'describe tag object' '
 git tag test-blob-1 -a -m msg unique-file:file &&
test_must_fail git describe test-blob-1 2>actual &&
grep "fatal: test-blob-1 is neither a commit nor blob" actual
 '


Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-15 Thread Stefan Beller
On Tue, Nov 14, 2017 at 5:52 PM, Jonathan Tan  wrote:
> On Tue, 14 Nov 2017 16:30:43 -0800
> Stefan Beller  wrote:
>
>> The walking is performed in reverse order to show the introduction of a
>> blob rather than its last occurrence.
>
> The code as implemented here does not do this - it instead shows the last
> occurrence.

fixed to show the first occurrence.

>
>>  NAME
>>  
>> -git-describe - Describe a commit using the most recent tag reachable from it
>> +git-describe - Describe a commit or blob using the graph relations
>
> I would write "Describe a commit or blob using a tag reachable from it".

using a ref, as we also can use refs.
I think 'the graph' is technically correct here, but may be too confusing.

>
>> +If the given object refers to a blob, it will be described
>> +as `:`, such that the blob can be found
>> +at `` in the ``. Note, that the commit is likely
>> +not the commit that introduced the blob, but the one that was found
>> +first; to find the commit that introduced the blob, you need to find
>> +the commit that last touched the path, e.g.
>> +`git log  -- `.
>> +As blobs do not point at the commits they are contained in,
>> +describing blobs is slow as we have to walk the whole graph.
>
> I think some of this needs to be updated?

fixed.

>
>> +static void process_object(struct object *obj, const char *path, void *data)
>> +{
>> + struct process_commit_data *pcd = data;
>> +
>> + if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
>> + reset_revision_walk();
>> + describe_commit(>current_commit, pcd->dst);
>> + strbuf_addf(pcd->dst, ":%s", path);
>> + pcd->revs->max_count = 0;
>> + }
>> +}
>
> Setting max_count to 0 does not work when reverse is used, because the
> commits are first buffered into revs->commits (see get_revision() in
> revision.c). There doesn't seem to be a convenient way to terminate the
> traversal immediately - I think setting revs->commits to NULL should
> work (but I didn't check). Remember to free revs->commits (using
> free_commit_list) first.

This does work indeed.

>
>> +test_expect_success 'describe a blob at a tag' '
>> + echo "make it a unique blob" >file &&
>> + git add file && git commit -m "content in file" &&
>> + git tag -a -m "latest annotated tag" unique-file &&
>> + git describe HEAD:file >actual &&
>> + echo "unique-file:file" >expect &&
>> + test_cmp expect actual
>> +'
>
> This is probably better named "describe a blob at a directly tagged
> commit".

ok

>  (Should we also test the case where a blob is directly
> tagged?)

We do a bad job at describing tags that point at a blob currently:

  git tag test-blob HEAD:Makefile
  git describe test-blob
error: object cd75985991f4535c45e2589222a9e6a38fb1d613 is a blob, not a commit
fatal: test-blob is not a valid 'commit' object

This series changes this to

  git describe test-blob
  v2.15.0-rc0-43-g54bd705a95:Makefile

which might not be expected (you'd expect "test-blob"),
so I think I can write a test telling that this is suboptimal
behavior?

>
>> +test_expect_success 'describe a blob with its last introduction' '
>> + git commit --allow-empty -m "empty commit" &&
>> + git rm file &&
>> + git commit -m "delete blob" &&
>> + git revert HEAD &&
>> + git commit --allow-empty -m "empty commit" &&
>> + git describe HEAD:file >actual &&
>> + grep unique-file-3-g actual
>> +'
>
> The description is not true: firstly, this shows the last occurrence,
> not the last introduction (you can verify this by adding another commit
> and noticing that the contents of "actual" changes), and what we want is
> not the last introduction anyway, but the first one.

fixed.


Re: [PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-14 Thread Jonathan Tan
On Tue, 14 Nov 2017 16:30:43 -0800
Stefan Beller  wrote:

> The walking is performed in reverse order to show the introduction of a
> blob rather than its last occurrence.

The code as implemented here does not do this - it instead shows the last
occurrence.

>  NAME
>  
> -git-describe - Describe a commit using the most recent tag reachable from it
> +git-describe - Describe a commit or blob using the graph relations

I would write "Describe a commit or blob using a tag reachable from it".

> +If the given object refers to a blob, it will be described
> +as `:`, such that the blob can be found
> +at `` in the ``. Note, that the commit is likely
> +not the commit that introduced the blob, but the one that was found
> +first; to find the commit that introduced the blob, you need to find
> +the commit that last touched the path, e.g.
> +`git log  -- `.
> +As blobs do not point at the commits they are contained in,
> +describing blobs is slow as we have to walk the whole graph.

I think some of this needs to be updated?

> +static void process_object(struct object *obj, const char *path, void *data)
> +{
> + struct process_commit_data *pcd = data;
> +
> + if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
> + reset_revision_walk();
> + describe_commit(>current_commit, pcd->dst);
> + strbuf_addf(pcd->dst, ":%s", path);
> + pcd->revs->max_count = 0;
> + }
> +}

Setting max_count to 0 does not work when reverse is used, because the
commits are first buffered into revs->commits (see get_revision() in
revision.c). There doesn't seem to be a convenient way to terminate the
traversal immediately - I think setting revs->commits to NULL should
work (but I didn't check). Remember to free revs->commits (using
free_commit_list) first.

> +test_expect_success 'describe a blob at a tag' '
> + echo "make it a unique blob" >file &&
> + git add file && git commit -m "content in file" &&
> + git tag -a -m "latest annotated tag" unique-file &&
> + git describe HEAD:file >actual &&
> + echo "unique-file:file" >expect &&
> + test_cmp expect actual
> +'

This is probably better named "describe a blob at a directly tagged
commit". (Should we also test the case where a blob is directly
tagged?)

> +test_expect_success 'describe a blob with its last introduction' '
> + git commit --allow-empty -m "empty commit" &&
> + git rm file &&
> + git commit -m "delete blob" &&
> + git revert HEAD &&
> + git commit --allow-empty -m "empty commit" &&
> + git describe HEAD:file >actual &&
> + grep unique-file-3-g actual
> +'

The description is not true: firstly, this shows the last occurrence,
not the last introduction (you can verify this by adding another commit
and noticing that the contents of "actual" changes), and what we want is
not the last introduction anyway, but the first one.


[PATCHv4 7/7] builtin/describe.c: describe a blob

2017-11-14 Thread Stefan Beller
Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob. For
example

  git describe --tags v0.99:Makefile
  conversion-901-g7672db20c2:Makefile

tells us the Makefile as it was in v0.99 was introduced in commit 7672db20.

The walking is performed in reverse order to show the introduction of a
blob rather than its last occurrence.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller 
---
 Documentation/git-describe.txt | 13 -
 builtin/describe.c | 66 ++
 t/t6120-describe.sh| 19 
 3 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..a25443ca91 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,7 +3,7 @@ git-describe(1)
 
 NAME
 
-git-describe - Describe a commit using the most recent tag reachable from it
+git-describe - Describe a commit or blob using the graph relations
 
 
 SYNOPSIS
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=]
+'git describe' 
 
 DESCRIPTION
 ---
@@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only 
shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `:`, such that the blob can be found
+at `` in the ``. Note, that the commit is likely
+not the commit that introduced the blob, but the one that was found
+first; to find the commit that introduced the blob, you need to find
+the commit that last touched the path, e.g.
+`git log  -- `.
+As blobs do not point at the commits they are contained in,
+describing blobs is slow as we have to walk the whole graph.
+
 OPTIONS
 ---
 ...::
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..a2a5fdc48d 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN   (1u << 0)
 #define MAX_TAGS   (FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,57 @@ static void describe_commit(struct object_id *oid, struct 
strbuf *dst)
strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+   struct object_id current_commit;
+   struct object_id looking_for;
+   struct strbuf *dst;
+   struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+   struct process_commit_data *pcd = data;
+   pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+   struct process_commit_data *pcd = data;
+
+   if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
+   reset_revision_walk();
+   describe_commit(>current_commit, pcd->dst);
+   strbuf_addf(pcd->dst, ":%s", path);
+   pcd->revs->max_count = 0;
+   }
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+   struct rev_info revs;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct process_commit_data pcd = { null_oid, oid, dst, };
+
+   argv_array_pushl(, "internal: The first arg is not parsed",
+   "--all", "--reflog", /* as many starting points as possible */
+   /* NEEDSWORK: --all is incompatible with worktrees for now: