[PATCH] git-cat-file: fix output when format string contains no variables

2013-11-05 Thread Sven Brauch
>From 2e7b5aed771faeff654a447346bb0b57570d9569 Mon Sep 17 00:00:00 2001
From: Sven Brauch 
Date: Tue, 5 Nov 2013 20:06:21 +0100
Subject: [PATCH] git-cat-file: fix output when format string contains no
 variables

When the format string for git-cat-object --batch-check contained no
variables, the function would not actually look for the object on disk,
but just verify that the hash is correct. Thus it would report no error
if asking for objects which did not actually exist on disk if the SHA hash
looked ok.

Example of buggy behaviour:
echo "XYZ" | git hash-object --stdin | git cat-file --batch-check="found"
would report "found" although the manpage claims it would report an error.

Signed-off-by: Sven Brauch 
---

Notes:
This fixes a bug where git-cat-file --batch-check would erroneously tell
that objects exist while they did in fact not in case the argument to
--batch-check was just a constant strig (i.e. no %(...) variables).
The reason was that calling sha1_object_info_extended while requesting no
properties of the object would not even verify this object existed, or more
exactly, sha1_loose_object_info would not do that.

I'm entirely unfamiliar with the git codebase; the suggested fix ensures
that always at least one property is requested. If there's a better way
to fix this issue, please let me know.

 builtin/cat-file.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..32d0b63 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -238,6 +238,15 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
return 0;
}
 
+   if (!data->info.disk_sizep && !data->info.sizep && !data->info.typep) {
+   /*
+* None of the info fields was requested, so we request the 
cheapest
+* one (disk_sizep) to force sha1_object_info_extended to 
actually
+* look for the file.
+*/
+   data->info.disk_sizep = &data->disk_size;
+   }
+
if (sha1_object_info_extended(data->sha1, &data->info) < 0) {
printf("%s missing\n", obj_name);
fflush(stdout);
-- 
1.8.4.2


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-cat-file: fix output when format string contains no variables

2013-11-06 Thread Junio C Hamano
Sven Brauch  writes:

> From 2e7b5aed771faeff654a447346bb0b57570d9569 Mon Sep 17 00:00:00 2001
> From: Sven Brauch 
> Date: Tue, 5 Nov 2013 20:06:21 +0100
> Subject: [PATCH] git-cat-file: fix output when format string contains no
>  variables

Thanks; first some procedural issues:

 - Omit the first line "From 2e7b..."; that does not belong to any
   patch submission (it is a separator used between messages in the
   mailbox formatted file "format-patch --stdout" produces).

 - The second line "From: Sven..." records exactly the same address
   as the e-mail you are sending out, so it should be omitted as
   well.

 - The third line "Date: ..." is not the time we the general public
   sees your fix for the first time, which is what "Date: " of your
   e-mail header already records, so we do not need it either.

 - And the last one "Subject: ..." is redundant; we can see it in
   your e-mail header.

In general, the latter three lines are produced by format-patch to
help you fill header fields in the MUA of your choice by cutting
(not copying) and pasting. Unless there is a valid reason to have
values different from what recipients would see in the e-mail header
(and there often isn't, unless you are forwarding somebody else's
patch, in which case you may want to use "From: ", or you are
responding to an ongoing discussion with a patch, in which case you
may want to use "Subject: "), please remove them after copying them
out to your e-mail header.

> When the format string for git-cat-object --batch-check contained no
> variables, the function would not actually look for the object on disk,
> but just verify that the hash is correct. Thus it would report no error
> if asking for objects which did not actually exist on disk if the SHA hash
> looked ok.
>
> Example of buggy behaviour:
> echo "XYZ" | git hash-object --stdin | git cat-file --batch-check="found"
> would report "found" although the manpage claims it would report an error.

An excellent log message.  It would have been even better to add a
new test to t1006 based on this reproduction recipe.

> Signed-off-by: Sven Brauch 
> ---
>
> Notes:
> This fixes a bug where git-cat-file --batch-check would erroneously tell
> that objects exist while they did in fact not in case the argument to
> --batch-check was just a constant strig (i.e. no %(...) variables).
> The reason was that calling sha1_object_info_extended while requesting no
> properties of the object would not even verify this object existed, or 
> more
> exactly, sha1_loose_object_info would not do that.
> 
> I'm entirely unfamiliar with the git codebase; the suggested fix ensures
> that always at least one property is requested. If there's a better way
> to fix this issue, please let me know.

I think the real problem is that sha1_loose_object_info() is called
by sha1_object_info_extended(), when it does not find a cached or a
packed object, and the callee assumes that it is asked to fill in
only the requested pieces of information while the caller does not
even bother to check if such an object actually exists.

How about doing it like the attached instead?

-- >8 --
Subject: sha1_loose_object_info(): do not return success on missing object

Since 052fe5ea (sha1_loose_object_info: make type lookup optional,
2013-07-12), sha1_loose_object_info() returns happily without
checking if the object in question exists, which is not what the the
caller sha1_object_info_extended() expects; the caller does not even
bother checking the existence of the object itself.

Noticed-by: Sven Brauch 
Signed-off-by: Junio C Hamano 
---

  Oh, by the way, there is this one iffy bit in batch_one_object():

  if (get_sha1(obj_name, data->sha1)) {
  printf("%s missing\n", obj_name);
  fflush(stdout);
  return 0;
  }

  At this point, the object _may_ be missing, but the obj_name may be
  malformed, so saying "missing" is not strictly correct.  If, for
  example, you misspelled the name of the master branch, you would get
  this:

  $ echo mastre | git cat-file --batch-check=foo
  mastre missing

  I however doubt that it is a good idea to reword this message by
  adding a logic to tell misspelled object name and missing object
  name apart.  The users of "cat-file --batch-check" are not
  expecting to be able to distinguish these two classes of errors
  anyway.

 sha1_file.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 7dadd04..00220a4 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2486,12 +2486,11 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 * need to look 

Re: [PATCH] git-cat-file: fix output when format string contains no variables

2013-11-06 Thread Jeff King
On Wed, Nov 06, 2013 at 10:00:57AM -0800, Junio C Hamano wrote:

> I think the real problem is that sha1_loose_object_info() is called
> by sha1_object_info_extended(), when it does not find a cached or a
> packed object, and the callee assumes that it is asked to fill in
> only the requested pieces of information while the caller does not
> even bother to check if such an object actually exists.

Yes, exactly. This is over-optimization on the part of my 052fe5ea. The
caller asked for "nothing", so we happily optimize the request to do
nothing. But I forgot there is always an implicit "does it even exist"
query in the call. I do not see any point in adding an "exists" query
field to "struct object_info". :)

> -- >8 --
> Subject: sha1_loose_object_info(): do not return success on missing object
> 
> Since 052fe5ea (sha1_loose_object_info: make type lookup optional,
> 2013-07-12), sha1_loose_object_info() returns happily without
> checking if the object in question exists, which is not what the the
> caller sha1_object_info_extended() expects; the caller does not even
> bother checking the existence of the object itself.
> 
> Noticed-by: Sven Brauch 
> Signed-off-by: Junio C Hamano 

This is definitely the right fix. Commit message and patch look good to
me. A few extra bits are in the patch below.

>   Oh, by the way, there is this one iffy bit in batch_one_object():
> 
>   if (get_sha1(obj_name, data->sha1)) {
>   printf("%s missing\n", obj_name);
>   fflush(stdout);
>   return 0;
>   }
> 
>   At this point, the object _may_ be missing, but the obj_name may be
>   malformed, so saying "missing" is not strictly correct.  If, for
>   example, you misspelled the name of the master branch, you would get
>   this:
> 
>   $ echo mastre | git cat-file --batch-check=foo
>   mastre missing
> 
>   I however doubt that it is a good idea to reword this message by
>   adding a logic to tell misspelled object name and missing object
>   name apart.  The users of "cat-file --batch-check" are not
>   expecting to be able to distinguish these two classes of errors
>   anyway.

Yes, I noticed that while doing the original series, but left it intact
for the exact reason you mention. Note that it is going to stdout and is
part of the actual data stream (so there is a good chance scripts are
parsing it).

> diff --git a/sha1_file.c b/sha1_file.c
> index 7dadd04..00220a4 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2486,12 +2486,11 @@ static int sha1_loose_object_info(const unsigned char 
> *sha1,
>* need to look inside the object at all.
>*/
>   if (!oi->typep && !oi->sizep) {
> - if (oi->disk_sizep) {
> - struct stat st;
> - if (stat_sha1_file(sha1, &st) < 0)
> - return -1;
> + struct stat st;
> + if (stat_sha1_file(sha1, &st) < 0)
> + return -1;
> + if (oi->disk_sizep)
>   *oi->disk_sizep = st.st_size;
> - }
>   return 0;

Here's a squashable patch which expands the comment above the code
you're tweaking, and adds a test to t1006.

I notice that t1006 could use some modernization, whitespace-fixing, and
typo-fixing, but I left that out of the patch for clarity.

Thanks both of you for working on this.

-Peff

---
diff --git a/sha1_file.c b/sha1_file.c
index 3474dca..10676ba 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2489,7 +2489,11 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 
/*
 * If we don't care about type or size, then we don't
-* need to look inside the object at all.
+* need to look inside the object at all. Note that we
+* do not optimize out the stat call, even if the
+* caller doesn't care about the disk-size, since our
+* return value implicitly indicates whether the
+* object even exists.
 */
if (!oi->typep && !oi->sizep) {
struct stat st;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a420742..8a1bc5c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -194,6 +194,12 @@ test_expect_success "--batch-check for an emtpy line" '
 test " missing" = "$(echo | git cat-file --batch-check)"
 '
 
+test_expect_success 'empty --batch-check notices missing object' '
+   echo "$_z40 missing" >expect &&
+   echo "$_z40" | git cat-file --batch-check="" >actual &&
+   test_cmp expect actual
+'
+
 batch_input="$hello_sha1
 $commit_sha1
 $tag_sha1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-cat-file: fix output when format string contains no variables

2013-11-06 Thread Sven Brauch
On Wednesday 06 November 2013 10:00:57 Junio C Hamano wrote:
> Thanks; first some procedural issues:
Thanks, I will take care of the mentioned points for future submissions.

> I think the real problem is that sha1_loose_object_info() is called
> by sha1_object_info_extended(), when it does not find a cached or a
> packed object, and the callee assumes that it is asked to fill in
> only the requested pieces of information while the caller does not
> even bother to check if such an object actually exists.
> 
> How about doing it like the attached instead?
Yes; this seems more like a proper fix. I would prefer it over my suggestion.
It is illogical that sha1_loose_object_info sometimes returns an error if the 
object does not exist and sometimes not, depending on which properties are 
requested.

Sven
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html