Re: [PATCH v9 1/5] sha1_file: support reading from a loose object of unknown type

2015-04-29 Thread Junio C Hamano
Karthik Nayak karthik@gmail.com writes:

 Update sha1_loose_object_info() to optionally allow it to read
 from a loose object file of unknown/bogus type; as the function
 usually returns the type of the object it read in the form of enum
 for known types, add an optional typename field to receive the
 name of the type in textual form and a flag to indicate the reading
 of a loose object file of unknown/bogus type.

 Add parse_sha1_header_extended() which acts as a wrapper around
 parse_sha1_header() allowing more information to be obtained.

Thanks.  This mostly looks good modulo a nit.

 diff --git a/sha1_file.c b/sha1_file.c
 index 980ce6b..9a15634 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -1564,6 +1564,33 @@ int unpack_sha1_header(git_zstream *stream, unsigned 
 char *map, unsigned long ma
   return git_inflate(stream, 0);
  }
  
 +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char 
 *map,
 + unsigned long mapsize, void *buffer,
 + unsigned long bufsiz, struct strbuf 
 *header)

This function in this round looks somewhat tricky.

 +{
 + unsigned char *cp;
 + int status;
 +
 + status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);

Here, we unpack into the caller-supplied buffer, without touching
the caller-supplied header strbuf.

 + /*
 +  * Check if entire header is unpacked in the first iteration.
 +  */
 + if (memchr(buffer, '\0', stream-next_out - (unsigned char *)buffer))
 + return 0;

And we return the object header in the buffer without ever touching
the header strbuf.  We expect that the caller would know that the
buffer it gave us would contain the full object header line.

 + strbuf_add(header, buffer, stream-next_out - (unsigned char *)buffer);
 + do {
 + status = git_inflate(stream, 0);
 + strbuf_add(header, buffer, stream-next_out - (unsigned char 
 *)buffer);
 + if (memchr(buffer, '\0', stream-next_out - (unsigned char 
 *)buffer))
 + return 0;

However, here, we return the object header in the caller-supplied
header strbuf, while using the caller-supplied buffer as a scratch
area.  We expect that the caller would know that the header strbuf
is what it has to use to find the object header.

Which is good in the sense that there is no unnecessary copies, but
the caller needs to be careful to do something like:

if (!unpack_to_strbuf(... buffer, sizeof(buffer), header)) {
if (header.len)
object_header = header.buf;
else
object_header = buffer;
} else {
error(cannot unpack the object header);
}

It is a good trade-off between complexity and efficiency.  The
complexity is isolated as the function is file-scope-static and it
is perfectly fine to force the callers to be extra careful.

But this suggests that the patch to add tests should try at least
two, preferably three, kinds of test input.  A bogus type that needs
a header longer than the caller's fixed buffer, a bogus type whose
header would fit within the fixed buffer, and optionally a correct
type whose header should always fit within the fixed buffer.

 @@ -1614,27 +1641,38 @@ static void *unpack_sha1_rest(git_zstream *stream, 
 void
 ...
 + /*
 +  * Set type to 0 if its an unknown object and
 +  * we're obtaining the type using '--allow-unkown-type'
 +  * option.
 +  */
 + if ((flags  LOOKUP_UNKNOWN_OBJECT)  (type == -1))
 + type = 0;
 + else if (type == -1)
 + die(invalid object type);

Would type == -2 or any other negative value, if existed, mean
something different?  I do not think so, and hence I would prefer
these two checks be done with type  0 instead.

 @@ -2522,13 +2572,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
 *sha1,
  }
  
  static int sha1_loose_object_info(const unsigned char *sha1,
 -   struct object_info *oi)
 +   struct object_info *oi,
 +   int flags)
  {
 - int status;
 - unsigned long mapsize, size;
 + int status = 0;
 + unsigned long mapsize;
   void *map;
   git_zstream stream;
   char hdr[32];
 + struct strbuf hdrbuf = STRBUF_INIT;
 ...
 + if ((flags  LOOKUP_UNKNOWN_OBJECT)) {
 + if (unpack_sha1_header_to_strbuf(stream, map, mapsize, hdr, 
 sizeof(hdr), hdrbuf)  0)
 + status = error(unable to unpack %s header with 
 --allow-unknown-type,
 +sha1_to_hex(sha1));
 + } else if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 
  0)
   status = error(unable to unpack %s header,
  sha1_to_hex(sha1));
 - else if ((status = parse_sha1_header(hdr, size))  0)
 + if 

Re: [PATCH v9 1/5] sha1_file: support reading from a loose object of unknown type

2015-04-29 Thread karthik nayak


On 04/30/2015 01:05 AM, Junio C Hamano wrote:

karthik nayak karthik@gmail.com writes:

 On 04/29/2015 08:19 PM, Junio C Hamano wrote:
 Karthik Nayak karthik@gmail.com writes:

 Update sha1_loose_object_info() to optionally allow it to read
 from a loose object file of unknown/bogus type; as the function
 usually returns the type of the object it read in the form of enum
 for known types, add an optional typename field to receive the
 name of the type in textual form and a flag to indicate the reading
 of a loose object file of unknown/bogus type.

 Add parse_sha1_header_extended() which acts as a wrapper around
 parse_sha1_header() allowing more information to be obtained.

 Thanks.  This mostly looks good modulo a nit.

 Sorry didn't get what you meant by modulo a nit..

nit as in Nit-pick; a small imperfection that may need to be
corrected (such as the what if we saw failure earlier and 'status'
already had a value? issue).

Thanks for clearing that out.


 It is a good trade-off between complexity and efficiency.  The
 complexity is isolated as the function is file-scope-static and it
 is perfectly fine to force the callers to be extra careful.

 But this suggests that the patch to add tests should try at least
 two, preferably three, kinds of test input.  A bogus type that needs
 a header longer than the caller's fixed buffer, a bogus type whose
 header would fit within the fixed buffer, and optionally a correct
 type whose header should always fit within the fixed buffer.

 Yes it is a tradeoff, and it is complex as in the user has to check
 the strbuf provided to see if its been used. But this like you said I
 guess its a good tradeoff.
 About the three tests, My patch checks a bogus type whose header
 would fit within the fixed buffer and correct type whose header
 should always fit within the fixed buffer but will write a test to
 check A bogus type that needs a header longer than the caller's fixed
 buffer

Yup. Please do so; that would make the test coverage more complete.


Yup will do :)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html