Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-21 Thread karthik nayak



On 04/21/2015 12:21 AM, Jeff King wrote:

On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote:


+static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char
*map,
+   unsigned long mapsize, void *buffer,
+   unsigned long bufsiz, struct strbuf
*header)
+{
+   unsigned char *cp;
+   int status;
+   int i = 0;
+
+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);


I wonder if we would feel comfortable just running this NUL-check as
part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
trigger in normal use, but I wonder if there would be any downsides
(e.g., maliciously crafted objects getting us to allocate memory or
something; I think it is fairly easy to convince git to allocate memory,
though).


But why would we want it to be a part of unpack_sha1_header?


+   for (cp = buffer; cp  stream-next_out; cp++)
+   if (!*cp) {
+   /* Found the NUL at the end of the header */
+   return 0;
+   }


I think we can spell this as:

   if (memchr(buffer, '\0', stream-next_out - buffer))
return 0;

which is shorter and possibly more efficient.

Noted. Thanks :)


In theory we could also just start trying to parse the type/size header,
and notice there when we don't find the NUL. That's probably not worth
doing, though. The parsing is separated from the unpacking here, so it
would require combining those two operations in a single function. And
the extra NUL search here is likely not very expensive.

Yes, even I though about doing that, but wasn't keen on combining those 
two functions, they're meant to do two different things.

-Peff


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-21 Thread Jeff King
On Tue, Apr 21, 2015 at 04:56:08PM +0530, karthik nayak wrote:

 +   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
 
 I wonder if we would feel comfortable just running this NUL-check as
 part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
 trigger in normal use, but I wonder if there would be any downsides
 (e.g., maliciously crafted objects getting us to allocate memory or
 something; I think it is fairly easy to convince git to allocate memory,
 though).
 
 But why would we want it to be a part of unpack_sha1_header?

Just to reduce the number of functions and the complexity of the caller.
But I guess it doesn't help that much if the caller would then need to
speculatively pass in a strbuf. So it's probably not a good idea.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-20 Thread karthik nayak



On 04/18/2015 02:40 AM, Junio C Hamano wrote:

Jeff King p...@peff.net writes:


On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:


Jeff King p...@peff.net writes:


If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free.


Thanks for reminding me; yes, that also worried me.


As an aside, I worried about the extra allocation for reading the header
in the first place. But it looks like we only do this on the --literally
code path (and otherwise use the normal unpack_sha1_header).  Still, I
wonder if we could make this work automagically.  That is, speculatively
unpack the first N bytes, assuming we hit the end-of-header. If not,
then go to a strbuf as the slow path. Then it would be fine to cover all
cases; the normal ones would be fast, and only ridiculous things would
incur the extra allocation.


Yes, that was what I was hoping to see eventually ;-)



Sorry for the delay, So after reading what Jeff said I tried to 
implement it, this might not be a final version of the change, more of a 
RFC version. What do you'll think?


@@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, 
unsigned char *map, unsigned long ma

return git_inflate(stream, 0);
 }

+static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned 
char *map,

+   unsigned long mapsize, void *buffer,
+   unsigned long bufsiz, struct 
strbuf *header)

+{
+   unsigned char *cp;
+   int status;
+   int i = 0;
+
+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+
+   for (cp = buffer; cp  stream-next_out; cp++)
+   if (!*cp) {
+   /* Found the NUL at the end of the header */
+   return 0;
+   }
+
+   strbuf_add(header, buffer, stream-next_out - (unsigned char 
*)buffer);

+   do {
+   status = git_inflate(stream, 0);
+   strbuf_add(header, buffer, stream-next_out - (unsigned 
char *)buffer);

+   for (cp = buffer; cp  stream-next_out; cp++)
+   if (!*cp)
+   /* Found the NUL at the end of the header */
+   return 0;
+   stream-next_out = buffer;
+   stream-avail_out = bufsiz;
+   } while (status != Z_STREAM_END);
+   return -1;
+}
+


@@ -2555,17 +2610,24 @@ static int sha1_loose_object_info(const unsigned 
char *sha1,

return -1;
if (oi-disk_sizep)
*oi-disk_sizep = mapsize;
-   if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
+   if ((flags  LOOKUP_LITERALLY)) {
+   if (unpack_sha1_header_to_strbuf(stream, map, mapsize, 
hdr, sizeof(hdr), hdrbuf)  0)
+   status = error(unable to unpack %s header with 
--literally,

+  sha1_to_hex(sha1));
+   } else if (unpack_sha1_header(stream, map, mapsize, hdr, 
sizeof(hdr))  0)

status = error(unable to unpack %s header,
   sha1_to_hex(sha1));
-   else if ((status = parse_sha1_header(hdr, size))  0)
+   if (hdrbuf.len) {
+   if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
flags))  0)
+   status = error(unable to parse %s header with 
--literally,

+  sha1_to_hex(sha1));
+   } else if ((status = parse_sha1_header_extended(hdr, oi, flags)) 
 0)
status = error(unable to parse %s header, 
sha1_to_hex(sha1));



and
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-20 Thread Jeff King
On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote:

 +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char
 *map,
 +   unsigned long mapsize, void *buffer,
 +   unsigned long bufsiz, struct strbuf
 *header)
 +{
 +   unsigned char *cp;
 +   int status;
 +   int i = 0;
 +
 +   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);

I wonder if we would feel comfortable just running this NUL-check as
part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
trigger in normal use, but I wonder if there would be any downsides
(e.g., maliciously crafted objects getting us to allocate memory or
something; I think it is fairly easy to convince git to allocate memory,
though).

 +   for (cp = buffer; cp  stream-next_out; cp++)
 +   if (!*cp) {
 +   /* Found the NUL at the end of the header */
 +   return 0;
 +   }

I think we can spell this as:

  if (memchr(buffer, '\0', stream-next_out - buffer))
return 0;

which is shorter and possibly more efficient.

In theory we could also just start trying to parse the type/size header,
and notice there when we don't find the NUL. That's probably not worth
doing, though. The parsing is separated from the unpacking here, so it
would require combining those two operations in a single function. And
the extra NUL search here is likely not very expensive.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 12:19 AM, Jeff King wrote:

On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote:


But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?


Yes, we could, that would eliminate  struct strbuf typename =
STRBUF_INIT.

Something like this perhaps :


Yeah, this is exactly what I had in mind.


   {
-   char type[10];
-   int i;
+   const char *buf = hdr;
  unsigned long size;
+   int type, len = 0;


Maybe switch the names of buf and len to type_buf and type_len
to make their purpose more clear?

-Peff



Yes, will change the names.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 12:53 AM, Junio C Hamano wrote:

karthik nayak karthik@gmail.com writes:


+   type = type_from_string_gently(buf, len, 1);
+   if (oi-typename) {
+   strbuf_add(oi-typename, buf, len);
+   strbuf_addch(oi-typename, '\0');


add() has setlen() at the end so you do not have to NUL terminate it
yourself.  Doing addch() is actively wrong, as oi-typename-len now
counts the terminating NUL as part of the string, no?



Yes. was speculative of that. thanks for clearing it up.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 05:01 AM, Eric Sunshine wrote:

On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak karthik@gmail.com wrote:

Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional typename field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.
[...]
---
diff --git a/sha1_file.c b/sha1_file.c
index 980ce6b..267399d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
  }

  static int sha1_loose_object_info(const unsigned char *sha1,
- struct object_info *oi)
+ struct object_info *oi,
+ int flags)
  {
-   int status;
-   unsigned long mapsize, size;
+   int status = 0;
+   unsigned long mapsize;
 void *map;
 git_zstream stream;
 char hdr[32];
+   struct strbuf hdrbuf = STRBUF_INIT;

 if (oi-delta_base_sha1)
 hashclr(oi-delta_base_sha1);
@@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 return -1;
 if (oi-disk_sizep)
 *oi-disk_sizep = mapsize;
-   if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
-   status = error(unable to unpack %s header,
-  sha1_to_hex(sha1));
-   else if ((status = parse_sha1_header(hdr, size))  0)
-   status = error(unable to parse %s header, sha1_to_hex(sha1));
-   else if (oi-sizep)
-   *oi-sizep = size;
+   if ((flags  LOOKUP_LITERALLY)) {
+   if (unpack_sha1_header_to_strbuf(stream, map, mapsize, hdrbuf) 
 0)
+   status = error(unable to unpack %s header with 
--literally,
+  sha1_to_hex(sha1));
+   else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
flags))  0)
+   status = error(unable to parse %s header with 
--literally,
+  sha1_to_hex(sha1));
+   } else {
+   if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 
 0)
+   status = error(unable to unpack %s header,
+  sha1_to_hex(sha1));
+   else if ((status = parse_sha1_header_extended(hdr, oi, flags)) 
 0)
+   status = error(unable to parse %s header, 
sha1_to_hex(sha1));
+   }
 git_inflate_end(stream);
 munmap(map, mapsize);
-   if (oi-typep)
+   if (status  oi-typep)
 *oi-typep = status;
+   if (hdrbuf.buf)
+   strbuf_release(hdrbuf);


Why is strbuf_release() protected by a conditional rather than being
called unconditionally? Am I missing something obvious?

No, you're right.



 return 0;
  }


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

  This _might_ have some performance impact in that strbuf_addch()
  involves strbuf_grow(*, 1), which does does it overflow to
  increment sb-len by one?; I would say it should be unmeasurable
  because the function is expected to be used only on loose objects
  and you shouldn't have very many of them without packing in your
  repository in the first place.
  
  I guess Peff's c1822d4f (strbuf: add an optimized 1-character
  strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
  new strbuf_grow_ch(), and once that happens the performance worry
  would disappear without this code to be changed at all.
 
 I haven't re-rolled that series yet, but the discussion there showed
 that strbuf_grow_ch() is unnecessary; call-sites can just check:
 
   if (!strbuf_avail(sb))
   strbuf_grow(sb, 1);
 
 to get the fast inline check. Since we go to the trouble to inline
 strbuf_addch, we should probably teach it the same trick. It would be
 nice to identify a place with a tight strbuf_addch() loop that could
 demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

  http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?

-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
is overly lax. It does a strncmp() with the length of the candidate
name, but does not check that we consumed all of the matching name.
So tr would match tree, comm would match commit, and so
forth.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread karthik nayak



On 04/17/2015 07:53 PM, Jeff King wrote:

On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

 This _might_ have some performance impact in that strbuf_addch()
 involves strbuf_grow(*, 1), which does does it overflow to
 increment sb-len by one?; I would say it should be unmeasurable
 because the function is expected to be used only on loose objects
 and you shouldn't have very many of them without packing in your
 repository in the first place.

 I guess Peff's c1822d4f (strbuf: add an optimized 1-character
 strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
 new strbuf_grow_ch(), and once that happens the performance worry
 would disappear without this code to be changed at all.

 I haven't re-rolled that series yet, but the discussion there showed
 that strbuf_grow_ch() is unnecessary; call-sites can just check:

if (!strbuf_avail(sb))
 strbuf_grow(sb, 1);

 to get the fast inline check. Since we go to the trouble to inline
 strbuf_addch, we should probably teach it the same trick. It would be
 nice to identify a place with a tight strbuf_addch() loop that could
 demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

   http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?


Yes, we could, that would eliminate  struct strbuf typename =
STRBUF_INIT.

Something like this perhaps :

@@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream
*stream, void *buffer, unsigned long s
   * too permissive for what we want to check. So do an anal
   * object header parse by hand.
   */
-int parse_sha1_header(const char *hdr, unsigned long *sizep)
+static int parse_sha1_header_extended(const char *hdr, struct
object_info *oi,
+  unsigned int flags)
  {
-   char type[10];
-   int i;
+   const char *buf = hdr;
 unsigned long size;
+   int type, len = 0;

 /*
-* The type can be at most ten bytes (including the
-* terminating '\0' that we add), and is followed by
+* The type can be of any size but is followed by
  * a space.
  */
-   i = 0;
 for (;;) {
 char c = *hdr++;
 if (c == ' ')
 break;
-   type[i++] = c;
-   if (i = sizeof(type))
-   return -1;
+   len++;
 }
-   type[i] = 0;
+
+   type = type_from_string_gently(buf, len, 1);
+   if (oi-typename) {
+   strbuf_add(oi-typename, buf, len);
+   strbuf_addch(oi-typename, '\0');
+   }
+   /*
+* Set type to 0 if its an unknown object and
+* we're obtaining the type using '--literally'
+* option.
+*/
+   if ((flags  LOOKUP_LITERALLY)  (type == -1))
+   type = 0;
+   else if (type == -1)
+   die(invalid object type);
+   if (oi-typep)
+   *oi-typep = type;

 /*
  * The length must follow immediately, and be in canonical



-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
 is overly lax. It does a strncmp() with the length of the candidate
 name, but does not check that we consumed all of the matching name.
 So tr would match tree, comm would match commit, and so
 forth.


Thanks for the patch re-roll on strbuf_addch() and the patch on
type_from_string_gently().
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote:

 But now we use type_from_string_gently, which can accept a length[1]. So
 we could just count the bytes to the first space and pass the original
 buffer along with that length, no?
 
 Yes, we could, that would eliminate  struct strbuf typename =
 STRBUF_INIT.
 
 Something like this perhaps :

Yeah, this is exactly what I had in mind.

   {
 -   char type[10];
 -   int i;
 +   const char *buf = hdr;
  unsigned long size;
 +   int type, len = 0;

Maybe switch the names of buf and len to type_buf and type_len
to make their purpose more clear?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
karthik nayak karthik@gmail.com writes:

 +   type = type_from_string_gently(buf, len, 1);
 +   if (oi-typename) {
 +   strbuf_add(oi-typename, buf, len);
 +   strbuf_addch(oi-typename, '\0');

add() has setlen() at the end so you do not have to NUL terminate it
yourself.  Doing addch() is actively wrong, as oi-typename-len now
counts the terminating NUL as part of the string, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 If there _is_ a performance implication to worry about here, I think it
 would be that we are doing an extra malloc/free.

Thanks for reminding me; yes, that also worried me.

 I'm not sure I
 understand why we are copying it at all. The original code copied from
 the hdr into type[10] so that we could NUL-terminate it, which was
 required for type_from_string().

Sounds like a good plan.


 But now we use type_from_string_gently, which can accept a length[1]. So
 we could just count the bytes to the first space and pass the original
 buffer along with that length, no?

 -Peff

 [1] Not related to your patch, but it looks like type_from_string_gently
 is overly lax. It does a strncmp() with the length of the candidate
 name, but does not check that we consumed all of the matching name.
 So tr would match tree, comm would match commit, and so
 forth.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  If there _is_ a performance implication to worry about here, I think it
  would be that we are doing an extra malloc/free.
 
 Thanks for reminding me; yes, that also worried me.

 As an aside, I worried about the extra allocation for reading the header
 in the first place. But it looks like we only do this on the --literally
 code path (and otherwise use the normal unpack_sha1_header).  Still, I
 wonder if we could make this work automagically.  That is, speculatively
 unpack the first N bytes, assuming we hit the end-of-header. If not,
 then go to a strbuf as the slow path. Then it would be fine to cover all
 cases; the normal ones would be fast, and only ridiculous things would
 incur the extra allocation.

Yes, that was what I was hoping to see eventually ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  If there _is_ a performance implication to worry about here, I think it
  would be that we are doing an extra malloc/free.
 
 Thanks for reminding me; yes, that also worried me.

As an aside, I worried about the extra allocation for reading the header
in the first place. But it looks like we only do this on the --literally
code path (and otherwise use the normal unpack_sha1_header).  Still, I
wonder if we could make this work automagically.  That is, speculatively
unpack the first N bytes, assuming we hit the end-of-header. If not,
then go to a strbuf as the slow path. Then it would be fine to cover all
cases; the normal ones would be fast, and only ridiculous things would
incur the extra allocation.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Eric Sunshine
On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak karthik@gmail.com wrote:
 Update sha1_loose_object_info() to optionally allow it to read
 from a loose object file of unknown/bogus type; as the function
 usually returns the type of the object it read in the form of enum
 for known types, add an optional typename field to receive the
 name of the type in textual form and a flag to indicate the reading
 of a loose object file of unknown/bogus type.
 [...]
 ---
 diff --git a/sha1_file.c b/sha1_file.c
 index 980ce6b..267399d 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
 *sha1,
  }

  static int sha1_loose_object_info(const unsigned char *sha1,
 - struct object_info *oi)
 + struct object_info *oi,
 + int flags)
  {
 -   int status;
 -   unsigned long mapsize, size;
 +   int status = 0;
 +   unsigned long mapsize;
 void *map;
 git_zstream stream;
 char hdr[32];
 +   struct strbuf hdrbuf = STRBUF_INIT;

 if (oi-delta_base_sha1)
 hashclr(oi-delta_base_sha1);
 @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char 
 *sha1,
 return -1;
 if (oi-disk_sizep)
 *oi-disk_sizep = mapsize;
 -   if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
 -   status = error(unable to unpack %s header,
 -  sha1_to_hex(sha1));
 -   else if ((status = parse_sha1_header(hdr, size))  0)
 -   status = error(unable to parse %s header, 
 sha1_to_hex(sha1));
 -   else if (oi-sizep)
 -   *oi-sizep = size;
 +   if ((flags  LOOKUP_LITERALLY)) {
 +   if (unpack_sha1_header_to_strbuf(stream, map, mapsize, 
 hdrbuf)  0)
 +   status = error(unable to unpack %s header with 
 --literally,
 +  sha1_to_hex(sha1));
 +   else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
 flags))  0)
 +   status = error(unable to parse %s header with 
 --literally,
 +  sha1_to_hex(sha1));
 +   } else {
 +   if (unpack_sha1_header(stream, map, mapsize, hdr, 
 sizeof(hdr))  0)
 +   status = error(unable to unpack %s header,
 +  sha1_to_hex(sha1));
 +   else if ((status = parse_sha1_header_extended(hdr, oi, 
 flags))  0)
 +   status = error(unable to parse %s header, 
 sha1_to_hex(sha1));
 +   }
 git_inflate_end(stream);
 munmap(map, mapsize);
 -   if (oi-typep)
 +   if (status  oi-typep)
 *oi-typep = status;
 +   if (hdrbuf.buf)
 +   strbuf_release(hdrbuf);

Why is strbuf_release() protected by a conditional rather than being
called unconditionally? Am I missing something obvious?

 return 0;
  }

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-15 Thread Junio C Hamano
Karthik Nayak karthik@gmail.com writes:

 Update sha1_loose_object_info() to optionally allow it to read
 from a loose object file of unknown/bogus type; as the function
 usually returns the type of the object it read in the form of enum
 for known types, add an optional typename field to receive the
 name of the type in textual form and a flag to indicate the reading
 of a loose object file of unknown/bogus type.

 Add parse_sha1_header_extended() which acts as a wrapper around
 parse_sha1_header() allowing more information to be obtained.

 Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
 unknown/corrupt objects which have a unknown sha1 header size to
 a strbuf structure. This was written by Junio C Hamano but tested
 by me.

 Helped-by: Junio C Hamano gits...@pobox.com
 Helped-by: Eric Sunshine sunsh...@sunshineco.com
 Signed-off-by: Karthik Nayak karthik@gmail.com
 ---

I see that you made the type parsing to happen earlier than the
previous round (which used to do the size first and then type).

Not a problem, though.  Just something I noticed...

 @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, 
 void *buffer, unsigned long s
   * too permissive for what we want to check. So do an anal
   * object header parse by hand.
   */
 -int parse_sha1_header(const char *hdr, unsigned long *sizep)
 +int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
 +unsigned int flags)
  {
 - char type[10];
 - int i;
 + struct strbuf typename = STRBUF_INIT;
   unsigned long size;
 + int type;
  
   /*
* The type can be at most ten bytes (including the
* terminating '\0' that we add), and is followed by
* a space.
*/
 - i = 0;
   for (;;) {
   char c = *hdr++;
   if (c == ' ')
   break;
 - type[i++] = c;
 - if (i = sizeof(type))
 - return -1;
 + strbuf_addch(typename, c);
   }
 - type[i] = 0;

This _might_ have some performance impact in that strbuf_addch()
involves strbuf_grow(*, 1), which does does it overflow to
increment sb-len by one?; I would say it should be unmeasurable
because the function is expected to be used only on loose objects
and you shouldn't have very many of them without packing in your
repository in the first place.

I guess Peff's c1822d4f (strbuf: add an optimized 1-character
strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
new strbuf_grow_ch(), and once that happens the performance worry
would disappear without this code to be changed at all.

 @@ -2541,7 +2596,7 @@ static int sha1_loose_object_info(const unsigned char 
 *sha1,
* return value implicitly indicates whether the
* object even exists.
*/
 - if (!oi-typep  !oi-sizep) {
 + if (!oi-typep  !oi-typename  !oi-sizep) {

You didn't have this check in the earlier round, and this new one is
correct, I think.  Good eyes to spot this potential problem.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-15 Thread Jeff King
On Wed, Apr 15, 2015 at 01:21:15PM -0700, Junio C Hamano wrote:

  -   type[i++] = c;
  -   if (i = sizeof(type))
  -   return -1;
  +   strbuf_addch(typename, c);
  }
  -   type[i] = 0;
 
 This _might_ have some performance impact in that strbuf_addch()
 involves strbuf_grow(*, 1), which does does it overflow to
 increment sb-len by one?; I would say it should be unmeasurable
 because the function is expected to be used only on loose objects
 and you shouldn't have very many of them without packing in your
 repository in the first place.
 
 I guess Peff's c1822d4f (strbuf: add an optimized 1-character
 strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
 new strbuf_grow_ch(), and once that happens the performance worry
 would disappear without this code to be changed at all.

I haven't re-rolled that series yet, but the discussion there showed
that strbuf_grow_ch() is unnecessary; call-sites can just check:

  if (!strbuf_avail(sb))
strbuf_grow(sb, 1);

to get the fast inline check. Since we go to the trouble to inline
strbuf_addch, we should probably teach it the same trick. It would be
nice to identify a place with a tight strbuf_addch() loop that could
demonstrate the speed increase, though.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html