Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-16 Thread Jeff King
On Wed, Sep 16, 2015 at 10:06:10AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote:
> >
> >> Maybe we should stick to the established practice of the many, many
> >> reentrant POSIX functions following the *_r() naming convention? I.e.
> >> the reentrant version of localtime() is called localtime_r(), the
> >> reentrant version of random() is called random_r() etc.
> >> 
> >> So I could see myself not needing an explanation if I had read
> >> sha1_to_hex_r(...).
> >
> > I like this suggestion. By itself, the "_r" does not communicate as much
> > as "_to" to me, but as long as the reader knows the "_r" idiom, it
> > communicates much more.
> >
> > I'll switch to this unless there is any objection.
> 
> Fine by me.  Thanks.

I started on this, but realized something interesting as I was updating
the docstrings. sha1_to_hex_r is truly reentrant. But
find_unique_abbrev_r is not, as it calls has_sha1_file(), which touches
lots of global data for the object lookup.

I think it's probably OK, as long as I don't make the claim that it is
truly reentrant.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-16 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote:
>
>> Maybe we should stick to the established practice of the many, many
>> reentrant POSIX functions following the *_r() naming convention? I.e.
>> the reentrant version of localtime() is called localtime_r(), the
>> reentrant version of random() is called random_r() etc.
>> 
>> So I could see myself not needing an explanation if I had read
>> sha1_to_hex_r(...).
>
> I like this suggestion. By itself, the "_r" does not communicate as much
> as "_to" to me, but as long as the reader knows the "_r" idiom, it
> communicates much more.
>
> I'll switch to this unless there is any objection.

Fine by me.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-16 Thread Jeff King
On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote:

> Maybe we should stick to the established practice of the many, many
> reentrant POSIX functions following the *_r() naming convention? I.e.
> the reentrant version of localtime() is called localtime_r(), the
> reentrant version of random() is called random_r() etc.
> 
> So I could see myself not needing an explanation if I had read
> sha1_to_hex_r(...).

I like this suggestion. By itself, the "_r" does not communicate as much
as "_to" to me, but as long as the reader knows the "_r" idiom, it
communicates much more.

I'll switch to this unless there is any objection.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-16 Thread Johannes Schindelin
Hi Junio, Jeff & Ramsay,

On 2015-09-16 03:32, Junio C Hamano wrote:
> Jeff King  writes:
> 
>>> Hmm, I haven't read any other patches yet (including those which use these
>>> new '_to' functions), but I can't help feeling they should be named 
>>> something
>>> like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead.  i.e. I 
>>> don't get
>>> the '_to' thing - not that I'm any good at naming things ...
>>
>> I meant it as a contrast with their original. sha1_to_hex() formats into
>> an internal buffer and returns it. But sha1_to_hex_to() formats "to" a
>> buffer of your choice.
> 
> I think that naming makes perfect sense.

I have to admit that I needed the explanation.

Maybe we should stick to the established practice of the many, many reentrant 
POSIX functions following the *_r() naming convention? I.e. the reentrant 
version of localtime() is called localtime_r(), the reentrant version of 
random() is called random_r() etc.

So I could see myself not needing an explanation if I had read 
sha1_to_hex_r(...).

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-15 Thread Junio C Hamano
Jeff King  writes:

>> Hmm, I haven't read any other patches yet (including those which use these
>> new '_to' functions), but I can't help feeling they should be named something
>> like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead.  i.e. I 
>> don't get
>> the '_to' thing - not that I'm any good at naming things ...
>
> I meant it as a contrast with their original. sha1_to_hex() formats into
> an internal buffer and returns it. But sha1_to_hex_to() formats "to" a
> buffer of your choice.

I think that naming makes perfect sense.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-15 Thread Jeff King
On Tue, Sep 15, 2015 at 05:55:55PM +0100, Ramsay Jones wrote:

> On 15/09/15 16:26, Jeff King wrote:
> > The sha1_to_hex and find_unique_abbrev functions always
> > write into reusable static buffers. There are a few problems
> > with this:
> >
> >   - future calls overwrite our result. This is especially
> > annoying with find_unique_abbrev, which does not have a
> > ring of buffers, so you cannot even printf() a result
> > that has two abbreviated sha1s.
> >
> >   - if you want to put the result into another buffer, we
> > often strcpy, which looks suspicious when auditing for
> > overflows.
> >
> > This patch introduces sha1_to_hex_to and find_unique_abbrev_to,
> > which write into a user-provided buffer. Of course this is
> > just punting on the overflow-auditing, as the buffer
> > obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
> > much easier to audit, since that is a well-known size.
> 
> Hmm, I haven't read any other patches yet (including those which use these
> new '_to' functions), but I can't help feeling they should be named something
> like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead.  i.e. I 
> don't get
> the '_to' thing - not that I'm any good at naming things ...

I meant it as a contrast with their original. sha1_to_hex() formats into
an internal buffer and returns it. But sha1_to_hex_to() formats "to" a
buffer of your choice.

I'm happy to switch the names to something else, but I don't think
_str() conveys the difference. If I were starting from scratch, I would
probably have just called my variant sha1_to_hex(), and called the
original sha1_to_hex_unsafe(). :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-15 Thread Ramsay Jones


On 15/09/15 16:26, Jeff King wrote:
> The sha1_to_hex and find_unique_abbrev functions always
> write into reusable static buffers. There are a few problems
> with this:
>
>   - future calls overwrite our result. This is especially
> annoying with find_unique_abbrev, which does not have a
> ring of buffers, so you cannot even printf() a result
> that has two abbreviated sha1s.
>
>   - if you want to put the result into another buffer, we
> often strcpy, which looks suspicious when auditing for
> overflows.
>
> This patch introduces sha1_to_hex_to and find_unique_abbrev_to,
> which write into a user-provided buffer. Of course this is
> just punting on the overflow-auditing, as the buffer
> obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
> much easier to audit, since that is a well-known size.

Hmm, I haven't read any other patches yet (including those which use these
new '_to' functions), but I can't help feeling they should be named something
like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead.  i.e. I don't 
get
the '_to' thing - not that I'm any good at naming things ...

ATB,
Ramsay Jones

>
> We retain the non-reentrant forms, which just become thin
> wrappers around the reentrant ones. This patch also adds a
> strbuf variant of find_unique_abbrev, which will be handy in
> later patches.
>
> Signed-off-by: Jeff King 
> ---
> If we wanted to be really meticulous, these functions could
> take a size for the output buffer, and complain if it is not
> GIT_SHA1_HEXSZ+1 bytes. But that would bloat every call
> like:
>
>   sha1_to_hex_to(buf, sizeof(buf), sha1);
>
>  cache.h | 27 ++-
>  hex.c   | 13 +
>  sha1_name.c | 16 +++-
>  strbuf.c|  9 +
>  strbuf.h|  8 
>  5 files changed, 63 insertions(+), 10 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index e231e47..cc59aba 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -785,7 +785,21 @@ extern char *sha1_pack_name(const unsigned char *sha1);
>   */
>  extern char *sha1_pack_index_name(const unsigned char *sha1);
>  
> -extern const char *find_unique_abbrev(const unsigned char *sha1, int);
> +/*
> + * Return an abbreviated sha1 unique within this repository's object 
> database.
> + * The result will be at least `len` characters long, and will be NUL
> + * terminated.
> + *
> + * The non-`_to` version returns a static buffer which will be overwritten by
> + * subsequent calls.
> + *
> + * The `_to` variant writes to a buffer supplied by the caller, which must be
> + * at least `GIT_SHA1_HEXSZ + 1` bytes. The return value is the number of 
> bytes
> + * written (excluding the NUL terminator).
> + */
> +extern const char *find_unique_abbrev(const unsigned char *sha1, int len);
> +extern int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int 
> len);
> +
>  extern const unsigned char null_sha1[GIT_SHA1_RAWSZ];
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
> @@ -1067,6 +1081,17 @@ extern int for_each_abbrev(const char *prefix, 
> each_abbrev_fn, void *);
>  extern int get_sha1_hex(const char *hex, unsigned char *sha1);
>  extern int get_oid_hex(const char *hex, struct object_id *sha1);
>  
> +/*
> + * Convert a binary sha1 to its hex equivalent. The `_to` variant writes
> + * the NUL-terminated output to the buffer `out`, which must be at least
> + * `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for convenience.
> + *
> + * The non-`_to` variant returns a static buffer, but uses a ring of 4
> + * buffers, making it safe to make multiple calls for a single statement, 
> like:
> + *
> + *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
> + */
> +extern char *sha1_to_hex_to(char *out, const unsigned char *sha1);
>  extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer 
> result! */
>  extern char *oid_to_hex(const struct object_id *oid);/* same static 
> buffer as sha1_to_hex */
>  
> diff --git a/hex.c b/hex.c
> index 899b74a..004fdea 100644
> --- a/hex.c
> +++ b/hex.c
> @@ -61,12 +61,10 @@ int get_oid_hex(const char *hex, struct object_id *oid)
>   return get_sha1_hex(hex, oid->hash);
>  }
>  
> -char *sha1_to_hex(const unsigned char *sha1)
> +char *sha1_to_hex_to(char *buffer, const unsigned char *sha1)
>  {
> - static int bufno;
> - static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
>   static const char hex[] = "0123456789abcdef";
> - char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
> + char *buf = buffer;
>   int i;
>  
>   for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
> @@ -79,6 +77,13 @@ char *sha1_to_hex(const unsigned char *sha1)
>   return buffer;
>  }
>  
> +char *sha1_to_hex(const unsigned char *sha1)
> +{
> + static int bufno;
> + static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
> + return sha1_to_hex_to(hexbuffer[3 & ++bufno], sha1);
> +}
> +
>  char *oid_to_hex(const struct object_id *oid)
>  {
>

[PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev

2015-09-15 Thread Jeff King
The sha1_to_hex and find_unique_abbrev functions always
write into reusable static buffers. There are a few problems
with this:

  - future calls overwrite our result. This is especially
annoying with find_unique_abbrev, which does not have a
ring of buffers, so you cannot even printf() a result
that has two abbreviated sha1s.

  - if you want to put the result into another buffer, we
often strcpy, which looks suspicious when auditing for
overflows.

This patch introduces sha1_to_hex_to and find_unique_abbrev_to,
which write into a user-provided buffer. Of course this is
just punting on the overflow-auditing, as the buffer
obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
much easier to audit, since that is a well-known size.

We retain the non-reentrant forms, which just become thin
wrappers around the reentrant ones. This patch also adds a
strbuf variant of find_unique_abbrev, which will be handy in
later patches.

Signed-off-by: Jeff King 
---
If we wanted to be really meticulous, these functions could
take a size for the output buffer, and complain if it is not
GIT_SHA1_HEXSZ+1 bytes. But that would bloat every call
like:

  sha1_to_hex_to(buf, sizeof(buf), sha1);

 cache.h | 27 ++-
 hex.c   | 13 +
 sha1_name.c | 16 +++-
 strbuf.c|  9 +
 strbuf.h|  8 
 5 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index e231e47..cc59aba 100644
--- a/cache.h
+++ b/cache.h
@@ -785,7 +785,21 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern const char *find_unique_abbrev(const unsigned char *sha1, int);
+/*
+ * Return an abbreviated sha1 unique within this repository's object database.
+ * The result will be at least `len` characters long, and will be NUL
+ * terminated.
+ *
+ * The non-`_to` version returns a static buffer which will be overwritten by
+ * subsequent calls.
+ *
+ * The `_to` variant writes to a buffer supplied by the caller, which must be
+ * at least `GIT_SHA1_HEXSZ + 1` bytes. The return value is the number of bytes
+ * written (excluding the NUL terminator).
+ */
+extern const char *find_unique_abbrev(const unsigned char *sha1, int len);
+extern int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int 
len);
+
 extern const unsigned char null_sha1[GIT_SHA1_RAWSZ];
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
@@ -1067,6 +1081,17 @@ extern int for_each_abbrev(const char *prefix, 
each_abbrev_fn, void *);
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern int get_oid_hex(const char *hex, struct object_id *sha1);
 
+/*
+ * Convert a binary sha1 to its hex equivalent. The `_to` variant writes
+ * the NUL-terminated output to the buffer `out`, which must be at least
+ * `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for convenience.
+ *
+ * The non-`_to` variant returns a static buffer, but uses a ring of 4
+ * buffers, making it safe to make multiple calls for a single statement, like:
+ *
+ *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
+ */
+extern char *sha1_to_hex_to(char *out, const unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
 extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
 
diff --git a/hex.c b/hex.c
index 899b74a..004fdea 100644
--- a/hex.c
+++ b/hex.c
@@ -61,12 +61,10 @@ int get_oid_hex(const char *hex, struct object_id *oid)
return get_sha1_hex(hex, oid->hash);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *sha1_to_hex_to(char *buffer, const unsigned char *sha1)
 {
-   static int bufno;
-   static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
static const char hex[] = "0123456789abcdef";
-   char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
+   char *buf = buffer;
int i;
 
for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
@@ -79,6 +77,13 @@ char *sha1_to_hex(const unsigned char *sha1)
return buffer;
 }
 
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   static int bufno;
+   static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
+   return sha1_to_hex_to(hexbuffer[3 & ++bufno], sha1);
+}
+
 char *oid_to_hex(const struct object_id *oid)
 {
return sha1_to_hex(oid->hash);
diff --git a/sha1_name.c b/sha1_name.c
index da6874c..416e408 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -368,14 +368,13 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn 
fn, void *cb_data)
return ds.ambiguous;
 }
 
-const char *find_unique_abbrev(const unsigned char *sha1, int len)
+int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int len)
 {
int status, exists;
-   static char hex[41];
 
-   memcpy(hex, sha1_to_hex(sha1), 40);
+   sha1_to_hex_to(hex, sha1);
if (len == 40 || !