Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
On Wed, Sep 16, 2015 at 10:06:10AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote: > > > >> Maybe we should stick to the established practice of the many, many > >> reentrant POSIX functions following the *_r() naming convention? I.e. > >> the reentrant version of localtime() is called localtime_r(), the > >> reentrant version of random() is called random_r() etc. > >> > >> So I could see myself not needing an explanation if I had read > >> sha1_to_hex_r(...). > > > > I like this suggestion. By itself, the "_r" does not communicate as much > > as "_to" to me, but as long as the reader knows the "_r" idiom, it > > communicates much more. > > > > I'll switch to this unless there is any objection. > > Fine by me. Thanks. I started on this, but realized something interesting as I was updating the docstrings. sha1_to_hex_r is truly reentrant. But find_unique_abbrev_r is not, as it calls has_sha1_file(), which touches lots of global data for the object lookup. I think it's probably OK, as long as I don't make the claim that it is truly reentrant. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
Jeff King writes: > On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote: > >> Maybe we should stick to the established practice of the many, many >> reentrant POSIX functions following the *_r() naming convention? I.e. >> the reentrant version of localtime() is called localtime_r(), the >> reentrant version of random() is called random_r() etc. >> >> So I could see myself not needing an explanation if I had read >> sha1_to_hex_r(...). > > I like this suggestion. By itself, the "_r" does not communicate as much > as "_to" to me, but as long as the reader knows the "_r" idiom, it > communicates much more. > > I'll switch to this unless there is any objection. Fine by me. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
On Wed, Sep 16, 2015 at 10:15:02AM +0200, Johannes Schindelin wrote: > Maybe we should stick to the established practice of the many, many > reentrant POSIX functions following the *_r() naming convention? I.e. > the reentrant version of localtime() is called localtime_r(), the > reentrant version of random() is called random_r() etc. > > So I could see myself not needing an explanation if I had read > sha1_to_hex_r(...). I like this suggestion. By itself, the "_r" does not communicate as much as "_to" to me, but as long as the reader knows the "_r" idiom, it communicates much more. I'll switch to this unless there is any objection. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
Hi Junio, Jeff & Ramsay, On 2015-09-16 03:32, Junio C Hamano wrote: > Jeff King writes: > >>> Hmm, I haven't read any other patches yet (including those which use these >>> new '_to' functions), but I can't help feeling they should be named >>> something >>> like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead. i.e. I >>> don't get >>> the '_to' thing - not that I'm any good at naming things ... >> >> I meant it as a contrast with their original. sha1_to_hex() formats into >> an internal buffer and returns it. But sha1_to_hex_to() formats "to" a >> buffer of your choice. > > I think that naming makes perfect sense. I have to admit that I needed the explanation. Maybe we should stick to the established practice of the many, many reentrant POSIX functions following the *_r() naming convention? I.e. the reentrant version of localtime() is called localtime_r(), the reentrant version of random() is called random_r() etc. So I could see myself not needing an explanation if I had read sha1_to_hex_r(...). Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
Jeff King writes: >> Hmm, I haven't read any other patches yet (including those which use these >> new '_to' functions), but I can't help feeling they should be named something >> like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead. i.e. I >> don't get >> the '_to' thing - not that I'm any good at naming things ... > > I meant it as a contrast with their original. sha1_to_hex() formats into > an internal buffer and returns it. But sha1_to_hex_to() formats "to" a > buffer of your choice. I think that naming makes perfect sense. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
On Tue, Sep 15, 2015 at 05:55:55PM +0100, Ramsay Jones wrote: > On 15/09/15 16:26, Jeff King wrote: > > The sha1_to_hex and find_unique_abbrev functions always > > write into reusable static buffers. There are a few problems > > with this: > > > > - future calls overwrite our result. This is especially > > annoying with find_unique_abbrev, which does not have a > > ring of buffers, so you cannot even printf() a result > > that has two abbreviated sha1s. > > > > - if you want to put the result into another buffer, we > > often strcpy, which looks suspicious when auditing for > > overflows. > > > > This patch introduces sha1_to_hex_to and find_unique_abbrev_to, > > which write into a user-provided buffer. Of course this is > > just punting on the overflow-auditing, as the buffer > > obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is > > much easier to audit, since that is a well-known size. > > Hmm, I haven't read any other patches yet (including those which use these > new '_to' functions), but I can't help feeling they should be named something > like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead. i.e. I > don't get > the '_to' thing - not that I'm any good at naming things ... I meant it as a contrast with their original. sha1_to_hex() formats into an internal buffer and returns it. But sha1_to_hex_to() formats "to" a buffer of your choice. I'm happy to switch the names to something else, but I don't think _str() conveys the difference. If I were starting from scratch, I would probably have just called my variant sha1_to_hex(), and called the original sha1_to_hex_unsafe(). :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
On 15/09/15 16:26, Jeff King wrote: > The sha1_to_hex and find_unique_abbrev functions always > write into reusable static buffers. There are a few problems > with this: > > - future calls overwrite our result. This is especially > annoying with find_unique_abbrev, which does not have a > ring of buffers, so you cannot even printf() a result > that has two abbreviated sha1s. > > - if you want to put the result into another buffer, we > often strcpy, which looks suspicious when auditing for > overflows. > > This patch introduces sha1_to_hex_to and find_unique_abbrev_to, > which write into a user-provided buffer. Of course this is > just punting on the overflow-auditing, as the buffer > obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is > much easier to audit, since that is a well-known size. Hmm, I haven't read any other patches yet (including those which use these new '_to' functions), but I can't help feeling they should be named something like 'sha1_to_hex_str()' and 'find_unique_abbrev_str()' instead. i.e. I don't get the '_to' thing - not that I'm any good at naming things ... ATB, Ramsay Jones > > We retain the non-reentrant forms, which just become thin > wrappers around the reentrant ones. This patch also adds a > strbuf variant of find_unique_abbrev, which will be handy in > later patches. > > Signed-off-by: Jeff King > --- > If we wanted to be really meticulous, these functions could > take a size for the output buffer, and complain if it is not > GIT_SHA1_HEXSZ+1 bytes. But that would bloat every call > like: > > sha1_to_hex_to(buf, sizeof(buf), sha1); > > cache.h | 27 ++- > hex.c | 13 + > sha1_name.c | 16 +++- > strbuf.c| 9 + > strbuf.h| 8 > 5 files changed, 63 insertions(+), 10 deletions(-) > > diff --git a/cache.h b/cache.h > index e231e47..cc59aba 100644 > --- a/cache.h > +++ b/cache.h > @@ -785,7 +785,21 @@ extern char *sha1_pack_name(const unsigned char *sha1); > */ > extern char *sha1_pack_index_name(const unsigned char *sha1); > > -extern const char *find_unique_abbrev(const unsigned char *sha1, int); > +/* > + * Return an abbreviated sha1 unique within this repository's object > database. > + * The result will be at least `len` characters long, and will be NUL > + * terminated. > + * > + * The non-`_to` version returns a static buffer which will be overwritten by > + * subsequent calls. > + * > + * The `_to` variant writes to a buffer supplied by the caller, which must be > + * at least `GIT_SHA1_HEXSZ + 1` bytes. The return value is the number of > bytes > + * written (excluding the NUL terminator). > + */ > +extern const char *find_unique_abbrev(const unsigned char *sha1, int len); > +extern int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int > len); > + > extern const unsigned char null_sha1[GIT_SHA1_RAWSZ]; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > @@ -1067,6 +1081,17 @@ extern int for_each_abbrev(const char *prefix, > each_abbrev_fn, void *); > extern int get_sha1_hex(const char *hex, unsigned char *sha1); > extern int get_oid_hex(const char *hex, struct object_id *sha1); > > +/* > + * Convert a binary sha1 to its hex equivalent. The `_to` variant writes > + * the NUL-terminated output to the buffer `out`, which must be at least > + * `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for convenience. > + * > + * The non-`_to` variant returns a static buffer, but uses a ring of 4 > + * buffers, making it safe to make multiple calls for a single statement, > like: > + * > + * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); > + */ > +extern char *sha1_to_hex_to(char *out, const unsigned char *sha1); > extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer > result! */ > extern char *oid_to_hex(const struct object_id *oid);/* same static > buffer as sha1_to_hex */ > > diff --git a/hex.c b/hex.c > index 899b74a..004fdea 100644 > --- a/hex.c > +++ b/hex.c > @@ -61,12 +61,10 @@ int get_oid_hex(const char *hex, struct object_id *oid) > return get_sha1_hex(hex, oid->hash); > } > > -char *sha1_to_hex(const unsigned char *sha1) > +char *sha1_to_hex_to(char *buffer, const unsigned char *sha1) > { > - static int bufno; > - static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; > static const char hex[] = "0123456789abcdef"; > - char *buffer = hexbuffer[3 & ++bufno], *buf = buffer; > + char *buf = buffer; > int i; > > for (i = 0; i < GIT_SHA1_RAWSZ; i++) { > @@ -79,6 +77,13 @@ char *sha1_to_hex(const unsigned char *sha1) > return buffer; > } > > +char *sha1_to_hex(const unsigned char *sha1) > +{ > + static int bufno; > + static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; > + return sha1_to_hex_to(hexbuffer[3 & ++bufno], sha1); > +} > + > char *oid_to_hex(const struct object_id *oid) > { >
[PATCH 08/67] add reentrant variants of sha1_to_hex and find_unique_abbrev
The sha1_to_hex and find_unique_abbrev functions always write into reusable static buffers. There are a few problems with this: - future calls overwrite our result. This is especially annoying with find_unique_abbrev, which does not have a ring of buffers, so you cannot even printf() a result that has two abbreviated sha1s. - if you want to put the result into another buffer, we often strcpy, which looks suspicious when auditing for overflows. This patch introduces sha1_to_hex_to and find_unique_abbrev_to, which write into a user-provided buffer. Of course this is just punting on the overflow-auditing, as the buffer obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is much easier to audit, since that is a well-known size. We retain the non-reentrant forms, which just become thin wrappers around the reentrant ones. This patch also adds a strbuf variant of find_unique_abbrev, which will be handy in later patches. Signed-off-by: Jeff King --- If we wanted to be really meticulous, these functions could take a size for the output buffer, and complain if it is not GIT_SHA1_HEXSZ+1 bytes. But that would bloat every call like: sha1_to_hex_to(buf, sizeof(buf), sha1); cache.h | 27 ++- hex.c | 13 + sha1_name.c | 16 +++- strbuf.c| 9 + strbuf.h| 8 5 files changed, 63 insertions(+), 10 deletions(-) diff --git a/cache.h b/cache.h index e231e47..cc59aba 100644 --- a/cache.h +++ b/cache.h @@ -785,7 +785,21 @@ extern char *sha1_pack_name(const unsigned char *sha1); */ extern char *sha1_pack_index_name(const unsigned char *sha1); -extern const char *find_unique_abbrev(const unsigned char *sha1, int); +/* + * Return an abbreviated sha1 unique within this repository's object database. + * The result will be at least `len` characters long, and will be NUL + * terminated. + * + * The non-`_to` version returns a static buffer which will be overwritten by + * subsequent calls. + * + * The `_to` variant writes to a buffer supplied by the caller, which must be + * at least `GIT_SHA1_HEXSZ + 1` bytes. The return value is the number of bytes + * written (excluding the NUL terminator). + */ +extern const char *find_unique_abbrev(const unsigned char *sha1, int len); +extern int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int len); + extern const unsigned char null_sha1[GIT_SHA1_RAWSZ]; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) @@ -1067,6 +1081,17 @@ extern int for_each_abbrev(const char *prefix, each_abbrev_fn, void *); extern int get_sha1_hex(const char *hex, unsigned char *sha1); extern int get_oid_hex(const char *hex, struct object_id *sha1); +/* + * Convert a binary sha1 to its hex equivalent. The `_to` variant writes + * the NUL-terminated output to the buffer `out`, which must be at least + * `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for convenience. + * + * The non-`_to` variant returns a static buffer, but uses a ring of 4 + * buffers, making it safe to make multiple calls for a single statement, like: + * + * printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two)); + */ +extern char *sha1_to_hex_to(char *out, const unsigned char *sha1); extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer result! */ extern char *oid_to_hex(const struct object_id *oid); /* same static buffer as sha1_to_hex */ diff --git a/hex.c b/hex.c index 899b74a..004fdea 100644 --- a/hex.c +++ b/hex.c @@ -61,12 +61,10 @@ int get_oid_hex(const char *hex, struct object_id *oid) return get_sha1_hex(hex, oid->hash); } -char *sha1_to_hex(const unsigned char *sha1) +char *sha1_to_hex_to(char *buffer, const unsigned char *sha1) { - static int bufno; - static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; static const char hex[] = "0123456789abcdef"; - char *buffer = hexbuffer[3 & ++bufno], *buf = buffer; + char *buf = buffer; int i; for (i = 0; i < GIT_SHA1_RAWSZ; i++) { @@ -79,6 +77,13 @@ char *sha1_to_hex(const unsigned char *sha1) return buffer; } +char *sha1_to_hex(const unsigned char *sha1) +{ + static int bufno; + static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; + return sha1_to_hex_to(hexbuffer[3 & ++bufno], sha1); +} + char *oid_to_hex(const struct object_id *oid) { return sha1_to_hex(oid->hash); diff --git a/sha1_name.c b/sha1_name.c index da6874c..416e408 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -368,14 +368,13 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data) return ds.ambiguous; } -const char *find_unique_abbrev(const unsigned char *sha1, int len) +int find_unique_abbrev_to(char *hex, const unsigned char *sha1, int len) { int status, exists; - static char hex[41]; - memcpy(hex, sha1_to_hex(sha1), 40); + sha1_to_hex_to(hex, sha1); if (len == 40 || !