Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 9:26 AM, Jeff King p...@peff.net wrote: On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote: Clients of strbuf rightly expect the buffer to grow as needed in order to complete the requested operation. It is, therefore, both weird and expectation-breaking for strbuf_addftime() to lack this behavior. Worse, it doesn't even signal when the format has failed due to insufficient buffer space. How about taking this approach (or something similar), instead, which grows the strbuf as needed? Here's a patch, on top of jk/date-mode-format (I think it would also be fine to just squash into the tip commit; the explanation in the commit message is sufficiently mirrored in the code comment). While cleaning up old local branches, I noticed that, although the jk/date-mode-format topic[1] made it into 'next' (and will be merged to 'master' according to What's cooking[2]), the below follow-on patch[3] which improves strbuf_addftime() never got picked up. Was this omission intentional? Based upon the discussion[4], I was under the impression that the patch was considered reasonably acceptable (and did not worsen problems with bogus format strings -- which are bogus anyway). [1]: http://thread.gmane.org/gmane.comp.version-control.git/272658/focus=272695 [2]: http://news.gmane.org/gmane.comp.version-control.git [3]: http://article.gmane.org/gmane.comp.version-control.git/273061 [4]: http://thread.gmane.org/gmane.comp.version-control.git/272658/focus=273026 -- 8 -- Subject: [PATCH] strbuf: make strbuf_addftime more robust The return value of strftime is poorly designed; when it returns 0, the caller cannot tell if the buffer was not large enough, or if the output was actually 0 bytes. In the original implementation of strbuf_addftime, we simply punted and guessed that our 128-byte hint would be large enough. We can do better, though, if we're willing to treat strftime like less of a black box. We can munge the incoming format to make sure that it never produces 0-length output, and then fix the resulting output. That lets us reliably grow the buffer based on strftime's return value. Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com Signed-off-by: Jeff King p...@peff.net --- strbuf.c| 38 +- t/t6300-for-each-ref.sh | 10 ++ 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/strbuf.c b/strbuf.c index a7ba028..e5e7370 100644 --- a/strbuf.c +++ b/strbuf.c @@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...) void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { + size_t hint = 128; size_t len; - /* -* strftime reports 0 if it could not fit the result in the buffer. -* Unfortunately, it also reports 0 if the requested time string -* takes 0 bytes. So if we were to probe and grow, we have to choose -* some arbitrary cap beyond which we guess that the format probably -* just results in a 0-length output. Since we have to choose some -* reasonable cap anyway, and since it is not that big, we may -* as well just grow to their in the first place. -*/ - strbuf_grow(sb, 128); + if (!*fmt) + return; + + strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); if (!len) { /* -* Either we failed, or the format actually produces a 0-length -* output. There's not much we can do, so we leave it blank. -* However, the output array is left in an undefined state, so -* we must re-assert our NUL terminator. +* strftime reports 0 if it could not fit the result in the buffer. +* Unfortunately, it also reports 0 if the requested time string +* takes 0 bytes. So our strategy is to munge the format so that the +* output contains at least one character, and then drop the extra +* character before returning. */ - sb-buf[sb-len] = '\0'; - } else { - sb-len += len; + struct strbuf munged_fmt = STRBUF_INIT; + strbuf_addf(munged_fmt, %s , fmt); + while (!len) { + hint *= 2; + strbuf_grow(sb, hint); + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, + munged_fmt.buf, tm); + } + strbuf_release(munged_fmt); + len--; /* drop munged space */ } + strbuf_setlen(sb, sb-len + len); } diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index c7f368c..7c9bec7 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -235,6 +235,16 @@
Re: [PATCH 3/3] introduce format date-mode
On Mon, Jul 20, 2015 at 08:41:08PM -0400, Eric Sunshine wrote: Here's a patch, on top of jk/date-mode-format (I think it would also be fine to just squash into the tip commit; the explanation in the commit message is sufficiently mirrored in the code comment). While cleaning up old local branches, I noticed that, although the jk/date-mode-format topic[1] made it into 'next' (and will be merged to 'master' according to What's cooking[2]), the below follow-on patch[3] which improves strbuf_addftime() never got picked up. Was this omission intentional? Based upon the discussion[4], I was under the impression that the patch was considered reasonably acceptable (and did not worsen problems with bogus format strings -- which are bogus anyway). Thanks for noticing. I do think the patch you quoted (to loop and grow the strbuf) is a good change. The original code would easily bite somebody with a really large date format, whereas this should work sanely everywhere, short of malformed inputs. And even then, I'd expect reasonable behavior on most systems. The obvious thing to worry about is a system where feeding a malformed % causes strftime to return 0, no matter what, and we reallocated and loop forever. But: 1. I don't even know if such a system exists. 2. We probably would blow up on malloc() eventually, so it wouldn't even be a real infinite loop. So I think the worst case is probably that we get a report later on from somebody on an arcane system that says I fed crap to --date=format, and my git died with an out-of-memory error, and then we figure out exactly _how_ their system is weird and deal with it then. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote: Clients of strbuf rightly expect the buffer to grow as needed in order to complete the requested operation. It is, therefore, both weird and expectation-breaking for strbuf_addftime() to lack this behavior. Worse, it doesn't even signal when the format has failed due to insufficient buffer space. How about taking this approach (or something similar), instead, which grows the strbuf as needed? Here's a patch, on top of jk/date-mode-format (I think it would also be fine to just squash into the tip commit; the explanation in the commit message is sufficiently mirrored in the code comment). -- 8 -- Subject: [PATCH] strbuf: make strbuf_addftime more robust The return value of strftime is poorly designed; when it returns 0, the caller cannot tell if the buffer was not large enough, or if the output was actually 0 bytes. In the original implementation of strbuf_addftime, we simply punted and guessed that our 128-byte hint would be large enough. We can do better, though, if we're willing to treat strftime like less of a black box. We can munge the incoming format to make sure that it never produces 0-length output, and then fix the resulting output. That lets us reliably grow the buffer based on strftime's return value. Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com Signed-off-by: Jeff King p...@peff.net --- strbuf.c| 38 +- t/t6300-for-each-ref.sh | 10 ++ 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/strbuf.c b/strbuf.c index a7ba028..e5e7370 100644 --- a/strbuf.c +++ b/strbuf.c @@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...) void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { + size_t hint = 128; size_t len; - /* -* strftime reports 0 if it could not fit the result in the buffer. -* Unfortunately, it also reports 0 if the requested time string -* takes 0 bytes. So if we were to probe and grow, we have to choose -* some arbitrary cap beyond which we guess that the format probably -* just results in a 0-length output. Since we have to choose some -* reasonable cap anyway, and since it is not that big, we may -* as well just grow to their in the first place. -*/ - strbuf_grow(sb, 128); + if (!*fmt) + return; + + strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); if (!len) { /* -* Either we failed, or the format actually produces a 0-length -* output. There's not much we can do, so we leave it blank. -* However, the output array is left in an undefined state, so -* we must re-assert our NUL terminator. +* strftime reports 0 if it could not fit the result in the buffer. +* Unfortunately, it also reports 0 if the requested time string +* takes 0 bytes. So our strategy is to munge the format so that the +* output contains at least one character, and then drop the extra +* character before returning. */ - sb-buf[sb-len] = '\0'; - } else { - sb-len += len; + struct strbuf munged_fmt = STRBUF_INIT; + strbuf_addf(munged_fmt, %s , fmt); + while (!len) { + hint *= 2; + strbuf_grow(sb, hint); + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, + munged_fmt.buf, tm); + } + strbuf_release(munged_fmt); + len--; /* drop munged space */ } + strbuf_setlen(sb, sb-len + len); } diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index c7f368c..7c9bec7 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -235,6 +235,16 @@ test_expect_success 'Check format of strftime date fields' ' test_cmp expected actual ' +test_expect_success 'exercise strftime with odd fields' ' + echo expected + git for-each-ref --format=%(authordate:format:) refs/heads actual + test_cmp expected actual + long=long format -- $_z40$_z40$_z40$_z40$_z40$_z40$_z40 + echo $long expected + git for-each-ref --format=%(authordate:format:$long) refs/heads actual + test_cmp expected actual +' + cat expected \EOF refs/heads/master refs/remotes/origin/master -- 2.5.0.rc0.336.g8460790 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote: Clients of strbuf rightly expect the buffer to grow as needed in order to complete the requested operation. It is, therefore, both weird and expectation-breaking for strbuf_addftime() to lack this behavior. Worse, it doesn't even signal when the format has failed due to insufficient buffer space. Agreed on all points. --- 8 --- void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { size_t len; struct strbuf f = STRBUF_INIT; /* * This is a bit tricky since strftime returns 0 if the result did not * fit in the supplied buffer, as well as when the formatted time has * zero length. In the former case, we need to grow the buffer and try * again. To distinguish between the two cases, we supply strftime with * a format string one character longer than what the client supplied, * which ensures that a successful format will have non-zero length, * and then drop the extra character from the formatted time before * returning. */ strbuf_addf(f, %s , fmt); Basically I was trying to avoid making any assumptions about exactly how strftime works. But presumably stick a space in the format is a universally reasonable thing to do. It's a hack, but it's contained to the function. do { strbuf_grow(sb, 128); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } while (!len); I think we need to keep growing this 128 ourselves, or else each loop iteration will just say yup, we have 128 bytes available; no need to grow. [...] If this is performance critical code, then the augmented format string can be constructed with less expensive functions than strbuf_addf(). This does get called a lot (e.g., once per commit). One extra allocation would probably not kill us there, but I think we could fairly trivially put this on the unlikely path: size_t hint = 128; size_t len; /* optimize out obvious 0-length case */ if (!*fmt) return; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); /* maybe not enough room, or maybe 0-length output */ if (!len) { struct strbuf f = STRBUF_INIT; strbuf_addf(f, %s , fmt); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } } I'd guess most cases will fit in 128 bytes and never even hit this code path. You could also get fancier and start the buffer smaller, but only do the fmt hack when we cross a threshold. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
Jeff King p...@peff.net writes: strbuf_addf(f, %s , fmt); Basically I was trying to avoid making any assumptions about exactly how strftime works. But presumably stick a space in the format is a universally reasonable thing to do. It's a hack, but it's contained to the function. Why can't I shake this feeling that ( %s, fmt), i.e. prepend not append, is the safer thing to do than to append? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 1:58 PM, Jeff King p...@peff.net wrote: On Tue, Jun 30, 2015 at 12:58:33PM -0400, Eric Sunshine wrote: Beyond the extra allocation, I was also concerned about the sledgehammer approach of %s to append a single character when there are much less expensive ways to do so. I don't think there's any other way. We have to feed a contiguous buffer to strftime, and we don't own the buffer, so we have to make a new copy. Sorry, I meant that the interpolation expense of %s . A cheaper (but more verbose) alternative might be: size_t n = strlen(fmt); const char *f = xmalloc(n + 2); strcpy(f, fmt); f[n] = ' '; f[n + 1] = '\0'; ... free(f); or something similar. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 09:22:18AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: strbuf_addf(f, %s , fmt); Basically I was trying to avoid making any assumptions about exactly how strftime works. But presumably stick a space in the format is a universally reasonable thing to do. It's a hack, but it's contained to the function. Why can't I shake this feeling that ( %s, fmt), i.e. prepend not append, is the safer thing to do than to append? Because then removing the extra space involves `memmove` of the buffer, rather than just shortening the length by one. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 6:20 AM, Jeff King p...@peff.net wrote: On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote: void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { size_t len; struct strbuf f = STRBUF_INIT; /* * This is a bit tricky since strftime returns 0 if the result did not * fit in the supplied buffer, as well as when the formatted time has * zero length. In the former case, we need to grow the buffer and try * again. To distinguish between the two cases, we supply strftime with * a format string one character longer than what the client supplied, * which ensures that a successful format will have non-zero length, * and then drop the extra character from the formatted time before * returning. */ strbuf_addf(f, %s , fmt); Basically I was trying to avoid making any assumptions about exactly how strftime works. But presumably stick a space in the format is a universally reasonable thing to do. It's a hack, but it's contained to the function. I don't think we're making any assumptions about strftime(). POSIX states: The format string consists of zero or more conversion specifications and ordinary characters. [...] All ordinary characters (including the terminating NUL character) are copied unchanged into the array. So, we seem to be on solid footing with this approach (even though it's a localized hack). do { strbuf_grow(sb, 128); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } while (!len); I think we need to keep growing this 128 ourselves, or else each loop iteration will just say yup, we have 128 bytes available; no need to grow. Yeah, I toyed with the idea of increasing the requested amount each iteration but wanted to keep the example simple, thus left it out. However, for some reason, I was thinking that strbuf_grow() was unconditionally expanding the buffer by the requested amount rather than merely ensuring that that amount was availabile, so the amount clearly needs to be increased on each iteration. Thanks for pointing that out. If this is performance critical code, then the augmented format string can be constructed with less expensive functions than strbuf_addf(). This does get called a lot (e.g., once per commit). One extra allocation would probably not kill us there [...] Beyond the extra allocation, I was also concerned about the sledgehammer approach of %s to append a single character when there are much less expensive ways to do so. [...] but I think we could fairly trivially put this on the unlikely path: size_t hint = 128; size_t len; /* optimize out obvious 0-length case */ if (!*fmt) return; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); /* maybe not enough room, or maybe 0-length output */ if (!len) { struct strbuf f = STRBUF_INIT; strbuf_addf(f, %s , fmt); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } } I'd guess most cases will fit in 128 bytes and never even hit this code path. You could also get fancier and start the buffer smaller, but only do the fmt hack when we cross a threshold. Yep, I toyed with other looping constructs as well before settling upon do-while in the example for its simplicity. If called often enough, this sort of optimization seems reasonable enough. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
Jeff King p...@peff.net writes: This does get called a lot (e.g., once per commit). One extra allocation would probably not kill us there, but I think we could fairly trivially put this on the unlikely path: size_t hint = 128; size_t len; /* optimize out obvious 0-length case */ if (!*fmt) return; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); /* maybe not enough room, or maybe 0-length output */ if (!len) { struct strbuf f = STRBUF_INIT; strbuf_addf(f, %s , fmt); while (!len) { hint *= 2; strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } } I'd guess most cases will fit in 128 bytes and never even hit this code path. You could also get fancier and start the buffer smaller, but only do the fmt hack when we cross a threshold. I'd assume that the hint thing will persist across calls somehow? In an invocation of git log --date=format:some format for millions of commits, it is likely that the length of the formatted date string will stay the same or close to the same (yeah, I know Wednesday would be longer than Monday). Answering myself to my earlier question, the reason is because I was worried what happens when given fmt is a malformed strftime format specifier. Perhaps it ends with a lone % and % may format to something unexpected, or something. Are we checking an error from strftime(3)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 9:26 AM, Jeff King p...@peff.net wrote: On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote: Clients of strbuf rightly expect the buffer to grow as needed in order to complete the requested operation. It is, therefore, both weird and expectation-breaking for strbuf_addftime() to lack this behavior. Worse, it doesn't even signal when the format has failed due to insufficient buffer space. How about taking this approach (or something similar), instead, which grows the strbuf as needed? Here's a patch, on top of jk/date-mode-format (I think it would also be fine to just squash into the tip commit; the explanation in the commit message is sufficiently mirrored in the code comment). As a logical change in itself, I could also see introduction of strbuf_addftime() split out into its own patch (with this patch squashed in). Either way, it's a welcome improvement over the original. -- 8 -- Subject: [PATCH] strbuf: make strbuf_addftime more robust The return value of strftime is poorly designed; when it returns 0, the caller cannot tell if the buffer was not large enough, or if the output was actually 0 bytes. In the original implementation of strbuf_addftime, we simply punted and guessed that our 128-byte hint would be large enough. We can do better, though, if we're willing to treat strftime like less of a black box. We can munge the incoming format to make sure that it never produces 0-length output, and then fix the resulting output. That lets us reliably grow the buffer based on strftime's return value. Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com Signed-off-by: Jeff King p...@peff.net --- strbuf.c| 38 +- t/t6300-for-each-ref.sh | 10 ++ 2 files changed, 31 insertions(+), 17 deletions(-) diff --git a/strbuf.c b/strbuf.c index a7ba028..e5e7370 100644 --- a/strbuf.c +++ b/strbuf.c @@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...) void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { + size_t hint = 128; size_t len; - /* -* strftime reports 0 if it could not fit the result in the buffer. -* Unfortunately, it also reports 0 if the requested time string -* takes 0 bytes. So if we were to probe and grow, we have to choose -* some arbitrary cap beyond which we guess that the format probably -* just results in a 0-length output. Since we have to choose some -* reasonable cap anyway, and since it is not that big, we may -* as well just grow to their in the first place. -*/ - strbuf_grow(sb, 128); + if (!*fmt) + return; + + strbuf_grow(sb, hint); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); if (!len) { /* -* Either we failed, or the format actually produces a 0-length -* output. There's not much we can do, so we leave it blank. -* However, the output array is left in an undefined state, so -* we must re-assert our NUL terminator. +* strftime reports 0 if it could not fit the result in the buffer. +* Unfortunately, it also reports 0 if the requested time string +* takes 0 bytes. So our strategy is to munge the format so that the +* output contains at least one character, and then drop the extra +* character before returning. */ - sb-buf[sb-len] = '\0'; - } else { - sb-len += len; + struct strbuf munged_fmt = STRBUF_INIT; + strbuf_addf(munged_fmt, %s , fmt); + while (!len) { + hint *= 2; + strbuf_grow(sb, hint); + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, + munged_fmt.buf, tm); + } + strbuf_release(munged_fmt); + len--; /* drop munged space */ } + strbuf_setlen(sb, sb-len + len); } diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index c7f368c..7c9bec7 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -235,6 +235,16 @@ test_expect_success 'Check format of strftime date fields' ' test_cmp expected actual ' +test_expect_success 'exercise strftime with odd fields' ' + echo expected + git for-each-ref --format=%(authordate:format:) refs/heads actual + test_cmp expected actual + long=long format -- $_z40$_z40$_z40$_z40$_z40$_z40$_z40 + echo $long expected + git for-each-ref --format=%(authordate:format:$long) refs/heads actual + test_cmp expected actual +' + cat expected \EOF refs/heads/master
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 12:58:33PM -0400, Eric Sunshine wrote: Basically I was trying to avoid making any assumptions about exactly how strftime works. But presumably stick a space in the format is a universally reasonable thing to do. It's a hack, but it's contained to the function. I don't think we're making any assumptions about strftime(). POSIX states: The format string consists of zero or more conversion specifications and ordinary characters. [...] All ordinary characters (including the terminating NUL character) are copied unchanged into the array. So, we seem to be on solid footing with this approach (even though it's a localized hack). Yeah, sorry I wasn't more clear. I had originally been thinking of making assumptions like well, %c cannot ever be blank. But your solution does not suffer from that level of knowledge. I think it is reasonably clever. Yeah, I toyed with the idea of increasing the requested amount each iteration but wanted to keep the example simple, thus left it out. However, for some reason, I was thinking that strbuf_grow() was unconditionally expanding the buffer by the requested amount rather than merely ensuring that that amount was availabile, so the amount clearly needs to be increased on each iteration. Thanks for pointing that out. FWIW, I had to look at it to double-check. I've often made the same mistake. Beyond the extra allocation, I was also concerned about the sledgehammer approach of %s to append a single character when there are much less expensive ways to do so. I don't think there's any other way. We have to feed a contiguous buffer to strftime, and we don't own the buffer, so we have to make a new copy. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 1:05 PM, Junio C Hamano gits...@pobox.com wrote: Answering myself to my earlier question, the reason is because I was worried what happens when given fmt is a malformed strftime format specifier. Perhaps it ends with a lone % and % may format to something unexpected, or something. That's a good point. I had considered prepending the extra character (space) rather than appending it but eventually rejected it to avoid the expense of shifting the characters down by one before returning the formatted string. However, is it our responsibility to guard against a malformed format? POSIX doesn't state the behavior of % , so I don't think we are any worse off by appending space to a malformed format ending with % since the malformed format could wreak havoc even without our transformation. Are we checking an error from strftime(3)? According to the BUGS section in POSIX: If the output string would exceed max bytes, errno is not set. This makes it impossible to distinguish this error case from cases where the format string legitimately produces a zero-length output string. POSIX.1-2001 does not specify any errno settings for strftime(). So, there does not seem to be a point in checking 'errno'. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 10:05:33AM -0700, Junio C Hamano wrote: I'd guess most cases will fit in 128 bytes and never even hit this code path. You could also get fancier and start the buffer smaller, but only do the fmt hack when we cross a threshold. I'd assume that the hint thing will persist across calls somehow? In an invocation of git log --date=format:some format for millions of commits, it is likely that the length of the formatted date string will stay the same or close to the same (yeah, I know Wednesday would be longer than Monday). I hadn't thought about that. It could persist, but I don't think this is necessarily the right place to do it. For two reasons: 1. You have no idea in strbuf_addftime if it's the same fmt being added over and over. This is the wrong place to make that optimization. 2. If you are interested in efficiency in a loop, then you should be reusing the same strbuf over and over, and avoiding the extra allocation in the first place. And that is indeed what we do for git log --date, as we will always use the same static-local buffer in show_date(). Answering myself to my earlier question, the reason is because I was worried what happens when given fmt is a malformed strftime format specifier. Perhaps it ends with a lone % and % may format to something unexpected, or something. Are we checking an error from strftime(3)? POSIX doesn't define any errno values for strftime (and in fact says No errors are defined). The return value description for POSIX (and the glibc manpage) talk about only whether or not the output fits. However, POSIX does say If a conversion specifier is not one of the above, the behavior is undefined. So certainly I could imagine an implementation that returns 0 when you feed it a bogus value. If you (as a user) feed us crap to give to strftime, I am not particularly concerned with whether you get crap out. My main concern is that it would return 0 and we would loop forever. OTOH, I think any sane implementation would simply copy unknown placeholders out (certainly glibc does that). So I think we could simply consider it a quality of implementation issue, and deal with any particular crappy implementations if and when they get reported. We could add something tricky (like --date=format:%) to the test suite to make it likelier to catch such a thing. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Tue, Jun 30, 2015 at 02:13:53PM -0400, Eric Sunshine wrote: Sorry, I meant that the interpolation expense of %s . A cheaper (but more verbose) alternative might be: size_t n = strlen(fmt); const char *f = xmalloc(n + 2); strcpy(f, fmt); f[n] = ' '; f[n + 1] = '\0'; ... free(f); or something similar. I think you're probably getting into premature optimization here. Using strbuf_vaddf should be within the same order of magnitude of instructions (and I think we should almost never hit this code path anyway, because we'll be reading into a static strbuf generally which will come pre-sized to hold a date). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] introduce format date-mode
On Thu, Jun 25, 2015 at 12:55:45PM -0400, Jeff King wrote: This feeds the format directly to strftime. Besides being a little more flexible, the main advantage is that your system strftime may know more about your locale's preferred format (e.g., how to spell the days of the week). Signed-off-by: Jeff King p...@peff.net --- diff --git a/strbuf.c b/strbuf.c index 0d4f4e5..a7ba028 100644 --- a/strbuf.c +++ b/strbuf.c @@ -709,3 +709,32 @@ char *xstrfmt(const char *fmt, ...) +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) +{ + size_t len; + + /* + * strftime reports 0 if it could not fit the result in the buffer. + * Unfortunately, it also reports 0 if the requested time string + * takes 0 bytes. So if we were to probe and grow, we have to choose + * some arbitrary cap beyond which we guess that the format probably + * just results in a 0-length output. Since we have to choose some + * reasonable cap anyway, and since it is not that big, we may + * as well just grow to their in the first place. + */ + strbuf_grow(sb, 128); + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); + + if (!len) { + /* + * Either we failed, or the format actually produces a 0-length + * output. There's not much we can do, so we leave it blank. + * However, the output array is left in an undefined state, so + * we must re-assert our NUL terminator. + */ + sb-buf[sb-len] = '\0'; + } else { + sb-len += len; + } +} Clients of strbuf rightly expect the buffer to grow as needed in order to complete the requested operation. It is, therefore, both weird and expectation-breaking for strbuf_addftime() to lack this behavior. Worse, it doesn't even signal when the format has failed due to insufficient buffer space. How about taking this approach (or something similar), instead, which grows the strbuf as needed? --- 8 --- void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) { size_t len; struct strbuf f = STRBUF_INIT; /* * This is a bit tricky since strftime returns 0 if the result did not * fit in the supplied buffer, as well as when the formatted time has * zero length. In the former case, we need to grow the buffer and try * again. To distinguish between the two cases, we supply strftime with * a format string one character longer than what the client supplied, * which ensures that a successful format will have non-zero length, * and then drop the extra character from the formatted time before * returning. */ strbuf_addf(f, %s , fmt); do { strbuf_grow(sb, 128); len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, tm); } while (!len); strbuf_setlen(sb, sb-len + len - 1); strbuf_release(f); } --- 8 --- If this is performance critical code, then the augmented format string can be constructed with less expensive functions than strbuf_addf(). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] introduce format date-mode
This feeds the format directly to strftime. Besides being a little more flexible, the main advantage is that your system strftime may know more about your locale's preferred format (e.g., how to spell the days of the week). Signed-off-by: Jeff King p...@peff.net --- Documentation/rev-list-options.txt | 5 + builtin/blame.c| 3 +++ cache.h| 2 ++ date.c | 9 - gettext.c | 1 + strbuf.c | 29 + strbuf.h | 5 + t/t6300-for-each-ref.sh| 8 8 files changed, 61 insertions(+), 1 deletion(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 77ac439..a9b808f 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -727,6 +727,11 @@ format, often found in email messages. + `--date=raw` shows the date in the internal raw Git format `%s %z` format. + +`--date=format:...` feeds the format `...` to your system `strftime`. +Use `--date=format:%c` to show the date in your system locale's +preferred format. See the `strftime` manual for a complete list of +format placeholders. ++ `--date=default` shows timestamps in the original time zone (either committer's or author's). diff --git a/builtin/blame.c b/builtin/blame.c index 43e8473..e2e3e75 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -2596,6 +2596,9 @@ parse_done: case DATE_NORMAL: blame_date_width = sizeof(Thu Oct 19 16:00:04 2006 -0700); break; + case DATE_STRFTIME: + blame_date_width = strlen(show_date(0, 0, blame_date_mode)) + 1; /* add the null */ + break; } blame_date_width -= 1; /* strip the null */ diff --git a/cache.h b/cache.h index 1759011..bb63a58 100644 --- a/cache.h +++ b/cache.h @@ -1105,8 +1105,10 @@ struct date_mode { DATE_ISO8601, DATE_ISO8601_STRICT, DATE_RFC2822, + DATE_STRFTIME, DATE_RAW } type; + const char *strftime_fmt; }; /* diff --git a/date.c b/date.c index cdad4db..8f91569 100644 --- a/date.c +++ b/date.c @@ -163,6 +163,8 @@ void show_date_relative(unsigned long time, int tz, struct date_mode *date_mode_from_type(enum date_mode_type type) { static struct date_mode mode; + if (type == DATE_STRFTIME) + die(BUG: cannot create anonymous strftime date_mode struct); mode.type = type; return mode; } @@ -221,6 +223,8 @@ const char *show_date(unsigned long time, int tz, const struct date_mode *mode) weekday_names[tm-tm_wday], tm-tm_mday, month_names[tm-tm_mon], tm-tm_year + 1900, tm-tm_hour, tm-tm_min, tm-tm_sec, tz); + else if (mode-type == DATE_STRFTIME) + strbuf_addftime(timebuf, mode-strftime_fmt, tm); else strbuf_addf(timebuf, %.3s %.3s %d %02d:%02d:%02d %d%c%+05d, weekday_names[tm-tm_wday], @@ -787,7 +791,10 @@ void parse_date_format(const char *format, struct date_mode *mode) mode-type = DATE_NORMAL; else if (!strcmp(format, raw)) mode-type = DATE_RAW; - else + else if (skip_prefix(format, format:, format)) { + mode-type = DATE_STRFTIME; + mode-strftime_fmt = xstrdup(format); + } else die(unknown date format %s, format); } diff --git a/gettext.c b/gettext.c index 7378ba2..a268a2c 100644 --- a/gettext.c +++ b/gettext.c @@ -162,6 +162,7 @@ void git_setup_gettext(void) podir = GIT_LOCALE_PATH; bindtextdomain(git, podir); setlocale(LC_MESSAGES, ); + setlocale(LC_TIME, ); init_gettext_charset(git); textdomain(git); } diff --git a/strbuf.c b/strbuf.c index 0d4f4e5..a7ba028 100644 --- a/strbuf.c +++ b/strbuf.c @@ -709,3 +709,32 @@ char *xstrfmt(const char *fmt, ...) return ret; } + +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm) +{ + size_t len; + + /* +* strftime reports 0 if it could not fit the result in the buffer. +* Unfortunately, it also reports 0 if the requested time string +* takes 0 bytes. So if we were to probe and grow, we have to choose +* some arbitrary cap beyond which we guess that the format probably +* just results in a 0-length output. Since we have to choose some +* reasonable cap anyway, and since it is not that big, we may +* as well just grow to their in the first place. +*/ + strbuf_grow(sb, 128); + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm); + + if (!len) { + /* +* Either we failed, or the format