Re: [PATCH 3/3] introduce format date-mode

2015-07-20 Thread Eric Sunshine
On Tue, Jun 30, 2015 at 9:26 AM, Jeff King p...@peff.net wrote:
 On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote:
 Clients of strbuf rightly expect the buffer to grow as needed in
 order to complete the requested operation. It is, therefore, both
 weird and expectation-breaking for strbuf_addftime() to lack this
 behavior. Worse, it doesn't even signal when the format has failed
 due to insufficient buffer space.

 How about taking this approach (or something similar), instead, which
 grows the strbuf as needed?

 Here's a patch, on top of jk/date-mode-format (I think it would also be
 fine to just squash into the tip commit; the explanation in the commit
 message is sufficiently mirrored in the code comment).

While cleaning up old local branches, I noticed that, although the
jk/date-mode-format topic[1] made it into 'next' (and will be merged
to 'master' according to What's cooking[2]), the below follow-on
patch[3] which improves strbuf_addftime() never got picked up. Was
this omission intentional? Based upon the discussion[4], I was under
the impression that the patch was considered reasonably acceptable
(and did not worsen problems with bogus format strings -- which are
bogus anyway).

[1]: http://thread.gmane.org/gmane.comp.version-control.git/272658/focus=272695
[2]: http://news.gmane.org/gmane.comp.version-control.git
[3]: http://article.gmane.org/gmane.comp.version-control.git/273061
[4]: http://thread.gmane.org/gmane.comp.version-control.git/272658/focus=273026

 -- 8 --
 Subject: [PATCH] strbuf: make strbuf_addftime more robust

 The return value of strftime is poorly designed; when it
 returns 0, the caller cannot tell if the buffer was not
 large enough, or if the output was actually 0 bytes. In the
 original implementation of strbuf_addftime, we simply punted
 and guessed that our 128-byte hint would be large enough.

 We can do better, though, if we're willing to treat strftime
 like less of a black box. We can munge the incoming format
 to make sure that it never produces 0-length output, and
 then fix the resulting output.  That lets us reliably grow
 the buffer based on strftime's return value.

 Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com
 Signed-off-by: Jeff King p...@peff.net
 ---
  strbuf.c| 38 +-
  t/t6300-for-each-ref.sh | 10 ++
  2 files changed, 31 insertions(+), 17 deletions(-)

 diff --git a/strbuf.c b/strbuf.c
 index a7ba028..e5e7370 100644
 --- a/strbuf.c
 +++ b/strbuf.c
 @@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...)

  void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
  {
 +   size_t hint = 128;
 size_t len;

 -   /*
 -* strftime reports 0 if it could not fit the result in the buffer.
 -* Unfortunately, it also reports 0 if the requested time string
 -* takes 0 bytes. So if we were to probe and grow, we have to choose
 -* some arbitrary cap beyond which we guess that the format probably
 -* just results in a 0-length output. Since we have to choose some
 -* reasonable cap anyway, and since it is not that big, we may
 -* as well just grow to their in the first place.
 -*/
 -   strbuf_grow(sb, 128);
 +   if (!*fmt)
 +   return;
 +
 +   strbuf_grow(sb, hint);
 len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);

 if (!len) {
 /*
 -* Either we failed, or the format actually produces a 
 0-length
 -* output. There's not much we can do, so we leave it blank.
 -* However, the output array is left in an undefined state, so
 -* we must re-assert our NUL terminator.
 +* strftime reports 0 if it could not fit the result in the 
 buffer.
 +* Unfortunately, it also reports 0 if the requested time 
 string
 +* takes 0 bytes. So our strategy is to munge the format so 
 that the
 +* output contains at least one character, and then drop the 
 extra
 +* character before returning.
  */
 -   sb-buf[sb-len] = '\0';
 -   } else {
 -   sb-len += len;
 +   struct strbuf munged_fmt = STRBUF_INIT;
 +   strbuf_addf(munged_fmt, %s , fmt);
 +   while (!len) {
 +   hint *= 2;
 +   strbuf_grow(sb, hint);
 +   len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
 +  munged_fmt.buf, tm);
 +   }
 +   strbuf_release(munged_fmt);
 +   len--; /* drop munged space */
 }
 +   strbuf_setlen(sb, sb-len + len);
  }
 diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
 index c7f368c..7c9bec7 100755
 --- a/t/t6300-for-each-ref.sh
 +++ b/t/t6300-for-each-ref.sh
 @@ -235,6 +235,16 @@ 

Re: [PATCH 3/3] introduce format date-mode

2015-07-20 Thread Jeff King
On Mon, Jul 20, 2015 at 08:41:08PM -0400, Eric Sunshine wrote:

  Here's a patch, on top of jk/date-mode-format (I think it would also be
  fine to just squash into the tip commit; the explanation in the commit
  message is sufficiently mirrored in the code comment).
 
 While cleaning up old local branches, I noticed that, although the
 jk/date-mode-format topic[1] made it into 'next' (and will be merged
 to 'master' according to What's cooking[2]), the below follow-on
 patch[3] which improves strbuf_addftime() never got picked up. Was
 this omission intentional? Based upon the discussion[4], I was under
 the impression that the patch was considered reasonably acceptable
 (and did not worsen problems with bogus format strings -- which are
 bogus anyway).

Thanks for noticing. I do think the patch you quoted (to loop and grow
the strbuf) is a good change. The original code would easily bite
somebody with a really large date format, whereas this should work
sanely everywhere, short of malformed inputs. And even then, I'd expect
reasonable behavior on most systems. The obvious thing to worry about is
a system where feeding a malformed %  causes strftime to return 0, no
matter what, and we reallocated and loop forever. But:

  1. I don't even know if such a system exists.

  2. We probably would blow up on malloc() eventually, so it wouldn't
 even be a real infinite loop.

So I think the worst case is probably that we get a report later on from
somebody on an arcane system that says I fed crap to --date=format, and
my git died with an out-of-memory error, and then we figure out exactly
_how_ their system is weird and deal with it then.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote:

 Clients of strbuf rightly expect the buffer to grow as needed in
 order to complete the requested operation. It is, therefore, both
 weird and expectation-breaking for strbuf_addftime() to lack this
 behavior. Worse, it doesn't even signal when the format has failed
 due to insufficient buffer space.
 
 How about taking this approach (or something similar), instead, which
 grows the strbuf as needed?

Here's a patch, on top of jk/date-mode-format (I think it would also be
fine to just squash into the tip commit; the explanation in the commit
message is sufficiently mirrored in the code comment).

-- 8 --
Subject: [PATCH] strbuf: make strbuf_addftime more robust

The return value of strftime is poorly designed; when it
returns 0, the caller cannot tell if the buffer was not
large enough, or if the output was actually 0 bytes. In the
original implementation of strbuf_addftime, we simply punted
and guessed that our 128-byte hint would be large enough.

We can do better, though, if we're willing to treat strftime
like less of a black box. We can munge the incoming format
to make sure that it never produces 0-length output, and
then fix the resulting output.  That lets us reliably grow
the buffer based on strftime's return value.

Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com
Signed-off-by: Jeff King p...@peff.net
---
 strbuf.c| 38 +-
 t/t6300-for-each-ref.sh | 10 ++
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index a7ba028..e5e7370 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...)
 
 void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
 {
+   size_t hint = 128;
size_t len;
 
-   /*
-* strftime reports 0 if it could not fit the result in the buffer.
-* Unfortunately, it also reports 0 if the requested time string
-* takes 0 bytes. So if we were to probe and grow, we have to choose
-* some arbitrary cap beyond which we guess that the format probably
-* just results in a 0-length output. Since we have to choose some
-* reasonable cap anyway, and since it is not that big, we may
-* as well just grow to their in the first place.
-*/
-   strbuf_grow(sb, 128);
+   if (!*fmt)
+   return;
+
+   strbuf_grow(sb, hint);
len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);
 
if (!len) {
/*
-* Either we failed, or the format actually produces a 0-length
-* output. There's not much we can do, so we leave it blank.
-* However, the output array is left in an undefined state, so
-* we must re-assert our NUL terminator.
+* strftime reports 0 if it could not fit the result in the 
buffer.
+* Unfortunately, it also reports 0 if the requested time 
string
+* takes 0 bytes. So our strategy is to munge the format so 
that the
+* output contains at least one character, and then drop the 
extra
+* character before returning.
 */
-   sb-buf[sb-len] = '\0';
-   } else {
-   sb-len += len;
+   struct strbuf munged_fmt = STRBUF_INIT;
+   strbuf_addf(munged_fmt, %s , fmt);
+   while (!len) {
+   hint *= 2;
+   strbuf_grow(sb, hint);
+   len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
+  munged_fmt.buf, tm);
+   }
+   strbuf_release(munged_fmt);
+   len--; /* drop munged space */
}
+   strbuf_setlen(sb, sb-len + len);
 }
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index c7f368c..7c9bec7 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -235,6 +235,16 @@ test_expect_success 'Check format of strftime date fields' 
'
test_cmp expected actual
 '
 
+test_expect_success 'exercise strftime with odd fields' '
+   echo expected 
+   git for-each-ref --format=%(authordate:format:) refs/heads actual 
+   test_cmp expected actual 
+   long=long format -- $_z40$_z40$_z40$_z40$_z40$_z40$_z40 
+   echo $long expected 
+   git for-each-ref --format=%(authordate:format:$long) refs/heads 
actual 
+   test_cmp expected actual
+'
+
 cat expected \EOF
 refs/heads/master
 refs/remotes/origin/master
-- 
2.5.0.rc0.336.g8460790

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote:

 Clients of strbuf rightly expect the buffer to grow as needed in
 order to complete the requested operation. It is, therefore, both
 weird and expectation-breaking for strbuf_addftime() to lack this
 behavior. Worse, it doesn't even signal when the format has failed
 due to insufficient buffer space.

Agreed on all points.

 --- 8 ---
 void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
 {
   size_t len;
   struct strbuf f = STRBUF_INIT;
 
   /*
* This is a bit tricky since strftime returns 0 if the result did not
* fit in the supplied buffer, as well as when the formatted time has
* zero length. In the former case, we need to grow the buffer and try
* again. To distinguish between the two cases, we supply strftime with
* a format string one character longer than what the client supplied,
* which ensures that a successful format will have non-zero length,
* and then drop the extra character from the formatted time before
* returning.
*/
   strbuf_addf(f, %s , fmt);

Basically I was trying to avoid making any assumptions about exactly how
strftime works. But presumably stick a space in the format is a
universally reasonable thing to do. It's a hack, but it's contained to
the function.

   do {
   strbuf_grow(sb, 128);
   len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
  f.buf, tm);
   } while (!len);

I think we need to keep growing this 128 ourselves, or else each loop
iteration will just say yup, we have 128 bytes available; no need to
grow.


 [...]
 
 If this is performance critical code, then the augmented format
 string can be constructed with less expensive functions than
 strbuf_addf().

This does get called a lot (e.g., once per commit). One extra allocation
would probably not kill us there, but I think we could fairly trivially
put this on the unlikely path:

  size_t hint = 128;
  size_t len;

  /* optimize out obvious 0-length case */
  if (!*fmt)
return;

  strbuf_grow(sb, hint);
  len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);

  /* maybe not enough room, or maybe 0-length output */
  if (!len) {
struct strbuf f = STRBUF_INIT;
strbuf_addf(f, %s , fmt);
while (!len) {
hint *= 2;
strbuf_grow(sb, hint);
len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, 
tm);
}
  }

I'd guess most cases will fit in 128 bytes and never even hit this code
path. You could also get fancier and start the buffer smaller, but only
do the fmt hack when we cross a threshold.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Junio C Hamano
Jeff King p...@peff.net writes:

  strbuf_addf(f, %s , fmt);

 Basically I was trying to avoid making any assumptions about exactly how
 strftime works. But presumably stick a space in the format is a
 universally reasonable thing to do. It's a hack, but it's contained to
 the function.

Why can't I shake this feeling that ( %s, fmt), i.e. prepend not
append, is the safer thing to do than to append?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Eric Sunshine
On Tue, Jun 30, 2015 at 1:58 PM, Jeff King p...@peff.net wrote:
 On Tue, Jun 30, 2015 at 12:58:33PM -0400, Eric Sunshine wrote:
 Beyond the extra allocation, I was also concerned about the
 sledgehammer approach of %s  to append a single character when there
 are much less expensive ways to do so.

 I don't think there's any other way. We have to feed a contiguous buffer
 to strftime, and we don't own the buffer, so we have to make a new copy.

Sorry, I meant that the interpolation expense of %s . A cheaper (but
more verbose) alternative might be:

size_t n = strlen(fmt);
const char *f = xmalloc(n + 2);
strcpy(f, fmt);
f[n] = ' ';
f[n + 1] = '\0';
...
free(f);

or something similar.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Tue, Jun 30, 2015 at 09:22:18AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
 strbuf_addf(f, %s , fmt);
 
  Basically I was trying to avoid making any assumptions about exactly how
  strftime works. But presumably stick a space in the format is a
  universally reasonable thing to do. It's a hack, but it's contained to
  the function.
 
 Why can't I shake this feeling that ( %s, fmt), i.e. prepend not
 append, is the safer thing to do than to append?

Because then removing the extra space involves `memmove` of the buffer,
rather than just shortening the length by one.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Eric Sunshine
On Tue, Jun 30, 2015 at 6:20 AM, Jeff King p...@peff.net wrote:
 On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote:
 void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
 {
   size_t len;
   struct strbuf f = STRBUF_INIT;

   /*
* This is a bit tricky since strftime returns 0 if the result did not
* fit in the supplied buffer, as well as when the formatted time has
* zero length. In the former case, we need to grow the buffer and try
* again. To distinguish between the two cases, we supply strftime with
* a format string one character longer than what the client supplied,
* which ensures that a successful format will have non-zero length,
* and then drop the extra character from the formatted time before
* returning.
*/
   strbuf_addf(f, %s , fmt);

 Basically I was trying to avoid making any assumptions about exactly how
 strftime works. But presumably stick a space in the format is a
 universally reasonable thing to do. It's a hack, but it's contained to
 the function.

I don't think we're making any assumptions about strftime(). POSIX states:

The format string consists of zero or more conversion
specifications and ordinary characters. [...] All ordinary
characters (including the terminating NUL character) are copied
unchanged into the array.

So, we seem to be on solid footing with this approach (even though
it's a localized hack).

   do {
   strbuf_grow(sb, 128);
   len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
  f.buf, tm);
   } while (!len);

 I think we need to keep growing this 128 ourselves, or else each loop
 iteration will just say yup, we have 128 bytes available; no need to
 grow.

Yeah, I toyed with the idea of increasing the requested amount each
iteration but wanted to keep the example simple, thus left it out.
However, for some reason, I was thinking that strbuf_grow() was
unconditionally expanding the buffer by the requested amount rather
than merely ensuring that that amount was availabile, so the amount
clearly needs to be increased on each iteration. Thanks for pointing
that out.

 If this is performance critical code, then the augmented format
 string can be constructed with less expensive functions than
 strbuf_addf().

 This does get called a lot (e.g., once per commit). One extra allocation
 would probably not kill us there [...]

Beyond the extra allocation, I was also concerned about the
sledgehammer approach of %s  to append a single character when there
are much less expensive ways to do so.

 [...] but I think we could fairly trivially put this on the unlikely path:

   size_t hint = 128;
   size_t len;

   /* optimize out obvious 0-length case */
   if (!*fmt)
 return;

   strbuf_grow(sb, hint);
   len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);

   /* maybe not enough room, or maybe 0-length output */
   if (!len) {
 struct strbuf f = STRBUF_INIT;
 strbuf_addf(f, %s , fmt);
 while (!len) {
 hint *= 2;
 strbuf_grow(sb, hint);
 len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, 
 tm);
 }
   }

 I'd guess most cases will fit in 128 bytes and never even hit this code
 path. You could also get fancier and start the buffer smaller, but only
 do the fmt hack when we cross a threshold.

Yep, I toyed with other looping constructs as well before settling
upon do-while in the example for its simplicity. If called often
enough, this sort of optimization seems reasonable enough.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 This does get called a lot (e.g., once per commit). One extra allocation
 would probably not kill us there, but I think we could fairly trivially
 put this on the unlikely path:

   size_t hint = 128;
   size_t len;

   /* optimize out obvious 0-length case */
   if (!*fmt)
   return;

   strbuf_grow(sb, hint);
   len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);

   /* maybe not enough room, or maybe 0-length output */
   if (!len) {
   struct strbuf f = STRBUF_INIT;
   strbuf_addf(f, %s , fmt);
   while (!len) {
   hint *= 2;
   strbuf_grow(sb, hint);
   len = strftime(sb-buf + sb-len, sb-alloc - sb-len, f.buf, 
 tm);
   }
   }

 I'd guess most cases will fit in 128 bytes and never even hit this code
 path. You could also get fancier and start the buffer smaller, but only
 do the fmt hack when we cross a threshold.

I'd assume that the hint thing will persist across calls somehow?
In an invocation of git log --date=format:some format for
millions of commits, it is likely that the length of the formatted
date string will stay the same or close to the same (yeah, I know
Wednesday would be longer than Monday).

Answering myself to my earlier question, the reason is because I was
worried what happens when given fmt is a malformed strftime format
specifier.  Perhaps it ends with a lone % and %  may format to
something unexpected, or something.

Are we checking an error from strftime(3)?


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Eric Sunshine
On Tue, Jun 30, 2015 at 9:26 AM, Jeff King p...@peff.net wrote:
 On Mon, Jun 29, 2015 at 06:22:47PM -0400, Eric Sunshine wrote:

 Clients of strbuf rightly expect the buffer to grow as needed in
 order to complete the requested operation. It is, therefore, both
 weird and expectation-breaking for strbuf_addftime() to lack this
 behavior. Worse, it doesn't even signal when the format has failed
 due to insufficient buffer space.

 How about taking this approach (or something similar), instead, which
 grows the strbuf as needed?

 Here's a patch, on top of jk/date-mode-format (I think it would also be
 fine to just squash into the tip commit; the explanation in the commit
 message is sufficiently mirrored in the code comment).

As a logical change in itself, I could also see introduction of
strbuf_addftime() split out into its own patch (with this patch
squashed in). Either way, it's a welcome improvement over the
original.

 -- 8 --
 Subject: [PATCH] strbuf: make strbuf_addftime more robust

 The return value of strftime is poorly designed; when it
 returns 0, the caller cannot tell if the buffer was not
 large enough, or if the output was actually 0 bytes. In the
 original implementation of strbuf_addftime, we simply punted
 and guessed that our 128-byte hint would be large enough.

 We can do better, though, if we're willing to treat strftime
 like less of a black box. We can munge the incoming format
 to make sure that it never produces 0-length output, and
 then fix the resulting output.  That lets us reliably grow
 the buffer based on strftime's return value.

 Clever-idea-by: Eric Sunshine sunsh...@sunshineco.com
 Signed-off-by: Jeff King p...@peff.net
 ---
  strbuf.c| 38 +-
  t/t6300-for-each-ref.sh | 10 ++
  2 files changed, 31 insertions(+), 17 deletions(-)

 diff --git a/strbuf.c b/strbuf.c
 index a7ba028..e5e7370 100644
 --- a/strbuf.c
 +++ b/strbuf.c
 @@ -712,29 +712,33 @@ char *xstrfmt(const char *fmt, ...)

  void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
  {
 +   size_t hint = 128;
 size_t len;

 -   /*
 -* strftime reports 0 if it could not fit the result in the buffer.
 -* Unfortunately, it also reports 0 if the requested time string
 -* takes 0 bytes. So if we were to probe and grow, we have to choose
 -* some arbitrary cap beyond which we guess that the format probably
 -* just results in a 0-length output. Since we have to choose some
 -* reasonable cap anyway, and since it is not that big, we may
 -* as well just grow to their in the first place.
 -*/
 -   strbuf_grow(sb, 128);
 +   if (!*fmt)
 +   return;
 +
 +   strbuf_grow(sb, hint);
 len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);

 if (!len) {
 /*
 -* Either we failed, or the format actually produces a 
 0-length
 -* output. There's not much we can do, so we leave it blank.
 -* However, the output array is left in an undefined state, so
 -* we must re-assert our NUL terminator.
 +* strftime reports 0 if it could not fit the result in the 
 buffer.
 +* Unfortunately, it also reports 0 if the requested time 
 string
 +* takes 0 bytes. So our strategy is to munge the format so 
 that the
 +* output contains at least one character, and then drop the 
 extra
 +* character before returning.
  */
 -   sb-buf[sb-len] = '\0';
 -   } else {
 -   sb-len += len;
 +   struct strbuf munged_fmt = STRBUF_INIT;
 +   strbuf_addf(munged_fmt, %s , fmt);
 +   while (!len) {
 +   hint *= 2;
 +   strbuf_grow(sb, hint);
 +   len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
 +  munged_fmt.buf, tm);
 +   }
 +   strbuf_release(munged_fmt);
 +   len--; /* drop munged space */
 }
 +   strbuf_setlen(sb, sb-len + len);
  }
 diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
 index c7f368c..7c9bec7 100755
 --- a/t/t6300-for-each-ref.sh
 +++ b/t/t6300-for-each-ref.sh
 @@ -235,6 +235,16 @@ test_expect_success 'Check format of strftime date 
 fields' '
 test_cmp expected actual
  '

 +test_expect_success 'exercise strftime with odd fields' '
 +   echo expected 
 +   git for-each-ref --format=%(authordate:format:) refs/heads actual 
 
 +   test_cmp expected actual 
 +   long=long format -- $_z40$_z40$_z40$_z40$_z40$_z40$_z40 
 +   echo $long expected 
 +   git for-each-ref --format=%(authordate:format:$long) refs/heads 
 actual 
 +   test_cmp expected actual
 +'
 +
  cat expected \EOF
  refs/heads/master
  

Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Tue, Jun 30, 2015 at 12:58:33PM -0400, Eric Sunshine wrote:

  Basically I was trying to avoid making any assumptions about exactly how
  strftime works. But presumably stick a space in the format is a
  universally reasonable thing to do. It's a hack, but it's contained to
  the function.
 
 I don't think we're making any assumptions about strftime(). POSIX states:
 
 The format string consists of zero or more conversion
 specifications and ordinary characters. [...] All ordinary
 characters (including the terminating NUL character) are copied
 unchanged into the array.
 
 So, we seem to be on solid footing with this approach (even though
 it's a localized hack).

Yeah, sorry I wasn't more clear. I had originally been thinking of
making assumptions like well, %c cannot ever be blank. But your
solution does not suffer from that level of knowledge. I think it is
reasonably clever.

 Yeah, I toyed with the idea of increasing the requested amount each
 iteration but wanted to keep the example simple, thus left it out.
 However, for some reason, I was thinking that strbuf_grow() was
 unconditionally expanding the buffer by the requested amount rather
 than merely ensuring that that amount was availabile, so the amount
 clearly needs to be increased on each iteration. Thanks for pointing
 that out.

FWIW, I had to look at it to double-check. I've often made the same
mistake.

 Beyond the extra allocation, I was also concerned about the
 sledgehammer approach of %s  to append a single character when there
 are much less expensive ways to do so.

I don't think there's any other way. We have to feed a contiguous buffer
to strftime, and we don't own the buffer, so we have to make a new copy.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Eric Sunshine
On Tue, Jun 30, 2015 at 1:05 PM, Junio C Hamano gits...@pobox.com wrote:
 Answering myself to my earlier question, the reason is because I was
 worried what happens when given fmt is a malformed strftime format
 specifier.  Perhaps it ends with a lone % and %  may format to
 something unexpected, or something.

That's a good point. I had considered prepending the extra character
(space) rather than appending it but eventually rejected it to avoid
the expense of shifting the characters down by one before returning
the formatted string.

However, is it our responsibility to guard against a malformed format?
POSIX doesn't state the behavior of % , so I don't think we are any
worse off by appending space to a malformed format ending with %
since the malformed format could wreak havoc even without our
transformation.

 Are we checking an error from strftime(3)?

According to the BUGS section in POSIX:

If the output string would exceed max bytes, errno is not set.
This makes it impossible to distinguish this error case from
cases where the format string legitimately produces a zero-length
output string. POSIX.1-2001 does not specify any errno settings
for strftime().

So, there does not seem to be a point in checking 'errno'.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Tue, Jun 30, 2015 at 10:05:33AM -0700, Junio C Hamano wrote:

  I'd guess most cases will fit in 128 bytes and never even hit this code
  path. You could also get fancier and start the buffer smaller, but only
  do the fmt hack when we cross a threshold.
 
 I'd assume that the hint thing will persist across calls somehow?
 In an invocation of git log --date=format:some format for
 millions of commits, it is likely that the length of the formatted
 date string will stay the same or close to the same (yeah, I know
 Wednesday would be longer than Monday).

I hadn't thought about that. It could persist, but I don't think this is
necessarily the right place to do it. For two reasons:

  1. You have no idea in strbuf_addftime if it's the same fmt being
 added over and over. This is the wrong place to make that
 optimization.

  2. If you are interested in efficiency in a loop, then you should be
 reusing the same strbuf over and over, and avoiding the extra
 allocation in the first place. And that is indeed what we do for
 git log --date, as we will always use the same static-local
 buffer in show_date().

 Answering myself to my earlier question, the reason is because I was
 worried what happens when given fmt is a malformed strftime format
 specifier.  Perhaps it ends with a lone % and %  may format to
 something unexpected, or something.
 
 Are we checking an error from strftime(3)?

POSIX doesn't define any errno values for strftime (and in fact says No
errors are defined). The return value description for POSIX (and the
glibc manpage) talk about only whether or not the output fits. However,
POSIX does say If a conversion specifier is not one of the above, the
behavior is undefined.

So certainly I could imagine an implementation that returns 0 when you
feed it a bogus value. If you (as a user) feed us crap to give to
strftime, I am not particularly concerned with whether you get crap out.
My main concern is that it would return 0 and we would loop forever.
OTOH, I think any sane implementation would simply copy unknown
placeholders out (certainly glibc does that). So I think we could simply
consider it a quality of implementation issue, and deal with any
particular crappy implementations if and when they get reported. We
could add something tricky (like --date=format:%) to the test suite to
make it likelier to catch such a thing.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-30 Thread Jeff King
On Tue, Jun 30, 2015 at 02:13:53PM -0400, Eric Sunshine wrote:

 Sorry, I meant that the interpolation expense of %s . A cheaper (but
 more verbose) alternative might be:
 
 size_t n = strlen(fmt);
 const char *f = xmalloc(n + 2);
 strcpy(f, fmt);
 f[n] = ' ';
 f[n + 1] = '\0';
 ...
 free(f);
 
 or something similar.

I think you're probably getting into premature optimization here. Using
strbuf_vaddf should be within the same order of magnitude of
instructions (and I think we should almost never hit this code path
anyway, because we'll be reading into a static strbuf generally which
will come pre-sized to hold a date).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] introduce format date-mode

2015-06-29 Thread Eric Sunshine
On Thu, Jun 25, 2015 at 12:55:45PM -0400, Jeff King wrote:
 This feeds the format directly to strftime. Besides being a
 little more flexible, the main advantage is that your system
 strftime may know more about your locale's preferred format
 (e.g., how to spell the days of the week).
 
 Signed-off-by: Jeff King p...@peff.net
 ---
 diff --git a/strbuf.c b/strbuf.c
 index 0d4f4e5..a7ba028 100644
 --- a/strbuf.c
 +++ b/strbuf.c
 @@ -709,3 +709,32 @@ char *xstrfmt(const char *fmt, ...)
 +void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
 +{
 + size_t len;
 +
 + /*
 +  * strftime reports 0 if it could not fit the result in the buffer.
 +  * Unfortunately, it also reports 0 if the requested time string
 +  * takes 0 bytes. So if we were to probe and grow, we have to choose
 +  * some arbitrary cap beyond which we guess that the format probably
 +  * just results in a 0-length output. Since we have to choose some
 +  * reasonable cap anyway, and since it is not that big, we may
 +  * as well just grow to their in the first place.
 +  */
 + strbuf_grow(sb, 128);
 + len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);
 +
 + if (!len) {
 + /*
 +  * Either we failed, or the format actually produces a 0-length
 +  * output. There's not much we can do, so we leave it blank.
 +  * However, the output array is left in an undefined state, so
 +  * we must re-assert our NUL terminator.
 +  */
 + sb-buf[sb-len] = '\0';
 + } else {
 + sb-len += len;
 + }
 +}

Clients of strbuf rightly expect the buffer to grow as needed in
order to complete the requested operation. It is, therefore, both
weird and expectation-breaking for strbuf_addftime() to lack this
behavior. Worse, it doesn't even signal when the format has failed
due to insufficient buffer space.

How about taking this approach (or something similar), instead, which
grows the strbuf as needed?

--- 8 ---
void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
{
size_t len;
struct strbuf f = STRBUF_INIT;

/*
 * This is a bit tricky since strftime returns 0 if the result did not
 * fit in the supplied buffer, as well as when the formatted time has
 * zero length. In the former case, we need to grow the buffer and try
 * again. To distinguish between the two cases, we supply strftime with
 * a format string one character longer than what the client supplied,
 * which ensures that a successful format will have non-zero length,
 * and then drop the extra character from the formatted time before
 * returning.
 */
strbuf_addf(f, %s , fmt);

do {
strbuf_grow(sb, 128);
len = strftime(sb-buf + sb-len, sb-alloc - sb-len,
   f.buf, tm);
} while (!len);
strbuf_setlen(sb, sb-len + len - 1);

strbuf_release(f);
}
--- 8 ---

If this is performance critical code, then the augmented format
string can be constructed with less expensive functions than
strbuf_addf().
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] introduce format date-mode

2015-06-25 Thread Jeff King
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/rev-list-options.txt |  5 +
 builtin/blame.c|  3 +++
 cache.h|  2 ++
 date.c |  9 -
 gettext.c  |  1 +
 strbuf.c   | 29 +
 strbuf.h   |  5 +
 t/t6300-for-each-ref.sh|  8 
 8 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 77ac439..a9b808f 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -727,6 +727,11 @@ format, often found in email messages.
 +
 `--date=raw` shows the date in the internal raw Git format `%s %z` format.
 +
+`--date=format:...` feeds the format `...` to your system `strftime`.
+Use `--date=format:%c` to show the date in your system locale's
+preferred format.  See the `strftime` manual for a complete list of
+format placeholders.
++
 `--date=default` shows timestamps in the original time zone
 (either committer's or author's).
 
diff --git a/builtin/blame.c b/builtin/blame.c
index 43e8473..e2e3e75 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2596,6 +2596,9 @@ parse_done:
case DATE_NORMAL:
blame_date_width = sizeof(Thu Oct 19 16:00:04 2006 -0700);
break;
+   case DATE_STRFTIME:
+   blame_date_width = strlen(show_date(0, 0, blame_date_mode)) + 
1; /* add the null */
+   break;
}
blame_date_width -= 1; /* strip the null */
 
diff --git a/cache.h b/cache.h
index 1759011..bb63a58 100644
--- a/cache.h
+++ b/cache.h
@@ -1105,8 +1105,10 @@ struct date_mode {
DATE_ISO8601,
DATE_ISO8601_STRICT,
DATE_RFC2822,
+   DATE_STRFTIME,
DATE_RAW
} type;
+   const char *strftime_fmt;
 };
 
 /*
diff --git a/date.c b/date.c
index cdad4db..8f91569 100644
--- a/date.c
+++ b/date.c
@@ -163,6 +163,8 @@ void show_date_relative(unsigned long time, int tz,
 struct date_mode *date_mode_from_type(enum date_mode_type type)
 {
static struct date_mode mode;
+   if (type == DATE_STRFTIME)
+   die(BUG: cannot create anonymous strftime date_mode struct);
mode.type = type;
return mode;
 }
@@ -221,6 +223,8 @@ const char *show_date(unsigned long time, int tz, const 
struct date_mode *mode)
weekday_names[tm-tm_wday], tm-tm_mday,
month_names[tm-tm_mon], tm-tm_year + 1900,
tm-tm_hour, tm-tm_min, tm-tm_sec, tz);
+   else if (mode-type == DATE_STRFTIME)
+   strbuf_addftime(timebuf, mode-strftime_fmt, tm);
else
strbuf_addf(timebuf, %.3s %.3s %d %02d:%02d:%02d %d%c%+05d,
weekday_names[tm-tm_wday],
@@ -787,7 +791,10 @@ void parse_date_format(const char *format, struct 
date_mode *mode)
mode-type = DATE_NORMAL;
else if (!strcmp(format, raw))
mode-type = DATE_RAW;
-   else
+   else if (skip_prefix(format, format:, format)) {
+   mode-type = DATE_STRFTIME;
+   mode-strftime_fmt = xstrdup(format);
+   } else
die(unknown date format %s, format);
 }
 
diff --git a/gettext.c b/gettext.c
index 7378ba2..a268a2c 100644
--- a/gettext.c
+++ b/gettext.c
@@ -162,6 +162,7 @@ void git_setup_gettext(void)
podir = GIT_LOCALE_PATH;
bindtextdomain(git, podir);
setlocale(LC_MESSAGES, );
+   setlocale(LC_TIME, );
init_gettext_charset(git);
textdomain(git);
 }
diff --git a/strbuf.c b/strbuf.c
index 0d4f4e5..a7ba028 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -709,3 +709,32 @@ char *xstrfmt(const char *fmt, ...)
 
return ret;
 }
+
+void strbuf_addftime(struct strbuf *sb, const char *fmt, const struct tm *tm)
+{
+   size_t len;
+
+   /*
+* strftime reports 0 if it could not fit the result in the buffer.
+* Unfortunately, it also reports 0 if the requested time string
+* takes 0 bytes. So if we were to probe and grow, we have to choose
+* some arbitrary cap beyond which we guess that the format probably
+* just results in a 0-length output. Since we have to choose some
+* reasonable cap anyway, and since it is not that big, we may
+* as well just grow to their in the first place.
+*/
+   strbuf_grow(sb, 128);
+   len = strftime(sb-buf + sb-len, sb-alloc - sb-len, fmt, tm);
+
+   if (!len) {
+   /*
+* Either we failed, or the format