Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-12 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Making a single preparation run for counting the lines will avoid memory
 fragmentation.  Also, fix the allocated memory size which was wrong
 when sizeof(int *) != sizeof(int), and would have been too small
 for sizeof(int *)  sizeof(int), admittedly unlikely.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

I think I took sizeof(int*)-sizeof(int) patch to the 'next' branch
already, which might have to conflict with this clean-up, but it
should be trivial to resolve.

Thanks for resending.  I was busy elsewhere (i.e. no feedback does
not mean silent rejection nor silent agreement at least from
me), and such a resend does help prevent patches fall thru cracks.

 diff --git a/builtin/blame.c b/builtin/blame.c
 index e44a6bb..1aefedf 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -1772,25 +1772,41 @@ static int prepare_lines(struct scoreboard *sb)
  {
   const char *buf = sb-final_buf;
   unsigned long len = sb-final_buf_size;
 + const char *end = buf + len;
 + const char *p;
 + int *lineno;
 + int num = 0, incomplete = 0;
  
 + for (p = buf;;) {
 + p = memchr(p, '\n', end - p);
 + if (p) {
 + p++;
   num++;
 + continue;
   }
 + break;
   }
 +
 + if (len  end[-1] != '\n')
 + incomplete++; /* incomplete line at the end */
 +
 + sb-lineno = xmalloc(sizeof(*sb-lineno) * (num + incomplete + 1));
 + lineno = sb-lineno;
 +
 + *lineno++ = 0;
 + for (p = buf;;) {
 + p = memchr(p, '\n', end - p);
 + if (p) {
 + p++;
 + *lineno++ = p - buf;
 + continue;
 + }
 + break;
 + }
 +
 + if (incomplete)
 + *lineno++ = len;
 +
   sb-num_lines = num + incomplete;
   return sb-num_lines;
  }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-06 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 It's snake oil making debugging harder.

 OK, that is a sensible argument.

 This was fun ;-)

 At the expense of seriously impacting my motivation to do any further
 code cleanup on Git.

 Well, I said it was fun because I was learning from a new person
 who haven't made much technical or code aethesitics discussion here
 so far.  If teaching others frustrates you and stops contributing to
 the project, that is a loss.

So let's post something noteworthy technical: the current code iteration
sported the following routine:

static struct blame_entry *blame_merge(struct blame_entry *list1,
   struct blame_entry *list2)
{
struct blame_entry *p1 = list1, *p2 = list2,
**tail = list1;

if (!p1)
return p2;
if (!p2)
return p1;

if (p1-s_lno = p2-s_lno) {
do {
tail = p1-next;
if ((p1 = *tail) == NULL) {
*tail = p2;
return list1;
}
} while (p1-s_lno = p2-s_lno);
}
for (;;) {
*tail = p2;
do {
tail = p2-next;
if ((p2 = *tail) == NULL)  {
*tail = p1;
return list1;
}
} while (p1-s_lno  p2-s_lno);
*tail = p1;
do {
tail = p1-next;
if ((p1 = *tail) == NULL) {
*tail = p2;
return list1;
}
} while (p1-s_lno = p2-s_lno);
}
}


This is decidedly more complex than the equivalent

static struct blame_entry *blame_merge(struct blame_entry *list1,
   struct blame_entry *list2)
{
struct blame_entry *p1 = list1, *p2 = list2,
**tail = list1;

while (p1  p2) {
if (p1-s_lno = p2-s_lno) {
*tail = p1;
p1 = *(tail = p1-next);
} else {
*tail = p2;
p2 = *(tail = p2-next);
}
}
*tail = p1 ? p1 : p2;
return list1;
}

Why not use the latter one?  Apart from the somewhat
obsessive-compulsive avoidance of checking the same condition twice in a
row (which nowadays compilers are pretty capable of taking into account
by themselves) the actually decisive difference is to avoid rewriting
links that are already pointing to the right place.

Those are the only write accesses that take place, and since we are
talking about _sorting_, it is to be expected that in the long run,
every record ends up in a cacheline different from its ultimate
neighbors.  Also, assuming a more or less random distribution and
equally sized chains, about half of the links will already be correct.
In practice, the situation tends to be more degenerate, leading to a
_higher_ ratio of already correct links.  Cutting down the expensive
writeout of cachelines by a factor of more than 2 makes quite a
difference in performance.

Does it matter here?  Unlikely.  Profiling tools tend to be useless for
seeing the impact of dirty cachelines since their operation
significantly interferes with the read/write patterns so this kind of
detail often escapes notice.  But the total contribution of this stage
to the runtime of git blame will be insignificant either way.

 new blood bringing in new ideas is fine, but I'd want to see those new
 ideas backed by solid thiniking, and that means they may have to
 explain themselves to those who are new to their ideas.

Well, there _is_ solid thinking behind the above code, but there _also_
is solid thinking behind the notion that it won't matter in this case.
I still prefer not pushing out code in the wild that I would hesitate to
employ in _all_ comparable circumstances.  Consider it as a pattern or a
style: a lot of thinking went into it once, and what this should buy you
is to never have to think about this issue again.

C++ is better at actual code reuse, but with C you basically get pattern
reuse.  Your head is instantiating the templates.  And it's a good idea
not to crowd one's head with unnecessary and/or suboptimal patterns.

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 so something like

 for (p = buf; p  end; p++) {
 p = find the end of this line
 if (!p)
 break;
 num++;
 }

 perhaps?

 Would crash on incomplete last line.

 Hmph, even with if !p?

Admitted.  So we have one _proper_ loop termination condition in the
middle of the loop, and we have a snake oil condition at the start of
the loop that can, according to the standard, _only_ yield true reliably
when p == end (any end  p can only result from undefined behavior when
p points to an object of size end - p).

The effect of this condition basically is to state we don't trust
memchr to do the right thing when called with 0 as its last argument
which can happen at most once.  This condition will come about by the
_combined_ effect of p = ... _in_ the loop (which is the real iteration
advancing p) and p++ hidden in the for statement (which never makes any
sense separated from the p = ...).

It turns out that
a) memchr is provided by a compatibility layer in case it is missing or
   defective
b) other code in Git demands that memchr works correctly with a zero
   last argument.  See, for example, attr.c:
if (dirlen = len)
break;
cp = memchr(path + len + 1, '/', dirlen - len - 1);

This clearly needs to be able to work with dirlen == len + 1.

So the loop gets rewritten in a manner where the for statement
_pretends_ to loop linearly through buf, but where the loop _body_ has
its own _regular_ termination condition it shields from the original
for, and p is advanced _independently_.  The advancement of p to beyond
the next '\n' is distributed to two different places in the loop, and
one place is made to look as if it does something else.

 But that last round of the loop will be no-op, so p  end vs p =
 end does not make any difference.  It is not even strictly
 necessary because memchr() limits the scan to end, but it would
 still be a good belt-and-suspenders defensive coding practice, I
 would think.

It's snake oil making debugging harder.  It masks the real action, and
it will route around suspected faulty behavior that is wrong according
to the standards and not permitted elsewhere in the codebase.  Other
than that, it just takes additional performance from every iteration.

Putting belts and suspenders on a bike looks comforting until the
suspenders get caught in the wheelspokes.

 which is the same as

   for (p = buf; ; p++) {
   *lineno++ = p - buf;
 p = memchr...
 if (!p)
   break;
   }

 and the latter has the loop control p++ at where it belongs to.

But it's only half the control.

As it is clear that we won't get to a state where I'd be willing to have
git blame pointing to me as the author, I suggest that you either put
your belts and suspenders on with a separate commit under your own
authorship, or that we drop this altogether.

As I stated already: this patch was just provided because the original
code offends my sense of aesthetics, so there is no point to it if it
offends yours.

I'd still recommend fixing the sizeof(int *) with sizeof(int) confusion.

 If we wanted to have a belt-and-suspenders safety, we need p =
 end here, not p  end,

That would be totally ridiculous since end  p cannot ever happen except
by undefined behavior.  Pointer inequalities require both pointers to be
associated with the same object.

 This was fun ;-)

At the expense of seriously impacting my motivation to do any further
code cleanup on Git.

Which is a reasonable tradeoff since your fun is more important to Git's
well-being than my one.

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 which I think is the prevalent style in our codebase.  The same for
 the other loop we see in the new code below.

  - avoid assignments in conditionals when you do not have to.

commit a77a48c259d9adbe7779ca69a3432e493116b3fd
Author: Junio C Hamano gits...@pobox.com
Date:   Tue Jan 28 13:55:59 2014 -0800

combine-diff: simplify intersect_paths() further
[...]

+   while ((p = *tail) != NULL) {

Because we can.

At any rate, I am not going to put any more work into this patch as it
is decidedly not worth the bad taste this discussion leaves in my mouth.
And I don't want any code marked as written by me that does not
correspond to what I'd be willing to write.  So please make sure to put
any rewrites in a separate commit with different authorship.

Thanks

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 which I think is the prevalent style in our codebase.  The same for
 the other loop we see in the new code below.

  - avoid assignments in conditionals when you do not have to.

 commit a77a48c259d9adbe7779ca69a3432e493116b3fd
 Author: Junio C Hamano gits...@pobox.com
 Date:   Tue Jan 28 13:55:59 2014 -0800

 combine-diff: simplify intersect_paths() further
 [...]

 +   while ((p = *tail) != NULL) {

 Because we can.

Be reasonable.  You cannot sensibly rewrite it to

p = *tail;
while (p) {
...
p = *tail;
}

when you do not know how ... part would evolve in the future.

if ((p = *tail) != NULL) {
...

is a totally different issue.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 It's snake oil making debugging harder.

OK, that is a sensible argument.

 This was fun ;-)

 At the expense of seriously impacting my motivation to do any further
 code cleanup on Git.

Well, I said it was fun because I was learning from a new person
who haven't made much technical or code aethesitics discussion here
so far.  If teaching others frustrates you and stops contributing to
the project, that is a loss.

But the styles matter, as the known pattern in the existing code is
what lets our eyes coast over unimportant details of the code while
reviewing and understanding.  I tend to be pickier when reviewing
code from new people who are going to make large contributions for
exactly that reason---new blood bringing in new ideas is fine, but
I'd want to see those new ideas backed by solid thiniking, and that
means they may have to explain themselves to those who are new to
their ideas.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 which I think is the prevalent style in our codebase.  The same for
 the other loop we see in the new code below.

  - avoid assignments in conditionals when you do not have to.

 commit a77a48c259d9adbe7779ca69a3432e493116b3fd
 Author: Junio C Hamano gits...@pobox.com
 Date:   Tue Jan 28 13:55:59 2014 -0800

 combine-diff: simplify intersect_paths() further
 [...]

 +   while ((p = *tail) != NULL) {

 Because we can.

 Be reasonable.  You cannot sensibly rewrite it to

   p = *tail;
 while (p) {
   ...
   p = *tail;
   }

 when you do not know how ... part would evolve in the future.

The only unknown here is the potential presence of continue; in
... and that can be addressed by writing

for (p = *tail; p; p = *tail) {
   ...
}

However, that only makes sense where ... is rather large and diverse and
the assignment in question provides a unifying point.  In this case, the
loop is rather small and perfectly fits on one screen.  It turns out
that the assignment only serves for _obfuscating_ the various code
paths.  We have:

while ((p = *tail) != NULL) {
cmp = ((i = q-nr)
   ? -1 : strcmp(p-path, q-queue[i]-two-path));

if (cmp  0) {
/* p-path not in q-queue[]; drop it */
*tail = p-next;
free(p);
continue;
}

if (cmp  0) {
/* q-queue[i] not in p-path; skip it */
i++;
continue;
}

hashcpy(p-parent[n].sha1, q-queue[i]-one-sha1);
p-parent[n].mode = q-queue[i]-one-mode;
p-parent[n].status = q-queue[i]-status;

tail = p-next;
i++;
}

While we could instead have:
  p = curr;
  while (p) {
cmp = ((i = q-nr)
   ? -1 : strcmp(p-path, q-queue[i]-two-path));

if (cmp  0) {
struct combine_diff_path *n = p-next;
/* p-path not in q-queue[]; drop it */
free(p);
p = *tail = n;
continue;
}

if (cmp  0) {
/* q-queue[i] not in p-path; skip it */
i++;
continue;
}

hashcpy(p-parent[n].sha1, q-queue[i]-one-sha1);
p-parent[n].mode = q-queue[i]-one-mode;
p-parent[n].status = q-queue[i]-status;

p = *(tail = p-next);
i++;
}

Of course, it only makes limited sense to recheck p after the second if, so
it would be clearer to write

  p = curr;
  while (p) {
cmp = ((i = q-nr)
   ? -1 : strcmp(p-path, q-queue[i]-two-path));

if (cmp  0) {
struct combine_diff_path *n = p-next;
/* p-path not in q-queue[]; drop it */
free(p);
p = *tail = n;
continue;
}

if (cmp  0) {
/* q-queue[i] not in p-path; skip it */
i++;
continue;
}

hashcpy(p-parent[n].sha1, q-queue[i]-one-sha1);
p-parent[n].mode = q-queue[i]-one-mode;
p-parent[n].status = q-queue[i]-status;

p = *(tail = p-next);
i++;
}

But that's sort of a red herring since the actual loop structure is
hidden in conditions where it does not belong.  (i = q-nr) is a
_terminal_ condition.

So it's more like

p = curr;
while (p) {
if (i = q-nr) {
*tail = NULL;
do {
struct combine_diff_path *n = p-next;
free(p);
p = n;
} while (p);
break;
}
cmp = strcmp(p-path, q-queue[i]-two-path));
if (cmp  0) {
struct combine_diff_path *n = p-next;
/* p-path not in q-queue[]; drop it */
free(p);
p = *tail = n;
continue;
}
if (cmp == 0) {
hashcpy(p-parent[n].sha1, q-queue[i]-one-sha1);
p-parent[n].mode = q-queue[i]-one-mode;
p-parent[n].status = q-queue[i]-status;

p = *(tail = p-next);
}
i++;
}

   if ((p = *tail) != NULL) {
   ...

 is a totally different issue.

Yes: it was just a matter of style instead of preventing _other_ code to
be rewritten in a clearer manner.

For a don't look 

Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-05 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 It's snake oil making debugging harder.

 OK, that is a sensible argument.

 This was fun ;-)

 At the expense of seriously impacting my motivation to do any further
 code cleanup on Git.

 Well, I said it was fun because I was learning from a new person
 who haven't made much technical or code aethesitics discussion here
 so far.  If teaching others frustrates you and stops contributing to
 the project, that is a loss.

The implication of Thanks for catching them, but this patch needs heavy
style fixes. is not one of learning.

 But the styles matter, as the known pattern in the existing code is
 what lets our eyes coast over unimportant details of the code while
 reviewing and understanding.  I tend to be pickier when reviewing code
 from new people who are going to make large contributions for exactly
 that reason---new blood bringing in new ideas is fine, but I'd want to
 see those new ideas backed by solid thiniking, and that means they may
 have to explain themselves to those who are new to their ideas.

Well, the point of stylistic decisions is exactly that they are not
individually backed by solid thinking: if they were, one would not
require a style.

A pattern of
some loop {
  ...
  if (condition) {
code intimately related to condition;
continue;
  }
  break;
}

is somewhat awkward.  The superficially simpler

some loop {
  ...
  if (!condition)
break;
  code intimately related to condition;
}

separates related parts with a central loop exit.  Which maybe a tossup.
The former pattern of break; at the end of a loop, however, becomes
indispensible in the form

some loop {
  ...
  switch (...) {
various cases ending in either break; or continue;
  }
  break;
}

In this case, the break; before the end of the loop establishes the
statement any commencement of the loop will be explicitly done using
continue;, particularly important since a break; inside of the switch
statement does not, without this help, break out of the loop.

It's a pattern that is transparent enough to be still preferable over
crank out the goto already, chicken.

Is if I have to use x in some situations anyway I may as well pick it
when there would be equivalent solutions solid thinking?  Not really.
It's about choosing one's familiars.  Which ultimately boils down to
personal style.  And where the differences are not really significant
for understanding and _are_ a conscious expression rather than just an
accident, the thin line between write in a way that does not go against
our style and/or good sense and write in the way I would have written
it may be the line between fun and work.

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup

Whitespace error in line 1778.  Should I be reposting?

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Making a single preparation run for counting the lines will avoid memory
 fragmentation.  Also, fix the allocated memory size which was wrong
 when sizeof(int *) != sizeof(int), and would have been too small
 for sizeof(int *)  sizeof(int), admittedly unlikely.

 Signed-off-by: David Kastrup d...@gnu.org
 ---
  builtin/blame.c | 40 
  1 file changed, 24 insertions(+), 16 deletions(-)

 diff --git a/builtin/blame.c b/builtin/blame.c
 index e44a6bb..522986d 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -1772,25 +1772,33 @@ static int prepare_lines(struct scoreboard *sb)
  {
   const char *buf = sb-final_buf;
   unsigned long len = sb-final_buf_size;
 - int num = 0, incomplete = 0, bol = 1;
 + const char *end = buf + len;
 + const char *p;
 + int *lineno;
 + 
 + int num = 0, incomplete = 0;

Is there any significance to the blank line between these two
variable definitions?

 +
 + for (p = buf;;) {
 + if ((p = memchr(p, '\n', end-p)) == NULL)
 + break;
 + ++num, ++p;

You have a peculiar style that is somewhat distracting.  Why isn't
this more like so?

for (p = buf; p++, num++; ) {
p = memchr(p, '\n', end - p);
if (!p)
break;
}

which I think is the prevalent style in our codebase.  The same for
the other loop we see in the new code below.

 - favor post-increment unless you use it as rvalue and need
   pre-increment;

 - SP around each binary ops e.g. 'end - p';

 - avoid assignments in conditionals when you do not have to.

 + }
  
 - if (len  buf[len-1] != '\n')
 + if (len  end[-1] != '\n')
   incomplete++; /* incomplete line at the end */

OK, so far we counted num complete lines and incomplete may be
one if there is an incomplete line after them.

 - while (len--) {
 - if (bol) {
 - sb-lineno = xrealloc(sb-lineno,
 -   sizeof(int *) * (num + 1));
 - sb-lineno[num] = buf - sb-final_buf;
 - bol = 0;
 - }
 - if (*buf++ == '\n') {
 - num++;
 - bol = 1;
 - }
 +
 + sb-lineno = lineno = xmalloc(sizeof(int) * (num + incomplete + 1));

OK, this function is called only once, so we know sb-lineno is NULL
originally and there is no reason to start from xrealloc().

 + for (p = buf;;) {
 + *lineno++ = p-buf;
 + if ((p = memchr(p, '\n', end-p)) == NULL)
 + break;
 + ++p;
   }
 - sb-lineno = xrealloc(sb-lineno,
 -   sizeof(int *) * (num + incomplete + 1));

These really *were* unnecessary reallocations.

Thanks for catching them, but this patch needs heavy style fixes.

 - sb-lineno[num + incomplete] = buf - sb-final_buf;
 +
 + if (incomplete)
 + *lineno++ = len;
 +
   sb-num_lines = num + incomplete;
   return sb-num_lines;
  }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Whitespace error in line 1778.  Should I be reposting?

Heh, let me try to clean it up first and then repost for your
review.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Making a single preparation run for counting the lines will avoid memory
 fragmentation.  Also, fix the allocated memory size which was wrong
 when sizeof(int *) != sizeof(int), and would have been too small
 for sizeof(int *)  sizeof(int), admittedly unlikely.

 Signed-off-by: David Kastrup d...@gnu.org
 ---
  builtin/blame.c | 40 
  1 file changed, 24 insertions(+), 16 deletions(-)

 diff --git a/builtin/blame.c b/builtin/blame.c
 index e44a6bb..522986d 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -1772,25 +1772,33 @@ static int prepare_lines(struct scoreboard *sb)
  {
  const char *buf = sb-final_buf;
  unsigned long len = sb-final_buf_size;
 -int num = 0, incomplete = 0, bol = 1;
 +const char *end = buf + len;
 +const char *p;
 +int *lineno;
 +
 +int num = 0, incomplete = 0;

 Is there any significance to the blank line between these two
 variable definitions?

Well, I needed more than the whitespace error to be motivated for
redoing.  Cough, cough.

 +
 +for (p = buf;;) {
 +if ((p = memchr(p, '\n', end-p)) == NULL)
 +break;
 +++num, ++p;

 You have a peculiar style that is somewhat distracting.  Why isn't
 this more like so?

   for (p = buf; p++, num++; ) {

More likely
for (p = buf;; p++, num++)

   p = memchr(p, '\n', end - p);
   if (!p)
   break;
   }

 which I think is the prevalent style in our codebase.  The same for
 the other loop we see in the new code below.

I rearranged a few times in order to have both loops be closely
analogous.  The second loop would then have to be

   for (p = buf;; p++) {
   *lineno++ = p-buf;
   p = memchr(p, '\n', end-p)
   if (!p)
   break;
   }

Admittedly, that works.  I am not too happy about the termination
condition being at the end of the loop but not in the for statement, but
yes, this seems somewhat nicer than what I proposed.

  - favor post-increment unless you use it as rvalue and need
pre-increment;

In my youth, the very non-optimizing C compiler I used under CP/M
produced less efficient code for x++ than for ++x even when not using
the resulting expression.  Surprisingly habit-forming.


  - SP around each binary ops e.g. 'end - p';

Ok.

 +}
  
 -if (len  buf[len-1] != '\n')
 +if (len  end[-1] != '\n')
  incomplete++; /* incomplete line at the end */

 OK, so far we counted num complete lines and incomplete may be
 one if there is an incomplete line after them.

That's pretty much the gist of the original code.

 -while (len--) {
 -if (bol) {
 -sb-lineno = xrealloc(sb-lineno,
 -  sizeof(int *) * (num + 1));
 -sb-lineno[num] = buf - sb-final_buf;
 -bol = 0;
 -}
 -if (*buf++ == '\n') {
 -num++;
 -bol = 1;
 -}
 +
 +sb-lineno = lineno = xmalloc(sizeof(int) * (num + incomplete + 1));

 OK, this function is called only once, so we know sb-lineno is NULL
 originally and there is no reason to start from xrealloc().

[...]

 These really *were* unnecessary reallocations.

Well, if a realloc will increase the allocation size by a constant
factor each time, the amortization cost is O(n) for n entries.  So with
a suitable realloc, the effect will not really be noticeable.  It still
offends my sense of aesthetics.

 Thanks for catching them, but this patch needs heavy style fixes.

Well, does not look all that heavy, but I'll repost.

There is another oversight: I am using memchr here, but there is no
obvious header file definiting it (the respective header will likely be
pulled in indirectly via something unrelated).

Anybody know offhand what I should be including here?  It looks like Git
has some fallback definitions of its own, so it's probably not just
string.h I should include?

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Whitespace error in line 1778.  Should I be reposting?

 Heh, let me try to clean it up first and then repost for your
 review.

 Thanks.

-- 8 --
From: David Kastrup d...@gnu.org

Making a single preparation run for counting the lines will avoid memory
fragmentation.  Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *)  sizeof(int), admittedly unlikely.

Signed-off-by: David Kastrup d...@gnu.org
---

 One logic difference from what was posted is that sb-lineno[num]
 is filled with the length of the entire buffer when the file ends
 with a complete line.  I do not remember if the rest of the logic
 actually depends on it (I think I use lineno[n+1] - lineno[n] to
 find the end of line, so this may matter for the last line),
 though.

 The original code dates back to 2006 when the author of the code
 was not paid for doing anything for Git but was doing it as a
 weekend and evening hobby, so it may not be so surprising to find
 this kind of what was I thinking when I wrote it inefficiency in
 such a code with $0 monetary value ;-)

 builtin/blame.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index e44a6bb..2364661 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1772,25 +1772,30 @@ static int prepare_lines(struct scoreboard *sb)
 {
const char *buf = sb-final_buf;
unsigned long len = sb-final_buf_size;
-   int num = 0, incomplete = 0, bol = 1;
+   const char *end = buf + len;
+   const char *p;
+   int *lineno;
+   int num = 0, incomplete = 0;
+
+   for (p = buf; p  end; p++) {
+   p = memchr(p, '\n', end - p);
+   if (!p)
+   break;
+   num++;
+   }
 
-   if (len  buf[len-1] != '\n')
+   if (len  end[-1] != '\n')
incomplete++; /* incomplete line at the end */
-   while (len--) {
-   if (bol) {
-   sb-lineno = xrealloc(sb-lineno,
- sizeof(int *) * (num + 1));
-   sb-lineno[num] = buf - sb-final_buf;
-   bol = 0;
-   }
-   if (*buf++ == '\n') {
-   num++;
-   bol = 1;
-   }
+
+   sb-lineno = lineno = xmalloc(sizeof(int) * (num + incomplete + 1));
+
+   for (p = buf; p  end; p++) {
+   *lineno++ = p - buf;
+   p = memchr(p, '\n', end - p);
+   if (!p)
+   break;
}
-   sb-lineno = xrealloc(sb-lineno,
- sizeof(int *) * (num + incomplete + 1));
-   sb-lineno[num + incomplete] = buf - sb-final_buf;
+   sb-lineno[num + incomplete] = len;
sb-num_lines = num + incomplete;
return sb-num_lines;
 }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Anybody know offhand what I should be including here?  It looks like Git
 has some fallback definitions of its own, so it's probably not just
 string.h I should include?

In general, no *.c files outside the compatibility layer should
include anything #include system-header.h, as there seem to be
finicky systems that care about order of inclusion and feature macro
definitions, all of which are meant to be handled by including
git-compat-util.h as the very first thing.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Whitespace error in line 1778.  Should I be reposting?

 Heh, let me try to clean it up first and then repost for your
 review.

 Thanks.

 -- 8 --
 From: David Kastrup d...@gnu.org

 Making a single preparation run for counting the lines will avoid memory
 fragmentation.  Also, fix the allocated memory size which was wrong
 when sizeof(int *) != sizeof(int), and would have been too small
 for sizeof(int *)  sizeof(int), admittedly unlikely.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

  One logic difference from what was posted is that sb-lineno[num]
  is filled with the length of the entire buffer when the file ends
  with a complete line.

Where's the difference?  This is exactly what will happen with my code
as well.  I _do_ rely on memchr(whatever, '\n', 0) to return NULL
without looking at any memory for that.  If there is a fear of memchr
not being able to deal with a count of 0, this code needs to be somewhat
more complex.

  I do not remember if the rest of the logic
  actually depends on it (I think I use lineno[n+1] - lineno[n] to
  find the end of line,

Well, you do it about _half_ the time.  The other half, you scan for the
'\n' explicitly.

  The original code dates back to 2006 when the author of the code
  was not paid for doing anything for Git but was doing it as a
  weekend and evening hobby, so it may not be so surprising to find
  this kind of what was I thinking when I wrote it inefficiency in
  such a code with $0 monetary value ;-)

Oh, _this_ patch is not in the I want to make money from it range.  If
that were the case, I should not have bothered at all.  This is just the
this code offends my sense of aesthetics class.  It's purely optional
to apply.  It's conceivable that it will make a performance difference
on non-glibc (or what it's called) platforms.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Anybody know offhand what I should be including here?  It looks like Git
 has some fallback definitions of its own, so it's probably not just
 string.h I should include?

 In general, no *.c files outside the compatibility layer should
 include anything #include system-header.h, as there seem to be
 finicky systems that care about order of inclusion and feature macro
 definitions, all of which are meant to be handled by including
 git-compat-util.h as the very first thing.

Ok, and that one's not yet in blame.c.  Will include, thanks.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Making a single preparation run for counting the lines will avoid memory
 fragmentation.  Also, fix the allocated memory size which was wrong
 when sizeof(int *) != sizeof(int), and would have been too small
 for sizeof(int *)  sizeof(int), admittedly unlikely.

 Signed-off-by: David Kastrup d...@gnu.org
 ---
  builtin/blame.c | 40 
  1 file changed, 24 insertions(+), 16 deletions(-)

 diff --git a/builtin/blame.c b/builtin/blame.c
 index e44a6bb..522986d 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -1772,25 +1772,33 @@ static int prepare_lines(struct scoreboard *sb)
  {
  const char *buf = sb-final_buf;
  unsigned long len = sb-final_buf_size;
 -int num = 0, incomplete = 0, bol = 1;
 +const char *end = buf + len;
 +const char *p;
 +int *lineno;
 +
 +int num = 0, incomplete = 0;

 Is there any significance to the blank line between these two
 variable definitions?

 +
 +for (p = buf;;) {
 +if ((p = memchr(p, '\n', end-p)) == NULL)
 +break;
 +++num, ++p;

 You have a peculiar style that is somewhat distracting.  Why isn't
 this more like so?

   for (p = buf; p++, num++; ) {
   p = memchr(p, '\n', end - p);
   if (!p)
   break;
   }

Ok, I now wrote

for (p = buf;; num++, p++) {
p = memchr(p, '\n', end - p);
if (!p)
break;
}

and you know what?  Its logic seems wrong.  The reason is that the p++
does not really have anything to do with the iteration, but rather steps
past the '\n' from the memchr.  So it's more like

for (p = buf;; num++) {
p = memchr(p, '\n', end - p);
if (p) {
p++;
continue;
}
break;
}

So barring protests, that's what I'm going to use instead.

-- 
David Kastrup

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Anybody know offhand what I should be including here?  It looks like Git
 has some fallback definitions of its own, so it's probably not just
 string.h I should include?

 In general, no *.c files outside the compatibility layer should
 include anything #include system-header.h, as there seem to be
 finicky systems that care about order of inclusion and feature macro
 definitions, all of which are meant to be handled by including
 git-compat-util.h as the very first thing.

 Ok, and that one's not yet in blame.c.  Will include, thanks.

No, don't.  Some well-known *.h files of ourse, most notably
cache.h, are known to include compat-util as the first thing, and
that is what *.c files typically include.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Ok, I now wrote

   for (p = buf;; num++, p++) {
   p = memchr(p, '\n', end - p);
   if (!p)
   break;
   }

Looks still wrong (perhaps this is a taste issue).

num++ is not loop control, but the real action of this
loop to count lines.  It is better left inside.

p++ is loop control, and belongs to the third part of
for(;;).

Isn't the normal continuation condition p  end?

so something like

for (p = buf; p  end; p++) {
p = find the end of this line
if (!p)
break;
num++;
}

perhaps?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Ok, I now wrote

  for (p = buf;; num++, p++) {
  p = memchr(p, '\n', end - p);
  if (!p)
  break;
  }

 Looks still wrong (perhaps this is a taste issue).

   num++ is not loop control, but the real action of this
   loop to count lines.  It is better left inside.

Ok.

   p++ is loop control, and belongs to the third part of
   for(;;).

No, it isn't.  The real loop control is the p = memchr line.  p++ only
skips over the newline.

   Isn't the normal continuation condition p  end?

memchr returns NULL when not finding anything any more.

 so something like

   for (p = buf; p  end; p++) {
   p = find the end of this line
 if (!p)
   break;
   num++;
   }

 perhaps?

Would crash on incomplete last line.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 so something like

  for (p = buf; p  end; p++) {
  p = find the end of this line
 if (!p)
  break;
  num++;
  }

 perhaps?

 Would crash on incomplete last line.

Hmph, even with if !p?   From your other message:

+   for (p = buf;; num++) {
+   p = memchr(p, '\n', end - p);
+   if (p) {
+   p++;
+   continue;
}
+   break;
+   }

If you flip the if statement around that would be the same as:

+   for (p = buf;; num++) {
+   p = memchr(p, '\n', end - p);
+   if (!p)
break;
p++;
+   }

And with loop action not in the control part, that would be the
same as:

for (p = buf; ; p++) {
p = memchr...
if (!p)
break;
num++;
}

no?  Would this crash on incomplete last line?

Ahh, p  end in for (p = buf; p  end; p++) { is not correct;
you depend on overrunning the end of the buffer to feed length 0 to
memchr and returning NULL inside the loop.  So to be equivalent to
your version, the contination condition needs to be

for (p = buf; p = end; p++) {

But that last round of the loop will be no-op, so p  end vs p =
end does not make any difference.  It is not even strictly
necessary because memchr() limits the scan to end, but it would
still be a good belt-and-suspenders defensive coding practice, I
would think.

+   for (p = buf;;) {
+   *lineno++ = p - buf;
+   p = memchr(p, '\n', end - p);
+   if (p) {
+   p++;
+   continue;
}
+   break;
}

Can we do the same transformation here?

for (p = buf;;) {
*lineno++ = p - buf;
p = memchr...
if (!p)
break;
p++;
}

which is the same as

for (p = buf; ; p++) {
*lineno++ = p - buf;
p = memchr...
if (!p)
break;
}

and the latter has the loop control p++ at where it belongs to. The
post-condition of one iteration of the body of the loop being p
points at the terminating LF of this line, incrementing the pointer
is how the loop control advances the pointer to the beginning of the
next line.

If we wanted to have a belt-and-suspenders safety, we need p =
end here, not p  end, because of how p is used when it is past
the last LF.  As we want to make these two loops look similar, that
means we should use p = end in the previous loop as well.

This was fun ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Philip Oakley

From: David Kastrup d...@gnu.org
To: Junio C Hamano gits...@pobox.com
Cc: git@vger.kernel.org
Sent: Tuesday, February 04, 2014 9:09 PM
Subject: Re: [PATCH] blame.c: prepare_lines should not call xrealloc for 
every line




Junio C Hamano gits...@pobox.com writes:


Junio C Hamano gits...@pobox.com writes:


David Kastrup d...@gnu.org writes:


Whitespace error in line 1778.  Should I be reposting?


Heh, let me try to clean it up first and then repost for your
review.

Thanks.


-- 8 --
From: David Kastrup d...@gnu.org

Making a single preparation run for counting the lines will avoid 
memory

fragmentation.  Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *)  sizeof(int), admittedly unlikely.

Signed-off-by: David Kastrup d...@gnu.org
---

 One logic difference from what was posted is that sb-lineno[num]
 is filled with the length of the entire buffer when the file ends
 with a complete line.


Where's the difference?  This is exactly what will happen with my code
as well.  I _do_ rely on memchr(whatever, '\n', 0) to return NULL
without looking at any memory for that.  If there is a fear of memchr
not being able to deal with a count of 0, this code needs to be 
somewhat

more complex.


A bit of googling found http://marc.info/?l=gnulib-bugm=130108029329021 
which suggests that behaviour can't be relied upon, and that perhaps 
some code is 'buggy' relative to expectations (hence the patch it 
proposed).


It suggests that one can't properly reference a zero length object.




 I do not remember if the rest of the logic
 actually depends on it (I think I use lineno[n+1] - lineno[n] to
 find the end of line,


Well, you do it about _half_ the time.  The other half, you scan for 
the

'\n' explicitly.


 The original code dates back to 2006 when the author of the code
 was not paid for doing anything for Git but was doing it as a
 weekend and evening hobby, so it may not be so surprising to find
 this kind of what was I thinking when I wrote it inefficiency in
 such a code with $0 monetary value ;-)


Oh, _this_ patch is not in the I want to make money from it range. 
If
that were the case, I should not have bothered at all.  This is just 
the
this code offends my sense of aesthetics class.  It's purely 
optional

to apply.  It's conceivable that it will make a performance difference
on non-glibc (or what it's called) platforms.

--
David Kastrup
--


Philip 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-04 Thread Philip Oakley

From: Philip Oakley philipoak...@iee.org

From: David Kastrup d...@gnu.org
To: Junio C Hamano gits...@pobox.com
Cc: git@vger.kernel.org
Sent: Tuesday, February 04, 2014 9:09 PM
Subject: Re: [PATCH] blame.c: prepare_lines should not call xrealloc 
for every line



[...]


Where's the difference?  This is exactly what will happen with my 
code

as well.  I _do_ rely on memchr(whatever, '\n', 0) to return NULL
without looking at any memory for that.  If there is a fear of memchr
not being able to deal with a count of 0, this code needs to be 
somewhat

more complex.


A bit of googling found 
http://marc.info/?l=gnulib-bugm=130108029329021 which suggests that 
behaviour can't be relied upon, and that perhaps some code is 'buggy' 
relative to expectations (hence the patch it proposed).


It suggests that one can't properly reference a zero length object.



I think I've confused myself between the patch 
https://lkml.org/lkml/2010/11/24/429 which fixed a bug in memchr() and 
the discussion in 
http://lists.gnu.org/archive/html/bug-gnulib/2011-03/msg00273.html 
(alternate link to marc.info) that discusses realloc() which has


we changed gnulib's test-memchr to
instead test memchr(zerosize_ptr(),a,0) rather than memchr(NULL,a,0).
However, once you can handle memchr(,0) on any other zero-size object,
most implementations also happen to do what you want for
memchr(NULL,a,0), even though it is technically undefined behavior.
ending with 'technically undefined behavior' which I misconstrued.

Apologies for the noise.




 I do not remember if the rest of the logic
 actually depends on it (I think I use lineno[n+1] - lineno[n] to
 find the end of line,



...

--
David Kastrup
--


Philip
--


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html