On Mon, May 8, 2017 at 2:55 PM, Jeff King <p...@peff.net> wrote:
> On Mon, May 08, 2017 at 10:02:58AM +0900, Junio C Hamano wrote:
>
>> Stefan Beller <sbel...@google.com> writes:
>>
>> > I guess it is ok for now and in this series, but we may want
>> > to split up diff.[ch] in the future into multiple finer grained files.
>>
>> For what end?  Such a split would require more symbols internal to
>> diff.[ch] to become external, which is a big downside, so we need to
>> have a large reward to compensate it if we were to go there.
>
> I think there are two sides to that coin. Let's say you have a file with
> five functions (and you can replace function with structs, global
> variables, etc; any unit of complexity that might or might not be
> externally visible):
>
>   /* called by both a() and b() */
>   static void a_and_b_helper();
>
>   /* called by a() */
>   static void a_helper();
>   void a();
>
>   /* called by b() */
>   static void b_helper();
>   void b();
>
> When they are in the same file, b() and b_helper() can see a_helper(),
> even though they don't need it. And that means increased complexity in
> dealing with a_helper(), because its visibility is larger than
> necessary. We have to worry about what a change to it might do to b()
> (or more realistically, c(), d(), etc).
>
> If we split this apart, we end up with three files:
>
>    common.c:
>      void a_and_b_helper();
>
>    a.c:
>      static void a_helper();
>      void a();
>
>    b.c:
>      static void b_helper();
>      void b();
>
> The specific helpers have less visibility, which is good. The public
> functions a() and b() were already public, so no change. But now the
> common helper is public, too, even though nobody except a() and b() care
> about it.
>
> So it's a tradeoff. And the important question is: is the bleed-over
> between a() and b() worse than the common bits being made public?  That
> can't be answered without looking at how many distinct "a" and "b"-like
> chunks there are in the file, and what the common bits look like. I'm
> not sure of the answer for diff.c. Without digging, both ends of the
> spectrum seem equally plausible to me: that it is mostly a set of N
> distinct units which could be split apart, or that it really is a few
> public functions calling the same common core over and over.
>
> And a follow-on question is what we can do to mitigate the cost of
> making the common code public. We could advertise a_and_b_helper() only
> in diff-internal.h, and then makes it only semi-public (anybody can
> include that, of course, but including diff-internal.h seems like it
> ought to be a red flag). That only helps the programmer, though; we'd
> still be losing out on compiler optimizations and static analysis that
> only looks at one translation unit at a time.
>
> Phew. That ended up a little long-winded. All of it is to say that I
> don't think it makes sense to reject a split out-of-hand, but nor is it
> always a good thing. It depends on what's in the file.

I agree on this sentiment. It really depends on the content under
discussion whether it makes sense to split.

Having had some exposure recently for diff.[ch] (and I just picked up
that series again, but did not send it out yet), I have the impression that
we do have a lot of code in diff.c, which is quite unrelated to each
other, i.e. a lot of [a,b]_helper()s, and few a_and_b_helper() for the
already public functions.

May first mail was based on perceived unneeded complexity, which
slows down in achieving a goal.

Thanks,
Stefan

Reply via email to