Re: Circumstances in which ChangeLog format is no longer useful

Joseph Myers Fri, 28 Jul 2017 16:48:07 -0700

On Fri, 28 Jul 2017, Alfred M. Szmidt wrote:

>    1. The package has a public version control system.
> 
>    (Rationale: this ensures people can see what changed, just as with
>    ChangeLogs, but can see *exactly* what changed rather than just the
>    brief descriptions.)
> 
> I think that rationale is incorrect, just because you have a public
> version control system does not mean that you can see what actually
> changed.  Going through multiple megabytes of diffs is not feasible,
> and searching for when something was renamed, added, removed, etc is
> something no tool is capable of providing.


That's a function of a busy project and is the same whether you look at 
commit logs, diffs or ChangeLog messages.

>    2. The version control uses a distributed version control system.
> 
>    (Rationale: this ensures people can get a complete copy of the
>    history of what changed, as they can with ChangeLog files in
>    releases.)
> 
> How would the information that is normally available in a ChangeLog
> file be populated if all that information is in the VCS?  That would
> still be needed for normal tarballs and the like when VCS is out the
> window.

I wouldn't object to shipping the version control history in tarballs, if 
necessary to stop having to write in the ChangeLog format (or having 
tarballs with and without the version control history).  Or straight 
copies of the version control logs, as long as no-one actually has to 
manually write the list of files, named entities within those files and 
what is changed in each named entity (instead just having human-written 
logs describing what changed at the logical level).  But I believe that 
people wanting to look at the history are going to check out the 
repository rather than attempting to get it from tarballs.

>    3. Commits are made for each logical change, not batched into a
>    commit per release or per day or other such batching.
> 
>    (Rationale: this ensures as much separation of logically separate
>    changes as there would be in a ChangeLog file.)
> 
> This is I think a good idea, the bunching of ChangeLog entries always
> feelt a bit weird.

I'd actually like points 1 and 3 (public VCS with logical commits) to be 
required for all GNU packages, but that's independent of my present point.

>    5. Commit messages describe the logical "what" changed (but don't
>    necessarily describe the physical "what" at the level of changes to
>    individual files and functions).
> 
>    (Rationale: the logical "what" is useful information at the human
>    level for understanding the change.  Listing individual changed
>    files and functions both duplicates the information available from
>    the version control system, and is at the wrong level for
>    understanding the change for most purposes.
> 
> I am not sure I understand why it is wrong, to be able to understand
> how something came to be one needs to look at how things changed --
> and only way to do that is with a ChangeLog entry.

The only way to do that reliably is with the version control history.  
Which is what people expect to use to look at how something came to be as 
it is - they expect to check out a repository, not to look at a tarball 
for that information.

>    It's normal in glibc, for example, for a change to affect many
>    separate files and named entities in those files, in ways that are
>    repetitive but not repetitive enough to use e.g. "All callers
>    changed", and which the ChangeLog format does not provide a good
>    fit to or result in useful information about the changes not
>    available from version control.)
> 
> Knowing that all callers have been change is I think useful
> information, why do you think the opposite?

The point is that the changes are mechanical, but not in a way that 
corresponds to "all callers changed", and that listing all the named 
entities changed and how they changed is error-prone, time-consuming 
(possibly taking longer than writing the patch itself) and results in a 
ChangeLog entry that is completely useless for people wanting to 
understand the change (who will want the description at the logical level, 
and if they want the exact details for each named entity, will find the 
version control history more useful).

Here's a representative example of a ChangeLog entry I wrote recently.  I 
think that given the logical description (summary line plus two 
paragraphs) in the commit message, and given the commit history for anyone 
interested in the exact details of how particular files or entities were 
changed, writing this ChangeLog entry was a complete waste of time and it 
provides nothing useful for anyone using or developing glibc.  And this 
sort of mostly-mechanical change, with many files and entities therein 
changed in similar but not identical ways, is very common when working on 
glibc; I've written a great many such ChangeLog entries, some much longer 
than this one with hundreds of named entities enumerated as changed.  And, 
similarly, for GCC changes.  Spending the time to write several paragraphs 
at the human level about the content and purpose of a change is 
worthwhile.  Spending the time to duplicate, badly, the information in the 
diff itself about changed files and entities therein is just an extra 
unnecessary hoop to jump through when making a change.

2017-06-01  Joseph Myers  <[email protected]>

        [BZ #21457]
        * sysdeps/arm/sys/ucontext.h (NGREG): Rename to __NGREG and define
        NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (mcontext_t): Use __ctx in defining fields.
        * sysdeps/i386/sys/ucontext.h (NGREG): Rename to __NGREG and
        define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (__ctxt): Likewise.
        (fpregset_t): Use __ctx and __ctxt in defining fields.
        (mcontext_t): Likewise.
        * sysdeps/m68k/sys/ucontext.h (NGREG): Rename to __NGREG and
        define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (mcontext_t): Use __ctx in defining fields.
        * sysdeps/mips/sys/ucontext.h (NGREG): Rename to __NGREG and
        define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (fpregset_t): Use __ctx in defining fields.
        (mcontext_t): Likewise.
        * sysdeps/unix/sysv/linux/alpha/sys/ucontext.h (NGREG): Rename to
        __NGREG and define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
        [__USE_MISC].
        (fpregset_t): Define using __NFPREG.
        * sysdeps/unix/sysv/linux/m68k/sys/ucontext.h (NGREG): Rename to
        __NGREG and define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (fpregset_t): Use __ctx in defining fields.
        (mcontext_t): Likewise.
        * sysdeps/unix/sysv/linux/mips/sys/ucontext.h (NGREG): Rename to
        __NGREG and define NGREG to __NGREG if [__USE_MISC].
        (NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
        [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (__ctx): New macro.
        (fpregset_t): Use __ctx in defining fields.
        (mcontext_t): Likewise.
        * sysdeps/unix/sysv/linux/nios2/sys/ucontext.h (__ctx): New macro.
        (mcontext_t): Use __ctx in defining fields.
        * sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (__ctx): New
        macro.
        [__WORDSIZE == 32] (NGREG): Rename to __NGREG and define NGREG to
        __NGREG if [__USE_MISC].
        [__WORDSIZE == 32] (gregset_t): Define using __NGREG.
        [__WORDSIZE == 32] (fpregset_t): Use __ctx in defining fields.
        (mcontext_t): Likewise.
        [__WORDSIZE != 32] (NGREG): Rename to __NGREG and define NGREG to
        __NGREG if [__USE_MISC].
        [__WORDSIZE != 32] (NFPREG): Rename to __NFPREG and define NFPREG
        to __NFPREG if [__USE_MISC].
        [__WORDSIZE != 32] (NVRREG): Rename to __NVRREG and define NVRREG
        to __NVRREG if [__USE_MISC].
        [__WORDSIZE != 32] (gregset_t): Define using __NGREG.
        [__WORDSIZE != 32] (fpregset_t): Define using __NFPREG.
        [__WORDSIZE != 32] (vscr_t): Use __ctx in defining fields.
        [__WORDSIZE != 32] (vrregset_t): Likewise.
        [__WORDSIZE != 32] (mcontext_t): Likewise.
        * sysdeps/unix/sysv/linux/s390/sys/ucontext.h (__ctx): New macro.
        (__psw_t): Use __ctx in defining fields.
        (NGREG): Rename to __NGREG and define NGREG to __NGREG if
        [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (fpreg_t): Use __ctx in defining fields.
        (fpregset_t): Likewise.
        (mcontext_t): Likewise.
        * sysdeps/unix/sysv/linux/sh/sys/ucontext.h (NGREG): Rename to
        __NGREG and define NGREG to __NGREG if [__USE_MISC].
        (gregset_t): Define using __NGREG.
        (NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
        [__USE_MISC].
        (fpregset_t): Define using __NFPREG.
        (__ctx): New macro.
        (mcontext_t): Use __ctx in defining fields.
        * sysdeps/unix/sysv/linux/x86/sys/ucontext.h (__ctx): New macro.
        [__x86_64__] (NGREG): Rename to __NGREG and define NGREG to
        __NGREG if [__USE_MISC].
        [__x86_64__] (gregset_t): Define using __NGREG.
        [__x86_64__] (struct _libc_fpxreg): Use __ctx in defining fields.
        [__x86_64__] (struct _libc_fpstate): Likewise.
        [__x86_64__] (mcontext_t): Likewise.
        [!__x86_64__] (NGREG): Rename to __NGREG and define NGREG to
        __NGREG if [__USE_MISC].
        [!__x86_64__] (gregset_t): Define using __NGREG.
        [!__x86_64__] (struct _libc_fpreg): Use __ctx in defining fields.
        [!__x86_64__] (struct _libc_fpstate): Likewise.
        [!__x86_64__] (mcontext_t): Likewise.

> Being able to generate the ChangeLog file is I think important for
> posterity, tarball releases lack any kind of history.  History has a
> really bad memory, just because one uses a VCS today doesn't mean that
> this will be available in 10, 20, 30 years in any usable format, or it
> might vanish completley.

Well, you could add a requirement not to switch away from a distributed 
VCS or to switch to a different VCS without converting history.  And 
indeed one to have the repository present or mirrored on GNU servers, if 
desired (or to have release tarball versions that include the VCS history, 
etc.).

>      Keep a change log to describe all the changes made to program
>      source files.  The purpose of this is so that people
>      investigating bugs in the future will know about the changes that
>      might have introduced the bug.  Often a new bug can be found by
>      looking at what was recently changed.  More importantly, change
>      logs can help you eliminate conceptual inconsistencies between
>      different parts of a program, by giving you a history of how the
>      conflicting concepts arose and who they came from.
> 
>    All this information is available in version control.
> 
> If you put ChangeLog entries in the commit message, then yes this
> information will be available.  But if you discard ChangeLog entries
> completley, I do not see how it can be available.  "annotate", "diff"
> don't provide a human readable and searchable means to go through
> history.  The information also becomes totally lost as soon as you
> discard the VCS (i.e. when doing releases).

You know about the changes much more reliably from the VCS than from 
ChangeLog entries, given that people may forget to write the ChangeLog 
entry, or may miss out a file or function's changes from it, or may commit 
with a ChangeLog from a previous version of the patch that doesn't 
correspond accurately to the committed patch version (given the make-work 
nature of writing most ChangeLog entries, and given they are something not 
generally used outside the GNU project, updating them is often something 
people don't think of doing - again, watching for badly updated ChangeLog 
entries in patch review is both necessary at present, and essentially a 
waste of time).  You can use e.g. "git log -p --stat" and search for file 
or function names (function names mentioned automatically on the @@ line 
of diff context are going to be at least as accurate as those in ChangeLog 
entries, given they are probably what people use when writing their 
ChangeLog entries to identify the functions changed).  The precise details 
may differ, but you have much more flexibility when looking at the actual 
history than something written at a very specific level (too high to 
actually undo the changes, too low to readily get an overall understanding 
of a complicated change) for ChangeLogs.

>    Because the problem with ChangeLogs, as seen in glibc and
>    elsewhere, is with needing to write descriptions in a particular
>    format, at a level that is not useful for human understanding of
>    the changes while not being as detailed as the exact changes
>    themselves in version control, being able to generate ChangeLogs
>    from version control using suitably-formatted log messages does not
>    address the issue.
> 
> Are we talking about the entries, or the actual ChangeLog file?  Many
> projects have abandoned keeping actual ChangeLog files, and extracting
> this information when making a tarball release since they cause the
> typical merge conflics and what nots.  If you are refering to the
> ChangeLog entries, I am not sure what problems you are refering to.

The problem that enumerating individual named entities changed consumes 
the time of contributors, confuses and puts off people used to non-GNU 
free software which invariably does not use this particular pre-VCS form 
of describing changes, and results in long unhelpful descriptions which 
don't allow you to see the wood for the trees because of the focus on a 
particular low-level repetition of what the change itself is for each 
individual entity, as can be seen in the VCS, rather than what the change 
is as a logical whole.

>    This form of description is exactly what's the problem.  In the
>    presence of ubiquitous distributed version control, writing this
>    style of description is the equivalent of:
> 
>      /* Add 1 to i.  */
>      i++;
> 
>    (that is, just repeating the immediately obvious meaning of the
>    history that everyone can see, and so effectively serving to hide
>    what's actually interesting about the history at a human level and
>    *should* be described).
> 
> I don't think the comparison is fair, the point of the ChangeLog files
> is to be able to undo changes.  The comment above doesn't actually

The point of the VCS is to be able to undo changes.  ChangeLog files, and 
the form of change description therein, are in no way a substitute for the 
VCS, and are essentially obsoleted by it.

> provide anything, a more apt comparison would have been:
> 
>      /* Change #1 was: Add pi to i.  */
>      /* Change #2 was: Add 1 to i.  */
>      i += 2;

No, my assertion is that "Add 1 to i." is to "i++;" as the above long 
ChangeLog entry is to the actual commit involved - a repetition of what 
everyone can plainly see from looking at the thing described (a C 
statement in the first case, a commit in the git history of glibc in the 
second case), and so completely useless.

Instead of "Add 1 to i." you should describe logical blocks of code at the 
logical level with things that aren't immediately repeating the obvious 
semantics of the code.  And, likewise, the actual commit message

    Fix more namespace issues in sys/ucontext.h (bug 21457).
    
    Continuing the fixes for namespace issues in sys/ucontext.h, this
    patch moves various symbols into the implementation namespace in the
    absence of __USE_MISC.  As with previous changes, it is nonexhaustive,
    just covering more straightforward cases.
    
    Structure fields are generally changed to have a prefix __ in the
    absence of __USE_MISC, via a macro __ctx (used without a space before
    the open parenthesis, since the result is a single identifier).
    Various macros such as NGREG also have leading __ added.  No changes
    are made to structure tags (and thus to C++ name mangling), except
    that in the (unused) file sysdeps/i386/sys/ucontext.h, structures
    defined inside other structures as the type for a field have their
    tags removed in the non-__USE_MISC case (those structure tags would
    not in any case have been visible in C++, because in C++ the scope of
    such a tag is limited to the containing structure).  No changes are
    made to the contents of bits/sigcontext.h, or to whether it is
    included.  Because of remaining namespace issues, this patch does not
    yet fix the bug or allow any XFAILs to be removed.

describes the logical nature of the change (including what is *not* 
changed, where relevant, which ChangeLog files would never mention), at 
the appropriate level for people to understand it.  I think people should 
be writing commit logs at that level rather than spending time duplicating 
the VCS information on exactly which symbols were changed in which files.

> That information is very useful when digging for bugs, and
> understanding how a code base was changed.  Just because one uses VCS
> doesn't mean that history is automatically available to everyone,
> someone still needs to write a commit message of some sort (i.e. the
> ChangeLog entry)

I.e. the sort of message above that you use to justify and explain the 
change at the logical level rather than enumerating files and symbols 
therein.

I'm all for proper detailed commit messages explaining both the content 
and the purpose of the change at the logical level, as used by the Linux 
kernel and by git itself.  It's the descriptions at the per-file, 
per-function level in the ChangeLog format that I consider unhelpful when 
they duplicate VCS information, badly.  I think GNU should be encouraging 
the sort of commit messages used by the Linux kernel and git, i.e. the 
sort of patch description you'd put in a mailing list message proposing 
and explaining the patch, while leaving the VCS to show what files and 
bits of files were changed, how, for those interested in that information.

> Sifting through multi-megabyte diffs isn't very fun when trying to get
> a birds eye view of what actually happened in a code base, and this is
> where ChangeLog entries are super useful and I'd argue totally
> nessecary for any code base.

I don't think so.  If someone wants to understand what changed between 
glibc 2.25 and 2.26 in more detail than the NEWS file gives, they might 
look at the above sort of description in the commit log; it will be much 
more helpful to them, and give much more insight into glibc development, 
than over 10000 lines of ChangeLog entries enumerating files and symbols.  
If they want to see the files changed, git log --stat.  If they want to 
see deeper into particular changes, git log -p --stat and look at 
whichever changes are of interest.

-- 
Joseph S. Myers
[email protected]

Re: Circumstances in which ChangeLog format is no longer useful

Reply via email to