Re: Do not merge index entries with equal sort keys?

arnold Mon, 27 Apr 2026 07:08:33 -0700

Eli Zaretskii <[email protected]> wrote:

> > From: Gavin Smith <[email protected]>
> > Date: Mon, 27 Apr 2026 14:09:37 +0100
> > Cc: [email protected], [email protected]
> > 
> > On Mon, Apr 27, 2026 at 02:48:23PM +0300, Eli Zaretskii wrote:
> > > > From: Gavin Smith <[email protected]>
> > > > Date: Sun, 26 Apr 2026 22:01:37 +0100
> > > > Cc: [email protected]
> > > > 
> > > > I remember that the convention of merging index entries with the
> > > > same sort key was very old (from when I looked at this before, several
> > > > years ago), but I thought we could reconsider this, as I do not actually
> > > > see any advantage of merging the entries.
> > > > 
> > > > Does anyone have an opinion on this?
> > > 
> > > Isn't this what causes the Index to have stuff like
> > > 
> > >   Foo bar.......................................42, 142, 442
> > > 
> > > rather than
> > > 
> > >   Foo bar.......................................42
> > >   Foo bar.......................................142
> > >   Foo bar.......................................442
> > > 
> > > That is, if the same subject is mentioned in several places, have on
> > > cumulative entry for it in the index with all the pages?  If so, I
> > > see a clear advantage to merging the entries, at least for
> > > non-punctuation characters.
> > 
> > I propose that they only be merged if the index entry text is identical.
> > 
> > Thus, the following two index entries should be merged:
> > 
> > @cindex Foo bar
> > @cindex Foo bar
> > 
> > However, the following index entries should be distinct:
> > 
> > @cindex Foo bar
> > @cindex Foo @code{bar}
> > @cindex Foo @command{bar}
> > @cindex Föö bar
> > 
> > I notice in the NEWS file for Texinfo, in the section for 6.0, there
> > is:
> > 
> > * texindex:
> >   . completely new implementation as a literate program using Texinfo
> >     and (portable) awk (called TexiWeb Jr.), thanks to Arnold Robbins.
> >     (Requires gawk 4.0+ if .twjr source is modified.)
> >   . the -o (--output) is not supported, unless we hear of someone using it.
> >   . duplicated sort keys with different display texts result in one
> >     merged index entry, using the first display text.
> >   . better sorting and parsing in unusual cases; most notably, { and }
> >     characters can appear as initials.
> > 
> > Bullet point 3 is what I am talking about here.
>
> I guess I didn't understand what you meant by "the same sort key".  I
> thought the entire text of each index entry is "the sort key", so I
> interpreted "the same key" as "the same text of index entry".  This is
> your first example.  It seems now that you are talking about collation
> that ignores certain secondary/tertiary weights, like accents and
> punctuation?  If so, then whether this is a Good Thing should indeed
> be controllable by some option, because it is quite possible that
> someone will want to merge them.


Eli, I think you're missing an additional wrinkle. It's possible
to explicitly say "use this text for sorting" via @sortas{}, and that
text can be different from the "display text".

By default having

        @cindex Foo bar
        @cindex Foo @code{bar}
        @cindex Foo @command{bar}

be distinct is a Good Thing.  It lets you see where you've been inconsistent
in the use of indexing markup, which can happen easily.

A few years ago I put *a lot* of work into the index in the gawk manual,
and the kind of thing I just described showed up in a few places.

On a related point, the gawk manual has three pages of symbol index
before the letter pages, so not merging symbols is the right behavior
for me.  This is simply proof that there both behaviors need to be
available, whatever the default is.   I have no problem with changing
the default, as long as I have the option to change it back for
my manual.

Thanks,

Arnold

Re: Do not merge index entries with equal sort keys?

Reply via email to