One of the threats to a project's health is bloat, or scope creep. I
feel like it is an appropriate moment, following the Texinfo release,
to report on some ways that the Texinfo project could be seen as bloated.
(You could also say that considering big picture issues is a pleasure
we can allow ourselves after the time considering small picture issues.)
This could be of use to current or future developers of the project.
I've come to think of the Unix philosophy of "do one thing, and do it
well" as a maxim of project management, as much as it is a criterion
for the beauty or utility of a technical product. If too much work
is funnelled through a single person, they will be spread too thin,
and drop things on the floor (to mix three metaphors).
A bloated project can die because the current developers may feel daunted
by the prospect of making new changes as they are increasingly difficult
and time-consuming to make, and potential new developers fail to get
to grips with the code in order to make contributions with the time or
effort they are willing to put into it.
Responsibilities and tasks grow while the enthusiasm, time and ability of
workers to fulfil these diminishes.
A few years ago, Richard Stallman wrote the following on a private
mailing list, in response to another email I wrote about the problems
of project maintenance:
> An idea that occurs to me is to make it easy to carve out a niche
> within the project which is modular enough that a few contributors can
> do a lot of work inside it without having to coordinate very often
> about things outside that niche.
(2021-01-02)
I think that's worth thinking about.
Readers may have had the experience themselves of attempting to work on
a code base that appears to have been abandoned by its original developers,
and failing to make headway on it in any reasonable time.
The Texinfo project is not dead at the moment but it may still get more
and more bloated over time.
Hence, I have been keeping a list of aspects of the project that could be
considered bloat.
* The infog/ document browser embedding a WebKitGTK web browser, for viewing
local HTML documentation. The existing developers clearly don't have time to
work on this, and it seems unlikely that it would ever become used by many
people. We should work on defining standards for Texinfo HTML documentation
instead and promote the installation of local HTML manuals and reading them
by existing help programs instead (see TODO.HTML in the Texinfo repository
for some notes).
* The browsing interface under js/, including the info.js file (enabled
with the INFO_JS_DIR option to Texinfo). The existing Texinfo developers
very likely don't have the time or interest to become expert JavaScript
programmers or web developers, and again, this part of the code base sees
minimal development. Although this code does have a small number of users,
it would be better developed outside of the Texinfo project by JavaScript
enthusiasts. Again, defining the format of HTML output by texi2any would
be a help in making a separation here.
* The Texinfo language reference card. It is one more thing to keep
updated, although there's no indication that many people look at it.
The reference card is actually 5 cards as the document goes across 5
pages.
* The pod2texi program. I suppose the purpose of this program is clear:
it converts POD to Texinfo. I've never used it and never touched the
sources. It does not seem to be getting in the way of other development,
so can probably be left alone.
* In the Texinfo language, there are several commands that are unnecessary.
@definfoenclose is the worst one as this allows defining new Texinfo
commands, which adds significant complexity. (When writing the
Parsetexi C code and I was reading Texinfo/Parser.pm for the first
time, I kept on asking myself, what is this definfoenclose feature I
keep seeing referred to all over the code?) Unfortunately we don't
want to break building old versions of documentation unnecessarily,
so should only remove commands if they were never used by anybody in
the first place (I believe @clickstyle may have been in this category).
Many of my comments relate to conversion of Texinfo:
* First, there are two main converters: texinfo.tex with TeX, and texi2any.
Implementing significant new changes in syntax using TeX is often difficult.
TeX is simply an inappropriate system for many aspects of text processing.
On the plus side, conversion of Texinfo manuals with TeX is very well
understood, and I usually find it quick to fix problems with texinfo.tex.
It helps that TeX is a stable, limited and self-contained system, and
eventually somebody working with a system built on top of TeX will get
to a sufficient level of understanding (which probably can't be said of
the LaTeX ecosystem or JavaScript frameworks).
It's not clear what would replace texinfo.tex (as, indeed, it is not
clear what could replace TeX). The LaTeX output from texi2any is not
a sufficient replacement in my opinion: for one thing, it has more
dependencies. Moreover, it adds an extra layer of indirection to the
output, making it harder to fine-tune the final result.
One idea is to make texi2any output plain TeX code that doesn't use
TeX macros, which would eliminate the macro programming side of TeX
(no more \expandafter). This means we would avoid having to output
to PDF directly in texi2any, implementing our own line-breaking algorithm
and so on.
I am not advocating for this change and there isn't any imminent need
for it but am mentioning it here as a possibility for the future.
* texi2any. This is the component of the code that sees the most
development, and which has the most complexity.
- First, it could be its own project, separate from the other programs
in Texinfo like info and install-info. It already has its own
configure.ac and gnulib checkout so would cleanly separate from
the rest of the code.
(I don't see any need for such a separation, but am mentioning it
as a possibility as that is something that could potentially lead to
more focused development.)
However, it's coupled to texinfo.tex in terms of the Texinfo language.
texinfo.tex is then coupled to the texindex and texi2dvi programs.
The info program is coupled to the output of texi2any (e.g. supporting
changes to the Info output like INFO_MATH_IMAGES that we recently
implemented). So it's probably not sensible to split things up.
- It has two versions for much of its code - C and Perl. This has
been discussed previously. Needless to say, this more than doubles
the maintenance burden.
- Features of texi2any that could be considered bloat:
* The XML format output with --xml. This is not actually useful
for anything.
The Texinfo manual suggests that users might want to use this as a
starting point for conversion to other formats, but I'd actually
rather they didn't, as it means that we have to maintain the XML
converter, and would be better to have the converters built into
texi2any like the other output formats.
It's not a huge maintenance burden, except remembering to update
the DTD when the language changes, and the time spent running
the tests for XML output, of course.
* The IXIN output format. This format is not likely to see any
further development, especially following the sad death of
its creator Thien-Thi Nguyen. This may not be an issue as the
IXIN conversion code is now specificially in an "Example"
subdirectory.
* In HTML output, the option to create a special "About" page
with the DO_ABOUT variable does not seem useful.
* The SORT_ELEMENT_COUNT variable does not seem useful:
If set, the name of a file to which a list of elements (nodes or
sections, depending on the output format) is dumped, sorted by the
number of lines they contain after removal of @-commands; default
unset. This is used by the program ‘texi-elements-by-size’ in the
‘util/’ directory of the Texinfo source distribution (*note
texi-elements-by-size::).
* The converter spends time and uses memory building "source marks"
with details of expanded macros and included source files. This
information is not used in the output conversion, unless converting
back to the Texinfo source it started with. It's possible that
some of this processing could be made optional for efficiency
(I haven't investigated in detail how this could be accomplished.)
* The Texinfo::Reader interface (new in 7.3). tta/README in the
Texinfo sources explains:
> The modules in perl/Texinfo/Example are not developped anymore. Docbook
> conversion modules in this directory were developped using an interface
> consisting of Texinfo::Reader, Texinfo::TreeElement and
> Texinfo::Example::TreeElementConverter as a proof of concept. However, this
> interface proved to be too slow in Perl and difficult to implement with XS
> code. The Reader and TreeElement interface (except for one function) are not
> used from Perl anymore. Going forward, the SWIG interface based on the
> Reader, Parser, Structuring and Texinfo Document C codes should
> be used. The SWIG interface is in the swig directory. Texinfo::Reader and
> Texinfo::TreeElement (except for the 'new' function) should not be used
> anymore.
* The --transliterate-file-names feature. This feature was only just
turned off by default in the recent release. It entails bundling
the Text::Unidecode Perl module with Texinfo, which although is only
1.3MB when extracted, bloats the directory listing (e.g. the output
of "tar tf") with 289 files - you may or may not agree that this
is a major problem.
* The Unicode::CollateStub replacement in perl/Texinfo/Indices.pm
is only needed on Red Hat-like systems where Unicode::Collate is not
installed. Recently with pre-release testing, this code had a problem
which we fixed. It isn't tested regularly.
- The texi2any API is major source of potential bloat.
Here's what I wrote in a private mail to Patrice on 2025-06-29:
> As you know my concern about using different languages for texi2any within
> the Texinfo project is one of simplicity and long-term maintainability.
>
> My concern about user code is one of maintainability. How well can we change
> the internals of texi2any if there is a lot of user code relying on internal
> details?
>
> For example, Linux famously does not have a stable ABI for device
> drivers:
>
> > You think you want a stable kernel interface, but you really do not,
> > and you don’t even know it. What you want is a stable running driver,
> > and you get that only if your driver is in the main kernel tree.
> >
> > ...
> >
> > As such, the kernel developers find bugs in current interfaces, or
> > figure out a better way to do things. If they do that, they then fix the
> > current interfaces to work better. When they do so, function names may
> > change, structures may grow or shrink, and function parameters may be
> > reworked. If this happens, all of the instances of where this interface
> > is used within the kernel are fixed up at the same time, ensuring that
> > everything continues to work properly.
>
> https://www.kernel.org/doc/html/latest/process/stable-api-nonsense.html
>
> (Of course, texi2any could be different for some reason.)
>
> Here's another argument against function call interfaces as a stable
> interface:
>
> > Remote Procedure Calls
> >
> > ...
> >
> > As a related issue, interfaces that have richer type signatures also
> > tend to be more complex, therefore more brittle. Over time, they tend
> > to succumb to ontology creep as the inventory of types that get passed
> > across interfaces grows steadily larger and the individual types more
> > elaborate. Ontology creep is a problem because structs are more likely
> > to mismatch than strings; if the ontologies of the programs on each side
> > don't exactly match, it can be very hard to teach them to communicate
> > at all, and fiendishly difficult to resolve bugs. The most successful
> > RPC applications, such as the Network File System, are those in which
> > the application domain naturally has only a few simple data types.
> >
> > The usual argument for RPC is that it permits “richer” interfaces
> > than methods like text streams — that is, interfaces with a more
> > elaborate and application-specific ontology of data types. But the Rule
> > of Simplicity applies! We observed in Chapter 4 that one of the functions
> > of interfaces is as choke points that prevent the implementation details
> > of modules from leaking into each other. Therefore, the main argument
> > in favor of RPC is also an argument that it increases global complexity
> > rather than minimizing it.
> >
> > With classical RPC, it's too easy to do things in a complicated and
> > obscure way instead of keeping them simple. RPC seems to encourage the
> > production of large, baroque, over-engineered systems with obfuscated
> > interfaces, high global complexity, and serious version-skew and
> > reliability problems — a perfect example of thick glue layers run amok.
>
> http://www.catb.org/esr/writings/taoup/html/ch07s03.html
The SWIG interface, in providing an API for more programming languages,
seems to make the potential problems with API stability worse.
As far as I know there are only two or three projects which use the
texi2any API (Lilypond, ffmpeg). It seems every release there are
changes to the API which needs fixes in these other packages.
For example, after the Texinfo 7.2 release, the ffmpeg build broke:
https://www.linuxquestions.org/questions/slackware-14/texinfo-7-2-looks-to-have-broken-texinfo-convert-html-4175745581/
I wrote at the time:
> Such breakages seem inevitable as extension code could rely on many
> details of internal texi2any code. The new version of Texinfo is
> then flagged as responsible for breaking compatibility.
>
> This only stays manageable as long as the number of packages relying
> on the Perl customization API stays low.
If more packages start using the texi2any API, it will be a further
source of breakage and even more work to go and find these packages
to fix their customization code when a new release is made.
On a lesser scale of problem, the API documentation takes quite a long
time to build and upload to the GNU website when doing a new release
because there is so much of it.
Instead of promoting and expanding the use of API programming facilities,
I think it would be better to find out what users were using the API for
and design built-in features for supporting what they want to do.