Use UTF-8 by default for LaTeX output?

Gavin Smith Sun, 21 Dec 2025 14:09:30 -0800

I was testing a Texinfo file that was encoded in Latin-1.  The LateX output
didn't work and had error messages.


See the attachments for a simple test case.  I created the LaTeX file
latin1.tex with "texi2any --latex latin1.tex".

Running pdflatex gives the error message;

! Missing $ inserted.
<inserted text> 
                $
l.79 correspond to Arabic �
                           39,999.  GNU \texttt{troff} uses `\texttt{w}' and
? 

The error is on the plus-or-minus symbol (±).

The documentation for the "inputenc" package explains:

    Each encoding has an associated .def file, for example latin1.def which
    defines the behaviour of each input character, using the commands:

    \DeclareInputText{slot}{text}
    \DeclareInputMath{slot}{math}

    This defines the input character slot to be the text material or math 
material
    respectively. For example, latin1.def defines slots "D6 (Æ) and "B5 (µ) by
    saying:

    \DeclareInputText{214}{\AE}
    \DeclareInputMath{181}{\mu}

    Note that the commands should be robust, and should not be dependent
    on the output encoding. The same slot should not have both a text
    and a math declaration for it. (This restriction may be removed in
    future releases of inputenc).

With "\usepackage[latin1]{inputenc}", ± is only defined for math mode
(with \DeclareInputMath).

This is a severe limitation and makes the package useless for texi2any
output, in my opinion.

If I change the output to use UTF-8 instead, the file processes without
error (utf8.tex).  inputenc with UTF-8 uses a completely different system
from that used with the eight-bit encodings and doesn't have the limitation
described above.

I've been trying to see how to modify texi2any to use the UTF-8 encoding
regardless of the input encoding.  The output encoding is copied from
the input encoding by Texinfo::Common::set_output_encoding which is
called in texi2any.pl, and called again in Texinfo::Converter::set_document.

Passing the value of OUTPUT_ENCODING_NAME on the command line works:

  ./texi2any --latex latin1.texi -o utf8.ltx -c OUTPUT_ENCODING_NAME=utf-8

But changing it in the converter defaults doesn't, because set_document
is called after these defaults are loaded:

diff --git a/tta/perl/Texinfo/Convert/LaTeX.pm 
b/tta/perl/Texinfo/Convert/LaTeX.pm
index 3bfc247e39..3c8b0fecac 100644
--- a/tta/perl/Texinfo/Convert/LaTeX.pm
+++ b/tta/perl/Texinfo/Convert/LaTeX.pm
@@ -819,6 +819,7 @@ my %defaults = (
   'FORMAT_MENU'          => 'nomenu',
   'EXTENSION'            => 'tex',
   'paragraphindent'      => undef, # global default is for Info/Plaintext
+  'OUTPUT_ENCODING_NAME' => 'utf-8'
 );
 
The only place I am aware of in the texi2any sources where the output encoding
is forced is in tta/perl/ext/epub3.pm:

  texinfo_set_from_init_file('OUTPUT_ENCODING_NAME', 'utf-8');

However, that is very likely not a good model to follow for LaTeX.pm.

Maybe set_output_encoding should avoid overwriting the OUTPUT_ENCODING_NAME
if it is already set?

Here's a patch:

diff --git a/tta/C/main/document.c b/tta/C/main/document.c
index f554ad555a..bea53f01a7 100644
--- a/tta/C/main/document.c
+++ b/tta/C/main/document.c
@@ -208,6 +208,7 @@ void
 set_output_encoding (OPTIONS *customization_information, DOCUMENT *document)
 {
   if (customization_information
+      && !customization_information->OUTPUT_ENCODING_NAME.o.string
       && document && document->global_info.input_encoding_name) {
     option_set_conf (&customization_information->OUTPUT_ENCODING_NAME, -1,
                      document->global_info.input_encoding_name);
diff --git a/tta/perl/Texinfo/Common.pm b/tta/perl/Texinfo/Common.pm
index 4054ba4321..7387802a6a 100644
--- a/tta/perl/Texinfo/Common.pm
+++ b/tta/perl/Texinfo/Common.pm
@@ -1338,10 +1338,13 @@ sub set_output_encoding($$) {
   if (defined($document)) {
     $document_information = $document->global_information();
   }
-  $customization_information->set_conf('OUTPUT_ENCODING_NAME',
-               $document_information->{'input_encoding_name'})
-     if (defined($document_information)
-         and exists($document_information->{'input_encoding_name'}));
+
+  if (!$customization_information->get_conf('OUTPUT_ENCODING_NAME')
+      and defined($document_information)
+      and exists($document_information->{'input_encoding_name'})) {
+    $customization_information->set_conf('OUTPUT_ENCODING_NAME',
+                 $document_information->{'input_encoding_name'})
+  }
 }
 
 # $DOCUMENT is the parsed Texinfo document.  It is optional, but it
diff --git a/tta/perl/Texinfo/Convert/LaTeX.pm 
b/tta/perl/Texinfo/Convert/LaTeX.pm
index 3bfc247e39..3c8b0fecac 100644
--- a/tta/perl/Texinfo/Convert/LaTeX.pm
+++ b/tta/perl/Texinfo/Convert/LaTeX.pm
@@ -819,6 +819,7 @@ my %defaults = (
   'FORMAT_MENU'          => 'nomenu',
   'EXTENSION'            => 'tex',
   'paragraphindent'      => undef, # global default is for Info/Plaintext
+  'OUTPUT_ENCODING_NAME' => 'utf-8'
 );

latin1.texi
Description: TeXInfo document

\documentclass{book}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage[gen]{eurosym}
\usepackage{textcomp}
\usepackage{graphicx}
\usepackage{etoolbox}
\usepackage{titleps}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{float}
% use hidelinks to remove boxes around links to be similar to Texinfo TeX
\usepackage[hidelinks]{hyperref}

\makeatletter
\newcommand{\Texinfothechapterheading}{}
\newtitlemark{\Texinfothechapterheading}%
\newcommand{\Texinfoheadingchaptername}{\chaptername}
\newtitlemark{\Texinfoheadingchaptername}%
\newcommand{\Texinfosettitle}{No Title}%

\newcommand{\Texinfounnumberedchapter}[1]{\chapter*{#1}
\addcontentsline{toc}{chapter}{\protect\textbf{#1}}%
\renewcommand{\Texinfothechapterheading}{}%
\chaptermark{#1}%
}%

\newcommand{\Texinfounnumberedpart}[1]{\part*{#1}
\addcontentsline{toc}{part}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfounnumberedsection}[1]{\section*{#1}
\addcontentsline{toc}{section}{\protect\textbf{#1}}%
\sectionmark{#1}%
}%

\newcommand{\Texinfounnumberedsubsection}[1]{\subsection*{#1}
\addcontentsline{toc}{subsection}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfounnumberedsubsubsection}[1]{\subsubsection*{#1}
\addcontentsline{toc}{subsubsection}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfochapter}[1]{\chapter{#1}
\renewcommand{\Texinfothechapterheading}{\Texinfoheadingchaptername{} \thechapter{} }%
}%

% redefine the \mainmatter command such that it does not clear page
% as if in double page
\renewcommand\mainmatter{\clearpage\@mainmattertrue\pagenumbering{arabic}}
\newenvironment{Texinfopreformatted}{%
  \par\GNUTobeylines\obeyspaces\frenchspacing\parskip=\z@\parindent=\z@}{}
{\catcode`\^^M=13 \gdef\GNUTobeylines{\catcode`\^^M=13 \def^^M{\null\par}}}
\newenvironment{Texinfoindented}{\begin{list}{}{}\item\relax}{\end{list}}

% used for substitutions in commands
\newcommand{\Texinfoplaceholder}[1]{}

\newpagestyle{single}{\sethead[\Texinfothechapterheading{}\chaptertitle{}][][\thepage]
                              {\Texinfothechapterheading{}\chaptertitle{}}{}{\thepage}}

% allow line breaking at underscore
\let\Texinfounderscore\_
\renewcommand{\_}{\Texinfounderscore\discretionary{}{}{}}
\makeatother
% set default for @setchapternewpage
\makeatletter
\patchcmd{\chapter}{\if@openright\cleardoublepage\else\clearpage\fi}{\Texinfoplaceholder{setchapternewpage placeholder}\clearpage}{}{}
\makeatother
\pagestyle{single}%


\begin{document}
\Texinfounnumberedchapter{{Test}}
\label{anchor:Test}%

The representable extrema in the `\texttt{i}' and `\texttt{I}' formats
correspond to Arabic �39,999.  GNU \texttt{troff} uses `\texttt{w}' and
`\texttt{z}' to represent 5,000 and 10,000 in Roman numerals, respectively,
following the convention of AT\&T \texttt{troff}---currently, the

\end{document}

\documentclass{book}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage[gen]{eurosym}
\usepackage{textcomp}
\usepackage{graphicx}
\usepackage{etoolbox}
\usepackage{titleps}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{float}
% use hidelinks to remove boxes around links to be similar to Texinfo TeX
\usepackage[hidelinks]{hyperref}

\makeatletter
\newcommand{\Texinfothechapterheading}{}
\newtitlemark{\Texinfothechapterheading}%
\newcommand{\Texinfoheadingchaptername}{\chaptername}
\newtitlemark{\Texinfoheadingchaptername}%
\newcommand{\Texinfosettitle}{No Title}%

\newcommand{\Texinfounnumberedchapter}[1]{\chapter*{#1}
\addcontentsline{toc}{chapter}{\protect\textbf{#1}}%
\renewcommand{\Texinfothechapterheading}{}%
\chaptermark{#1}%
}%

\newcommand{\Texinfounnumberedpart}[1]{\part*{#1}
\addcontentsline{toc}{part}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfounnumberedsection}[1]{\section*{#1}
\addcontentsline{toc}{section}{\protect\textbf{#1}}%
\sectionmark{#1}%
}%

\newcommand{\Texinfounnumberedsubsection}[1]{\subsection*{#1}
\addcontentsline{toc}{subsection}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfounnumberedsubsubsection}[1]{\subsubsection*{#1}
\addcontentsline{toc}{subsubsection}{\protect\textbf{#1}}%
}%

\newcommand{\Texinfochapter}[1]{\chapter{#1}
\renewcommand{\Texinfothechapterheading}{\Texinfoheadingchaptername{} \thechapter{} }%
}%

% redefine the \mainmatter command such that it does not clear page
% as if in double page
\renewcommand\mainmatter{\clearpage\@mainmattertrue\pagenumbering{arabic}}
\newenvironment{Texinfopreformatted}{%
  \par\GNUTobeylines\obeyspaces\frenchspacing\parskip=\z@\parindent=\z@}{}
{\catcode`\^^M=13 \gdef\GNUTobeylines{\catcode`\^^M=13 \def^^M{\null\par}}}
\newenvironment{Texinfoindented}{\begin{list}{}{}\item\relax}{\end{list}}

% used for substitutions in commands
\newcommand{\Texinfoplaceholder}[1]{}

\newpagestyle{single}{\sethead[\Texinfothechapterheading{}\chaptertitle{}][][\thepage]
                              {\Texinfothechapterheading{}\chaptertitle{}}{}{\thepage}}

% allow line breaking at underscore
\let\Texinfounderscore\_
\renewcommand{\_}{\Texinfounderscore\discretionary{}{}{}}
\makeatother
% set default for @setchapternewpage
\makeatletter
\patchcmd{\chapter}{\if@openright\cleardoublepage\else\clearpage\fi}{\Texinfoplaceholder{setchapternewpage placeholder}\clearpage}{}{}
\makeatother
\pagestyle{single}%


\begin{document}
\Texinfounnumberedchapter{{Test}}
\label{anchor:Test}%

The representable extrema in the `\texttt{i}' and `\texttt{I}' formats
correspond to Arabic ±39,999.  GNU \texttt{troff} uses `\texttt{w}' and
`\texttt{z}' to represent 5,000 and 10,000 in Roman numerals, respectively,
following the convention of AT\&T \texttt{troff}---currently, the

\end{document}

Use UTF-8 by default for LaTeX output?

Reply via email to