I was testing a Texinfo file that was encoded in Latin-1. The LateX output didn't work and had error messages.
See the attachments for a simple test case. I created the LaTeX file
latin1.tex with "texi2any --latex latin1.tex".
Running pdflatex gives the error message;
! Missing $ inserted.
<inserted text>
$
l.79 correspond to Arabic �
39,999. GNU \texttt{troff} uses `\texttt{w}' and
?
The error is on the plus-or-minus symbol (±).
The documentation for the "inputenc" package explains:
Each encoding has an associated .def file, for example latin1.def which
defines the behaviour of each input character, using the commands:
\DeclareInputText{slot}{text}
\DeclareInputMath{slot}{math}
This defines the input character slot to be the text material or math
material
respectively. For example, latin1.def defines slots "D6 (Æ) and "B5 (µ) by
saying:
\DeclareInputText{214}{\AE}
\DeclareInputMath{181}{\mu}
Note that the commands should be robust, and should not be dependent
on the output encoding. The same slot should not have both a text
and a math declaration for it. (This restriction may be removed in
future releases of inputenc).
With "\usepackage[latin1]{inputenc}", ± is only defined for math mode
(with \DeclareInputMath).
This is a severe limitation and makes the package useless for texi2any
output, in my opinion.
If I change the output to use UTF-8 instead, the file processes without
error (utf8.tex). inputenc with UTF-8 uses a completely different system
from that used with the eight-bit encodings and doesn't have the limitation
described above.
I've been trying to see how to modify texi2any to use the UTF-8 encoding
regardless of the input encoding. The output encoding is copied from
the input encoding by Texinfo::Common::set_output_encoding which is
called in texi2any.pl, and called again in Texinfo::Converter::set_document.
Passing the value of OUTPUT_ENCODING_NAME on the command line works:
./texi2any --latex latin1.texi -o utf8.ltx -c OUTPUT_ENCODING_NAME=utf-8
But changing it in the converter defaults doesn't, because set_document
is called after these defaults are loaded:
diff --git a/tta/perl/Texinfo/Convert/LaTeX.pm
b/tta/perl/Texinfo/Convert/LaTeX.pm
index 3bfc247e39..3c8b0fecac 100644
--- a/tta/perl/Texinfo/Convert/LaTeX.pm
+++ b/tta/perl/Texinfo/Convert/LaTeX.pm
@@ -819,6 +819,7 @@ my %defaults = (
'FORMAT_MENU' => 'nomenu',
'EXTENSION' => 'tex',
'paragraphindent' => undef, # global default is for Info/Plaintext
+ 'OUTPUT_ENCODING_NAME' => 'utf-8'
);
The only place I am aware of in the texi2any sources where the output encoding
is forced is in tta/perl/ext/epub3.pm:
texinfo_set_from_init_file('OUTPUT_ENCODING_NAME', 'utf-8');
However, that is very likely not a good model to follow for LaTeX.pm.
Maybe set_output_encoding should avoid overwriting the OUTPUT_ENCODING_NAME
if it is already set?
Here's a patch:
diff --git a/tta/C/main/document.c b/tta/C/main/document.c
index f554ad555a..bea53f01a7 100644
--- a/tta/C/main/document.c
+++ b/tta/C/main/document.c
@@ -208,6 +208,7 @@ void
set_output_encoding (OPTIONS *customization_information, DOCUMENT *document)
{
if (customization_information
+ && !customization_information->OUTPUT_ENCODING_NAME.o.string
&& document && document->global_info.input_encoding_name) {
option_set_conf (&customization_information->OUTPUT_ENCODING_NAME, -1,
document->global_info.input_encoding_name);
diff --git a/tta/perl/Texinfo/Common.pm b/tta/perl/Texinfo/Common.pm
index 4054ba4321..7387802a6a 100644
--- a/tta/perl/Texinfo/Common.pm
+++ b/tta/perl/Texinfo/Common.pm
@@ -1338,10 +1338,13 @@ sub set_output_encoding($$) {
if (defined($document)) {
$document_information = $document->global_information();
}
- $customization_information->set_conf('OUTPUT_ENCODING_NAME',
- $document_information->{'input_encoding_name'})
- if (defined($document_information)
- and exists($document_information->{'input_encoding_name'}));
+
+ if (!$customization_information->get_conf('OUTPUT_ENCODING_NAME')
+ and defined($document_information)
+ and exists($document_information->{'input_encoding_name'})) {
+ $customization_information->set_conf('OUTPUT_ENCODING_NAME',
+ $document_information->{'input_encoding_name'})
+ }
}
# $DOCUMENT is the parsed Texinfo document. It is optional, but it
diff --git a/tta/perl/Texinfo/Convert/LaTeX.pm
b/tta/perl/Texinfo/Convert/LaTeX.pm
index 3bfc247e39..3c8b0fecac 100644
--- a/tta/perl/Texinfo/Convert/LaTeX.pm
+++ b/tta/perl/Texinfo/Convert/LaTeX.pm
@@ -819,6 +819,7 @@ my %defaults = (
'FORMAT_MENU' => 'nomenu',
'EXTENSION' => 'tex',
'paragraphindent' => undef, # global default is for Info/Plaintext
+ 'OUTPUT_ENCODING_NAME' => 'utf-8'
);
latin1.texi
Description: TeXInfo document
\documentclass{book}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage[gen]{eurosym}
\usepackage{textcomp}
\usepackage{graphicx}
\usepackage{etoolbox}
\usepackage{titleps}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{float}
% use hidelinks to remove boxes around links to be similar to Texinfo TeX
\usepackage[hidelinks]{hyperref}
\makeatletter
\newcommand{\Texinfothechapterheading}{}
\newtitlemark{\Texinfothechapterheading}%
\newcommand{\Texinfoheadingchaptername}{\chaptername}
\newtitlemark{\Texinfoheadingchaptername}%
\newcommand{\Texinfosettitle}{No Title}%
\newcommand{\Texinfounnumberedchapter}[1]{\chapter*{#1}
\addcontentsline{toc}{chapter}{\protect\textbf{#1}}%
\renewcommand{\Texinfothechapterheading}{}%
\chaptermark{#1}%
}%
\newcommand{\Texinfounnumberedpart}[1]{\part*{#1}
\addcontentsline{toc}{part}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfounnumberedsection}[1]{\section*{#1}
\addcontentsline{toc}{section}{\protect\textbf{#1}}%
\sectionmark{#1}%
}%
\newcommand{\Texinfounnumberedsubsection}[1]{\subsection*{#1}
\addcontentsline{toc}{subsection}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfounnumberedsubsubsection}[1]{\subsubsection*{#1}
\addcontentsline{toc}{subsubsection}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfochapter}[1]{\chapter{#1}
\renewcommand{\Texinfothechapterheading}{\Texinfoheadingchaptername{} \thechapter{} }%
}%
% redefine the \mainmatter command such that it does not clear page
% as if in double page
\renewcommand\mainmatter{\clearpage\@mainmattertrue\pagenumbering{arabic}}
\newenvironment{Texinfopreformatted}{%
\par\GNUTobeylines\obeyspaces\frenchspacing\parskip=\z@\parindent=\z@}{}
{\catcode`\^^M=13 \gdef\GNUTobeylines{\catcode`\^^M=13 \def^^M{\null\par}}}
\newenvironment{Texinfoindented}{\begin{list}{}{}\item\relax}{\end{list}}
% used for substitutions in commands
\newcommand{\Texinfoplaceholder}[1]{}
\newpagestyle{single}{\sethead[\Texinfothechapterheading{}\chaptertitle{}][][\thepage]
{\Texinfothechapterheading{}\chaptertitle{}}{}{\thepage}}
% allow line breaking at underscore
\let\Texinfounderscore\_
\renewcommand{\_}{\Texinfounderscore\discretionary{}{}{}}
\makeatother
% set default for @setchapternewpage
\makeatletter
\patchcmd{\chapter}{\if@openright\cleardoublepage\else\clearpage\fi}{\Texinfoplaceholder{setchapternewpage placeholder}\clearpage}{}{}
\makeatother
\pagestyle{single}%
\begin{document}
\Texinfounnumberedchapter{{Test}}
\label{anchor:Test}%
The representable extrema in the `\texttt{i}' and `\texttt{I}' formats
correspond to Arabic �39,999. GNU \texttt{troff} uses `\texttt{w}' and
`\texttt{z}' to represent 5,000 and 10,000 in Roman numerals, respectively,
following the convention of AT\&T \texttt{troff}---currently, the
\end{document}
\documentclass{book}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage[gen]{eurosym}
\usepackage{textcomp}
\usepackage{graphicx}
\usepackage{etoolbox}
\usepackage{titleps}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{float}
% use hidelinks to remove boxes around links to be similar to Texinfo TeX
\usepackage[hidelinks]{hyperref}
\makeatletter
\newcommand{\Texinfothechapterheading}{}
\newtitlemark{\Texinfothechapterheading}%
\newcommand{\Texinfoheadingchaptername}{\chaptername}
\newtitlemark{\Texinfoheadingchaptername}%
\newcommand{\Texinfosettitle}{No Title}%
\newcommand{\Texinfounnumberedchapter}[1]{\chapter*{#1}
\addcontentsline{toc}{chapter}{\protect\textbf{#1}}%
\renewcommand{\Texinfothechapterheading}{}%
\chaptermark{#1}%
}%
\newcommand{\Texinfounnumberedpart}[1]{\part*{#1}
\addcontentsline{toc}{part}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfounnumberedsection}[1]{\section*{#1}
\addcontentsline{toc}{section}{\protect\textbf{#1}}%
\sectionmark{#1}%
}%
\newcommand{\Texinfounnumberedsubsection}[1]{\subsection*{#1}
\addcontentsline{toc}{subsection}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfounnumberedsubsubsection}[1]{\subsubsection*{#1}
\addcontentsline{toc}{subsubsection}{\protect\textbf{#1}}%
}%
\newcommand{\Texinfochapter}[1]{\chapter{#1}
\renewcommand{\Texinfothechapterheading}{\Texinfoheadingchaptername{} \thechapter{} }%
}%
% redefine the \mainmatter command such that it does not clear page
% as if in double page
\renewcommand\mainmatter{\clearpage\@mainmattertrue\pagenumbering{arabic}}
\newenvironment{Texinfopreformatted}{%
\par\GNUTobeylines\obeyspaces\frenchspacing\parskip=\z@\parindent=\z@}{}
{\catcode`\^^M=13 \gdef\GNUTobeylines{\catcode`\^^M=13 \def^^M{\null\par}}}
\newenvironment{Texinfoindented}{\begin{list}{}{}\item\relax}{\end{list}}
% used for substitutions in commands
\newcommand{\Texinfoplaceholder}[1]{}
\newpagestyle{single}{\sethead[\Texinfothechapterheading{}\chaptertitle{}][][\thepage]
{\Texinfothechapterheading{}\chaptertitle{}}{}{\thepage}}
% allow line breaking at underscore
\let\Texinfounderscore\_
\renewcommand{\_}{\Texinfounderscore\discretionary{}{}{}}
\makeatother
% set default for @setchapternewpage
\makeatletter
\patchcmd{\chapter}{\if@openright\cleardoublepage\else\clearpage\fi}{\Texinfoplaceholder{setchapternewpage placeholder}\clearpage}{}{}
\makeatother
\pagestyle{single}%
\begin{document}
\Texinfounnumberedchapter{{Test}}
\label{anchor:Test}%
The representable extrema in the `\texttt{i}' and `\texttt{I}' formats
correspond to Arabic ±39,999. GNU \texttt{troff} uses `\texttt{w}' and
`\texttt{z}' to represent 5,000 and 10,000 in Roman numerals, respectively,
following the convention of AT\&T \texttt{troff}---currently, the
\end{document}
