On Tue, Dec 23, 2025 at 10:32:25PM +0000, Gavin Smith wrote:
> On Sun, Dec 21, 2025 at 10:08:58PM +0000, Gavin Smith wrote:
> > Here's a patch:
> 
> Here's a more complete patch.  To avoid changing the output for
> HTML, DocBook and one other other output format ("Texinfo XML"), when the
> input was not UTF-8, I had to remove the default OUTPUT_ENCODING_NAME
> UTF-8 setting.  Otherwise these formats would be forced to UTF-8 as
> well.

> I don't think that the OUPTUT_ENCODING_NAME defaults did very much,
> but I'm not certain.  It's possible these default values stemmed from
> a time before UTF-8 was the default input encoding for Texinfo.  (For
> example, "git blame" tracks the setting in DocBook.pm to a commit on
> 2012-09-14 (49aa00da6ae37), whereas UTF-8 only became the default input
> encoding in 2019.)

My recalling, but it seems to be wrong, is that the default values were
not related to the default input encoding, they were the default for the
output encoding.  More precisely, it seemed to me that DocBook had
always preferred the output to be UTF-8, independently of the
@documentencoding, and the OUTPUT_ENCODING_NAME was there to enforce
that.  Seems like it is not actually the case, and the documentation
actually states that the DocBook output is based on the document input.

For epub, the output encoding is forced, but it is because it is set
from an init file, such that set_conf does nothing in that case.

It is not clear to me what the best interface could be.  We could
imagine using something similar to the file names encoding, ie have a
variable like
  DOC_ENCODING_FOR_OUTPUT_ENCODING_NAME
and if it is set to 0, the default OUTPUT_ENCODING_NAME would be left as
is.

But your approach is ok too.

> diff --git a/ChangeLog b/ChangeLog
> index b4185d5f7a..9f549d6155 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,24 @@
> +2025-12-23  Gavin Smith <[email protected]>
> +
> +     UTF-8 by default for LaTeX output
> +
> +     * tta/perl/Texinfo/Convert/LaTeX.pm (%defaults):
> +     Set OUTPUT_ENCODING_NAME to 'utf-8'.
> +
> +     * tta/perl/Texinfo/Common.pm (set_output_encoding),
> +     * tta/C/main/document.c (set_output_encoding): Only propagate
> +     encoding name from input encoding to output encoding if output
> +     encoding is not already set.
> +     * tta/perl/Texinfo/Convert/Text.pm: update comments
> +     
> +     * tta/data/converters_defaults.txt (html_converter),
> +     * tta/perl/Texinfo/Convert/DocBook.pm (%defaults),
> +     * tta/perl/Texinfo/Convert/HTML.pm (%defaults),
> +     * tta/perl/Texinfo/Convert/TexinfoXML.pm (%defaults):
> +     Remove OUTPUT_ENCODING_NAME utf-8 default.
> +
> +     * NEWS: update
> +
>  2025-12-23 Patrice Dumas  <[email protected]>
>  
>       * tta/C/convert/convert_html.c (html_conversion_finalization),
> diff --git a/NEWS b/NEWS
> index a709f21184..cddc981fd6 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -70,6 +70,9 @@ See the manual for detailed information.
>    . Info output:
>      . new (experimental) variable INFO_MATH_IMAGES allows outputting
>        images for mathematics notation
> +  . LaTeX output:
> +     . use UTF-8 encoding for output by default, regardless of input
> +       encoding.  override with OUTPUT_ENCODING_NAME.
>    . Remove the Texinfo::TeX4HT customization package.
>    . XML output:
>        . use HTML entities names for @H and @dotaccent accents types
> diff --git a/tta/C/main/document.c b/tta/C/main/document.c
> index f554ad555a..bea53f01a7 100644
> --- a/tta/C/main/document.c
> +++ b/tta/C/main/document.c
> @@ -208,6 +208,7 @@ void
>  set_output_encoding (OPTIONS *customization_information, DOCUMENT *document)
>  {
>    if (customization_information
> +      && !customization_information->OUTPUT_ENCODING_NAME.o.string
>        && document && document->global_info.input_encoding_name) {
>      option_set_conf (&customization_information->OUTPUT_ENCODING_NAME, -1,
>                       document->global_info.input_encoding_name);
> diff --git a/tta/data/converters_defaults.txt 
> b/tta/data/converters_defaults.txt
> index bd7b4f8dae..29ae0e9439 100644
> --- a/tta/data/converters_defaults.txt
> +++ b/tta/data/converters_defaults.txt
> @@ -114,7 +114,6 @@ NO_CSS                 0
>  NO_NUMBER_FOOTNOTE_SYMBOL  *
>  NODE_NAME_IN_MENU      1
>  OPEN_QUOTE_SYMBOL      undef
> -OUTPUT_ENCODING_NAME   utf-8
>  SECTION_NAME_IN_TITLE  0
>  SHORT_TOC_LINK_TO_TOC  1
>  SHOW_TITLE             undef
> diff --git a/tta/perl/Texinfo/Common.pm b/tta/perl/Texinfo/Common.pm
> index 4054ba4321..7387802a6a 100644
> --- a/tta/perl/Texinfo/Common.pm
> +++ b/tta/perl/Texinfo/Common.pm
> @@ -1338,10 +1338,13 @@ sub set_output_encoding($$) {
>    if (defined($document)) {
>      $document_information = $document->global_information();
>    }
> -  $customization_information->set_conf('OUTPUT_ENCODING_NAME',
> -               $document_information->{'input_encoding_name'})
> -     if (defined($document_information)
> -         and exists($document_information->{'input_encoding_name'}));
> +
> +  if (!$customization_information->get_conf('OUTPUT_ENCODING_NAME')
> +      and defined($document_information)
> +      and exists($document_information->{'input_encoding_name'})) {
> +    $customization_information->set_conf('OUTPUT_ENCODING_NAME',
> +                 $document_information->{'input_encoding_name'})
> +  }
>  }
>  
>  # $DOCUMENT is the parsed Texinfo document.  It is optional, but it
> diff --git a/tta/perl/Texinfo/Convert/DocBook.pm 
> b/tta/perl/Texinfo/Convert/DocBook.pm
> index 604cca6678..49c838f31c 100644
> --- a/tta/perl/Texinfo/Convert/DocBook.pm
> +++ b/tta/perl/Texinfo/Convert/DocBook.pm
> @@ -57,7 +57,6 @@ my %defaults = (
>    # Customization option variables
>    'FORMAT_MENU'          => 'nomenu',
>    'EXTENSION'            => 'xml', # dbk?
> -  'OUTPUT_ENCODING_NAME' => 'utf-8',
>    'SPLIT'                => '',
>    'OPEN_QUOTE_SYMBOL'    => '&#'.hex('2018').';',
>    'CLOSE_QUOTE_SYMBOL'   => '&#'.hex('2019').';',
> diff --git a/tta/perl/Texinfo/Convert/LaTeX.pm 
> b/tta/perl/Texinfo/Convert/LaTeX.pm
> index 3bfc247e39..3c8b0fecac 100644
> --- a/tta/perl/Texinfo/Convert/LaTeX.pm
> +++ b/tta/perl/Texinfo/Convert/LaTeX.pm
> @@ -819,6 +819,7 @@ my %defaults = (
>    'FORMAT_MENU'          => 'nomenu',
>    'EXTENSION'            => 'tex',
>    'paragraphindent'      => undef, # global default is for Info/Plaintext
> +  'OUTPUT_ENCODING_NAME' => 'utf-8'
>  );
>  
>  
> diff --git a/tta/perl/Texinfo/Convert/TexinfoXML.pm 
> b/tta/perl/Texinfo/Convert/TexinfoXML.pm
> index 3d99e26bbf..8040f5c741 100644
> --- a/tta/perl/Texinfo/Convert/TexinfoXML.pm
> +++ b/tta/perl/Texinfo/Convert/TexinfoXML.pm
> @@ -45,7 +45,6 @@ my %defaults = (
>    # Customization option variables
>    'FORMAT_MENU'          => 'menu',
>    'EXTENSION'            => 'xml',
> -  'OUTPUT_ENCODING_NAME' => 'utf-8',
>    'SPLIT'                => '',
>  );
>  
> diff --git a/tta/perl/Texinfo/Convert/Text.pm 
> b/tta/perl/Texinfo/Convert/Text.pm
> index 7408985f8b..20462aa2a4 100644
> --- a/tta/perl/Texinfo/Convert/Text.pm
> +++ b/tta/perl/Texinfo/Convert/Text.pm
> @@ -954,7 +954,7 @@ sub convert($$) {
>    if (defined($document)) {
>      $global_info = $document->global_information();
>  
> -    # same as Texinfo::Common::set_output_encoding
> +    # similar to Texinfo::Common::set_output_encoding
>      $self->{'OUTPUT_ENCODING_NAME'} = $global_info->{'input_encoding_name'}
>        if (defined($global_info)
>            and exists($global_info->{'input_encoding_name'}));
> @@ -991,7 +991,7 @@ sub output($$) {
>    if ($document) {
>      $global_info = $document->global_information();
>  
> -    # same as Texinfo::Common::set_output_encoding
> +    # similar to Texinfo::Common::set_output_encoding
>      $self->{'OUTPUT_ENCODING_NAME'} = $global_info->{'input_encoding_name'}
>        if (defined($global_info)
>            and exists($global_info->{'input_encoding_name'}));
> 
> 

Reply via email to