from:"Eli Zaretskii"

Re: Info definition line only starting with single hyphen?

2024-10-02 Thread Eli Zaretskii

> Date: Wed, 02 Oct 2024 22:21:27 +0300
> From: Eli Zaretskii 
> Cc: bug-texinfo@gnu.org
> 
> > From: Gavin Smith 
> > Date: Wed, 2 Oct 2024 19:06:36 +0100
> > 
> > I saw this line had been added to NEWS:
> > 
> > . Info:
> >   . Output a single hyphen at the start of a definition line
> > 
> > This change was in commit 3ee7ac39a8 (dated 2024-07-06, applied 2024-09-29).
> > 
> > The effect of this change is to have one hyphen rather than two:
> > 
> > - -- Library Function: int foobar (int F---OO, float B--AR)
> > + - Library Function: int foobar (int F---OO, float B--AR)
> > 
> > I'm concerned whether this change to Info output for definition
> > commands is necessary.  Wouldn't it be better just to leave this output
> > alone as people will be used to having it with two hyphens?  Was it ever
> > only one hyphen in the past (e.g. with Texinfo 4.13, before the Perl
> > rewrite)?
> 
> AFAICT, it changed from a single hyphen to 2 hyphens in or before
> Texinfo 4.8.

Looks like it's this discussion:

  https://lists.gnu.org/archive/html/bug-texinfo/2004-02/msg00021.html

and this ChangeLog entry:

  2004-02-10  Karl Berry  

  * makeinfo/defun.c: --- (which gets reduced to --)
  instead of -- (which gets reduced to -).
  Report from Akim Demaille , 09 Feb 2004 18:06:29 +0100.

Re: Info definition line only starting with single hyphen?

2024-10-02 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 2 Oct 2024 19:06:36 +0100
> 
> I saw this line had been added to NEWS:
> 
> . Info:
>   . Output a single hyphen at the start of a definition line
> 
> This change was in commit 3ee7ac39a8 (dated 2024-07-06, applied 2024-09-29).
> 
> The effect of this change is to have one hyphen rather than two:
> 
> - -- Library Function: int foobar (int F---OO, float B--AR)
> + - Library Function: int foobar (int F---OO, float B--AR)
> 
> I'm concerned whether this change to Info output for definition
> commands is necessary.  Wouldn't it be better just to leave this output
> alone as people will be used to having it with two hyphens?  Was it ever
> only one hyphen in the past (e.g. with Texinfo 4.13, before the Perl
> rewrite)?

AFAICT, it changed from a single hyphen to 2 hyphens in or before
Texinfo 4.8.

Re: Info definition line only starting with single hyphen?

2024-10-02 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 2 Oct 2024 19:06:36 +0100
> 
> I saw this line had been added to NEWS:
> 
> . Info:
>   . Output a single hyphen at the start of a definition line
> 
> This change was in commit 3ee7ac39a8 (dated 2024-07-06, applied 2024-09-29).
> 
> The effect of this change is to have one hyphen rather than two:
> 
> - -- Library Function: int foobar (int F---OO, float B--AR)
> + - Library Function: int foobar (int F---OO, float B--AR)
> 
> I'm concerned whether this change to Info output for definition
> commands is necessary.  Wouldn't it be better just to leave this output
> alone as people will be used to having it with two hyphens?  Was it ever
> only one hyphen in the past (e.g. with Texinfo 4.13, before the Perl
> rewrite)?

I vote against arbitrary changes in output formats.

Re: Flood of commits from July?

2024-09-29 Thread Eli Zaretskii

> Date: Sun, 29 Sep 2024 16:30:40 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org, Gavin Smith 
> 
> On Sun, Sep 29, 2024 at 05:18:58PM +0300, Eli Zaretskii wrote:
> > Why do I suddenly receive from texinfo-comm...@gnu.org a very long
> > series of commits from June and July?  I've received about 400 mails
> > like that already since noon, and it doesn't show any signs of
> > stopping.
> 
> That are commits that were written at that time and rebased on master.
> In the beginning we thought that they could be for 7.3, but since we
> have made many more important and user visible changes since in master
> for 7.2, these commits are added to master now.  They are also corrected
> as rebasing may lead to incorrect code being in the version control, and
> for that the sooner the better.

I've disabled my email delivery from this list, as I cannot afford
flooding my inbox.  Would appreciate if you could tell me when this
ends and the list goes back to its normal mode.

TIA

Flood of commits from July?

2024-09-29 Thread Eli Zaretskii

Why do I suddenly receive from texinfo-comm...@gnu.org a very long
series of commits from June and July?  I've received about 400 mails
like that already since noon, and it doesn't show any signs of
stopping.

Re: XDG Base Directory Specification ignores installation directories

2024-09-26 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Thu, 26 Sep 2024 22:15:13 +0100
> 
> On Thu, Sep 26, 2024 at 05:26:59PM +0200, Patrice Dumas wrote:
> > 
> > My understanding is that if you want to install on non-standard
> > locations and want to have this non-standard location being searched for
> > before other directories, then you should put first in XDG_DATA_DIRS (or
> > XDG_CONFIG_DIRS).  It should be possible to change
> > XDG_DATA_DIRS/XDG_CONFIG_DIRS as a user or for the system, but it
> > depends on the platform configuration.  It seems to often be in
> > /etc/profile.d/ for system-wide early change of environment variables
> > for GNU/Linux.
> 
> I don't think we should require the user (or distribution maintainer) to
> set XDG_DATA_DIRS/XDG_CONFIG_DIRS to install in non-default locations.
> Users have been able to set datadir with configure scripts for a lot
> longer than the XDG specification has existed.  We should continue
> to check in datadir in line with the long-standing practice of GNU
> programs.
> 
> It may be fine to use the directories in XDG_DATA_DIRS as well as
> $datadir.
> 
> It could depend on the type of file.  A typical "data" file could be
> a font file (under /usr/share/fonts) that could be used by any program,
> not just those from a single package.  In this view "data" is not
> a fixed, necessary part of a single program but sommething that could
> be used by multiple programs.

Are there systems out there where XDG_DATA_DIRS and XDG_CONFIG_DIRS
are set to something different from the defaults?  If there are such
systems, could someone look and tell how packages are configured there
to look for their data/config files?

I'm far from considering myself an expert of this XDG stuff, but we
may be looking at this from a wrong POV, because AFAIU if these
variables are set to different values, it is the job of the system
managers (or the persons who set these XDG variables) to make sure the
system-wide data and config files are installed in those directories,
and configure packages like Texinfo to look there by default.
Anything else will simply not work, at least not reliably.

If I'm right, then there's no reason for Texinfo to look in these
directories _in_addition_ to $datadir etc.; instead, we should assume
that Texinfo was configured with the appropriate --datadir options to
point to those places in the first place.

Re: do formatting_cpp_lines tests fail with native mingw?

2024-09-21 Thread Eli Zaretskii

> Date: Sat, 21 Sep 2024 12:44:38 +0200
> From: Patrice Dumas 
> 
> Hello,
> 
> The test with /dev/null failure is explained by MinGW on MSYS mapping to
> NUL (as explained by Eli in another thread).
> 
> Could you please, Eli, check whether the other test,
> split_nocopying_split_dev_null in formatting fails or not with mingw on
> MSYS, if possible with the release/7.1 branch?

I didn't have time to build anything after Texinfo 7.1 (neither 7.1.1
nor the current release/7.1 branch).  In 7.1, the
split_nocopying_split_dev_null test did not fail for me, so if it
haven't changed since then, and if Texinfo didn't change anything that
is related to what the test tests, it should not fail now.

Re: finding prove with mingw on cygwin?

2024-09-21 Thread Eli Zaretskii

> Date: Sat, 21 Sep 2024 00:32:43 +0200
> From: Patrice Dumas 
> 
> On Fri, Sep 20, 2024 at 08:14:33PM +0200, Patrice Dumas wrote:
> > On Fri, Sep 20, 2024 at 08:42:18PM +0300, Eli Zaretskii wrote:
> > >  And if not, does the
> > > Strawberry Perl's bin/ subdirectory appear on your PATH?
> > 
> > I think so, there is:
> > Path:   C:\cygwin\bin
> >   ..
> > C:\Strawberry\c\bin
> > C:\Strawberry\perl\site\bin
> > C:\Strawberry\perl\bin
> >   ..
> > 
> > I will check what a ls on those directories gives, to check if prove is
> > actually there.
> 
> I have checked, there is indeed prove.bat in
> /cygdrive/c/Strawberry/perl/bin
> 
> Yet it is not found.  Could be because of the .bat, and it could make
> sense, as it is not clear to me that cygwin bash.exe can interpret .bat
> files.

I'm now confused as to what port of Texinfo are you building and
testing.  If it's a Cygwin build of Texinfo, then Strawberry Perl
should not be used; you should use a Cygwin port of Perl instead.  And
'prove' should not be a batch file in a Cygwin Perl, it should be a
Perl script named just "prove", as it is on any Unix system.

Indeed, I don't believe Cygwin supports batch file execution, except
via the likes of "cmd /c foo.bat".

Re: multiple failures for info tests on cygwin

2024-09-21 Thread Eli Zaretskii

> Date: Sat, 21 Sep 2024 00:16:45 +0200
> From: Patrice Dumas 
> 
> On Fri, Sep 20, 2024 at 11:00:22PM +0100, Gavin Smith wrote:
> > On Fri, Sep 20, 2024 at 11:43:16PM +0200, Patrice Dumas wrote:
> > > On Fri, Sep 20, 2024 at 10:21:01PM +0300, Eli Zaretskii wrote:
> > > > > Date: Fri, 20 Sep 2024 20:43:45 +0200
> > > > > From: Patrice Dumas 
> > > > > 
> > > > > Many info tests on cygwin-32 fail (release 7.1 branch).  I attach the
> > > > > info tests log, I can provide more information on configure output, 
> > > > > and
> > > > > some other logs if needed.
> > > > > 
> > > > > =
> > > > >GNU Texinfo 7.1.1-20240920: info/test-suite.log
> > > > > =
> > > > > 
> > > > > # TOTAL: 87
> > > > > # PASS:  51
> > > > > # SKIP:  0
> > > > > # XFAIL: 0
> > > > > # FAIL:  36
> > > > > # XPASS: 0
> > > > > # ERROR: 0
> > > > 
> > > > That's most probably the side effect of using Cygwin tools again: the
> > > > native port of info cannot run many of the tests because the test rig
> > > > uses features not supported by native Windows programs (emulation of
> > > > terminals and other such stuff).  When running the tests with MSYS,
> > > > the test suite detects that and skips those tests, but since you run
> > > > them with Cygwin, I'm guessing that the way the test suite detects
> > > > Windows ports fails for some reason.
> > > > 
> > > > My records from running the test suite in Texinfo-7.1 indicate that 56
> > > > of the info tests were skipped, whereas above you say that none were
> > > > skipped.  So I'm quite sure this is the reason.
> > > 
> > > Actually, my feeling is that it is the absence of posix_openpt that
> > > triggers have_ptys to be false, which in turn causes pseudotty not to
> > > be built.  In the CI tests, cygwin have posix_openpt, while mingw (in
> > > cygwin) does not.
> > > 
> > > Gavin, maybe you could have a look?
> > 
> > My first question is what exactly "mingw in cygwin" means.  If this is
> > some mixture of mingw (i.e. a native MS-Windows environment) and cygwin
> > then it may not be a setup worth supporting, due to subtle incompatibilities
> > between the two, as has been recently discussed.
> > 
> > If it is Texinfo built on cygwin to run in cygwin, as a cygwin program
> > (not a native MS-Windows program), that is different and may be more worth
> > supporting.
> 
> This case is indeed Texinfo built on cygwin to run in cygwin, as a cygwin
> program.

??? Then why are you using Strawberry Perl?  That's not a Cygwin build
of Perl.  The Cygwin Perl should not output CR-LF end-of-line format,
it should output the Unix newline-only EOLs.

Cygwin builds should normally need no patching and fixing, since
Cygwin's purpose is to provide a faithful emulation of a Posix system.

Your original message about this build of Texinfo said:

> I would like to have more tests for mingw (on cygwin) using Bruno CI
> https://github.com/gnu-texinfo/ci-check
> 
> However, 'prove' is not found by configure, by
> AC_CHECK_PROGS([PROVE], [prove], [])
> while perl is found with
> AC_PATH_PROG([PERL], [perl])
> 
> I checked the binary zip provided for strawberry perl, which seems to be
> the Perl used in that case, and there is a prove.bat file provided.
> 
> Do testers with mingw and strawberry Perl have prove found by configure?
> Any idea on what should be done?

This seems to say you are building and testing the MinGW port of
Texinfo, not a Cygwin port.  If it's a Cygwin port, why is MinGW
mentioned and why are Strawberry Perl and the Windows C:\foo\bar
format of file names relevant? none of that are supported by Cygwin,
AFAIK.

Re: multiple failures for info tests on cygwin

2024-09-21 Thread Eli Zaretskii

> Date: Fri, 20 Sep 2024 23:43:16 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org
> 
> On Fri, Sep 20, 2024 at 10:21:01PM +0300, Eli Zaretskii wrote:
> > > Date: Fri, 20 Sep 2024 20:43:45 +0200
> > > From: Patrice Dumas 
> > > 
> > > Many info tests on cygwin-32 fail (release 7.1 branch).  I attach the
> > > info tests log, I can provide more information on configure output, and
> > > some other logs if needed.
> > > 
> > > =
> > >GNU Texinfo 7.1.1-20240920: info/test-suite.log
> > > =
> > > 
> > > # TOTAL: 87
> > > # PASS:  51
> > > # SKIP:  0
> > > # XFAIL: 0
> > > # FAIL:  36
> > > # XPASS: 0
> > > # ERROR: 0
> > 
> > That's most probably the side effect of using Cygwin tools again: the
> > native port of info cannot run many of the tests because the test rig
> > uses features not supported by native Windows programs (emulation of
> > terminals and other such stuff).  When running the tests with MSYS,
> > the test suite detects that and skips those tests, but since you run
> > them with Cygwin, I'm guessing that the way the test suite detects
> > Windows ports fails for some reason.
> > 
> > My records from running the test suite in Texinfo-7.1 indicate that 56
> > of the info tests were skipped, whereas above you say that none were
> > skipped.  So I'm quite sure this is the reason.
> 
> Actually, my feeling is that it is the absence of posix_openpt that
> triggers have_ptys to be false, which in turn causes pseudotty not to
> be built.  In the CI tests, cygwin have posix_openpt, while mingw (in
> cygwin) does not.

It's that, and the fact that the tests use fifos, which native Windows
programs cannot support, either.  And maybe more.

Whatever PTY Cygwin can produce, the native Windows Info reader is
extremely unlikely to be able to use.

All the tests that exercise interactive commands of the Info reader
and need to check the results of those commands on the screen should
be skipped with the native Windows build.

Re: multiple failures for info tests on cygwin

2024-09-20 Thread Eli Zaretskii

> Date: Fri, 20 Sep 2024 20:43:45 +0200
> From: Patrice Dumas 
> 
> Many info tests on cygwin-32 fail (release 7.1 branch).  I attach the
> info tests log, I can provide more information on configure output, and
> some other logs if needed.
> 
> =
>GNU Texinfo 7.1.1-20240920: info/test-suite.log
> =
> 
> # TOTAL: 87
> # PASS:  51
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  36
> # XPASS: 0
> # ERROR: 0

That's most probably the side effect of using Cygwin tools again: the
native port of info cannot run many of the tests because the test rig
uses features not supported by native Windows programs (emulation of
terminals and other such stuff).  When running the tests with MSYS,
the test suite detects that and skips those tests, but since you run
them with Cygwin, I'm guessing that the way the test suite detects
Windows ports fails for some reason.

My records from running the test suite in Texinfo-7.1 indicate that 56
of the info tests were skipped, whereas above you say that none were
skipped.  So I'm quite sure this is the reason.

Re: finding prove with mingw on cygwin?

2024-09-20 Thread Eli Zaretskii

> Date: Fri, 20 Sep 2024 20:14:33 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org
> 
> >   $ which prove
> >   /bin/prove
> > 
> > in the MSYS Bash.  So it does find 'prove'.
> > 
> > > Any idea on what should be done?
> > 
> > Do you have the file in your MSYS installation?
> 
> The shell is not MSYS, it is:
> shell: C:\cygwin\bin\bash.exe -eo pipefail -o igncr '{0}'

Using the Cygwin tools to build MinGW ports is an uncharted territory
for me: I never did that.  MSYS was created (as a fork of Cygwin)
precisely for this purpose: allow building MinGW ports using native
Windows development tools (MinGW GCC and Binutils, native Windows
Perl, etc.), and it includes several special tricks up its sleeve to
that end.  Examples include transparent conversion of /d/foo/bar file
names to the Windows D:\foo\bar form, conversion of foo:bar:baz in
PATH to Windows-style foo;bar;baz, EOL conversion where possible,
conversion of /dev/null to NUL etc.

By contrast, Cygwin has a very different purpose.

I think the correct way of building MinGW ports using Cygwin Bash and
development tools is to use Cygwin-to-MinGW cross-compiler (I think
the Cygwin project provides such cross-tools).  Using native Windows
executables (including MinGW GCC and Binutils, and other MinGW ports)
is bound to cause problems such as this one.

And I have no idea how to invoke Strawberry Perl from Cygwin in a way
that will avoid these problems.  I guess you could try removing CR
characters, but that could bite you in some other cases, where a lone
CR character is produced and is supposed to be kept.  If you are
lucky, this will never happen in Texinfo, but only if you are lucky.

> and the Cygwin Perl is not installed on purpose such that Strawberry
> Perl is used.
> 
> >  And if not, does the
> > Strawberry Perl's bin/ subdirectory appear on your PATH?
> 
> I think so, there is:
> Path:   C:\cygwin\bin
>   ..
> C:\Strawberry\c\bin
> C:\Strawberry\perl\site\bin
> C:\Strawberry\perl\bin
>   ..
> 
> Perl is found there by configure:
> 
> checking for perl... /cygdrive/c/Strawberry/perl/bin/perl
> 
> 
> I will check what a ls on those directories gives, to check if prove is
> actually there.

I think if you build using a mix of Cygwin tools (Bash etc.) and MinGW
GCC/Binutils and native Windows Perl, you will be fighting an up-hill
battle due to all those subtle incompatibilities.  From my 20 years of
experience of building MinGW ports natively, the only reliable way of
doing that is by using MSYS Bash, MSYS Make, MSYS Coreutils and other
MSYS ports specifically prepared for this purpose.

Re: safe to remove carriage returns in configure for mingw?

2024-09-20 Thread Eli Zaretskii

> Date: Fri, 20 Sep 2024 19:29:14 +0200
> From: Patrice Dumas 
> 
> An issue with XS modules configure for mingw (on cygwin) using Bruno CI
> https://github.com/gnu-texinfo/ci-check
> 
> In tp/Texinfo/XS/configure.ac, we use perl -V:var to get information on
> Perl configuration for the var.  The perl output in that case has ^M
> carriage returns at end of lines.  This prevents the configure script to
> work correctly.  We parse perl -V:var output using a sed script.  It
> would be easy to add
>   s/\r//
> to remove carriage returns.  However, I fear that it could break some
> native builds that expect carriage returns at end of lines.
> 
> The symptom of this issue are line like that in tp/Texinfo/XS/configure
> run output, while previous lines do not have a ^M:
> checking Perl configuration value cc... gcc';^M
> 
> Is this issue also happening with native builds?  Would it be safe to
> always remove carriage returns?

What do you mean by "native builds"?  A MinGW build is a native
Windows build, so I'm confused about the above dichotomy.

FWIW, I don't see any such problems in Texinfo 7.0.94: there are no ^M
characters in tp/Texinfo/XS/config.log.  I don't know why.  Maybe
`...` removes the ^M characters in my version of Bash?  Hmm...

Yes, looks that way:

  $ /d/usr/Perl/bin/perl -V:cc > foo

produces 'foo' with CRLF end-of-line, but

  $ echo `/d/usr/Perl/bin/perl -V:cc` > foo

produces 'foo' with Unix newlines.

Doesn't this happen with the Bash you are using?

Re: finding prove with mingw on cygwin?

2024-09-20 Thread Eli Zaretskii

> Date: Fri, 20 Sep 2024 19:14:46 +0200
> From: Patrice Dumas 
> 
> Hello,
> 
> I would like to have more tests for mingw (on cygwin) using Bruno CI
> https://github.com/gnu-texinfo/ci-check
> 
> However, 'prove' is not found by configure, by
> AC_CHECK_PROGS([PROVE], [prove], [])
> while perl is found with
> AC_PATH_PROG([PERL], [perl])
> 
> I checked the binary zip provided for strawberry perl, which seems to be
> the Perl used in that case, and there is a prove.bat file provided.
> 
> Do testers with mingw and strawberry Perl have prove found by configure?

On my system, I get

  $ which prove
  /bin/prove

in the MSYS Bash.  So it does find 'prove'.

> Any idea on what should be done?

Do you have the file in your MSYS installation?  And if not, does the
Strawberry Perl's bin/ subdirectory appear on your PATH?

Re: Texinfo 7.1.0.91 pretest available

2024-09-01 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 1 Sep 2024 18:49:59 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Sep 01, 2024 at 06:48:31PM +0300, Eli Zaretskii wrote:
> > > From: Gavin Smith 
> > > Date: Sun, 1 Sep 2024 16:25:17 +0100
> > > Cc: platform-test...@gnu.org
> > > 
> > >   . C source files that are generated from *.xs files are no no longer
> > > distributed, so xsubpp from Perl is needed to build XS modules.
> > 
> > Why this change?  That's not something I'd expect to see in a bugfix
> > release, since it potentially changes the build process.
> 
> I think the fear was that since we distribute the *.xs files and
> the Makefile also has rules to generate the *.c files from the *.xs
> files, it is possible that the distributed *.c file is overwritten
> automatically.

It can only be overwritten automatically if its XS source is newer,
i.e. if it was edited.  In which case the overwrite is expected.

This happens with any file in a tarball that is generated.  For
example, if you modify configure.ac, the configure script will be
regenerated, overwriting the one that came with the tarball.

This is completely normal and expected.

> We felt it was wrong to overwrite a distributed file
> with no way of getting back the original version (other than extracting
> the distribution archive again.)

This is not different from any other generated file in the tarball.

> https://lists.gnu.org/archive/html/bug-texinfo/2023-12/msg00011.html
> ""make distclean" does not bring back build tree to previous state"
> (Hilmar Preuße)
> 
> https://lists.gnu.org/archive/html/bug-texinfo/2021-02/msg00160.html
> "texinfo-6.7.90 tarball contains files that "make distclean" erases"
> (Bruno Haible)

If the problem was with "make distclean", a better solution would be
to not delete these *.c files as part of "make distclean".  According
to the GNU Coding Standards, the 'distclean' target removes only files
that were not in the tarball.

> > Anyway, how does one check whether one has or doesn't have xsubpp?
> > 
> > Thanks.
> 
> Run "xsubpp -v" or "which xsubpp" from the command line.

Thanks.

Re: Texinfo 7.1.0.91 pretest available

2024-09-01 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 1 Sep 2024 16:25:17 +0100
> Cc: platform-test...@gnu.org
> 
>   . C source files that are generated from *.xs files are no no longer
> distributed, so xsubpp from Perl is needed to build XS modules.

Why this change?  That's not something I'd expect to see in a bugfix
release, since it potentially changes the build process.

Anyway, how does one check whether one has or doesn't have xsubpp?

Thanks.

Re: XDG Base Directory Specification ignores installation directories

2024-08-28 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 28 Aug 2024 21:26:55 +0100
> 
> I'm a bit confused as I thought it was configuration files which we
> were discussing, which would be XDG_CONFIG_DIRS, not XDG_DATA_DIRS.
> 
> I see that htmlxref.cnf is installed under $datadir, e.g. as
> "/usr/local/share/texinfo/htmlxref.cnf".  I suppose there is a question
> of whether this file counts as "data" or "configuration".  We should
> decide which one each file we are talking about is, otherwise this issue
> is more confusing.

I don't understand why we are discussing
/usr/local/share/texinfo/htmlxref.cnf.  AFAIU, the XDG specification
is for files specific to users, not system-wide configuration files.
So the system-wide htmlxref.cnf should not be affected by XDG, only
user-specific files ~/htmlxref.cnf and ~/.texinfo/htmlxref.cnf should
be affected.  Or what am I missing?

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-26 Thread Eli Zaretskii

> Date: Mon, 26 Aug 2024 19:53:43 +0200
> From: Patrice Dumas 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> 
> > XDG is supported only on some systems.  For example, MS-Windows
> > doesn't support it, and I'm not sure about macOS.
> 
> I have not seen anything platform specific in the specification, as far
> as I can tell (except for the path separator), no more than what we
> currently do.

The directories that are the default values of the various XDG
locations do not necessarily exist (or even make sense) on other
systems.  And system utilities and services do not recognize them, so,
for example, the cache directory is not cleaned up from time to time.

> > So my suggestion is look for the files in their traditional places,
> > and failing that to fall back on XDG-mandated locations (with suitable
> > defaults as defined by XDG, if the relevant environment variables are
> > not set).
> 
> But that means that you support forever two locations, the traditional
> places, and the XDG-mandated locations if variables are not set.  I
> wanted to support only one, the XDG-mandated locations if variables are
> not set and deprecate the traditional locations (and use the XDG
> environment variables if set, but there is no trade-off with that case).

I understand the desire, but I don't think it's practical to make such
abrupt changes in user-facing facilities.

Again, it's just my opinion and my experience.

> > This is what Emacs does, and we have yet to hear a single
> > complaint.  We found this to be the best alternative in terms of
> > backward compatibility: it allows users smooth migration to
> > XDG-compliant configuration if and when they so desire.
> 
> It's no wonder that no one complain, all the added complexity is on you. 

Not entirely true, because the users need to understand this logic and
what it does when the XDG directories do exist.  For example, when
Emacs is started and the user init file does not yet exist, and some
Emacs command creates the init file, users need to understand where
will that file be created.

But yes, most of the burden related to this complication is on the
Emacs developers, not on users.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-26 Thread Eli Zaretskii

> Date: Mon, 26 Aug 2024 09:02:37 +0200
> From: Patrice Dumas 
> 
> On Sun, Aug 25, 2024 at 06:57:42PM +0100, Gavin Smith wrote:
> > On Sun, Aug 25, 2024 at 02:15:17PM +0200, Patrice Dumas wrote:
> > > On Sun, Aug 25, 2024 at 11:32:01AM +0100, Gavin Smith wrote:
> > > > We should at least support the "XDG Base Directory Specification".
> 
> I see three possibilities for the locations that do not already follow
> the XDG Specification, locations like /etc/texinfo or /etc/texi2any, and
> locations like $HOME/.texinfo or $HOME/.texi2any
> 
> 1) remove them as possibilities completly and mark in the documentation
>   that the new directory have to be used
> 2) keep them as possibilities but mark them deprecated in the
>   documentation, add warnings in the code and keep them about 5 years
> 3) keep them and maintain them
> 
> I would lean towards 2), otherwise there would be too many directories
> searched for.
> 
> Opinions?

XDG is supported only on some systems.  For example, MS-Windows
doesn't support it, and I'm not sure about macOS.

So my suggestion is look for the files in their traditional places,
and failing that to fall back on XDG-mandated locations (with suitable
defaults as defined by XDG, if the relevant environment variables are
not set).  This is what Emacs does, and we have yet to hear a single
complaint.  We found this to be the best alternative in terms of
backward compatibility: it allows users smooth migration to
XDG-compliant configuration if and when they so desire.

(Btw, AFAIK XDG directories can be ephemeral, and the environment
variables can be removed when the user logs off.  This should be kept
in mind when designing the support for XDG.)

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 19:45:16 +0200
> From: Patrice Dumas 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> 
> On Sun, Aug 25, 2024 at 08:42:39PM +0300, Eli Zaretskii wrote:
> > > Date: Sun, 25 Aug 2024 19:04:39 +0200
> > > From: Patrice Dumas 
> > > 
> > > On Sun, Aug 25, 2024 at 11:32:01AM +0100, Gavin Smith wrote:
> > > > It's actually worse.  It's not just a matter of placing the htmlxref.cnf
> > > > file in the same directory as the Texinfo source.  It has to be present
> > > > in the working directory (.) where texi2any runs.  (I just tested
> > > > this with current git 'master'.)  
> > > 
> > > Now an htmlxref.cnf file in the same directory as the Texinfo source
> > > should be used.
> > 
> > Only there or _also_ there (in addition to looking in the directory
> > where texi2any runs)?
> 
> Also there.  First in the directory where texi2any runs and then in the
> same directory as the Texinfo source before the other directories.

Thanks.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 19:04:39 +0200
> From: Patrice Dumas 
> 
> On Sun, Aug 25, 2024 at 11:32:01AM +0100, Gavin Smith wrote:
> > It's actually worse.  It's not just a matter of placing the htmlxref.cnf
> > file in the same directory as the Texinfo source.  It has to be present
> > in the working directory (.) where texi2any runs.  (I just tested
> > this with current git 'master'.)  
> 
> Now an htmlxref.cnf file in the same directory as the Texinfo source
> should be used.

Only there or _also_ there (in addition to looking in the directory
where texi2any runs)?

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 17:26:59 +0200
> From: Patrice Dumas 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> 
> On Sun, Aug 25, 2024 at 05:27:19PM +0300, Eli Zaretskii wrote:
> > As you discuss the various changes and improvements to this, please
> > consider one additional aspect: it is quite reasonable for a project
> > to have cross-references to other manuals, which are not necessarily
> > of interest for the GNU Project, at least not enough to have that
> > referent manual in htmlxref.cnf.
> 
> I think that, for now, we should have all the manuals in htmlxref.cnf
> not only GNU manuals.

This might be too much work for you, for no good reason.  There are
gazillions of manuals out there, and having all of them in that file,
let alone keeping it up to date, is not necessarily the business of
the Texinfo project.  By contrast, leaving it to individual projects
to take care of references only they need makes a lot of sense to me.

But you are the Texinfo maintainers, so it's your call.

> I view the htmlxref.cnf file as a way to
> translate any manual identifier (the manual name right now) as a
> ressource located in the World Wide Web (that can be used in a
> cross-reference).  A bit like what doi.org does but restricted to the
> Texinfo HTML manuals world

Is doi.org run by 2 volunteers, or does it maybe have a slightly
larger staff?

> and have the non GNU manuals be delagated somehow, but for now, as long
> as we do not have such an structure, I think that it is better if we add
> all the known Texinfo HTML manuals.

Why do you assume the referent manual must necessarily be a Texinfo
manual?  That is completely not guaranteed.  It could be produced by
Sphinx, for example, or some other HTML-authoring tool in use out
there.

Anyway, I only wanted to raise an issue that I noticed.  It is up to
you and Gavin to decide what to do with it.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 14:15:17 +0200
> From: Patrice Dumas 
> 
> On Sun, Aug 25, 2024 at 11:32:01AM +0100, Gavin Smith wrote:
> > 
> > This is a hack but may be the best way at the moment.  I think we
> > should have a better way of supporting this if there isn't one already.
> > We should at least support the "XDG Base Directory Specification".
> 
> I can work on that.  Should it be done before or after the release?
> 
> I think that adding the manual source directory could be done before the
> release as it is not a big change.

As you discuss the various changes and improvements to this, please
consider one additional aspect: it is quite reasonable for a project
to have cross-references to other manuals, which are not necessarily
of interest for the GNU Project, at least not enough to have that
referent manual in htmlxref.cnf.

A case in point is the Eshell manual (part of the Emacs suite of
manuals), which in a couple of places has a cross-reference to the
manual of zsh.  Now, zsh is not a GNU project, and there's no reason
to have the URL of its manual in htmlxref.cnf.  Likewise, there could
be other manuals that are referenced by some GNU project in its
manual, but are otherwise not interesting to the other projects, not
enough to have them in the official Texinfo htmlxref.cnf.  Because if
we decide that any and every referent manual must be in htmlxref.cnf,
we will eventually have a lot of stuff there that is hardly of
interest, and keeping track on all of them to have them up-to-date
could be a significant burden.

So maybe we need two such files: one maintained by Texinfo, where only
GNU manuals are mentioned, the other local to a project, where the
project could augment the Texinfo-maintained file with additions that
are of interest only to that project.

Yes, I know that this should already be possible, but at least the
Texinfo manual should mention this.  E.g., currently the Texinfo
manual asks for every update to the file to be reported to the Texinfo
developers, but that makes little sense for such "project-specific"
external references.  Maybe this project-local file should also be
named differently, I don't know.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 25 Aug 2024 11:32:01 +0100
> Cc: bug-texinfo@gnu.org
> 
> > This is all quite annoying, to tell the truth.  The only practical
> > alternative seems to be to generate the HTML manuals using the
> > system-wide htmlxref.cnf file, and then use our own script to edit all
> > the instances of "MANUAL_html/" to say "MANUAL/" instead, which
> > basically means we don't use any of the facilities provided by Texinfo
> > for this purpose.  Or am I missing something?
> 
> I don't know why you can't update the system-wide htmlxref.cnf file.  I
> imagine there can't be that many people building and uploading the
> Emacs web manuals.

I'll need to talk to the other Emacs co-maintainers.  I don't know
what kind of control they have over the systems where they produce the
HTML manuals when a new version is released.  I, for example, cannot
modify system-wide files on the GNU server where I do these chores, so
I'd need to install Texinfo under my home directory if we decide to
use an updated htmlxref.cnf file.  In any case, we will need to
coordinate between us to share any changes in that file we may need to
do in the future, and that is an error-prone practice.

My hope was for a clean and reliable solution: include htmlxref.cnf in
the Emacs repository and take care of keeping it up to date at all
times.  Unfortunately, it doesn't really work cleanly, as concluded by
this discussion.  Too bad.

Thanks for the other suggestions and comments, I think at least the
current situation is clear now.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 12:35:54 +0200
> From: Patrice Dumas 
> 
> On Sun, Aug 25, 2024 at 11:33:45AM +0100, Gavin Smith wrote:
> > 
> > I think it is most likely a mistake.
> 
> Another possibility would be that the manual name got renamed and the
> old name is used here.  But even if it was the case, the new name should
> be there too, and there should be a comment explaining the situation.

According to the VCS history, this file was always named "smtpmail".

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> Date: Sun, 25 Aug 2024 10:54:31 +0300
> From: Eli Zaretskii 
> Cc: bug-texinfo@gnu.org
> 
> > It should be possible to use an additional project-specific htmlxref.cnf
> > file that would take priority.  See 
> > <https://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-Xref-Configuration.html>.
> 
> This section of the Texinfo manual seems to imply that if a project
> has its own htmlxref.cnf file, that file must be present in every
> directory where HTML manuals are generated.  IOW, there's no way of
> sharing the same file between different subdirectories of the same
> project tree.  Is that conclusion correct?
> 
> Emacs has its manuals in 4 subdirectories of the doc/ directory (and a
> 5th subdirectory, with translations of the manuals, is in the works).
> Since potentially each one of these manuals can cross-reference to
> each one of the other Emacs manuals, and since the HTML versions of
> the manuals are produced by running 'makeinfo' in the corresponding
> subdirectory, AFAIU we need to have the up-to-date cross-reference
> information in each of these 4 subdirectories.  Is that correct?
> 
> There is the HTMLXREF_FILE variable we could use to point to a single
> htmlxref.cnf file, but was this variable supported starting from the
> same Texinfo version which introduced htmlxref.cnf?  It doesn't sound
> that way: htmlxref.cnf is mentioned in NEWS under Texinfo 5.0, whereas
> HTMLXREF_FILE is mentioned only under Texinfo 7.0.  So using this
> variable would mean we'd need to restrict HTML generation to Texinfo
> 7.0 and newer, right?

And one more question wrt what this section of the Texinfo manual
says: what exactly us the MANUAL part of the entries in htmlxref.cnf,
and what does texi2any compare it to?  For example, there's this entry
in the current file:

   smtp mono${EMACS}/html_mono/smtpmail.html
   smtp node${EMACS}/html_node/smtpmail/

What does "smtp" mean here?  The source file for this manual is
smtpmail.texi and the output goes to smptmail.html, so how is "smtp"
used by texi2any?  I'm asking because there are other manuals where
the name in the top-level DIR menu differs from the file name, and I
need to understand how to format those lines.  The file as distributed
by Texinfo seems to be somewhat inconsistent in this respect: for
example, in case of CL-LIB manual, the MANUAL part of the entry is
"cl", identical to the file name, in contrast to the above "smtp"
entry.  Perhaps "smtp" above is a mistake, and MANUAL should always
equal the file name?

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-25 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Fri, 23 Aug 2024 17:18:25 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Fri, Aug 23, 2024 at 09:25:40AM +0300, Eli Zaretskii wrote:
> > > I notice that "eglot" is not listed in "htmlxref.cnf" so will be output
> > > with a default.  The HTML source has
> > > 
> > >  > > href="../eglot_html/Eglot-Features.html#Eglot-Features">Eglot Features
> > > 
> > > for the link.  The code in texi2any (in HTML.pm, _external_node_href)
> > > suffixes the manual name with an underscore and the output format ("html")
> > > when generating this hyperlink.
> > 
> > Sorry, you lost me here.  Emacs doesn't have an htmlxref.cnf file,
> > AFAICS.  Are you saying that we need to make sure the file distributed
> > by Texinfo is up-to-date with the structure of the Emacs manuals, and
> > that this up-to-date file is present on the system where the HTMl
> > manuals are generated?  That sounds like an unnecessary maintenance
> > burden, both for us and for you.
> 
> That is the way that inter-manual links have been supported up to the
> current point in time, yes.
> 
> It should be possible to use an additional project-specific htmlxref.cnf
> file that would take priority.  See 
> <https://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-Xref-Configuration.html>.

This section of the Texinfo manual seems to imply that if a project
has its own htmlxref.cnf file, that file must be present in every
directory where HTML manuals are generated.  IOW, there's no way of
sharing the same file between different subdirectories of the same
project tree.  Is that conclusion correct?

Emacs has its manuals in 4 subdirectories of the doc/ directory (and a
5th subdirectory, with translations of the manuals, is in the works).
Since potentially each one of these manuals can cross-reference to
each one of the other Emacs manuals, and since the HTML versions of
the manuals are produced by running 'makeinfo' in the corresponding
subdirectory, AFAIU we need to have the up-to-date cross-reference
information in each of these 4 subdirectories.  Is that correct?

There is the HTMLXREF_FILE variable we could use to point to a single
htmlxref.cnf file, but was this variable supported starting from the
same Texinfo version which introduced htmlxref.cnf?  It doesn't sound
that way: htmlxref.cnf is mentioned in NEWS under Texinfo 5.0, whereas
HTMLXREF_FILE is mentioned only under Texinfo 7.0.  So using this
variable would mean we'd need to restrict HTML generation to Texinfo
7.0 and newer, right?

This is all quite annoying, to tell the truth.  The only practical
alternative seems to be to generate the HTML manuals using the
system-wide htmlxref.cnf file, and then use our own script to edit all
the instances of "MANUAL_html/" to say "MANUAL/" instead, which
basically means we don't use any of the facilities provided by Texinfo
for this purpose.  Or am I missing something?

> > > There are actually two different questions:
> > > * What name to use for the output directory
> > > * What name to use in cross-references by default
> > > 
> > > These could be different.  We could keep the output directory as
> > > "eglot_html" while changing the hyperlink to refer to, simply, "eglot".
> > 
> > Can you explain how this could work?  I always thought that these
> > parts of each URL must reflect the actual directory structure of the
> > filesystem where the HTML files reside.
> 
> The idea is that the output directory MANUAL_html would be renamed to
> MANAUL when uploaded to the website.

That's a tough requirement when using a VCS to host the manuals, as
the entire GNU project does.  CVS doesn't support file renaming, so we
lose all the history when renaming a directory.  So renaming the
existing directories is not an option for us.

> What has caused the problem you are reporting is not actually the
> output directory having the "_html" suffix, but this suffix appearing
> in intermanual links.

AFAIU, in order to work, the links and the directory structure must
use the same conventions, otherwise there are significant
complications (like redirection files etc.).  So we must make sure the
links use MANUAL, not MANUAL_html.  Using the --output command-line
option (which we do) does not solve this problem, as it only affects
where the output files are being written, not the cross-reference
links.

Re: removing the HTML Xref Mismatch node

2024-08-23 Thread Eli Zaretskii

> Date: Fri, 23 Aug 2024 23:01:25 +0200
> From: Patrice Dumas 
> 
> While we are on the subject of HTML Xref, I think that the "HTML Xref
> Mismatch" node should be removed
> 
>  
> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-Xref-Mismatch.html
> 
> The information is still true, but I do not think that the solution
> proposed is useful in practice as:
> 1) in general both split and non-split manuals are generated
> 2) with htmlxref.cnf the nature of the target manual is known
> 
> I suggest removing the node from the manual, and put the code example
> somewhere else (maybe in tp/TODO).
> 
> Opinions?

I think the situation that node tries to "solve" is basically
insoluble.  I think when producing a manual that references other
manuals, the person who runs texi2any must know the form (mono, node,
etc.) of the referent manual(s).

My problem is that even if I know that, I'm not sure I understand how
to use that information at HTML-generation time, unless all the
referent manuals are split the same as the manual I'm producing (this
is the only situation clearly described in then Texinfo manual, and
the default operation of texi2any).  But what if the referent manuals
use different splitting, let alone if they use splitting different
from one another (e.g., the manual being produced is node-split,
whereas some referent manuals use chapter-splitting, some others
section-splitting, and some mono)?  How can I tell this to texi2any to
have it generate correct references?  It sounds like the only way is
to have a local htmlxref.cnf file that mentions _only_ the actual
splitting of each referent manual?  If this is the way to go, the
Texinfo manual should say that explicitly.  And if even this will not
solve the problem, then IMO a solution should be designed and
implemented.

In this connection, this text of "HTML Xref Configuration" is
completely unclear to me:

  However, if a manual is not available in that form, anything that is
  available can be used.  Here is the search order for each style:

   node⇒ node,section, chapter, mono
   section ⇒ section, chapter, node,mono
   chapter ⇒ chapter, section, node,mono
   mono⇒ mono,chapter, section, node

 These section- and chapter-level cross-manual references can succeed
  only when the target manual was created using ‘--node-files’; this is
  the default for split output.

When is this search performed and by whom?  Is it even relevant to the
issue of cross-manual references, and if so, how?  And what does the
last sentence about --node-files want to say?

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-23 Thread Eli Zaretskii

> Date: Fri, 23 Aug 2024 22:17:33 +0200
> From: Patrice Dumas 
> 
> +The defaults for the @var{host} and @var{dir} correspond with 
> cross-references
> +links for a collection of manuals located side by side on a single host.  
> This
> +is different from the usual case of HTML manuals published on the World Wide
> +Web served from different hosts in directories chosen by the publisher, for
> +which the @var{host} and @var{dir} must be specially defined to have working
> +cross-references.  However this requires external information and is detailed
> +later on (@pxref{HTML Xref Configuration}).

This part is unclear, I could not understand what it means and how
that affects me as the user of Texinfo.

> +If you want to generate cross-references to HTML manuals published on
> +the World Wide Web, the @var{host} and @var{dir} parts of
> +the cross-reference need to be known by Texinfo converters to be able
> +to generate cross-references.  More generally you may want to specify
> +the location of other manuals for cross-references.

This is written from the wrong POV, IMO.  The correct POV is that of a
manual author/maintainer who wants cross-references from _other_
manual to his/her one to be correct.  So I suggest the following
rewording.:

  For other manuals published on the World Wide Web to be able to have
  correct cross-references to your manual, the @var{host} and
  @var{dir} parts of the cross-reference need to be known by Texinfo
  converters when they produce HTML for those other manuals.
  Similarly, for your manual in HTML format to have correct
  cross-references to other manuals, @command{texi2any} needs to know
  how to construct the URLs for other manuals.  This information is
  provided by @file{htmlxref.cnf}.

> +Right now, the @file{htmlxref.cnf} file distributed with GNU Texinfo serves
> +as the main resource to locate Texinfo HTML manuals in the World Wide Web.
> +Since it is installed in a location used by @command{texi2any}, HTML manuals
> +information found in this file will be used for cross-references by default.
> +
>  If you have additions or corrections to the @file{htmlxref.cnf}
>  distributed with Texinfo, please email @email{bug-texinfo@@gnu.org} as
> -usual.  You can get the latest version from
> +usual.  If you publish a Texinfo HTML manual on the World Wide Web,
> +having an up-to-date location listed in @file{htmlxref.cnf} should ensure 
> that
> +all HTML manuals generated by @command{texi2any} use this location for
> +cross-references in the default case.  You can get the latest version from
>  @url{http://ftpmirror.gnu.org/@/texinfo/@/htmlxref.cnf}.

I suggest to tell here explicitly that having a local htmlxref.cnf
file with the correct information is the solution for outdated or
missing data in the file as distributed by Texinfo.

Thanks.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-23 Thread Eli Zaretskii

> Date: Fri, 23 Aug 2024 09:47:57 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org
> 
> On Thu, Aug 22, 2024 at 10:32:41PM +0300, Eli Zaretskii wrote:
> > Texinfo 7.0 changed the name of the directory where texi2any outputs
> > split-HTML files for a manual:
> > 
> >   7.0 (7 November 2022)
> >   * texi2any
> >   [...]
> >. HTML output:
> >. use manual_name_html as output directory for split HTML instead of
> >  manual_name or manual_name.html
> 
> For the reference, it was discussed here:
> https://lists.gnu.org/archive/html/bug-texinfo/2022-02/msg3.html

That discussion failed to examine the existing practices, and failed
to draw the attention of the relevant package maintainers to the fact
that their practices were different.

I'm quite sure that you and Gavin are aware of the FSF infrastructure
that GNU projects use to upload their manuals to the GNU site and the
fact it uses CVS as its storage, and therefore I'd expect you to look
there and compare the actual trees with what this change was about to
do.

> We noted at that time that manuals generated by gendoc.sh and automake
> generated html targets were not affected directly as they used --output,
> in a way that was already incompatible with the HTML Xref specification.
> Many GNU projects use gendoc.sh, but not all, in particular big projects
> tend to use their own systems.  I have not checked, but I guess that non
> GNU project often use the texi2any defaults.

I don't think I follow.  A typical GNU project has an "html" target in
its Makefile, which invokes makeinfo/texi2any, not gendoc.sh.  And how
does the --output flag affect the cross-references to other manuals?

> > and references to other manuals now fail because the directory
> > structure of Emacs manuals under manual/html_node/ has subdirectories
> > named by the old convention: "manual_name", not "manual_name_html".
> > 
> > Worse, the directory structure of the Emacs manuals is actually a CVS
> > tree, because all the Emacs manuals that are regenerated for each new
> > release of Emacs are checked into CVS in the
> > cvs.savannah.gnu.org:/web/emacs repository, and from there are posted
> > on the GNU documentation site by the FSF infrastructure.  So now we
> > need either to rename all the subdirectories in CVS (and risk losing
> > the VCS history), or write a script that modifies the cross-manual
> > references to omit the "_html" suffix.
> 
> All the changes in the HTML Xref specification directory lead to that
> kind of unfortunate incompatible change.

Then why make such changes at all?  What was so bad about the previous
defaults?

> Here, you can update all the
> manuals at once, however, even if it is a hassle.  But for links to
> truely external manuals it is worse, as there is no possibility of
> changing already generated manuals.  htmlxref.cnf can be used for
> external manuals, but it needs to be changed two time, before and after
> the target manual is regenerated with the new HTML Xref specification.

Exactly.  Which once again tells me this stuff should not be changed,
not unless there's a very significant problem with the existing
defaults.

> > This is exactly the kind of backward-incompatible change that should
> > have never been done, because it breaks on-line manuals that are
> > regenerated for new releases,
> 
> Here it depends.  If all the manuals are generated using the same HTML
> Xref specification, it should be ok -- though is still may require
> change in the build/VCS/... as you report.

You cannot rely on everybody using the latest version of Texinfo.
This is impractical and unreasonable.  For example, I'm using one of
the GNU servers to produce Emacs releases and Emacs manuals to go with
them, but I'm not the sysadmin of that server, so I cannot always make
sure I have the latest Texinfo (nor do I always want to have it: for
example, a bugfix release had better used the same Texinfo version as
the original one, to avoid introducing new issues).

> If --output is used to specify the directory to follow the HTML Xref
> specification, it may not be corrected that way, but I think that using
> --output is always more consistent with also using htmlxref.cnf to
> specify the target location.

Sede above: I don't follow how --output affects cross-references
between manuals in the html_node/ subdirectories.

> > I guess it's too late to lobby for reverting that change?
> 
> Indeed.  It could have been reverted before the published change in HTML
> Xref, but now it is too late.  Also, the previous way lead either to a
> weird result (with .html) or made HTML output special for no reason.
> 
> Overall, we try to avoid making changes in the HTML Xref specification
> and it does not change that often, but from time to time it still needs
> to change.

As I wrote earlier, the fact that htmlxref.cnf has so many entries for
individual manuals is for me a telltale sign that the new default is
problematic at best.

Converted @multitable displays as empty

2024-08-23 Thread Eli Zaretskii

Please take a look at this node from the Emacs's Flymake manual:

  
https://www.gnu.org/software/emacs/manual/html_mono/flymake.html#Mode-line-status

It displays as (almost) empty.  But the page source reveals that we
have a  there:

  The following statuses are defined:
  
  
  [nerrors nwarnings ...]Normal operation. nerrors and nwarnings are, 
respectively,
  the total number of errors and warnings found during the last buffer
  check, for all backends. They may be followed by other totals for
  other types of diagnostics (see Customizing 
Flymake error types).
  WaitSome Flymake 
backends haven’t reported since the last time they
  where questioned.  It is reasonable to assume that this is a temporary
  delay and Flymake will resume normal operation soon.
  !All the configured 
Flymake backends have disabled themselves: Flymake
  cannot annotate the buffer and action from the user is needed to
  investigate and remedy the situation (see Troubleshooting).
  ?There are no 
applicable Flymake backends for this buffer, thus Flymake
  cannot annotate it.  To fix this, a user may look to extending Flymake
  and add a new backend (see Extending 
Flymake).
  

For some reason, this table, which AFAICT is the result of converting
@multitable to HTML, displays as empty in several browsers (Firefox,
Chrome).  Can you please advise what is wrong with this table and how
to fix this?

Thanks.

Re: Texinfo 7.0 changed the name of HTML output directory

2024-08-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Thu, 22 Aug 2024 21:12:52 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Thu, Aug 22, 2024 at 10:32:41PM +0300, Eli Zaretskii wrote:
> > For example, if you go to the above URL, click on "Eglot" and select
> > the "one web page per node" variant, you end up here:
> > 
> >   https://www.gnu.org/software/emacs/manual/html_node/eglot/index.html
> > 
> > and references to other manuals now fail because the directory
> > structure of Emacs manuals under manual/html_node/ has subdirectories
> > named by the old convention: "manual_name", not "manual_name_html".
> 
> I couldn't quickly find a broken link on the eglot manual you linked
> to, but found a broken link via the debbugs link:
> 
> https://www.gnu.org/software/emacs/manual/html_node/flymake/
> 
> This has a broken link to
> 
> https://www.gnu.org/software/emacs/manual/html_node/eglot_html/Eglot-Features.html#Eglot-Features

Yes.  (And this is just an example; there are more such cases I found
in subdirectories of the html_node directory by searching for
"_html/".)

> The "eglot_html" part of this link is incorrect: it should be
> 
> https://www.gnu.org/software/emacs/manual/html_node/eglot/Eglot-Features.html#Eglot-Features

Right.

> I notice that "eglot" is not listed in "htmlxref.cnf" so will be output
> with a default.  The HTML source has
> 
>  href="../eglot_html/Eglot-Features.html#Eglot-Features">Eglot Features
> 
> for the link.  The code in texi2any (in HTML.pm, _external_node_href)
> suffixes the manual name with an underscore and the output format ("html")
> when generating this hyperlink.

Sorry, you lost me here.  Emacs doesn't have an htmlxref.cnf file,
AFAICS.  Are you saying that we need to make sure the file distributed
by Texinfo is up-to-date with the structure of the Emacs manuals, and
that this up-to-date file is present on the system where the HTMl
manuals are generated?  That sounds like an unnecessary maintenance
burden, both for us and for you.

> There are actually two different questions:
> * What name to use for the output directory
> * What name to use in cross-references by default
> 
> These could be different.  We could keep the output directory as
> "eglot_html" while changing the hyperlink to refer to, simply, "eglot".

Can you explain how this could work?  I always thought that these
parts of each URL must reflect the actual directory structure of the
filesystem where the HTML files reside.

> This could potentially break some use cases, but I think they would be
> rare.  The common case is that the output directory, e.g. "eglot_html",
> would be renamed to e.g. "eglot" when installed on a website.   The
> purpose of this change was certainly not to change locations of web pages.

If the common case is to rename "eglot_html" to "eglot", then why is
the default "eglot_html"?  In my book, defaults should be identical to
what happens in common cases, otherwise we force everyone and their
dog to customize texi2any.  No?  IOW, I'd expect the version of
htmlxref.cnf distributed by Texinfo to be mostly empty, with the
possible exception of G and GS variables, and possibly other such
variables that document the site of each manual, like BINUTILS, EMACS,
etc. -- so that cross-manual references that point to another project
could be correct.  But the "manual information" lines should not be
present in this file for most if not all manuals, AFAUI, if the
defaults are reasonable.

> Before this change, I believe the manual would have been output in
> a directory called "eglot.html" but this was renamed when installed.

Maybe.  I don't see this renaming in any of the scripts involved in
generating and uploading of the Emacs manuals, but maybe I'm missing
something.

> I suspect that the online organisation of the Emacs manuals is slightly
> different to that of other projects and changing the default to
> "../eglot/" instead of "../eglot_html/" would benefit Emacs but few if
> any other projects.

Please elaborate.  I don't understand how this can be relevant only to
Emacs.  The only thing that is special to Emacs is that it has many
more manuals than other GNU projects, and maybe more cross-manual
references, but other than that, this should be relevant to any
project that has cross-manual references.

> A fix that seems easy is just to add eglot to htmlxref.cnf.  I
> will do this and upload it to ftp.gnu.org.  If there are other
> manuals that are missing entries can you let us know.

Thanks, but that is not enough.  Emacs releases are prepared by one of
several people, as their free time

Texinfo 7.0 changed the name of HTML output directory

2024-08-22 Thread Eli Zaretskii

Texinfo 7.0 changed the name of the directory where texi2any outputs
split-HTML files for a manual:

  7.0 (7 November 2022)
  * texi2any
  [...]
   . HTML output:
   . use manual_name_html as output directory for split HTML instead of
 manual_name or manual_name.html

This broke cross-manual references in the Emacs manuals as they are
accessible here:

   https://www.gnu.org/software/emacs/manual/

See https://debbugs.gnu.org/cgi/bugreport.cgi?bug=72761.

For example, if you go to the above URL, click on "Eglot" and select
the "one web page per node" variant, you end up here:

  https://www.gnu.org/software/emacs/manual/html_node/eglot/index.html

and references to other manuals now fail because the directory
structure of Emacs manuals under manual/html_node/ has subdirectories
named by the old convention: "manual_name", not "manual_name_html".

Worse, the directory structure of the Emacs manuals is actually a CVS
tree, because all the Emacs manuals that are regenerated for each new
release of Emacs are checked into CVS in the
cvs.savannah.gnu.org:/web/emacs repository, and from there are posted
on the GNU documentation site by the FSF infrastructure.  So now we
need either to rename all the subdirectories in CVS (and risk losing
the VCS history), or write a script that modifies the cross-manual
references to omit the "_html" suffix.

This is exactly the kind of backward-incompatible change that should
have never been done, because it breaks on-line manuals that are
regenerated for new releases, and the problem is a nuisance to fix if
the manuals are hosted in VCS repositories, because a typical VCS has
at best very weak support for renaming files, and at worst simply
loses all history before the rename.

I guess it's too late to lobby for reverting that change?

Re: "make maintainer-clean" inconsistency

2024-08-09 Thread Eli Zaretskii

> Date: Fri, 9 Aug 2024 14:31:19 +0200
> From: Patrice Dumas 
> 
> On Thu, Aug 08, 2024 at 08:46:37PM +0100, Gavin Smith wrote:
> > When I run "make maintainer-clean", there is a difference depending
> > on whether it is an out-of-source build.
> 
> My feeling is that maintainer-clean in an out of source build does not
> make much sense, and that we should not support it.  But that's a wild
> guess, I have never seen anything on that in the documentations and
> that's a subject I do not know much about.

>From the GNU Coding Standards manual:

  'maintainer-clean'
   Delete almost everything that can be reconstructed with this
   Makefile.  This typically includes everything deleted by
   'distclean', plus more: C source files produced by Bison, tags
   tables, Info files, and so on.

   The reason we say "almost everything" is that running the command
   'make maintainer-clean' should not delete 'configure' even if
   'configure' can be remade using a rule in the Makefile.  More
   generally, 'make maintainer-clean' should not delete anything that
   needs to exist in order to run 'configure' and then begin to build
   the program.  Also, there is no need to delete parent directories
   that were created with 'mkdir -p', since they could have existed
   anyway.  These are the only exceptions; 'maintainer-clean' should
   delete everything else that can be rebuilt.

   The 'maintainer-clean' target is intended to be used by a
   maintainer of the package, not by ordinary users.  You may need
   special tools to reconstruct some of the files that 'make
   maintainer-clean' deletes.  Since these files are normally included
   in the distribution, we don't take care to make them easy to
   reconstruct.  If you find you need to unpack the full distribution
   again, don't blame us.

   To help make users aware of this, the commands for the special
   'maintainer-clean' target should start with these two:

@echo 'This command is intended for maintainers to use; it'
@echo 'deletes files that may need special tools to rebuild.'

FWIW, I agree that this makes very little sense outside of the source
tree, because a better way in that case is simply to remove the build
tree, or switch to a clean tree for the subsequent build.

Re: Texinfo 7.1.0.90 pretest results [mingw]

2024-06-17 Thread Eli Zaretskii

> From: Bruno Haible 
> Date: Tue, 18 Jun 2024 00:24:12 +0200
> 
> For mingw, though, I'll stay with a native Windows perl. That's the point of
> a native Windows build.

Right.  And you cannot really build the XS extensions with MinGW
against a non-native Perl anyway.

Re: declaring function pointers with explicit prototypes for the info reader

2024-06-16 Thread Eli Zaretskii

> Date: Sun, 16 Jun 2024 16:29:10 +0200
> From: Patrice Dumas 
> 
> In standalone info reader code in info/ most function pointers are
> declared as a generic function pointer VFunction *, defined in info.h as
> 
> typedef void VFunction ();
> 
> I think that it would be much better to use actual prototypes depending
> on the functions to have type checking by the compiler.  I started doing
> that and did not find any evident issue with having explicit prototypes,
> but I may be missing something.
> 
> Would there be any reason not to have explicit prototypes?

If the code passes function pointers to other functions, or stores
function pointers in arrays, the prototypes of the functions will have
to match each other and/or the functions they are being passed to.  So
beware when two functions of different signatures are placed into the
same array or passed as an argument to the same function, because the
compiler will at least emit a warning if not an error.

Re: menu and sectioning consistency warning too strict?

2024-04-10 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 10 Apr 2024 23:53:31 +0100
> 
> I agree that the warning is not really necessary.   I don't mind
> either way.  It's up to you if you want to try to remove the warning.
> It's questionable whether lone @node without a following sectioning
> command is proper Texinfo. or what these constructs mean or how they should
> be output.

IMO, removing it would be a regression.  Either we should have a
separate setting for warning about missing sectioning commands, or the
warning should stay (or be replaced by a smarter one, like I wrote in
my previous message).

Most Texinfo manuals are intended to have a sectioning command in each
node, so this warning catches mistakes and is thus IMO valuable in a
vast majority of cases.

Re: menu and sectioning consistency warning too strict?

2024-04-10 Thread Eli Zaretskii

> Date: Wed, 10 Apr 2024 21:57:19 +0200
> From: Patrice Dumas 
> 
> With CHECK_NORMAL_MENU_STRUCTURE set to 1, there is a warning by
> texi2any:
> 
> a.texi:10: warning: node `node after chap1' is next for `chap1' in menu but 
> not in sectioning
> 
> for the following code:
> 
> @node Top
> @top top
> 
> @menu
> * chap1::
> * node after chap1::
> @end menu
> 
> @node chap1
> @chapter Chapter 1
> 
> @node node after chap1,, chap1, Top

AFAIU, the warning tells you that @chapter is missing in node after
chap1.

> I am not sure that this warning is warranted, this code seems ok to
> me, the lone node is not fully consistent with the sectioning structure,
> but not that inconsistent either.

I don't think I agree, since @chapter is missing in the second node.

> If there is another chapter after the lone node, there are two warnings,
> but this seems ok to me, as in that case, there is a clearer
> inconsistency, since with sectioning there is this time a different next:
> 
> b.texi:10: warning: node next pointer for `chap1' is `chap2' but next is 
> `node after chap1' in menu
> b.texi:15: warning: node prev pointer for `chap2' is `chap1' but prev is 
> `node after chap1' in menu

Which again tells you the same: @chapter is missing in node after
chap1.

> Should I try to remove the warning with a lone node at the end?

IMO, no, not unless you replace it with a smarter warning that
explicitly says @chapter is missing in the second node.

Re: organization of the documentation of customization variables

2024-03-27 Thread Eli Zaretskii

> Date: Tue, 26 Mar 2024 23:20:23 +0100
> From: Patrice Dumas 
> 
> > I took the list and tried to sort it into sections.  I may not have
> > done an especially good job of this, and there will likely be misplaced
> > variables.  I suggest this could be taken as a starting point for
> > reorganising the manual.
> 
> I started from that and did two nodes, as can be seen in the commit
> https://git.savannah.gnu.org/cgit/texinfo.git/commit/?id=c0a8822909514e947cefc7112986a2e704a023d0
> 
> Before I continue, is what I did the expected content?

>From where I stand, yes, it's a very good starting point, thanks.

> Here is what I propose:
> 
> * move HTML customization variables explanations to the 'Generating HTML'
>   chapter, either in an already existing section where they would be
>   inserted naturally (for example in the 'HTML CSS' section for
>   customization variables related to CSS) or to new sections or
>   subsections.  For exampl, I think that the new 'HTML Output Structure
>   Customization' node could be before 'Generating EPUB' or together with
>   'HTML Splitting', while 'File Names and Links Customization for HTML'
>   could be after 'HTML Cross-references' probably with other
>   customization variables nodes.
> * Move the 'HTML Customization Variables List' node to an appendix.

SGTM.  I think making the customization stuff separate subsections
would be better, as many users will not need that in the mainline
reading of the manual.  But that's a weak preference.

Re: Build from git broken - missing gperf?

2024-02-05 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Mon, 5 Feb 2024 19:35:59 +
> 
> I found it was being rebuilt by "make" because a dependency was updated:
> 
> $ ls -l gnulib/lib/iconv_open-aix.gperf 
> -rw-rw-r-- 1 g g 1.8k Jan 31 18:24 gnulib/lib/iconv_open-aix.gperf
> 
> which came from a gnulib update to another module (all that happened
> was that copyright years changed from 2023 to 2024).
> 
> Gnulib documents that "gperf" is a required tool for using the "iconv_open"
> module.  It's not especially easy, I find, to find why a particular gnulib
> module was brought in, but looking at the files under the "modules" directory
> of a gnulib checkout, I found the chain of dependencies
> 
> uniconv/u8-strconv-from-enc -> uniconv/u8-conv-from-enc -> striconveha
>   -> striconveh -> iconv_open
> 
> (Of course, there could other dependency chains that also brought this module
> in.)
> 
> Short of extirpating this dependency, the only solution appears to
> be to require anyone building from git to have gperf installed, which
> doesn't seem like a good situation, as it was never required before.
> 
> I don't know if uniconv/u8-conv-from-enc is a necessary module.  It's
> not easy to find out how the module is used as the documentation is
> lacking, but it appears to match libunistring.  The documentation is
> here:
> https://www.gnu.org/software/libunistring/manual/html_node/uniconv_002eh.html
> 
> I found uses of "u8_strconv_from_encoding" throughout the XS code,
> although most of the uses (I didn't check them all) have "UTF-8" as one
> of the arguments, making it appear that we are converting from UTF-8
> to UTF-8.

Should we ask the Gnulib folks to help us out?

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 4 Feb 2024 15:58:28 +
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> 
> On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > > An alternative is not to have such a variable but just to have an option
> > > to collate according to the user's locale.  Then the user would run e.g.
> > > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the 
> > > ll_LL.UTF-8
> > > locale.  They would have to have the locale installed that was appropriate
> > > for whichever manual they were processing (assuming the "variable 
> > > weighting"
> > > option is appropriate.)
> > 
> > What would be the default then, though?  AFAIR, we decided by default
> > to use en_US.utf-8 for collation, with the purpose of making the
> > sorting locale-independent by default, so that Info manuals produced
> > with the default settings are identical regardless of the user's
> > locale.
> 
> I agree that sorting should be locale-independent by default.

That's definitely ideal.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Eli Zaretskii

> Date: Sun, 4 Feb 2024 11:42:52 +0100
> From: pertu...@free.fr
> Cc: Gavin Smith , bug-texinfo@gnu.org
> 
> On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > I think en_US.utf-8 is (or at least can be by default) a combination
> > of @documentlanguage and @documentencoding.
> 
> I try to make the index collation as independent as possible of
> @documentencoding and output encoding.  Here the utf-8 is meant to
> provide a sorting 'independent' of the encoding.

Why is that a good idea?  Presumably, a manual whose language is
provided by @documentlanguage is indeed written in that language, and
so the collation should be according to that language?  Or what am I
missing?

If we want collation which uses only codepoints, disregarding any
collation weights defined by the Unicode TR10, we could use
en_US.utf-8, but then, as Gavin says, using glibc collation function
you get more than you asked, because weights are not ignored.  So we
need to use something else in the C variant of collation code, AFAIU.

> Regarding the language for now the aim was to have something as
> similar as the Perl output, which is obtained without a locale.  The
> choice of en_US was motivated by that aim.  I looked at the
> /usr/lib/locale/*/LC_COLLATE files on my debian GNU/Linux and there was
> no "en.utf-8", which would have been my first choice, so I used
> "en_US.utf-8".

I don't know enough about what Perl does in the module you are using.
"Obtained without a locale" means what exactly? a collation order that
only considers the Unicode codepoints of the characters?  Or does it
mean something else?  If it only considers the codepoints, then
collation in C using glibc functions will NOT produce the same order
even under en_US.utf-8, AFAIU.

Re: index sorting in texi2any in C issue with spaces

2024-02-01 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Thu, 1 Feb 2024 22:16:07 +
> Cc: Patrice Dumas , bug-texinfo@gnu.org
> 
> On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > > From: Patrice Dumas 
> > > 
> > > > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > > > There may be language-specific "tailoring" for other languages (e.g.
> > > > Swedish) that the user may wish to use instead.  Hence, it may be
> > > > a good idea to allow use of a user-specified locale for collation 
> > > > through
> > > > the C code.
> > > 
> > > That would not be difficult to implement as a customization variable.
> > > What about COLLATION_LANGUAGE?
> > 
> > What would be the possible values of this variable, and in what format
> > will those values be specified?
> 
> I imagine it would be a locale name for passing to newlocale and thence
> to strxfrm_l.  What Patrice implemented hardcord the name "en_US.utf-8"
> but this would be a possible value.

I think en_US.utf-8 is (or at least can be by default) a combination
of @documentlanguage and @documentencoding.

> (If there are locale names on MS-Windows that are different, it would
> be fine to support them the same way, only the invocation of texi2any
> would vary to use a different locale name.)

Yes, we will need to come up with something like that.  (And yes, the
names of locales on Windows are different, and can also take several
different formats.  For example, the equivalent of en_US can be either
"English_United States" or "en-US" [with a dash, not underscore], and
there's also a numerical locale ID -- e.g. 0x409 for en_US.)

> An alternative is not to have such a variable but just to have an option
> to collate according to the user's locale.  Then the user would run e.g.
> "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> locale.  They would have to have the locale installed that was appropriate
> for whichever manual they were processing (assuming the "variable weighting"
> option is appropriate.)

What would be the default then, though?  AFAIR, we decided by default
to use en_US.utf-8 for collation, with the purpose of making the
sorting locale-independent by default, so that Info manuals produced
with the default settings are identical regardless of the user's
locale.

> It is probably not justified to provide an interface to the flags of
> CompareStringW on MS-Windows if we can't provide the same functionality
> with strcoll/strxfrm/strxfrm_l.

Agreed.  I mentioned that only for completeness, and as an
illustration of the fact that the APIs for controlling this stuff are
extremely platform-dependent, although the underlying ideas and
algorithms are the same.

> It seems not very important to provide more of these collation options
> for indices as it is not something users are complaining about.

Right.

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Eli Zaretskii

> Date: Wed, 31 Jan 2024 23:11:02 +0100
> From: Patrice Dumas 
> 
> > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > There may be language-specific "tailoring" for other languages (e.g.
> > Swedish) that the user may wish to use instead.  Hence, it may be
> > a good idea to allow use of a user-specified locale for collation through
> > the C code.
> 
> That would not be difficult to implement as a customization variable.
> What about COLLATION_LANGUAGE?

What would be the possible values of this variable, and in what format
will those values be specified?

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 31 Jan 2024 20:10:56 +
> 
> It seems like a pretty obscure interface.  It is barely
> documented - newlocale is in the Linux Man Pages but not the
> glibc manual, and strxfrm_l was only in the Posix standard
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html).
> I don't know of any other way of accessing the collation functionality.
> 
> Do you know how portable it is?

AFAIK, this is glibc-specific.

In general, the implementations of Unicode TR10 differ among
platforms, with glibc offering the most complete and compatible
implementation and the CLDR DB to support it (what you discovered in
/usr/share/i18n/locales on your system).  MS-Windows has a similar,
but different in effect, functionality, see

  
https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-comparestringw

It supports various flags, described here:

  
https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-comparestringex

that affect the handling of collation weights.  For example, the
NORM_IGNORESYMBOLS flag will have an effect similar to what Patrice
found: spaces (and other punctuation characters) are ignored when
sorting.

CompareStringW accepts "wide" strings, i.e. a string should be
converted to UTF-16 encoding before calling it.  There's a similar
CompareStringA, which accepts 'char *' strings, but it can only
compare strings whose characters are all representable in the current
system locale's codeset; if we want to have all text represented
internally in UTF-8, we should probably convert UTF-8 to UTF-16 and
use CompareStringW.

I don't know about *BSD and other platforms, but wouldn't be surprised
if they offered something of their own, still different from glibc
and/or strict TR10/CLDR compliance.

> Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> There may be language-specific "tailoring" for other languages (e.g.
> Swedish) that the user may wish to use instead.  Hence, it may be
> a good idea to allow use of a user-specified locale for collation through
> the C code.

Probably.  Note that CompareStringW gives the caller a finer control:
they can tailor the handling of different weight categories, beyond
setting the locale for which the collation is needed.  Also, the
locale argument is defined differently for CompareStringW than via the
Posix-style setlocale or similar APIs (but that's something for the
implementation to figure out).

> I found some locale definition files on my system under
> /usr/share/i18n/locales (location mention in man page of the "locale"
> command) and there is a file iso14651_t1_common which appears to be
> based on the Unicode Collation tables.  I have only skimmed this file
> and don't understand the file format well (it's supposed to be documented
> in the output of "man 5 locale"), but is really part of glibc internals.
> 
> In that file, space has a line
> 
>  IGNORE;IGNORE;IGNORE; % SPACE
> 
> which appears to define space as a fourth-level collation element,
> corresponding to the Shifted option at the link above:
> 
>   "Shifted: Variable collation elements are reset to zero at levels one
>   through three. In addition, a new fourth-level weight is appended..."
> 
> In the Default Unicode Collation Element Table (DUCET), space has the line
> 
> 0020  ; [*0209.0020.0002] # SPACE
> 
> with the "*" character denoting it as a "variable" collation element.
> 
> I expect it would require creating a glibc locale to change the collation
> order, which is not something we can do.

I think if we want to ponder these aspects we should talk to the glibc
developers about the available options.

Re: makeinfo 7.1 misses menu errors

2024-01-19 Thread Eli Zaretskii

> Date: Fri, 19 Jan 2024 16:30:33 -0700
> From: Karl Berry 
> 
> Hi Gavin,
> 
> The problem as I remember it was that the error messages are awful:
> 
> No argument, but having any message at all is infinitely better than
> silence. I urge you to restore them by default, suboptimal as they are.
> 
> It's true that those msgs as such have never made a great deal of sense
> to me (including in the old C makeinfo). But they indicate perfectly
> well "there is a problem with the sectioning+menus related to node XYZ".
> It was not hard to figure it out once I knew that. I had no clue there
> was a problem until someone using makeinfo 6.x told me.

I agree.  Perhaps by default makeinfo should just display a general
warning about "some problem with sectioning vs menus", with a pointer
to the offending @menu command, and the warning text should advise to
use "-c CHECK_NORMAL_MENU_STRUCTURE=1" to get the details.  WDYT?

Re: makeinfo does not produce first output file when multiple files passed

2024-01-18 Thread Eli Zaretskii

> From: No Wayman 
> Date: Thu, 18 Jan 2024 10:52:17 -0500
> 
> 
> makeinfo --version: texi2any (GNU texinfo) 7.1
> Run on Arch Linux
> 
> Reproduction steps:
> 
> 1. Clone the "emacs-eat" repository:
> 
> $ cd /tmp/
> $ git clone https://codeberg.org/akib/emacs-eat.git
> 
> 2. Within the repository, attempt to build 3 info files from the 3 
> texi files in the repository:
> 
> $ cd ./emacs-eat
> $ makeinfo fdl.texi -o fdl.info gpl.texi -o gpl.info eat.texi -o 
> eat.info
> 
> This results in the last two info files being created (e.g. 
> gpl.info and eat.info in this example).
> The first file is not created regardless of the order they are 
> passed.
> Passing a "null" first argument results in all three files being 
> generated:
> 
> $ makeinfo /dev/null fdl.texi -o fdl.info gpl.texi -o gpl.info 
> eat.texi -o eat.info
> 
> Is this a misunderstanding on my part, or should all three files 
> be generated with the first makeinfo command in the reproduction 
> case?

I'm not sure what exactly is going on, but you _are_ making a mistake:
the files fdl.texi and gpl.texi are supposed to be @include'd by other
Texinfo files, not processed separately.  There's a comment at the
beginning of each one of them saying that.  Why are you processing
them as separate Texinfo documents?

Re: makeinfo 7.1 misses menu errors

2024-01-17 Thread Eli Zaretskii

> Date: Wed, 17 Jan 2024 14:55:33 -0700
> From: Karl Berry 
> 
> I recently learned that some @menu vs. sectioning discrepancies in the
> automake manual were found with makeinfo 6.7, but not 7.1. 
> 
> In essence, I moved a subsection (Errors with distclean) from one
> section to another, but forgot to remove the menu entry from the old one.
> (Surely not an uncommon error.)
> 
> Running the attached on 7.1, there are no errors or warnings.
> 6.7 correctly reports various problems resulting from this:

I believe this is an intentional feature in recent Texinfo versions.
To get the warnings back, you need to run makeinfo with the
command-line option "-c CHECK_NORMAL_MENU_STRUCTURE=1".

Re: "make distclean" does not bring back build tree to previous state

2023-12-13 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Tue, 12 Dec 2023 20:21:28 +
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Dec 10, 2023 at 04:00:56PM +0100, Preuße, Hilmar wrote:
> > Hello,
> > 
> > I got a report telling that "make distclean" does not bring back the build
> > tree into the original state. After running a build (configure line below)
> > and calling "make distclean", we have a few differences. Some files were
> > deleted, some files were re-generated and hence show a different time stamp
> > inside the file:
> 
> We'll probably have to investigate each of these changes separately to
> see where they occur.  I am probably not going to fix them all in one
> go.
> 
> First, could you confirm which version of Texinfo you got these results
> with.
> 
> I tested with Texinfo 7.1, ran configure with the same configure line
> as you, then ran "make distclean".

Let me point out that "make distclean" is NOT supposed to revert the
tree to a clean state as far as Git is concerned.  "make distclean" is
supposed to remove any files built or modified as part of building a
release tarball, and release tarballs can legitimately include files
that are not versioned, and therefore are not managed by Git.  So
looking at the results of "git status" is not the correct way of
finding files that "make distclean" is supposed to remove.  Instead,
one should:

  . create a release tarball
  . unpack and build the release tarball in a separate directory
  . run "make distclean" in the directory where the tarball was built
  . unpack the release tarball in another directory, and then compare
that other directory with the one where you run "make distclean"

If a Makefile target is required that should remove all non-versioned
files, that should be a separate target, likely "maintainer-clean" or
somesuch.

Re: Texinfo.tex, problem with too-long table inside @float

2023-12-03 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 3 Dec 2023 13:57:08 +
> Cc: bug-texinfo@gnu.org
> 
> The solution that occurs to me is to recognise a third argument for
> @float.  @float was introduced in Texinfo 4.7, in 2004.  From NEWS:
> 
>   . new commands @float, @caption, @shortcaption, @listoffloats for
> initial implementation of floating material (figures, tables, etc).
> Ironically, they do not yet actually float anywhere.
> 
> The comments in texinfo.tex state:
> 
> % @float FLOATTYPE,LABEL,LOC ... @end float for displayed figures, tables,
> % etc.  We don't actually implement floating yet, we always include the
> % float "here".  But it seemed the best name for the future.
> 
> [...]
> 
> % #1 is the optional FLOATTYPE, the text label for this float, typically
> % "Figure", "Table", "Example", etc.  Can't contain commas.  If omitted,
> % this float will not be numbered and cannot be referred to.
> %
> % #2 is the optional xref label.  Also must be present for the float to
> % be referable.
> %
> % #3 is the optional positioning argument; for now, it is ignored.  It
> % will somehow specify the positions allowed to float to (here, top, bottom).
> %
> 
> (I can't find any discussions from the time about the new feature
> in the mailing list archives.) 

Maybe Karl (CC'ed) could comment on this?

Re: CC and CFLAGS are ignored by part of the build

2023-11-14 Thread Eli Zaretskii

> From: Bruno Haible 
> Date: Tue, 14 Nov 2023 04:23:58 +0100
> 
> Apparently some optimization options were still in effect. And indeed,
> the file tp/Texinfo/XS/config.status contains these lines:
> 
> CC='sparc64-linux-gnu-gcc'
> compiler='sparc64-linux-gnu-gcc'
> LTCC='sparc64-linux-gnu-gcc'
> compiler='sparc64-linux-gnu-gcc'
> S["CPP"]="sparc64-linux-gnu-gcc -E"
> S["ac_ct_CC"]="sparc64-linux-gnu-gcc"
> S["CC"]="sparc64-linux-gnu-gcc"
> S["PERL_CONF_cc"]="sparc64-linux-gnu-gcc"
> S["PERL_CONF_optimize"]="-O2 -g"
> 
> Per the GNU Coding Standards [1], when I specify CC and CFLAGS, it should
> override the package's defaults.
> 
> I understand that perl comes with its own installation and that building
> code that can be dynamically loaded by perl can be challenging. But the
> CC and CFLAGS values that I have specified are ABI-compatible with
> the ones that perl wants. Therefore I expect them to be obeyed.

AFAIU, that's impossible in general, because CFLAGS could include
flags that cannot be applied to both CC and PERL_CONF_cc due to
compatibility issues, since Perl could have been built using a very
different compiler.

IMNSHO, it isn't a catastrophe that compiling Perl extensions needs a
separate C flags variable.  It is basically similar to CFLAGS and
CXXFLAGS being separate for building the same project (which happens
inj practice, for example, in GDB, which is part C and part C++).  And
if the GCS doesn't cater for these (relatively rare and specialized)
situations, then I think the GCS needs to be amended.  There's no need
to be dogmatic about this.

Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: bug-libunistr...@gnu.org
> Date: Sat, 11 Nov 2023 23:54:52 +0100
> 
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > .
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).

I think this should be explained in the documentation, if it isn't
already.  This "ambiguous width" issue is very subtle and unknown to
many (most?) people, so not having it explicit in the documentation is
not user-friendly, IMO.

> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.

Regardless of the locale?  Is there an assumption that UTF-8 means
"not CJK" or something?

> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

AFAIU, char32_t is identical to uint_least32_t (which is also from
stdint.h).

Re: Locale-independent paragraph formatting

2023-11-10 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Fri, 10 Nov 2023 19:48:04 +
> Cc: Bruno Haible , bug-texinfo@gnu.org
> 
> On Fri, Nov 10, 2023 at 08:47:10AM +0200, Eli Zaretskii wrote:
> > > Does anybody know if we could just write 'a' instead of U'a' and rely
> > > on it being converted?
> > > 
> > > E.g. if you do
> > > 
> > > char32_t c = 'a';
> > > 
> > > then afterwards, c should be equal to 97 (ASCII value of 'a').
> > 
> > Why not?  What could be the problems with using this?
> 
> I think what was confusing me was the statement that char32_t held a UTF-32
> encoded Unicode character.  I then thought it would have a certain byte
> order, so if the UTF-32 was big endian, the bytes would have the order
> 00 00 00 61, whereas the value 97 on a little endian machine would have
> the order 61 00 00 00.  However, it seems that UTF-32 just means the
> codepoint is encoded as a 32-bit integer, and the endianness of the
> UTF-32 sequence can be assumed to match the endianness of the machine.
> The standard C integer conversions can be assumed to work when assigning
> to/from char32_t because it is just an integer type, I assume.

AFAIU, since a codepoint in UTF-32 is just one UTF-32 unit, the issue
of endianness doesn't apply.  Endianness in UTF encodings applies only
if a codepoint takes more than one unit, since the endianness is
between units, not within units themselves (where it always follows
the machine).

Re: Locale-independent paragraph formatting

2023-11-09 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Thu, 9 Nov 2023 21:26:11 +
> 
> I have just pushed a commit (e3a28cc9bf) to use gnulib/libunistring
> functions instead of the locale-dependent functions mbrtowc and wcwidth.
> This allows for a significant simplification as we do not have to try
> to switch to a UTF-8 encoded locale.
> 
> I was not sure about how to put a char32_t literal in the source code.
> For example, where we previously had L'a' as a literal wchar_t letter 'a',
> I changed this to U'a'.  I could not find very much information about this
> online or whether this would be widely supported by C compilers.  The U prefix
> for char32_t is mentioned in a copy of the C11 standard I found online and
> also in a C23 draft.

I have MinGW GCC 9.2 here, and it supports U'a'.  But I don't think we
need it, we could just use 'a' instead, see below.

OTOH, the char32_t type is not supported by this GCC, not even if I
use -std=gnu2x.  Maybe we should use uint_least32_t instead?

> Does anybody know if we could just write 'a' instead of U'a' and rely
> on it being converted?
> 
> E.g. if you do
> 
> char32_t c = 'a';
> 
> then afterwards, c should be equal to 97 (ASCII value of 'a').

Why not?  What could be the problems with using this?

Re: Code from installed libtexinfo.so.0 run for non-installed texi2any

2023-11-06 Thread Eli Zaretskii

> Date: Mon, 6 Nov 2023 14:25:20 +0100
> From: pertu...@free.fr
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> 
> > Do these two replace the several *XS shared libraries we had until
> > Texinfo 7.1, or are they in addition to them?
> 
> There are new *XS shared libraries in additions to those in 7.1,
> StructuringTransfo.la, TranslationsXS.la (which will most likely be
> merged in another one), ConvertXS.la.  The two libraries libtexinfoxs
> and libtexinfo contain the code common for those new *XS shared
> libraries and also code common with Parsetexi.la, which is an XS shared
> library existing in 7.1.
> 
> > In any case, it sounds like these libraries should be installed where
> > we were installing the *XS shared libraries till now.
> 
> It is pkglibdir.  Would be easy to change Makefile.am to put them there,
> but are we sure that the linker will find them when the dlopened *XS
> files are loaded by perl?

I don't know enough about search for shared libraries on Posix
systems, but at least on Windows the linker looks first in the
directory from which the calling shared library was loaded, so it
should work.  Loading by an absolute file name should also work, I
think, and is probably more reliable.

Re: Code from installed libtexinfo.so.0 run for non-installed texi2any

2023-11-06 Thread Eli Zaretskii

> Date: Mon, 6 Nov 2023 09:20:37 +0100
> From: pertu...@free.fr
> Cc: Gavin Smith , bug-texinfo@gnu.org
> 
> On Sun, Nov 05, 2023 at 09:59:44PM +0200, Eli Zaretskii wrote:
> > 
> > I don't have any libtexinfo shared library here, and I don't see one
> > being built, let alone installed, as part of Texinfo.  is this
> > something new in the development sources?  If so, what code is linked
> > into libtexinfo?
> 
> Yes, it is new.  In Texinfo we use a lot XS objects, which are C code
> with a specific interface that allows them to be loaded (dlopen'ed) by
> perl to replace pure perl functions by C functions.  This allows to use
> perl as a high level language, and C for speed.
> 
> libtexinfo corresponds to the 'pure' C common code that performs the
> computations needed for texi2any, working on C data only (no direct use
> of any perl data).  It is used by many XS objects, it is an internal
> library to be used, for now, only by those XS objects.
> 
> There is another new library, libtexinfoxs, for the 'perl C' common code
> used by those XS objects, that does the interface between C data and
> perl data.  This code is even more tied to the XS objects.  The two
> libraries are separate to clearly separate the code that does the
> computations (libtexinfo), that is not related to perl at all and the
> code used to interface C data and perl (libtexinfoxs).

Do these two replace the several *XS shared libraries we had until
Texinfo 7.1, or are they in addition to them?

In any case, it sounds like these libraries should be installed where
we were installing the *XS shared libraries till now.

Re: Code from installed libtexinfo.so.0 run for non-installed texi2any

2023-11-05 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 5 Nov 2023 18:12:45 +
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> 
> So you know what a dynamically loaded library is; this contains a collection
> of functions and potentially data structures that can be loaded by running
> code and run as part of a computer program.
> 
> Usually, when such a library is installed on a system, this is for use
> generally by any program.  For example, if there is a library file
> libz.so.1, this could be linked by passing the -lz flag to the C compiler
> when building the program.  The program would be able to call functions
> in the library and so on.
> 
> The program using this library would likely be written by a different
> person, and as part of a different project, to the persons and projects
> responsible for the creation of the library.  There is an assumption that
> the library has a stable interface, and the library and programs using
> the library are worked on completely independently.
> 
> The dynamically loaded libraries used by texi2any (XS modules) are
> completely different.  Technically, they are loaded in the same way,
> by the running Perl interpreter.  But they are an integral part of the
> texi2any program.  They are intended for the use of the texi2any program
> only, not any other.

The XS modules are installed in a directory which is usually not
looked into by the dynamic linker.  Is that what you are talking
about?  If so, we have been using "non-public libraries" since long
ago, no?  Or what am I missing?

> The file was being installed under /usr/local/lib/libtexinfo.so.1, as
> if to imply that a user could link it against their programs with -ltexinfo,
> or load it with dlopen, which would be completely inappropriate.

I don't have any libtexinfo shared library here, and I don't see one
being built, let alone installed, as part of Texinfo.  is this
something new in the development sources?  If so, what code is linked
into libtexinfo?

Re: Code from installed libtexinfo.so.0 run for non-installed texi2any

2023-11-05 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 5 Nov 2023 17:04:47 +
> 
> > Maybe one day libtexinfo could be a public library, but not for now
> > and libtexinfoxs should probably never ever be a public library.
> 
> I agree neither of them should be a public library now.

Can someone please explain what does "not being a public library"
mean, when we talk about shared libraries?  I don't think I'm familiar
with this notion.

Re: Texinfo 7.1 released

2023-10-25 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Mon, 23 Oct 2023 19:52:49 +0100
> Cc: bug-texinfo@gnu.org
> 
> I propose the following, more finished patch, which applies
> to Texinfo 7.1.  We can also do something similar for the master branch.

Unfortunately, this change doesn't work on MS-Windows:

  libtool: compile:  d:/usr/bin/gcc.exe -DHAVE_CONFIG_H -I. -I. -I./gnulib/lib 
-I./gnulib/lib -DDATADIR=\"d:/usr/share\" -Id:/usr/include -s -O2 -DWIN32 
-DPERL_TEXTMODE_SCRIPTS -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT 
-DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields -s 
-O2 -DVERSION=\"0\" -DXS_VERSION=\"0\" -ID:/usr/Perl/lib/CORE -MT xspara.lo -MD 
-MP -MF .deps/xspara.Tpo -c xspara.c  -DDLL_EXPORT -DPIC -o .libs/xspara.o
  xspara.c: In function 'xspara__add_next':
  xspara.c:757:39: warning: passing argument 1 of 'get_utf8_codepoint' from 
incompatible pointer type [-Wincompatible-pointer-types]
757 |   get_utf8_codepoint (&state.last_letter, p, len);
|   ^~
|   |
|   rpl_wint_t * {aka unsigned int 
*}
  xspara.c:689:30: note: expected 'wchar_t *' {aka 'short unsigned int *'} but 
argument is of type 'rpl_wint_t *' {aka 'unsigned int *'}
689 | get_utf8_codepoint (wchar_t *pwc, const char *mbs, size_t n)
| ~^~~

The warning is real: wchar_t is a 16-bit data type on MS-Windows,
whereas the code assumes it's of the same width as wint_t.

I changed the offending code to say this instead:

  if (!strchr (end_sentence_characters
   after_punctuation_characters, *p))
{
  wchar_t wc;
  get_utf8_codepoint (&wc, p, len);
  state.last_letter = wc;
}

and then it compiled cleanly.

Re: Texinfo 7.1 released

2023-10-23 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 22 Oct 2023 21:01:54 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 22, 2023 at 10:05:16PM +0300, Eli Zaretskii wrote:
> > > This patch, applied to 7.1, removes the recently added dTHX calls,
> > > but also removes the fprintf calls that were preventing compilation
> > > without it:
> > 
> > It doesn't help: 1:20.7 instead of 1:21.2.
> 
> I'm running out of ideas.  Have you tried timing it with a smaller input
> file (e.g. doc/info-stnd.texi)?  That could detect whether the slowdown
> depends on the size of the input, or if it is a single slowdown to do
> with initialisation/shutdown.

The times seem to be roughly proportional to the size of the generated
Info file, yes.

> Another change is that xspara.c uses btowc now.  I hardly see how it makes
> a difference, but here is something to try:
> 
> diff xspara.c{.old,} -u
> --- xspara.c.old2023-10-22 20:59:03.801498451 +0100
> +++ xspara.c2023-10-22 20:59:29.189031067 +0100
> @@ -730,7 +730,7 @@
>if (!strchr (end_sentence_characters
> after_punctuation_characters, *p))
>  {
> -  if (!PRINTABLE_ASCII(*p))
> +  if (1 || !PRINTABLE_ASCII(*p))
>  {
>wchar_t wc = L'\0';
>mbrtowc (&wc, p, len, NULL);
> @@ -1013,7 +1013,7 @@
>  }
>  
>/** Not a white space character. */
> -  if (!PRINTABLE_ASCII(*p))
> +  if (1 || !PRINTABLE_ASCII(*p))
>  {
>char_len = mbrtowc (&wc, p, len, NULL);
>  }
> 
> This means that all calls go via the MinGW-specific mbrtowc implementation
> in xspara.c.

Bingo.  This brings the time for producing the ELisp manual down to
15.4 sec, 5 sec faster than v7.0.3.

I see that btowc linked into the XSParagraph module is a MinGW
specific implementation, not from the Windows-standard MSVCRT (where
it is absent).  My conclusion is that the MinGW btowc is extremely
inefficient.

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 22 Oct 2023 19:35:11 +0100
> Cc: bug-texinfo@gnu.org
> 
> One thing to try would to eliminate dTHX calls.  If these are
> time-consuming on MinGW/MS-Windows, then extra calls will greatly slow
> down the program, due to the number of times the paragraph formatting
> functions are called.
> 
> This patch, applied to 7.1, removes the recently added dTHX calls,
> but also removes the fprintf calls that were preventing compilation
> without it:

It doesn't help: 1:20.7 instead of 1:21.2.

> I have looked for other differences in xspara.c between Texinfo 7.0.3
> and Texinfo 7.1 and cannot really see anything suspicious.

XSParagraph inscludes other source files in addition to xspara.c --
could the changes in those other files be the cause?

> The only other thing that comes to mind that there could have been a
> change in imported gnulib modules.
> 
> Failing that, the only I idea I have is to use some kind of source-level
> profiler to find out why so much time is spent in this module.

Hmm...

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 22 Oct 2023 18:41:34 +0100
> Cc: bug-texinfo@gnu.org
> 
> > > Surprise: running with TEXINFO_XS=omit _reduces_ the elapsed time of
> > > producing the Emacs ELisp manual from 1:21.16 to 0:36.97.
> > 
> > Another data point: running with TEXINFO_XS_PARSER=0 takes 1:34.4 min,
> > so it sounds like the slowdown is due to some XS module other than the
> > parser module.  Is there a way to disable additional modules one by
> > one?
> 
> This is most surprising, but promising that we're getting close to the
> problem.
> 
> The simplest way to disable XS modules would be to delete or rename
> the libtool files that are used for loading them.  If you run with
> TEXINFO_XS=debug, you can see which modules are loaded.  With Texinfo 7.1,
> lines like the following would be printed:
> 
> found ../tp/../tp/Texinfo/XS/Parsetexi.la
> found ../tp/../tp/Texinfo/XS/MiscXS.la
> found ../tp/../tp/Texinfo/XS/XSParagraph.la
> 
> You could then disable modules with e.g.
> 
> mv ../tp/../tp/Texinfo/XS/XSParagraph.la{,.disable}
> 
> or
> 
> mv ../tp/../tp/Texinfo/XS/MiscXS.la{,.disable}

Thanks.  Looks like the slowdown is in XSParagraph: without it, I get
21.8 sec, only slightly slower than Texinfo 7.0.3.  Disabling MiscXS
as well yields almost the same time (0.05 sec longer) as with MiscXS,
and disabling Parsetexi gets us back to 37 sec, the same as with
TEXINFO_XS=omit.

Beyond the fact that XSParagraph seems to be th culprit, I wonder why
MiscXS doesn't speed up the processing.  Is this expected?

Anyway, what's next?

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> Date: Sun, 22 Oct 2023 17:30:15 +0300
> From: Eli Zaretskii 
> Cc: bug-texinfo@gnu.org
> 
> > From: Gavin Smith 
> > Date: Sun, 22 Oct 2023 14:23:53 +0100
> > Cc: bug-texinfo@gnu.org
> > 
> > > > First, check that the Perl extension modules are actually being used.  
> > > > Try
> > > > setting the TEXINFO_XS environment variable to "require" or "debug".
> > > 
> > > I don't need to do that, I already verified that extensions are used
> > > when I worked on the pretests (which, as you might remember, caused
> > > Perl to crash at first).
> > 
> > I'd expected so, just wanted to make sure.
> 
> Surprise: running with TEXINFO_XS=omit _reduces_ the elapsed time of
> producing the Emacs ELisp manual from 1:21.16 to 0:36.97.

Another data point: running with TEXINFO_XS_PARSER=0 takes 1:34.4 min,
so it sounds like the slowdown is due to some XS module other than the
parser module.  Is there a way to disable additional modules one by
one?

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 22 Oct 2023 14:23:53 +0100
> Cc: bug-texinfo@gnu.org
> 
> > > First, check that the Perl extension modules are actually being used.  Try
> > > setting the TEXINFO_XS environment variable to "require" or "debug".
> > 
> > I don't need to do that, I already verified that extensions are used
> > when I worked on the pretests (which, as you might remember, caused
> > Perl to crash at first).
> 
> I'd expected so, just wanted to make sure.

Surprise: running with TEXINFO_XS=omit _reduces_ the elapsed time of
producing the Emacs ELisp manual from 1:21.16 to 0:36.97.  Disabling
Unicode::Collate on top of that has almost no effect (about 1 sec).

What do you think I should try next?

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 22 Oct 2023 13:35:19 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 22, 2023 at 12:06:21PM +0300, Eli Zaretskii wrote:
> >   . makeinfo is painfully slow.  For example, building the ELisp
> > manual that is part of Emacs takes a whopping 82.3 sec.  By
> > contrast, Texinfo-7.0.3 takes just 20.7 sec.  And this is with
> > Perl extensions being used!  What could explain such a performance
> > regression? perhaps the use of libunistring or some other code
> > that handles non-ASCII characters?
> 
> It could be the use of Unicode collation for sorting document indices.

Can index sorting take more than a 1 minute?

> First, check that the Perl extension modules are actually being used.  Try
> setting the TEXINFO_XS environment variable to "require" or "debug".

I don't need to do that, I already verified that extensions are used
when I worked on the pretests (which, as you might remember, caused
Perl to crash at first).

> Otherwise, the easiest way of turning off the Unicode collation is
> patching the source code:
> 
> --- a/tp/Texinfo/Structuring.pm
> +++ b/tp/Texinfo/Structuring.pm
> @@ -2604,7 +2604,7 @@ sub setup_sortable_index_entries($;$)
>my $collator;
>eval { require Unicode::Collate; Unicode::Collate->import; };
>my $unicode_collate_loading_error = $@;
> -  if ($unicode_collate_loading_error eq '') {
> +  if (0 || $unicode_collate_loading_error eq '') {
>  $collator = Unicode::Collate->new(%collate_options);
>} else {
>  $collator = Texinfo::CollateStub->new();
> 
> This should use the 'cmp' Perl operator instead of the more complicated
> Unicode collation algorithm.

How can I run makeinfo uninstalled, from the texinfo-7.1 source tree?
The version that is currently installed here is v7.0.3, as I must be
able to produce manuals in reasonable times as part of my work on
Emacs and other projects, so I uninstalled 7.1 when I found these
problems.

> (Incidently 20.7 seconds for Texinfo 7.0.3 is still longer than I would
> expect.  On my system the same manual is processed in 5-6 seconds, on
> GNU/Linux on a fairly cheap Acer laptop.)

That is of secondary importance for me at this time.

Re: Texinfo 7.1 released

2023-10-22 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 18 Oct 2023 15:07:26 +0100
> Cc: bug-texinfo@gnu.org
> 
> We have released version 7.1 of Texinfo, the GNU documentation format.

I'm sorry to say that makeinfo in this new release of Texinfo has
serious problems, when built with MinGW on MS-Windows.  Here are the 2
problems I immediately saw in real-life usage of this version, as soon
as I installed it:

  . makeinfo is painfully slow.  For example, building the ELisp
manual that is part of Emacs takes a whopping 82.3 sec.  By
contrast, Texinfo-7.0.3 takes just 20.7 sec.  And this is with
Perl extensions being used!  What could explain such a performance
regression? perhaps the use of libunistring or some other code
that handles non-ASCII characters?

  . makeinfo seems to ignore @documentencoding, at least in some
places.  Specifically, it consistently produces ASCII equivalents
of some punctuation characters, like quotes “..” and ’, en-dash –,
etc.  Curiously, other punctuation characters, and even the above
ones in some contexts, _are_ produced.  As an example, makeinfo
7.1 produces

 If you don't customize ‘auth-sources’, you'll have to live with the
  defaults: the unencrypted netrc file ‘~/.authinfo’ will be used for any
  host and any port.

where 7.0.3 produced

 If you don’t customize ‘auth-sources’, you’ll have to live with the
  defaults: the unencrypted netrc file ‘~/.authinfo’ will be used for any
  host and any port.

Note how ’ in "don’t" and "you’ll" produced the ASCII ', whereas
‘auth-sources’ and ‘~/.authinfo’ are quoted with non-ASCII quote
characters.  Why this difference?  Texinfo 7.0.3 produces
non-ASCII quotes in both cases.

The above basically means I'm unable to upgrade to 7.1, and will need
to keep using v7.0.3 for the time being.

I'm sorry I didn't try this version on the Emacs docs when it was in
pretest.  To my defense, I never before saw such issues once the test
suite runs successfully.  Any suggestions for debugging the above two
issues will be welcome.

Re: branch master updated: * info/info.c (get_initial_file), * info/infodoc.c (info_get_info_help_node), * info/nodes.c (info_get_node_with_defaults): Use strcmp or strcasecmp instead of mbscasecmp in

2023-10-19 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Thu, 19 Oct 2023 14:10:56 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Thu, Oct 19, 2023 at 03:26:51PM +0300, Eli Zaretskii wrote:
> > > diff --git a/info/info.c b/info/info.c
> > > index 8ca4a17e58..d7a6afaa2c 100644
> > > --- a/info/info.c
> > > +++ b/info/info.c
> > > @@ -250,7 +250,7 @@ get_initial_file (int *argc, char ***argv, char 
> > > **error)
> > >  {
> > >/* If they say info info (or info -O info, etc.), show them 
> > >   info-stnd.texi.  (Get info.texi with info -f info.) */
> > > -  if ((*argv)[0] && mbscasecmp ((*argv)[0], "info") == 0)
> > > +  if ((*argv)[0] && strcmp ((*argv)[0], "info") == 0)
> > >  (*argv)[0] = "info-stnd";
> > 
> > This could produce regressions on case-insensitive filesystems, where
> > we could have INFO.EXE, for example.  Do we really no longer care
> > about those?
> 
> (*argv)[0] here is not the name of the program but what was given on the
> command line.  It should mean that "INFO.EXE info" works as before if
> "INFO.EXE" is the name of the info program, whereas "INFO.EXE INFO" wouldn't.

On MS-DOS and MS-Windows, argv[0] is usually NOT what the user types
on the command line, it's what the OS fills in, and it usually puts
there the full absolute file name of the executable.

> > >/* If the node not found was "Top", try again with different case. */
> > > -  if (!node && (nodename && mbscasecmp (nodename, "Top") == 0))
> > > +  if (!node && (nodename && strcasecmp (nodename, "Top") == 0))
> > 
> > Are there no Info manuals that have "Top" with a different
> > letter-case?
> 
> It is strcasecmp here, not strcmp.  This should support other
> capitalisations, like "TOP" or "ToP".

Right, sorry.  So I think it's good enough.

Re: branch master updated: * info/info.c (get_initial_file), * info/infodoc.c (info_get_info_help_node), * info/nodes.c (info_get_node_with_defaults): Use strcmp or strcasecmp instead of mbscasecmp in

2023-10-19 Thread Eli Zaretskii

> Date: Thu, 19 Oct 2023 08:20:49 -0400
> From: "Gavin D. Smith" 
> 
> +2023-10-19  Gavin Smith 
> +
> + * info/info.c (get_initial_file),
> + * info/infodoc.c (info_get_info_help_node),
> + * info/nodes.c (info_get_node_with_defaults):
> + Use strcmp or strcasecmp instead of mbscasecmp in several
> + cases where we do not care about case-insensitive matching with
> + non-ASCII characters.
> +
>  2023-10-19  Gavin Smith 
>  
>   * tp/maintain/change_perl_modules_version.sh:
> diff --git a/info/info.c b/info/info.c
> index 8ca4a17e58..d7a6afaa2c 100644
> --- a/info/info.c
> +++ b/info/info.c
> @@ -250,7 +250,7 @@ get_initial_file (int *argc, char ***argv, char **error)
>  {
>/* If they say info info (or info -O info, etc.), show them 
>   info-stnd.texi.  (Get info.texi with info -f info.) */
> -  if ((*argv)[0] && mbscasecmp ((*argv)[0], "info") == 0)
> +  if ((*argv)[0] && strcmp ((*argv)[0], "info") == 0)
>  (*argv)[0] = "info-stnd";

This could produce regressions on case-insensitive filesystems, where
we could have INFO.EXE, for example.  Do we really no longer care
about those?

> --- a/info/infodoc.c
> +++ b/info/infodoc.c
> @@ -357,8 +357,7 @@ DECLARE_INFO_COMMAND (info_get_info_help_node, _("Visit 
> Info node '(info)Help'")
>  for (win = windows; win; win = win->next)
>{
>  if (win->node && win->node->fullpath
> -&& !mbscasecmp ("info",
> -filename_non_directory (win->node->fullpath))
> +&& !strcmp (filename_non_directory (win->node->fullpath), "info")
>  && (!strcmp (win->node->nodename, "Help")
>  || !strcmp (win->node->nodename, "Help-Small-Screen")))

Likewise here.

>/* If the node not found was "Top", try again with different case. */
> -  if (!node && (nodename && mbscasecmp (nodename, "Top") == 0))
> +  if (!node && (nodename && strcasecmp (nodename, "Top") == 0))

Are there no Info manuals that have "Top" with a different
letter-case?

Re: MinGW "info" program broken?

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: bug-texinfo@gnu.org
> Date: Sun, 15 Oct 2023 16:07:28 +0200
> 
> Eli Zaretskii wrote:
> > The stand-alone Info reader built with MinGW works
> > flawlessly for me.
> > 
> > > I had understood that "info" was running well on MinGW so it would be 
> > > worth
> > > understanding any differences between yours and Bruno's setup.
> > 
> > I'm indeed curious why this happens with the MSVC build.
> 
> It happens also with the mingw-w64 version 5.0.3 build. Let me investigate...

I guess you somehow trip on this code fragment from pcterm.c:

  /* Print STRING to the terminal at the current position. */
  static void
  pc_put_text (string)
   char *string;
  {
if (speech_friendly)
  fputs (string, stdout);
  #ifdef __MINGW32__
else if (hscreen == INVALID_HANDLE_VALUE)
  fputs (string, stdout);  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
else if (output_cp == CP_UTF8 || output_cp == CP_UTF7)
  write_utf (output_cp, string, -1);
  #endif
else
  cputs (string);
  }

Which probably means the screen handle is somehow invalid?

Re: Texinfo 7.0.94 on native Windows

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> Date: Sun, 15 Oct 2023 16:25:56 +0200
> 
> Eli Zaretskii wrote:
> > _popen accepts a MODE argument which can be used to control that, see
> > 
> >   
> > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/popen-wpopen?view=msvc-170
> > 
> > We use this in the stand-alone Info reader, for example, in this
> > snippet from info/filesys.c:
> > 
> >   stream = popen (command, FOPEN_RBIN);
> 
> This is good. But there are these two occurrences of popen():
> 
> 1)
> info/man.c:
> fpipe = popen (cmdline, "r");
> 
> Should better use FOPEN_RBIN as well.

No, because any 'man' program on Windows is likely to produce CRLF
EOLs when it writes to stdout.

> 2)
> info/session.c:
> printer_pipe = fopen (++print_command, "w");
>   printer_pipe = popen (print_command, "w");
> 
> Should better use FOPEN_WBIN.

No, because we write to a 'lpr' work-alike, which on Windows should be
able to handle CRLF EOLs.

Re: MinGW "info" program broken?

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: bug-texinfo@gnu.org
> Date: Sun, 15 Oct 2023 16:07:28 +0200
> 
> Eli Zaretskii wrote:
> > The stand-alone Info reader built with MinGW works
> > flawlessly for me.
> > 
> > > I had understood that "info" was running well on MinGW so it would be 
> > > worth
> > > understanding any differences between yours and Bruno's setup.
> > 
> > I'm indeed curious why this happens with the MSVC build.
> 
> It happens also with the mingw-w64 version 5.0.3 build. Let me investigate...

Is this build with UCRT or with MSVCRT?

Re: MinGW "info" program broken?

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Date: Sun, 15 Oct 2023 15:23:45 +0200
> 
> Gavin Smith wrote:
> > I had understood that "info" was running well on MinGW so it would be worth
> > understanding any differences between yours and Bruno's setup.
> 
> I'm usually building with mingw-w64 5.0.3.
> 
> Whereas Eli (AFAIK) often builds with the older mingw from the now-defunct
> mingw.org site. Correct me if I'm wrong, Eli.

You are not wrong, but both flavors should produce a working info.exe,
AFAIK.  We took care of that years ago.

Re: Texinfo 7.0.94 on native Windows

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> Date: Sun, 15 Oct 2023 15:11:33 +0200
> 
> Eli Zaretskii wrote:
> > > For 'popen' and 'pclose', one needs the gnulib modules 'popen' and 
> > > 'pclose',
> > > respectively.
> > 
> > Windows has _popen and _pclose, which can be used instead.
> 
> _popen uses text mode, not binary mode, by default, AFAIK. This can be
> problematic.

_popen accepts a MODE argument which can be used to control that, see

  
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/popen-wpopen?view=msvc-170

We use this in the stand-alone Info reader, for example, in this
snippet from info/filesys.c:

  stream = popen (command, FOPEN_RBIN);

Re: MinGW "info" program broken?

2023-10-15 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 15 Oct 2023 13:17:53 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 15, 2023 at 01:24:32PM +0200, Bruno Haible wrote:
> >   - The behaviour of the 'ginfo' program on MSVC is the same as on mingw,
> > albeit not really useful currently: './info -f texinfo.info' spits out
> > the entire manual to stdout at once. It looks like the device gets set
> > to stdout, or there is no knowledge about the terminal window's height,
> > or something like that.
> 
> Is that also true on the MinGW build you did, Eli?

No, of course not.  The stand-alone Info reader built with MinGW works
flawlessly for me.

> I had understood that "info" was running well on MinGW so it would be worth
> understanding any differences between yours and Bruno's setup.

I'm indeed curious why this happens with the MSVC build.

Re: Texinfo 7.0.94 pretest available

2023-10-15 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 15 Oct 2023 12:57:46 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 15, 2023 at 12:00:47PM +0300, Eli Zaretskii wrote:
> > Thanks.
> > 
> > This doesn't compile with MinGW, because some of the dTHX additions I
> > needed for the previous pretest were not installed(?), and are still
> > missing.  The patch I needed is below.
> 
> I hadn't installed those changes because I hadn't understood why they
> were necessary.  Looking at the code, I expect that it was due to the
> use of fprintf in those functions which the Perl headers must be doing
> something funny with.  Presumably you got an error message when compiling
> indicating this?

I show two such error messages below, I hope it will help you
understand the reason.

> The fprintf calls were added since the Texinfo 7.0 branch so will not
> have broken previously.
> 
> Although adding dTHX should be harmless, the paragraph formatting
> functions are very frequently called functions and adding the dTHX in
> there has a potential performance impact, especially in xspara__add_next.
> 
> However, I could not detect any performance difference in the testing I
> did so I have added them anyway.

Here are the error messages I promised.  Error #1:

 libtool: compile:  d:/usr/bin/gcc.exe -DHAVE_CONFIG_H -I. -I./parsetexi 
-I. -I./gnulib/lib -I./gnulib/lib -DDATADIR=\"d:/usr/share\" -Id:/usr/include 
-s -O2 -DWIN32 -DPERL_TEXTMODE_SCRIPTS -DUSE_SITECUSTOMIZE 
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv 
-fno-strict-aliasing -mms-bitfields -s -O2 -DVERSION=\"0\" -DXS_VERSION=\"0\" 
-ID:/usr/Perl/lib/CORE -MT parsetexi/Parsetexi_la-api.lo -MD -MP -MF 
parsetexi/.deps/Parsetexi_la-api.Tpo -c parsetexi/api.c  -DDLL_EXPORT -DPIC -o 
parsetexi/.libs/Parsetexi_la-api.o
 In file included from parsetexi/api.c:23:
 parsetexi/api.c: In function 'reset_parser':
 D:/usr/Perl/lib/CORE/perl.h:155:16: error: 'my_perl' undeclared (first use 
in this function)
   155 | #  define aTHX my_perl
   |^~~
 D:/usr/Perl/lib/CORE/embedvar.h:38:18: note: in expansion of macro 'aTHX'
38 | #define vTHX aTHX
   |  ^~~~
 D:/usr/Perl/lib/CORE/embedvar.h:65:20: note: in expansion of macro 'vTHX'
65 | #define PL_StdIO  (vTHX->IStdIO)
   |^~~~
 D:/usr/Perl/lib/CORE/iperlsys.h:207:4: note: in expansion of macro 
'PL_StdIO'
   207 |  (*PL_StdIO->pStderr)(PL_StdIO)
   |^~~~
 D:/usr/Perl/lib/CORE/XSUB.h:511:21: note: in expansion of macro 
'PerlSIO_stderr'

   511 | #define stderr  PerlSIO_stderr
   | ^~
 parsetexi/api.c:174:14: note: in expansion of macro 'stderr'
   174 | fprintf (stderr,
   |  ^~
 D:/usr/Perl/lib/CORE/perl.h:155:16: note: each undeclared identifier is 
reported only once for each function it appears in
   155 | #  define aTHX my_perl
   |^~~
 D:/usr/Perl/lib/CORE/embedvar.h:38:18: note: in expansion of macro 'aTHX'
38 | #define vTHX aTHX
   |  ^~~~
 D:/usr/Perl/lib/CORE/embedvar.h:65:20: note: in expansion of macro 'vTHX'
65 | #define PL_StdIO  (vTHX->IStdIO)
   |^~~~
 D:/usr/Perl/lib/CORE/iperlsys.h:207:4: note: in expansion of macro 
'PL_StdIO'
   207 |  (*PL_StdIO->pStderr)(PL_StdIO)
   |^~~~
 D:/usr/Perl/lib/CORE/XSUB.h:511:21: note: in expansion of macro 
'PerlSIO_stderr'

   511 | #define stderr  PerlSIO_stderr
   | ^~
 parsetexi/api.c:174:14: note: in expansion of macro 'stderr'
   174 | fprintf (stderr,
   |  ^~

Error #2:

 libtool: compile:  d:/usr/bin/gcc.exe -DHAVE_CONFIG_H -I. -I. 
-I./gnulib/lib -I./gnulib/lib -DDATADIR=\"d:/usr/share\" -Id:/usr/include -s 
-O2 -DWIN32 -DPERL_TEXTMODE_SCRIPTS -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT 
-DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields -s 
-O2 -DVERSION=\"0\" -DXS_VERSION=\"0\" -ID:/usr/Perl/lib/CORE -MT xspara.lo -MD 
-MP -MF .deps/xspara.Tpo -c xspara.c  -DDLL_EXPORT -DPIC -o .libs/xspara.o
 In file included from xspara.c:39:
 xspara.c: In function 'xspara__print_escaped_spaces':
 D:/usr/Perl/lib/CORE/perl.h:155:16: error: 'my_perl' undeclared (first use 
in this function)
   155 | #  define aTHX my_perl
   |^~~
 D:/usr/Perl/lib/CORE/embedvar.h:38:18: note: in expansion of macro 'aTHX'
38 |

Re: Texinfo 7.0.94 on native Windows

2023-10-15 Thread Eli Zaretskii

> From: Bruno Haible 
> Date: Sun, 15 Oct 2023 13:24:32 +0200
> 
> For 'popen' and 'pclose', one needs the gnulib modules 'popen' and 'pclose',
> respectively.

Windows has _popen and _pclose, which can be used instead.  That's
what MinGW does, AFAIK.

But I'm not sure Texinfo should try supporting an MSVC build.  It's
enough to support MinGW, IMO.

Re: Texinfo 7.0.94 pretest available

2023-10-15 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sat, 14 Oct 2023 14:27:36 +0100
> Cc: platform-test...@gnu.org
> 
> A pretest distribution for the next Texinfo release (7.1) has been
> uploaded to
>  
> https://alpha.gnu.org/gnu/texinfo/texinfo-7.0.94.tar.xz
> 
> There have not been many changes since the previous pretest.  We are
> making this pretest mainly to test build fixes for the MinGW platform.
> We hope to release this as Texinfo 7.1 in a few days' time, unless
> further problems are found.
> 
> Changes since 7.0.93:
> 
> * A bug has been fixed where a few document strings would not be
>   translated in texi2any's output.
> * Fix building Perl XS modules on MinGW/MS-Windows.
> * Fix install-info on the same platform.
> * Tests of texi2any changed to avoid differing results for different
>   implementations of the wcwidth function.
> 
> We make these pretests to help find any problems before we make an official
> release to a larger audience, so that the release will be as good as it
> can be.
> 
> Please send any feedback to .

Thanks.

This doesn't compile with MinGW, because some of the dTHX additions I
needed for the previous pretest were not installed(?), and are still
missing.  The patch I needed is below.

--- ./tp/Texinfo/XS/parsetexi/api.c~0   2023-10-07 19:12:05.0 +0300
+++ ./tp/Texinfo/XS/parsetexi/api.c 2023-10-15 11:30:26.924948100 +0300
@@ -158,6 +158,8 @@ reset_parser_except_conf (void)
 void
 reset_parser (int debug_output)
 {
+  dTHX;
+
   /* NOTE: Do not call 'malloc' or 'free' in this function or in any function
  called in this file.  Since this file (api.c) includes the Perl headers,
  we get the Perl redefinitions, which we do not want, as we don't use
--- ./tp/Texinfo/XS/xspara.c~0  2023-10-11 20:08:06.0 +0300
+++ ./tp/Texinfo/XS/xspara.c2023-10-15 11:29:14.853806700 +0300
@@ -565,6 +565,8 @@ xspara_get_pending (void)
 void
 xspara__add_pending_word (TEXT *result, int add_spaces)
 {
+  dTHX;
+
   if (state.word.end == 0 && !state.invisible_pending_word && !add_spaces)
 return;
 
@@ -640,6 +642,9 @@ char *
 xspara_end (void)
 {
   static TEXT ret;
+
+  dTHX;
+
   text_reset (&ret);
   state.end_line_count = 0;
 
@@ -686,6 +691,8 @@ xspara_end (void)
 void
 xspara__add_next (TEXT *result, char *word, int word_len, int transparent)
 {
+  dTHX;
+
   int disinhibit = 0;
   if (!word)
 return;


After fixing the above, most of the tests pass, but 3 texi2any tests
fail:

 FAIL: test_scripts/layout_formatting_info_ascii_punctuation.sh
 FAIL: test_scripts/layout_formatting_info_disable_encoding.sh
 FAIL: test_scripts/layout_formatting_plaintext_ascii_punctuation.sh

I attach below the logs of these failures.



texi2any-tests-mingw.log.gz
Description: Binary data

Re: library for unicode collation in C for texi2any?

2023-10-14 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sat, 14 Oct 2023 19:57:22 +0100
> 
> It's all in the future, but I am slightly concerned about is duplicating
> in Texinfo existing system facilities.  For example, for avoiding use of
> wcwidth, our use of which depends on setting a UTF-8 locale, and using
> the wchar_t type.  Is every program that uses wcwidth supposed to supply
> their own implementation instead, and isn't this wasteful?

What other locale-specific functions do we need in addition to
wcwidth?

If the list of those functions is short enough, we could replace them
all by the corresponding Gnulib/libunistring functions, and then we
could stop setting specific locales and relying on locale-specific
libc functions.  That will give us locale-independent code which will
work on all systems.

> I don't know if libunistring aspires to become a standard system library
> for handling UTF-8 data but if we use it for other UTF-8 processing it
> would make sense to use it for collation.
> 
> I suggest writing to Bruno Haible to ask if he has plans to include
> collation functionality in libunistring in the future.  I am currently
> reading through "Unicode Technical Standard #10" and although I don't
> understand a lot of it yet, it seems feasible that we could implement it
> in C.

It is feasible, but implementing it from scratch is a lot of work, and
needs a large database (which we could take from the CLDR).  But note
that CLDR is AFAIK locale-dependent; the only part of it that doesn't
depend on the locale is collation by Unicode codepoints.

Re: library for unicode collation in C for texi2any?

2023-10-14 Thread Eli Zaretskii

> Date: Sat, 14 Oct 2023 11:57:02 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org
> 
> On Thu, Oct 12, 2023 at 06:13:34PM +0300, Eli Zaretskii wrote:
> > What you say is not detailed enough, but using my crystal ball I think
> > you can have this with glibc-based systems, and also on Windows (but
> > that requires using a special API for comparing strings).  Not sure
> > about the equivalent features on other systems, like *BSD and macOS.
> > You can see that in action in how GNU 'ls' sorts file names.
> 
> Looks like ls ultimately uses strcoll.  The problem is that it selects
> the current locale, we never want to use the current locale in Texinfo.
> We either want to use a 'generic' locale (which does not really exist
> as far as I can tell) or the @documentlanguage locale.

Yes, I know.  However, if the current locale's codeset is UTF-8, AFAIK
glibc uses the full Unicode CLDR, which is what I wanted to point out.

> There seems to be variants of strcoll and of strxfrm, strcoll_l and 
> strxfrm_l that allow to specify a locale, but it is not very well
> documented (these functions seem to be in the glibc, but are not
> documented, strcoll and strxfrm are), there are no gnulib modules, and I
> am not sure whether with "C" locale these functions really use the
> specified locale.

I don't think we want to depend on the locale in Texinfo.  The problem
is how to find or write an implementation that on the one side doesn't
use the locale-dependent collation rules, and OTOH ignores punctuation
and other "unimportant" characters.

Re: Texinfo 7.0.93 pretest available

2023-10-13 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Fri, 13 Oct 2023 22:14:32 +0100
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> 
> Eli, are you able to test this from git or do you need me to make another
> pretest release?

Git is a bit problematic, as some of the file names include non-ASCII
characters.  For this reason, and also for others (e.g., I have
already made too many changes to 7.0.93 sources), I'd prefer another
pretest.  I think it's also a better way in general, as non-trivial
changes were made since 7.0.93, so it would be prudent to let other
pretesters test the result.

Thanks.

Re: library for unicode collation in C for texi2any?

2023-10-13 Thread Eli Zaretskii

> Date: Fri, 13 Oct 2023 11:31:54 + (UTC)
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG 
> 
> >> [...] Neither collation corresponds to Unicode codepoints.
> >
> > That's exactly what we should not do.
> 
> I strongly disagree.
> 
> > People who read German don't necessarily live in Germany, and
> > Texinfo is not a general-purpose system for typesetting documents,
> > it is a system for writing software documentation.
> 
> What you describe is certainly valid for a function index, say.
> However, a concept index – which is an essential part of any
> documentation IMHO – that doesn't sort as expected is at the border of
> being useless.

You are exaggerating, and that doesn't help.  In practice, the
problems are minor, and consistency is much more important.

> > Besides, which German are you talking about?  There are several
> > German-based locales, each one with its own local tailoring.
> 
> It doesn't matter.

If this "doesn't matter", then why do you insist on this?

>   There are zillions of German computer books that
> come with an index, and such books *are* read in all German-speaking
> countries and elsewhere, irrespective of a fine-tuned locale used for
> the exact index order.  *This* part can be easily standardized by
> making Texinfo support exactly one German locale ('de').
> 
> > So consistency in Texinfo is IMNSHO more important that fine-tuning
> > the order to a specific locale and language.
> 
> What good for is this consistency if it is extremely user-unfriendly?

It will be "user-unfriendly" anyway, if we use one flavor of German,
because users in a different locale will not expect that.

> What exactly is the problem if, say, an MS compilation produces a
> slightly different sorting order in the index?  Just add a sentence to
> the build instructions and tell the people what to expect.

You are wrong.  Your POV is skewed.  And that is all I can tell you on
this matter, since it looks like continuing this discussion is not
useful.

Re: library for unicode collation in C for texi2any?

2023-10-13 Thread Eli Zaretskii

> Date: Fri, 13 Oct 2023 07:31:29 + (UTC)
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG 
> 
> 
> >> OK, no tailoring.  I wasn't aware of those differences, thanks for
> >> pointing me to it.
> >> 
> >> Hopefully, we agree that `@documentlanguage` should set a
> >> language-specific collation for the index.
> > 
> > Without tailoring, this basically means collation according to
> > Unicode codepoints.
> 
> Uh oh, this is not good.  As an example, consider the letter 'ä'.
> There are two possible collations that are considered as correct for
> German:
> 
> * Sort 'ä' right before 'b'.
> 
> * Handle 'ä' similar to 'ae' but sort it after 'ae'.
> 
> Neither collation corresponds to Unicode codepoints.

That's exactly what we should not do.  People who read German don't
necessarily live in Germany, and Texinfo is not a general-purpose
system for typesetting documents, it is a system for writing software
documentation.  Besides, which German are you talking about?  There
are several German-based locales, each one with its own local
tailoring.  So consistency in Texinfo is IMNSHO more important that
fine-tuning the order to a specific locale and language.

Re: library for unicode collation in C for texi2any?

2023-10-13 Thread Eli Zaretskii

> Date: Fri, 13 Oct 2023 07:08:36 + (UTC)
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG 
> 
> 
> >> ... there is probably a misunderstanding on my side.  I don't know
> >> what you mean with 'tailoring', please give an example.
> > 
> > This subject is too large and complicated for me to answer this
> > question here.  So I will refer you to the relevant Unicode spec:
> > 
> >   https://unicode.org/reports/tr10/
> > 
> > Section 8 "Tailoring" there will probably answer your question.
> 
> OK, no tailoring.  I wasn't aware of those differences, thanks for
> pointing me to it.
> 
> Hopefully, we agree that `@documentlanguage` should set a
> language-specific collation for the index.

Without tailoring, this basically means collation according to Unicode
codepoints.

Re: library for unicode collation in C for texi2any?

2023-10-12 Thread Eli Zaretskii

> Date: Thu, 12 Oct 2023 20:30:47 + (UTC)
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG 
> 
> >> > I don't recommend to tailor index sorting for the language
> >> > indicated by @documentlanguage, either.
> >> 
> >> This surprises me.  Why not?  For some languages, the alphabetical
> >> order differs enormously from English.
> > 
> > Because indices in a Texinfo document should not depend on details
> > of how the manual was produced.
> 
> Well, if I write a book in German, say, I most definitely want an
> index sorted with a German collation (there is more than a single one,
> BTW).  This collation should be used regardless of the input encoding.
> However, ...
> 
> > And note that I said "tailoring", which is minor adjustments to the
> > general collation, which is based on character Unicode codepoints.
> 
> ... there is probably a misunderstanding on my side.  I don't know
> what you mean with 'tailoring', please give an example.

This subject is too large and complicated for me to answer this
question here.  So I will refer you to the relevant Unicode spec:

  https://unicode.org/reports/tr10/

Section 8 "Tailoring" there will probably answer your question.

The main reason why I think we should not use language-specific
tailoring is that it is implemented differently by different system
libraries, and therefore the manuals produced by using that will be
different depending on what platform they were produced.  And that is
undesirable, IMO, from our POV.  As an example, I suggest to compare
the collation of file names in GNU 'ls', as implemented by glibc
(which basically implements the entire Unicode UTS#10 mentioned above
and uses its CLDR data set, http://unicode.org/cldr/), with the
corresponding MS-Windows API documented here:

  
https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-comparestringex

The results of collation using these disparate implementations is
similar, but not identical.  My point here is that Texinfo should IMO
try to avoid these subtle differences as much as possible.  Using code
that is independent of the current locale is a large step in that
direction, but there are additional smaller steps that we should take
after that, and avoiding too strong dependence on language-specific
collation, as implemented by the underlying libraries, is one of them.

Re: library for unicode collation in C for texi2any?

2023-10-12 Thread Eli Zaretskii

> Date: Thu, 12 Oct 2023 17:12:44 + (UTC)
> Cc: pertu...@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG 
> 
> 
> > I don't recommend to tailor index sorting for the language indicated
> > by @documentlanguage, either.
> 
> This surprises me.  Why not?  For some languages, the alphabetical
> order differs enormously from English.

Because indices in a Texinfo document should not depend on details of
how the manual was produced.  And note that I said "tailoring", which
is minor adjustments to the general collation, which is based on
character Unicode codepoints.

Re: library for unicode collation in C for texi2any?

2023-10-12 Thread Eli Zaretskii

> Date: Thu, 12 Oct 2023 15:00:57 +0200
> From: Patrice Dumas 
> Cc: bug-texinfo@gnu.org
> 
> On Thu, Oct 12, 2023 at 01:29:27PM +0300, Eli Zaretskii wrote:
> > What is "smart sorting"? where is it described/documented?
> 
> It is, in general, any way to sort Unicode that takes into account
> natural languages words orders. In practice, what is used in
> Unicode::Collate is the 'Unicode Technical Standard #10' Unicode
> Collation Algorithm (a.k.a. UCA) described in
> http://www.unicode.org/reports/tr10.  In texi2any, we set an option of
> collation,
>   ( 'variable' => 'Non-Ignorable' )
> such that spaces and punctuation marks sort before letters.  This
> specific option is described in
> http://www.unicode.org/reports/tr10/#Variable_Weighting
> 
> It would be perfect if the same sorting could be obtained, but if
> C code does not follow exactly the same standard, I do not think
> that it is so problematic, as long as the sorting is sensible.  It could
> actually be problematic for tests, but if the output of texi2any is ok
> even if not fully reproducible, it would still be better than sorting
> according to the Unicode codepoint in a full C implementation.

What you say is not detailed enough, but using my crystal ball I think
you can have this with glibc-based systems, and also on Windows (but
that requires using a special API for comparing strings).  Not sure
about the equivalent features on other systems, like *BSD and macOS.
You can see that in action in how GNU 'ls' sorts file names.

> > In general, Unicode collation rules are locale- and
> > language-dependent.  My recommendation for Texinfo is not to use
> > locale-specific collation rules, so that the indices would come out
> > sorted identically no matter in which locale the user runs texi2any.
> 
> That's the plan.  The plan is to use the @documentlanguage information
> with Unicode::Collate::Locale in the future, but never use the locale.

I don't recommend to tailor index sorting for the language indicated
by @documentlanguage, either.

> This is still a TODO item, though, as Unicode::Collate::Locale is a perl
> core module since perl 5.14 only, released in 2011, so my plan was to
> wait for 2031 to use it and be able to assume that it is indeed present
> the same way we assume that Unicode::Collate is present.

We can have this in C today.

Re: library for unicode collation in C for texi2any?

2023-10-12 Thread Eli Zaretskii

> Date: Thu, 12 Oct 2023 11:39:14 +0200
> From: Patrice Dumas 
> 
> One thing I could not find easily in C is something to replace the
> Unicode::Collate perl module for index entries sorting using 'smart'
> rules for sorting, that could be either found in Gnulib, included easily
> in the Texinfo distribution or would be, in general, installed.  Unless
> I missed something, there is no such facility in libunistring, it seems
> to be in libICU, but I do not know how easy it could be
> integrated/shipped with Texinfo and I do not think that it is installed
> in the general case.
> 
> 
> Do you have information, on how to do 'smart' unicode sorting in
> C, including for tests, which could allow shipping of code as we already
> do with libunistring in gnulib in case it is not already installed, such
> that it is used in the general case?  Could also be example of projects
> that have managed to do that.

What is "smart sorting"? where is it described/documented?

In general, Unicode collation rules are locale- and
language-dependent.  My recommendation for Texinfo is not to use
locale-specific collation rules, so that the indices would come out
sorted identically no matter in which locale the user runs texi2any.

Re: Texinfo 7.0.93 pretest available

2023-10-11 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Wed, 11 Oct 2023 18:15:04 +0100
> Cc: Patrice Dumas 
> 
> On Wed, Oct 11, 2023 at 06:12:51PM +0100, Gavin Smith wrote:
> > I will send you a diff to try to see if it lets the tests pass, or if
> > we need to make any further changes.
> 
> Attached.

Thanks.  This solves some of the diffs, but not all of them.  In
addition, one test that previously passed now fails
(formatting_documentlanguage_cmdline.sh).  I attach below the
redirected output of all the failed tests, which shows the diffs
against the expected results.

tp-tests-patched-mingw.gz
Description: Binary data

Re: Texinfo 7.0.93 pretest available

2023-10-11 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Tue, 10 Oct 2023 20:24:47 +0100
> Cc: br...@clisp.org, bug-texinfo@gnu.org
> 
> On Tue, Oct 10, 2023 at 02:55:09PM +0300, Eli Zaretskii wrote:
> > > If this simple stub is preferable to the Gnulib implementation for
> > > MS-Windows, (e.g. it makes the tests pass) we could re-add it again.
> > 
> > We can do that, but I think we should first explore a better
> > alternative: use UTF-8 functions everywhere, without relying on the
> > locale-aware functions of libc, such as wcwidth.  For example, instead
> > of wcwidth, we could use uc_width.
> 
> Changing away from using wcwidth at this stage is a more significant
> change to be making.  I want to fix this issue in an easy and simple way.
> As far as I am aware these tests passed on MS-Windows with previous
> releases of Texinfo, so doing what we did before seems the simplest fix
> to me.

Then we need to understand why the tests are now failing when they
succeeded previously.

> I'm not sure of the easiest way to put in a replacement for wcwidth
> given that the wcwidth module is in use.  I tried the stub implementation
> as before with a different name, but this led to test failures, so may
> not be enough.  It's possible there have also been changes in the tests.
> Do you know the last released version of Texinfo that passed the test
> suite successfully?

Texinfo 7.0.3 succeeded to run the tests.

> I wonder if it is commit b9347e3db9d0 that is responsible (2022-11-11,
> Patrice Dumas), or other changes to tp/tests/coverage_macro.texi that
> change what is occurring in the line.

I doubt that, since the previous versions already included, for
example, the dotless j letter, which is one of those which cause
trouble.

> As I said before, one short-term fix I would be happy with is to split
> the content up so there are shorter lines.  Given that the purpose of
> these tests is not to test line-breaking in itself, and that this is
> a fragile part of texi2any's output, if line breaking is to be tested
> this should be part of a specialised test.  Any difference in the
> line breaking for the coverage_macro.texi tests leads to a mass of
> differences which are hard to interpret.  We could put any problematic
> characters on lines of their own, e.g.

This would be fine by me, if filling is not the issue being tested
there.

Re: Texinfo 7.0.93 pretest available

2023-10-10 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Tue, 10 Oct 2023 18:09:15 +0100
> Cc: Eli Zaretskii , bug-texinfo@gnu.org
> 
> On Mon, Oct 09, 2023 at 11:32:49PM +0200, Bruno Haible wrote:
> > Gavin Smith wrote:
> > > It is supposed to attempt to force the locale to a UTF-8 locale.  You
> > > can see the code in xspara_init that attempts to change the locale.  There
> > > is also a comment before xspara_add_text:
> > > 
> > >   "This function relies on there being a UTF-8 locale in LC_CTYPE for
> > >   mbrtowc to work correctly."
> > 
> > That's an inherently unportable thing. You can't just force an UTF-8
> > locale if the system does not have it.
> 
> The module shouldn't load if it can't switch to a UTF-8 locale.  xspara_init
> returns a different value if these attempts fail leading the code loading
> the module (in Texinfo::XSLoader) to fall back to the pure Perl version.

If the inability to load the UTF-8 locale means the modules cannot be
loaded, I consider that a serious problem, because Perl implementation
is slower.  We need every possible way of speeding up texi2any,
because the speed regression since Texinfo moved to the Perl
implementation is significant, so much so that some refuse to upgrade
from Texinfo 4.13 (and thus hold back usage of new Texinfo features in
the various GNU manuals).  We cannot afford losing speedups due to
such issues, especially since they are solvable using readily
available libraries.

> It would be good to get away from the attempts to switch to a UTF-8 locale
> but I doubt it is urgent to do before the release, as the current approach,
> however flawed, has been in place and worked fairly well for a long time
> (since the XS paragraph module was written).  At the time it seemed to be
> the only way to get the information from wcwidth.

Then what do you propose to do about this in the MinGW port of Texinfo
7.1?  And why is it urgent to release Texinfo 7.1 without fixing this
issue?

Re: Texinfo 7.0.93 pretest available

2023-10-10 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Mon, 9 Oct 2023 20:39:59 +0100
> Cc: Bruno Haible , bug-texinfo@gnu.org
> 
> > IOW, unless the locale's codeset is UTF-8, any character that is not
> > printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> > guessing that no one has ever tried to run the test suite in a
> > non-UTF-8 locale before?
> 
> It is supposed to attempt to force the locale to a UTF-8 locale.  You
> can see the code in xspara_init that attempts to change the locale.  There
> is also a comment before xspara_add_text:
> 
>   "This function relies on there being a UTF-8 locale in LC_CTYPE for
>   mbrtowc to work correctly."

You cannot force MS-Windows into using the UTF-8 locale (with the
possible exception of very recent Windows versions, which AFAIK still
don't support UTF-8 in full).

You also cannot force an arbitrary Posix system into using UTF-8,
because such a locale might not be installed.

> For MS-Windows there is the w32_setlocale function that may use something
> different:
> 
>   /* Switch to the Windows U.S. English locale with its default
>  codeset.  We will handle the non-ASCII text ourselves, so the
>  codeset is unimportant, and Windows doesn't support UTF-8 as the
>  codeset anyway.  */
>   return setlocale (category, "ENU");
> 
> mbrtowc has its own override which handle UTF-8.
> 
> As far as this relates to wcwidth, there used to be an MS-Windows specific
> stub implementation of this, removed in commit 5a66bc49ac032 (Patrice Dumas,
> 2022-08-19) which added a gnulib implementation of wcwidth:
> 
> diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
> index 93924a623c..bf4ef91650 100644
> --- a/tp/Texinfo/XS/xspara.c
> +++ b/tp/Texinfo/XS/xspara.c
> @@ -206,13 +206,6 @@ iswspace (wint_t wc)
>return 0;
>  }
>  
> -/* FIXME: Provide a real implementation.  */
> -int
> -wcwidth (const wchar_t wc)
> -{
> -  return wc == 0 ? 0 : 1;
> -}
> -
>  int
>  iswupper (wint_t wi)
>  {
> 
> 
> If this simple stub is preferable to the Gnulib implementation for
> MS-Windows, (e.g. it makes the tests pass) we could re-add it again.

We can do that, but I think we should first explore a better
alternative: use UTF-8 functions everywhere, without relying on the
locale-aware functions of libc, such as wcwidth.  For example, instead
of wcwidth, we could use uc_width.

Is it feasible to use UTF-8 in texi2any disregarding the locale, and
use libunistring or something similar for the few functions we need in
the extensions that are required to deal with non-ASCII characters?
If we can do that, it will work on all systems, including Windows.
(This is basically what Emacs does, but it does that on a much greater
scale, which is unnecessary in texi2any.)

Re: Texinfo 7.0.93 pretest available

2023-10-10 Thread Eli Zaretskii

> Date: Mon, 9 Oct 2023 21:17:28 +0200
> From: Patrice Dumas 
> 
> On Sun, Oct 08, 2023 at 06:29:23PM +0100, Gavin Smith wrote:
> > 
> > I remember that in the past, I broke up some of these lines to avoid
> > test failures on some platform that had different wcwidth results for
> > some characters.
> 
> Maybe an optionin the long term here would be not to use wcwidth at all,
> but use libunistring functions like u8_strwidth.  It would probably
> remove the issue of locale.  The only requirement would be to make sure
> that the input string is UTF-8 encoded such that it can be converted to
> uint8_t without risk of error.

Isn't makeinfo converts all non-ASCII text to UTF-8 anyway?  If so, we
should always use the UTF-8 functions, without relying on the locale
and libc locale-aware functions.

Re: Texinfo 7.0.93 pretest available

2023-10-09 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: gavinsmith0...@gmail.com, bug-texinfo@gnu.org
> Date: Mon, 09 Oct 2023 19:18:25 +0200
> 
> Eli Zaretskii wrote:
> > > I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
> 
> Oops, typo: What I tested was the de_DE.ISO-8859-1 locale:
> $ export LC_ALL=de_DE.ISO-8859-1

So wcwidth in ISO-8895-1 locale returns 1 for U+0237?  Even though
U+0237 cannot be encoded in ISO-8895-1?  And iswprint returns non-zero
for it in that locale?

Or does the Texinfo test suite forces the locale to something UTF-8?

> > Since U+0237 is not printable in my locale (it isn't supported by the
> > system codepage), the value -1 is correct.  Am I missing something?
> 
> True. But why don't we see the same test failure on glibc and on FreeBSD
> systems, then, in a locale with ISO-8859-1 encoding?

Good question.  Maybe they interpret the Posix standards differently
(if the locale is not forced by the test suite).

> > > This "simpler approximation" would not return a good result when wc
> > > is a control character (such as CR, LF, TAB, or such). It is important
> > > that the caller of wcwidth() or wcswidth() is able to recognize that
> > > the string as a whole does not have a definite width.
> > 
> > It is still better than returning -1, don't you agree?
> 
> No, I don't agree. Returning -1 tells the caller "watch out, you cannot
> assume anything about printed outline of this string".

I meant "better for Texinfo when it generates Info manuals", not in
general.

> > But for some reason you completely ignored my more general comment
> > about what Texinfo needs from wcwidth.
> 
> That's because I am not familiar with the Texinfo code. I don't know
> whether and where Texinfo calls wcwidth(), and I don't know with which
> expectations it does so.

It calls wcwidth to know how many columns a character will take, in
order to fill lines, when it generates manuals in the Info format.

Re: Texinfo 7.0.93 pretest available

2023-10-09 Thread Eli Zaretskii

> From: Bruno Haible 
> Cc: bug-texinfo@gnu.org
> Date: Mon, 09 Oct 2023 18:15:05 +0200
> 
> Eli Zaretskii wrote:
> > unless the locale's codeset is UTF-8, any character that is not
> > printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> > guessing that no one has ever tried to run the test suite in a
> > non-UTF-8 locale before?
> 
> I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
> texinfo 7.0.93 build fine and all tests pass.

de_DE.UTF-8 is a UTF-8 locale.  I asked about non-UTF-8 locales.  An
example would be de_DE.ISO8859-1.  Or what am I missing?

> > Yes, quite a few characters return -1 from wcwidth, in particular the
> > ȷ character above (which explains the above difference).
> 
> This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be
> recognized as having a width of 1 in all implementations of wcwidth.

But if U+0237 cannot be represented in the locale's codeset, its width
can not be 1, because it cannot be printed.  This is my interpretation
of the standard's language (emphasis mine):

  DESCRIPTION

  The wcwidth() function shall determine the number of column
  positions required for the wide character wc. The application
  shall ensure that the value of wc is a character representable
  as a wchar_t, and is a wide-character code corresponding to a
  valid character in the current locale.
  ^
  RETURN VALUE

  The wcwidth() function shall either return 0 (if wc is a null
  wide-character code), or return the number of column positions
  to be occupied by the wide-character code wc, or return -1 (if
  wc does not correspond to a printable wide-character code).
 ^^
Since U+0237 is not printable in my locale (it isn't supported by the
system codepage), the value -1 is correct.  Am I missing something?

> There's no reason for it to have a width of -1, since it's not a control
> character.
> There's no reason for it to have a width of 0, since it's not a combining
> mark or a non-spacing character.
> There's no reason for it to have a width of 2, since it's not a CJK character
> and not in a Unicode range with many CJK characters.

I think you assume that all the Unicode letter characters are always
printable in every locale.  That's not what I understand, and iswprint
agrees with me, because I get -1 for U+0237 due to this code:

> >   return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;

> > I don't think the above logic in Gnulib's wcwidth (which basically
> > replicates the logic in any reasonable wcwidth implementation, so is
> > not specific to Gnulib) fits what Texinfo needs.  Texinfo needs to be
> > able to produce output independently of the locale.  What matters to
> > Texinfo is the encoding of the output document, not the locale's
> > codeset.  So I think we should call uc_width when the output document
> > encoding is UTF-8 (which is the default, including in the above test),
> > regardless of the locale's codeset.  Or we could use a simpler
> > approximation:
> > 
> >   return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;
> 
> This "simpler approximation" would not return a good result when wc
> is a control character (such as CR, LF, TAB, or such). It is important
> that the caller of wcwidth() or wcswidth() is able to recognize that
> the string as a whole does not have a definite width.

It is still better than returning -1, don't you agree?

But for some reason you completely ignored my more general comment
about what Texinfo needs from wcwidth.

Re: Texinfo 7.0.93 pretest available

2023-10-09 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 8 Oct 2023 20:21:44 +0100
> Cc: bug-texinfo@gnu.org
> 
> Just comparing the first line in the hunk:
> 
> -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ
> +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ)
> 
> the line you are getting is longer than the reference results.  
> 
> I wonder if for some of the non-ASCII characters wcwidth is returning 0 or
> -1 leading the line to be longer.

Yes, quite a few characters return -1 from wcwidth, in particular the
ȷ character above (which explains the above difference).

> It's also possible that other codepoints have inconsistent wcwidth results,
> especially for combining accents.
> 
> Do you know if it is the gnulib implementation of wcwidth that is being
> used or a MinGW one?

AFAIK, MinGW doesn't have wcwidth, so we are using the one from
Gnulib.  But what Gnulib does in this case is not what Texinfo
expects, I think:

int
wcwidth (wchar_t wc)
#undef wcwidth
{
  /* In UTF-8 locales, use a Unicode aware width function.  */
  if (is_locale_utf8_cached ())
{
  /* We assume that in a UTF-8 locale, a wide character is the same as a
 Unicode character.  */
  return uc_width (wc, "UTF-8");
}
  else
{
  /* Otherwise, fall back to the system's wcwidth function.  */
#if HAVE_WCWIDTH
  return wcwidth (wc);
#else
  return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
#endif
}
}

IOW, unless the locale's codeset is UTF-8, any character that is not
printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
guessing that no one has ever tried to run the test suite in a
non-UTF-8 locale before?

I don't think the above logic in Gnulib's wcwidth (which basically
replicates the logic in any reasonable wcwidth implementation, so is
not specific to Gnulib) fits what Texinfo needs.  Texinfo needs to be
able to produce output independently of the locale.  What matters to
Texinfo is the encoding of the output document, not the locale's
codeset.  So I think we should call uc_width when the output document
encoding is UTF-8 (which is the default, including in the above test),
regardless of the locale's codeset.  Or we could use a simpler
approximation:

  return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;

CC'ing Bruno who I think knows much more about this.

Re: Texinfo 7.0.93 pretest available

2023-10-08 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 8 Oct 2023 18:29:23 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 08, 2023 at 07:31:12PM +0300, Eli Zaretskii wrote:
> > I see a very large diff, full of non-ASCII characters.  A typical hunk
> > is below:
> > 
> >   -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ
> >   -(ȷ) ‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å)
> >   -‘@tieaccent{a}’ a͡ ‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ
> >   -(ạ) ‘@v{a}’ ǎ (ǎ) @,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)
> >   +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ)
> >   +‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å) ‘@tieaccent{a}’ 
> > a͡
> >   +‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ (ạ) ‘@v{a}’ ǎ (ǎ)
> >   +@,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)
> > 
> > It looks like a filling problem to me, perhaps because something
> > counts bytes instead of characters?
> 
> It's almost certainly a problem with filling as you say.  In the C (XS)
> code, the return value of wcwidth is used for each character to get
> the width of each line.  The pure Perl code doesn't use the wcwidth
> function as far as I know but keeps a count for each line based on
> regex character classes.  The relevant code is in
> Texinfo/Convert/Unicode.pm, in the 'string_width' function.

So perhaps the wcwidth function is the culprit.  I'm guessing that it
returns 1 for every printable character in my case.

> Do you know whether the XS modules are in use?

Yes, they are.  That's why Perl crashed before the getdelim issue was
fixed, and the crash was inside Parsetexi.dll, which is an XS module.

> You could try "export TEXINFO_XS=omit" or "export TEXINFO_XS=require" to
> check if it makes a difference.  That would narrow it down to which version
> of the code had the problem (or if they both have a problem).

This command succeeds with status 0:

  $ TEXINFO_XS=omit test_scripts/coverage_formatting_info.sh

Re: Texinfo 7.0.93 pretest available

2023-10-08 Thread Eli Zaretskii

> Date: Sun, 08 Oct 2023 19:31:12 +0300
> From: Eli Zaretskii 
> Cc: bug-texinfo@gnu.org
> 
> I see a very large diff, full of non-ASCII characters.  A typical hunk
> is below:
> 
>   -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ
>   -(ȷ) ‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å)
>   -‘@tieaccent{a}’ a͡ ‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ
>   -(ạ) ‘@v{a}’ ǎ (ǎ) @,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)
>   +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ)
>   +‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å) ‘@tieaccent{a}’ a͡
>   +‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ (ạ) ‘@v{a}’ ǎ (ǎ)
>   +@,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)
> 
> It looks like a filling problem to me, perhaps because something
> counts bytes instead of characters?

Or maybe the data about character width is incorrect/inconsistent?

Re: Texinfo 7.0.93 pretest available

2023-10-08 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 8 Oct 2023 17:04:36 +0100
> Cc: bug-texinfo@gnu.org
> 
> On Sun, Oct 08, 2023 at 04:55:24PM +0300, Eli Zaretskii wrote:
> > > Date: Sun, 08 Oct 2023 16:42:05 +0300
> > > From: Eli Zaretskii 
> > > CC: bug-texinfo@gnu.org
> > > 
> > > The next set of problems is in install-info: the new code in this
> > > version fails to close files, and then Windows doesn't let us
> > > remove/rename them.  The result is that almost all the install-info
> > > tests fail with Permission denied.  The patch below fixes that:
> > 
> > Finally, 8 tests in tp/tests fail:
> > 
> >test_scripts/coverage_formatting_info.sh
> >test_scripts/coverage_formatting_plaintext.sh
> >test_scripts/layout_formatting_info_ascii_punctuation.sh
> >test_scripts/layout_formatting_info_disable_encoding.sh
> >test_scripts/layout_formatting_plaintext_ascii_punctuation.sh
> >test_scripts/layout_formatting_fr.sh
> >test_scripts/layout_formatting_fr_info.sh
> >test_scripts/layout_formatting_fr_icons.sh
> > 
> > I don't think I understand how to debug this.  I tried to look at the
> > output and log files, but either I look at the wrong files or I
> > misunderstand how to interpret them.  Any help and advice will be
> > appreciated.
> 
> First change to the tp/tests subdirectory.  Then run the test script.
> For example:
> 
> test_scripts/coverage_formatting_info.sh

Thanks.

> This prints the texi2any command run, and if there are unexpected results
> these should be printed too.  On my system, here is what is printed for that
> test:
> 
> testdir: coverage/
> driving_file: ./coverage//list-of-tests
> made result dir: ./coverage//res_parser/
> 
> doing test formatting_info, src_file ./coverage//formatting.texi
> format_option: 
> texi2any.pl formatting_info -> coverage//out_parser/formatting_info
>  /usr/bin/perl -w ./..//texi2any.pl  --force --conf-dir ./../t/init/ 
> --conf-dir ./../init --conf-dir ./../ext -I ./coverage/ -I coverage// -I ./ 
> -I . -I built_input --error-limit=1000 -c TEST=1  --output 
> coverage//out_parser/formatting_info/ -D 'needcollationcompat Need collation 
> compatibility' --info ./coverage//formatting.texi > 
> coverage//out_parser/formatting_info/formatting.1 
> 2>coverage//out_parser/formatting_info/formatting.2
> 
> all done, exiting with status 0
> 
> If any of the output files or standard output or error differered from what
> was expected, this would be printed as a diff afterwards.

I see a very large diff, full of non-ASCII characters.  A typical hunk
is below:

  -(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ
  -(ȷ) ‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å)
  -‘@tieaccent{a}’ a͡ ‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ
  -(ạ) ‘@v{a}’ ǎ (ǎ) @,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)
  +(ì) @'{e} é (é) @'{@dotless{i}} í (í) @dotless{i} ı (ı) @dotless{j} ȷ (ȷ)
  +‘@H{a}’ a̋ ‘@dotaccent{a}’ ȧ (ȧ) ‘@ringaccent{a}’ å (å) ‘@tieaccent{a}’ a͡
  +‘@u{a}’ ă (ă) ‘@ubaraccent{a}’ a̲ ‘@udotaccent{a}’ ạ (ạ) ‘@v{a}’ ǎ (ǎ)
  +@,c ç (ç) ‘@,{c}’ ç (ç) ‘@ogonek{a}’ ą (ą)

It looks like a filling problem to me, perhaps because something
counts bytes instead of characters?

The diffs like above are followed by diffs in the Index part, where it
looks like the differences are just line counts:

   * Menu:

  -* truc:  chapter.(line 2236)
  +* truc:  chapter.(line 2234)

Probably due to the same problem of incorrect filling of lines?

Re: Texinfo 7.0.93 pretest available

2023-10-08 Thread Eli Zaretskii

> From: Gavin Smith 
> Date: Sun, 8 Oct 2023 16:33:22 +0100
> Cc: bug-texinfo@gnu.org
> 
> > > Hence, I propose to initialise n to 0, rather than 120 as in the patch
> > > below.
> > 
> > No, the value must be positive, otherwise it still crashes.  It's a
> > bug in MinGW implementation.
> 
> Can you refer to any discussion of this bug online anywhere?

I don't need any discussions, I simply read the code.  MinGW is a Free
Software, so the sources of its additions to the Microsoft runtime are
part of the MinGW distribution.  Once I understood that the build is
using the MinGW getdelim, I simply looked at the sources.

> I see on the POSIX specification:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdelim.html
> 
> the wording is slightly different to the glibc manual:
> 
>If *n is non-zero, the application shall ensure that *lineptr either
>points to an object of size at least *n bytes, or is a null pointer.
>
>If *lineptr is a null pointer or if the object pointed to by *lineptr
>is of insufficient size, an object shall be allocated...
> 
> This implies that it is ok to have null *LINEPTR and positive *N.

Yes, it is OK.  It should be also OK to have *N be any garbage when
*LINEPTR is NULL, but the MinGW implementation fails to support that
case.

> I don't like using the value 120 as this is slightly larger than a
> default line length of 80, which is confusing as you might think it
> was that number for a reason and that we were supporting input line
> lengths up to 120 bytes, when in fact any positive number would have
> done.
> 
> I will change it to be 1 with a comment that it should be any positive
> number.

The value 1 works, I already tested that.

> This bug sounds like something that should be worked around with gnulib.
> Would you be able to send details of the bug to bug-gnu...@gnu.org as
> well as any information on the versions of MinGW affected?

Yes, when I have time.  I'm a bit busy these days; it's sheer luck I
had so much time today to work on the non-trivial problems in this
pretest.  (And it isn't over yet.)

Re: Texinfo 7.0.93 pretest available

2023-10-08 Thread Eli Zaretskii

> Date: Sun, 08 Oct 2023 16:42:05 +0300
> From: Eli Zaretskii 
> CC: bug-texinfo@gnu.org
> 
> The next set of problems is in install-info: the new code in this
> version fails to close files, and then Windows doesn't let us
> remove/rename them.  The result is that almost all the install-info
> tests fail with Permission denied.  The patch below fixes that:

Finally, 8 tests in tp/tests fail:

   test_scripts/coverage_formatting_info.sh
   test_scripts/coverage_formatting_plaintext.sh
   test_scripts/layout_formatting_info_ascii_punctuation.sh
   test_scripts/layout_formatting_info_disable_encoding.sh
   test_scripts/layout_formatting_plaintext_ascii_punctuation.sh
   test_scripts/layout_formatting_fr.sh
   test_scripts/layout_formatting_fr_info.sh
   test_scripts/layout_formatting_fr_icons.sh

I don't think I understand how to debug this.  I tried to look at the
output and log files, but either I look at the wrong files or I
misunderstand how to interpret them.  Any help and advice will be
appreciated.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1015 matches

Mail list logo