Re: PACKAGE_VERSION should not be renamed

2023-11-12 Thread Patrice Dumas
On Sun, Nov 12, 2023 at 11:05:18AM +, Gavin Smith wrote:
> 
> However, I doubt that it is justified to rename it.  We know it is an
> option, so what is the point of putting _OPTION on the end to get
> PACKAGE_VERSION_OPTION?

Thinking more about it, I think that there is a point in having a
different name.  The name PACKAGE_VERSION already refers to something,
to the value set by configure.  The addition of _OPTION makes clear that
it may be modified/customized compared to what configure passed. And it
already is modified if TEST is set.

Instead of changing that, I propose to also provide a PACKAGE_VERSION
in perl that is not an option and is unmodified and can always be used
to get what was set by configure.  It could make sense to put the
information somewhere else than in the customization, and use something
similar to what is in the API with get_info(), which is not modifiable
(in contrast with get_conf()/set_conf()).  This could be available in
init files with texinfo_get_info().  In C/XS the configure set macros
could be used directly, or the values set in perl could be passed too,
and available through get_info(), although the issue with the macro
already defined would pop up again if it is a filed in a structure.
This could be somewhat confusing, but would allow to distinguish the
values that should be used to do portability codes and the values used
to output strings.

> This could be dealt with in a variety of ways.  The most straightforward
> would be to "#undef PACKAGE_VERSION" at the beginning of option_types.h.
> That way, any source code file using the OPTIONS type and including
> this file will not have that preprocessor definition active throughout
> the file.

If we do that, we should keep the existing value in another macro,
renamed, for example PACKAGE_VERSION_CONFIG or PACKAGE_VERSION_MACRO.
But the more I think about the more it seems relevant to me to keep
PACKAGE_VERSION as set by configure and use a different name for the
customization option as done now.

> We do not use the PACKAGE_VERSION from the build system for the XS code
> and it simply has the value "0".  (In the top-level build system, it is
> the version of the package, currently "7.1dev".)

What we should have here is not fully clear to me, but I think that it
should be the value set by configure.  Probably having the same as in
the main configure would be the best, but we could also have a number
that is only incremented if there were changes, like we do for the
texinfo DTD.

> It's also not necessary that we use the symbol PACKAGE_VERSION in the XS/C
> code to refer to this option, so could be renamed to o_PACKAGE_VERSION
> as a special case if the #undef option does not work or is not appropriate
> for some reason.

To me PACKAGE_VERSION_OPTION is better than o_PACKAGE_VERSION, it is
clearer, it minimizes the differences between perl variables
name and C/XS names used, and between C/XS and documentation.

-- 
Pat



Re: PACKAGE_VERSION should not be renamed

2023-11-12 Thread Patrice Dumas
On Sun, Nov 12, 2023 at 11:05:18AM +, Gavin Smith wrote:
> However, I doubt that it is justified to rename it.  We know it is an
> option, so what is the point of putting _OPTION on the end to get
> PACKAGE_VERSION_OPTION?

I chose that way because I imagined that using PACKAGE_VERSION only for
macros defined in config.h was a kind of established custom.  But it is
probably better to avoid this change as you mention.

> This could be dealt with in a variety of ways.  The most straightforward
> would be to "#undef PACKAGE_VERSION" at the beginning of option_types.h.
> That way, any source code file using the OPTIONS type and including
> this file will not have that preprocessor definition active throughout
> the file.
> 
> We do not use the PACKAGE_VERSION from the build system for the XS code
> and it simply has the value "0".  (In the top-level build system, it is
> the version of the package, currently "7.1dev".)

I do not like that possibility that much, if we want to use
PACKAGE_VERSION later on (we use it in perl code), it will not be
available, I think that it would be best avoiding that solution.

> It's also not necessary that we use the symbol PACKAGE_VERSION in the XS/C
> code to refer to this option, so could be renamed to o_PACKAGE_VERSION
> as a special case if the #undef option does not work or is not appropriate
> for some reason.

It seems to me to be a better solution, I'll have a look at it if you
don't.  I think that it is part of generated code from perl data or a
txt file, so it should probably be fixed at generation time.

-- 
Pat



Re: set_labels_identifiers_target -fsanitize=undefined error

2023-11-12 Thread John Paul Adrian Glaubitz
Hi!

On Sat, 2023-11-04 at 16:38 +, Gavin Smith wrote:
> On Sat, Nov 04, 2023 at 01:10:47PM +, Sam James wrote:
> > 
> > John Paul Adrian Glaubitz  writes:
> > 
> > > Hi Gavin!
> > > 
> > > On Sat, 2023-11-04 at 11:00 +, Gavin Smith wrote:
> > > > The line in question is:
> > > > 
> > > >   memcpy (targets, list_of_labels, labels_number * sizeof(LABEL));
> > > > 
> > > > - again, the second argument of memcpy.
> > > > 
> > > > However, main/targets.c was only introduced after Texinfo 7.1 so
> > > > this is not the original problem.
> > > 
> > > I'll provide a backtrace as well as the commit that introduced the 
> > > regression
> > > on SPARC within the next days. Need to set up two new SPARC servers next 
> > > week
> > > first.
> > > 
> > 
> > OK, I tried this out on sparc with Gavin's fix on master, and got...
> > 
> > export UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1
> > ./autogen.sh;  ./configure PERL_EXT_CFLAGS="-O2 -ggdb3
> > -fsanitize=undefined" CFLAGS="-O2 -ggdb3 -fsanitize=undefined"   ; make
> > -j$(nproc) ; make check -j$(nproc)
> > 
> > parsetexi/tree.c:77:11: runtime error: member access within misaligned 
> > address 0x0100010e9744 for type 'struct ELEMENT', which requires 8 byte 
> > alignment
> > 0x0100010e9744: note: pointer points here
> >   00 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> >  00 00 00 00 00 00 00 00
> >   ^
> > #0 0xfff8000102fc12ec in new_element parsetexi/tree.c:77
> 
> It is probably a problem with the obstack allocation method that was being
> used for ELEMENT (an attempted optimisation, although it's questionable how
> successful it was).
> 
> According to the glibc manual,
> 
>   Each obstack has an “alignment boundary”; each object allocated in the
>   obstack automatically starts on an address that is a multiple of the
>   specified boundary.  By default, this boundary is aligned so that the
>   object can hold any type of data.
> 
> (Info node (libc)Obstacks Data Alignment.)
> 
> However, we also use the gnulib obstacks module.  Between glibc and gnulib,
> it was evidently not meeting this promise.
> 
> You can try to set the alignment explicitly:
> 
> diff tree.c.old tree.c -u
> --- tree.c.old  2023-11-04 16:15:13.632755680 +
> +++ tree.c  2023-11-04 16:16:36.211072521 +
> @@ -43,7 +43,10 @@
>if (obs_element_first)
>  obstack_free (&obs_element, obs_element_first);
>else
> -obstack_init (&obs_element);
> +{
> +  obstack_alignment_mask (&obs_element) = 7; /* 8-byte alignment */
> +  obstack_init (&obs_element);
> +}
>  
>obs_element_first = obstack_alloc (&obs_element, sizeof (int));
> 
> 
> Can you check if that works?

Yes, I can confirm that this patch fixes the crash for me.

Would be great if this fix could be included for the next release!

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: set_labels_identifiers_target -fsanitize=undefined error

2023-11-12 Thread John Paul Adrian Glaubitz
Hi!

On Sat, 2023-11-04 at 12:03 +0100, John Paul Adrian Glaubitz wrote:
> Hi Gavin!
> 
> On Sat, 2023-11-04 at 11:00 +, Gavin Smith wrote:
> > The line in question is:
> > 
> >   memcpy (targets, list_of_labels, labels_number * sizeof(LABEL));
> > 
> > - again, the second argument of memcpy.
> > 
> > However, main/targets.c was only introduced after Texinfo 7.1 so
> > this is not the original problem.
> 
> I'll provide a backtrace as well as the commit that introduced the regression
> on SPARC within the next days. Need to set up two new SPARC servers next week
> first.

Here is the result of my bisect process:

(sid_sparc64-dchroot)glaubitz@stadler:~/texinfo$ git bisect bad 

   
83259a78038068caf3f3d8d669796ea003a63735 is the first bad commit
commit 83259a78038068caf3f3d8d669796ea003a63735
Author: Gavin Smith
Date:   Mon Apr 10 18:02:00 2023 +0100

Run 'gnulib-tool --add-import obstack' under tp/Texinfo/XS.

 ChangeLog   |   4 +
 tp/Texinfo/XS/gnulib/lib/Makefile.am|  33 ++
 tp/Texinfo/XS/gnulib/lib/alignof.h  |  51 +++
 tp/Texinfo/XS/gnulib/lib/exitfail.c |  24 ++
 tp/Texinfo/XS/gnulib/lib/exitfail.h |  18 ++
 tp/Texinfo/XS/gnulib/lib/gettext.h  | 300 ++
 tp/Texinfo/XS/gnulib/lib/obstack.c  | 353 +
 tp/Texinfo/XS/gnulib/lib/obstack.h  | 546 
 tp/Texinfo/XS/gnulib/m4/gnulib-cache.m4 |   2 +
 tp/Texinfo/XS/gnulib/m4/gnulib-comp.m4  |  18 ++
 tp/Texinfo/XS/gnulib/m4/obstack.m4  |  33 ++
 tp/Texinfo/XS/gnulib/m4/vararrays.m4|  72 +
 tp/Texinfo/XS/parsetexi/tree.c  |   2 +-
 13 files changed, 1455 insertions(+), 1 deletion(-)
 create mode 100644 tp/Texinfo/XS/gnulib/lib/alignof.h
 create mode 100644 tp/Texinfo/XS/gnulib/lib/exitfail.c
 create mode 100644 tp/Texinfo/XS/gnulib/lib/exitfail.h
 create mode 100644 tp/Texinfo/XS/gnulib/lib/gettext.h
 create mode 100644 tp/Texinfo/XS/gnulib/lib/obstack.c
 create mode 100644 tp/Texinfo/XS/gnulib/lib/obstack.h
 create mode 100644 tp/Texinfo/XS/gnulib/m4/obstack.m4
 create mode 100644 tp/Texinfo/XS/gnulib/m4/vararrays.m4
(sid_sparc64-dchroot)glaubitz@stadler:~/texinfo$

I'll test the proposed patch now.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: c32width gives incorrect return values in C locale

2023-11-12 Thread Gavin Smith
On Sat, Nov 11, 2023 at 11:54:52PM +0100, Bruno Haible wrote:
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > .
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).
> 
> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.
> 
> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

Thanks for the advice.



PACKAGE_VERSION should not be renamed

2023-11-12 Thread Gavin Smith
On Sun, Nov 05, 2023 at 12:13:00PM +0100, Arsen Arsenović wrote:
> Oh, interesting.  On my system, I'm using texinfo built from master, and
> it completely lacks PACKAGE_VERSION!  (if I rebuild from releases/7.1,
> it's present).  It looks like it got removed in
> 6a1a78b941a832033b8939ab4cdce1646dd56904.

It's still stated as PACKAGE_VERSION in doc/texinfo.texi, even though it's
mentioned in NEWS now.  If the customization variable is changing name,
then the manual should be updated as well.

However, I doubt that it is justified to rename it.  We know it is an
option, so what is the point of putting _OPTION on the end to get
PACKAGE_VERSION_OPTION?

Especially since it is used as part of the cutomization API, it
seems that this should be a stable option.  Here is a link to a patch
to fix building ffmpeg docs:

http://ffmpeg.org/pipermail/ffmpeg-devel/2023-November/316605.html

Including version checks using PACKAGE_VERSION:

   # determine if texinfo is at least version 6.8
  -my $program_version_num = 
version->declare(get_conf('PACKAGE_VERSION'))->numify;
  +my $program_version_num = 
version->declare(ff_get_conf('PACKAGE_VERSION'))->numify;
   my $program_version_6_8 = $program_version_num >= 6.008000;
   
   # print the TOC where @contents is used
   if ($program_version_6_8) {
  -set_from_init_file('CONTENTS_OUTPUT_LOCATION', 'inline');
  +ff_set_from_init_file('CONTENTS_OUTPUT_LOCATION', 'inline');
   } else {
  -set_from_init_file('INLINE_CONTENTS', 1);
  +ff_set_from_init_file('INLINE_CONTENTS', 1);
   }

If the customization API is likely to break easily (and speaking generally,
it seems to me that any API like this is likely to, and the more it is
integrated into the "guts" of the program, the more likely it is to, although
I am not very familiar with the texi2any customization API), then at
least PACKAGE_VERSION gives packages a chance to adapt, so should be stable
if nothing else is.

In commit 6a1a78b941a832033b (2023-09-13) (*), it is said to avoid
clashing with macros defined in config.h.  This is the details of the
implementation and something we could work around, rather than changing
the user-visible interface.  It's currently used in the definition of
the OPTIONS type in tp/Texinfo/XS/main/options_types.h.

This could be dealt with in a variety of ways.  The most straightforward
would be to "#undef PACKAGE_VERSION" at the beginning of option_types.h.
That way, any source code file using the OPTIONS type and including
this file will not have that preprocessor definition active throughout
the file.

We do not use the PACKAGE_VERSION from the build system for the XS code
and it simply has the value "0".  (In the top-level build system, it is
the version of the package, currently "7.1dev".)

It's also not necessary that we use the symbol PACKAGE_VERSION in the XS/C
code to refer to this option, so could be renamed to o_PACKAGE_VERSION
as a special case if the #undef option does not work or is not appropriate
for some reason.

(*) My preference is to always state at least the date as well as the
git commit ID -- that way, if we ever move to a version control system
other than git, then somebody reading the mailing list archives will
find it easier to find which commit it is.  It's useful information to
have, in any case.