Re: Parameterize the memory allocator for the container library

2024-02-15 Thread Marc Nieper-Wißkirchen
Hi,

If I may chime into the discussion, let me remark that local
parameterization of the memory allocator is often useful (while the linker
approach means a global replacement).  For example, part of the
program/library may work with an arena allocator (see, e.g. [1] for a
general discussion) and containers used in this part of the program should
use the arena allocator instead of the global allocator.

The C++ container types have the ability to use custom allocators (see [2]
for some simple benchmarks).

Best,

Marc

--

[1] https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator
[2]
https://indiegamedev.net/2022/03/27/custom-c20-memory-allocators-for-stl-containers/

Am Sa., 10. Feb. 2024 um 18:28 Uhr schrieb Bruno Haible :

> Hi,
>
> > Is it possible to parameterize the memory allocator for the ordered map
> > library?
>
> 1) What's the use-case?
>
> 2) What's wrong with the approach outlined in
>https://stackoverflow.com/questions/17803456/ ?
>
> Bruno
>
>
>
>
>


Re: new module 'jit/cache'

2023-12-20 Thread Marc Nieper-Wißkirchen
Do you have detailed test results/logs?

Thanks,

Marc

Am Di., 19. Dez. 2023 um 23:13 Uhr schrieb Bruno Haible :

> My (preliminary) test results:
>
> The test passes on x86_64, m68k, alpha, s390, s390x, riscv64, loongarch64.
>
> It fails on some platforms with i386, mips, sparc, arm, arm64
>
> It fails on all platforms with powerpc*, ia64.
>
> Bruno
>
>
>
>


Re: pointer addition and arrays

2023-12-17 Thread Marc Nieper-Wißkirchen
Am So., 17. Dez. 2023 um 09:50 Uhr schrieb Paul Eggert :

> On 2023-12-16 04:42, Marc Nieper-Wißkirchen wrote:
> > the problem is if further down in the compiler pipeline, the
> > P <= P expression produces some intermediate code that becomes equal to
> > code the optimizer should rightfully optimize away
>
> I'm skeptical that such optimizations would be worth the trouble in
> practice, because any optimization where P==P does not imply P<=P would
> run contrary to common sense and programmer intuition.


The C standard is full of such examples "contrary to common sense", and in
related areas (e.g. comparison/overflow of (un)signed integers) optimizers
are already using these.


> In practice,
> compiler writers who'd employ such optimizations would likely cause more
> trouble than they'd cure, and I wouldn't want to encourage them.
>
> If I'm wrong, and in the future these optimizations become valuable and
> break programs like Coreutils, we can revisit the issue then. In the
> meantime let's leave these sleeping dogs lie. I have little doubt that
> Bruno has missed many of the issues and it'd be a waste of our time to
> try to find the rest of them.
>

We don't have to hunt all these issues, but we should correct those we find.

The C standard has always favoured the freedom of compiler writers for
efficiency, so it is definitely not wrong by its own measures.  It may not
be the best decision for every application area, but there are other
programming languages besides C.


Re: pointer addition and arrays

2023-12-16 Thread Marc Nieper-Wißkirchen
Am Fr., 15. Dez. 2023 um 02:00 Uhr schrieb Paul Eggert :

> On 12/3/23 05:15, Andreas F. Borchert wrote:
> > You will have
> > then to possibly live with annoying warnings and odd behaviours.
>
> The C standard says that when P is a null pointer, P==P must be true
> whereas P<=P has undefined behavior. However, in practice any
> implementation where P==P succeeds but P<=P fails is a pedantic
> implementation that is merely enforcing what are arguably bugs in the
> standard. Let's not waste time worrying about implementations like that.
>

I would agree if implementations did this because of pedantry, just
because.  But the problem is if further down in the compiler pipeline, the
P <= P expression produces some intermediate code that becomes equal to
code the optimizer should rightfully optimize away (or replace by the
equivalent of __builtin_unreachable).

Gnulib should lead by example and follow the standard (on systems that
implement the standard.  I like Example 2 on this page [1] very much.  It
shows that well-justified optimizations can easily lead to consequences of
UB that make the compiler look evil (or pedantic).  We cannot circumvent
the principle of explosion ([2]).

Thanks,

Marc

--

[1] https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-In-Action/
[2] https://en.wikipedia.org/wiki/Principle_of_explosion


Re: undefined-behavior obstack.c:139

2023-12-03 Thread Marc Nieper-Wißkirchen
Am So., 3. Dez. 2023 um 10:05 Uhr schrieb Paul Eggert :

> On 2023-12-02 01:04, Bruno Haible wrote:
> > On the contrary, I will try to find and eliminate such alarms.
>
> Please don't complicate and/or slow down shared Gnulib code just to
> pacify this particular false alarm from Clang. The obstack fix was fine,
> because it made obstack clearer and no slower. But we shouldn't have to
> insert unnecessary "is the size zero?" checks merely to pacify the false
> alarms.
>

I would suggest writing macros that encapsulate code like "NULL + 0" and
replace the code with macro invocations.  On legacy systems, the macro
would expand into the legacy code; on modern systems where checks like "is
this zero size?" should be eliminated by the compiler, the macro would
expand into the correct ISO C code.  This can also make the code more
transparent as long as the macro has a self-describing name.


> The right place to fix this problem is in Clang, not in Gnulib.
>

I don't think so.  As long as Clang implements ISO C, it is not Clang's
problem but a problem of code that produces UB (in the technical sense).

Just my two cents, of course.


Re: undefined-behavior obstack.c:139

2023-12-01 Thread Marc Nieper-Wißkirchen
Am Fr., 1. Dez. 2023 um 21:01 Uhr schrieb Paul Eggert :

> On 2023-12-01 10:40, Bruno Haible wrote:
>
> > Indeed, this sentence appears to forbid ((char *) NULL) + something.
>
> Yes. However, Gnulib code can still use ((char *) NULL) + something)
> because the Gnulib portability guidelines allow it.
>
> The issue with clang false positives is covered here:
>
>
> https://www.gnu.org/software/gnulib/manual/html_node/Unsupported-Platforms.html
>
> which lists "clang -fsanitize=undefined" as an unsupported platform
> unless you also specify "-fno-sanitize=pointer-overflow".
>
> The obstack patch you installed is fine, as it's clearer and just as
> fast as the original. However, we needn't go through Gnulib and change
> other code merely because it runs afoul of this false alarm from clang.
>

It may not be a false alarm in future versions of the compiler.  At any
time, the compiler may decide to replace ((char *) NULL + 0) by
__builtin_unreachable.  It can make sense to find an ISO C-compliant
replacement of this idiom at some point.


Re: undefined-behavior obstack.c:139

2023-12-01 Thread Marc Nieper-Wißkirchen
Am Fr., 1. Dez. 2023 um 19:42 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > By 6.5.6 "Additive Operators":
> >
> > (2) "... one operator shall be a pointer to a complete object type..."
> >
> > NULL, which is a null pointer constant, is not necessarily a pointer to a
> > complete object type.
>
> In my test program, I used a variable of type 'char *'. Which is a pointer
> to a complete object type.
>

Yes, I was referring to the expression "NULL + 0" you mentioned.

[...]


Re: undefined-behavior obstack.c:139

2023-12-01 Thread Marc Nieper-Wißkirchen
By 6.5.6 "Additive Operators":

(2) "... one operator shall be a pointer to a complete object type..."

NULL, which is a null pointer constant, is not necessarily a pointer to a
complete object type.

(9) "... If the pointer operand and the result do not point to elements of
the same array object or one past the last element of the array object, the
behavior is undefined..."

NULL does not have to point to an element of an array object (or any
object; see (8)).


Am Fr., 1. Dez. 2023 um 17:02 Uhr schrieb Bruno Haible :

> Jeffrey Walton wrote:
> > I think that's a valid finding. NULL is not a valid address. You can't
> > add anything to it.
>
> Can you back this opinion with citations from ISO C 23 [1] ?
>
> Bruno
>
> [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf
>
>
>
>
>


Re: new module 'jit/cache'

2023-11-28 Thread Marc Nieper-Wißkirchen
I forgot to actually make use of the patched part in the test case; here is
the full patch.  Please excuse the mess.



Am Mo., 27. Nov. 2023 um 18:36 Uhr schrieb Marc Nieper-Wißkirchen <
marc.nieper+...@gmail.com>:

> Am Mo., 27. Nov. 2023 um 17:27 Uhr schrieb Marc Nieper-Wißkirchen <
> marc.nieper+...@gmail.com>:
>
> [...]
>
>
>
>> For the moment, I expect
>>>   - a SIGSEGV on powerpc*-aix, powerpc64-linux, ia64, hppa, and hppa64
>>> due to the structure of function pointers, see
>>>
>>> https://git.savannah.gnu.org/gitweb/?p=libffcall.git;a=blob;f=porting-tools/abis/function-pointer.txt
>>
>>
>> Thank you; I will try to incorporate this info into my test.
>>
>
> I have attached a patch file to review my ideas.  The code hasn't been
> checked on the platforms actually touched by the patch.  It has not been
> committed yet (and lacks a ChangeLog entry).
>
> Marc
>
>


jit-cache.patch
Description: Binary data


Re: new module 'jit/cache'

2023-11-27 Thread Marc Nieper-Wißkirchen
Am Mo., 27. Nov. 2023 um 17:27 Uhr schrieb Marc Nieper-Wißkirchen <
marc.nieper+...@gmail.com>:

[...]



> For the moment, I expect
>>   - a SIGSEGV on powerpc*-aix, powerpc64-linux, ia64, hppa, and hppa64
>> due to the structure of function pointers, see
>>
>> https://git.savannah.gnu.org/gitweb/?p=libffcall.git;a=blob;f=porting-tools/abis/function-pointer.txt
>
>
> Thank you; I will try to incorporate this info into my test.
>

I have attached a patch file to review my ideas.  The code hasn't been
checked on the platforms actually touched by the patch.  It has not been
committed yet (and lacks a ChangeLog entry).

Marc


jit-cache.patch
Description: Binary data


[PATCH] stack: Fix documentation in header file.

2023-11-27 Thread Marc Nieper-Wißkirchen
* lib/stack.h: Correct documentation on `stack_current_base'.
---
 ChangeLog   | 5 +
 lib/stack.h | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index db273f6907..c7d6bd4dc5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2023-11-27  Marc Nieper-Wißkirchen  
+
+   stack: Fix documentation in header file.
+   * lib/stack.h: Correct documentation on `stack_current_base'.
+
 2023-11-27  Marc Nieper-Wißkirchen  
 
jit/cache-tests: Fix include guard.
diff --git a/lib/stack.h b/lib/stack.h
index dbd6812bc8..91b1944455 100644
--- a/lib/stack.h
+++ b/lib/stack.h
@@ -31,7 +31,7 @@
  Initialization: stack_init ();
  De-initialization:  stack_destroy ();
  Predicate:  bool res = stack_empty ();
- Introspection:  ELEMENT *base = stack_base ();
+ Introspection:  ELEMENT *base = stack_current_base ();
  Pushing:stack_push (, element);
  Popping:ELEMENT element = stack_pop ();
  Discarding: stack_discard ();
-- 
2.34.1




[PATCH] jit/cache-tests: Fix include guard.

2023-11-27 Thread Marc Nieper-Wißkirchen
* tests/jit/test-cache.c (main): Extend range of include guard.
---
 ChangeLog  | 5 +
 tests/jit/test-cache.c | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 8cc61bb26b..db273f6907 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2023-11-27  Marc Nieper-Wißkirchen  
+
+   jit/cache-tests: Fix include guard.
+   * tests/jit/test-cache.c (main): Extend range of include guard.
+
 2023-11-25  Marc Nieper-Wißkirchen  
 
jit/cache-tests: New module.
diff --git a/tests/jit/test-cache.c b/tests/jit/test-cache.c
index 47c81a8b1e..3158d02117 100644
--- a/tests/jit/test-cache.c
+++ b/tests/jit/test-cache.c
@@ -49,10 +49,9 @@ return2 (void)
 int
 main ()
 {
-#if !(HAVE_SYS_MMAN_H && HAVE_SYS_MMAN_H)
+#if !(HAVE_SYS_MMAN_H && HAVE_MPROTECT)
   return 77;
-#endif
-
+#else
   int const pagesize = getpagesize ();
   unsigned char *start = pagealign_xalloc (pagesize);
   unsigned char *end = start + pagesize;
@@ -74,4 +73,5 @@ main ()
   ASSERT (f () == 2);
 
   return 0;
+#endif  /* !(HAVE_SYS_MMAN_H && HAVE_PROTECT) */
 }
-- 
2.34.1




Re: new module 'jit/cache'

2023-11-27 Thread Marc Nieper-Wißkirchen
Am Sa., 25. Nov. 2023 um 21:19 Uhr schrieb Bruno Haible :

> Hello Marc,
>
> > Here is my attempt at a simple unit test for your new cache module.
> >
> > I have tested it on an x86 box (where the instruction cache is always in
> > sync) and on an Armv8 Raspberry, both running GNU/Linux.  Commenting out
> > the call to `clear_cache' shows that `clear_cache' is not a NOP on Arm
> and
> > that the test is actually doing a non-trivial test.
>
> Very nice! It is simpler than what I had expected (no per-CPU #ifdefs so
> far).
>

I am sure they will come as soon as it is tested on more platforms than
just GNU/Linux with x86_64 and aarch64.

The alternative would be to pull in the whole GNU lightning...

I've committed your patch (as two separate patches).
>
> I will test it on various platforms.
>
> For the moment, I expect
>   - a SIGSEGV on powerpc*-aix, powerpc64-linux, ia64, hppa, and hppa64
> due to the structure of function pointers, see
>
> https://git.savannah.gnu.org/gitweb/?p=libffcall.git;a=blob;f=porting-tools/abis/function-pointer.txt


Thank you; I will try to incorporate this info into my test.

  - a link error on native Windows, due to the use of mprotect() outside
> #ifdefs.
>

Of course! I am going to fix this.


Re: new module 'jit/cache'

2023-11-25 Thread Marc Nieper-Wißkirchen
Hello Bruno,

Here is my attempt at a simple unit test for your new cache module.

I have tested it on an x86 box (where the instruction cache is always in
sync) and on an Armv8 Raspberry, both running GNU/Linux.  Commenting out
the call to `clear_cache' shows that `clear_cache' is not a NOP on Arm and
that the test is actually doing a non-trivial test.

Best,

Marc

Am Mo., 13. Nov. 2023 um 14:33 Uhr schrieb Bruno Haible :

> >   * m4/valgrind-helper.m4: New file.
>
> Oops, that file was incomplete. Fixed like this:
>
>
> 2023-11-13  Bruno Haible  
>
> jit/cache: Fix configure test.
> * m4/valgrind-helper.m4 (gl_VALGRIND_HELPER): Check already at
> configure
> time whether  exists. Fix AC_DEFINE_UNQUOTED
> invocation.
>
> diff --git a/m4/valgrind-helper.m4 b/m4/valgrind-helper.m4
> index b3d70a7ad9..99c31030b9 100644
> --- a/m4/valgrind-helper.m4
> +++ b/m4/valgrind-helper.m4
> @@ -1,4 +1,4 @@
> -# valgrind-helper.m4 serial 1
> +# valgrind-helper.m4 serial 2
>  dnl Copyright (C) 2023 Free Software Foundation, Inc.
>  dnl This file is free software; the Free Software Foundation
>  dnl gives unlimited permission to copy and/or distribute it,
> @@ -18,5 +18,13 @@ AC_DEFUN_ONCE([gl_VALGRIND_HELPER]
> support_valgrind=0
>   fi
>  ])
> -  AC_DEFINE_UNQUOTED([ENABLE_VALGRIND_SUPPORT], [$support_valgrind])
> +  if test $support_valgrind = 1; then
> +AC_CHECK_HEADERS([valgrind/valgrind.h])
> +if test $ac_cv_header_valgrind_valgrind_h != yes; then
> +  AC_MSG_ERROR([cannot enable valgrind support: 
> not found])
> +fi
> +  fi
> +  AC_DEFINE_UNQUOTED([ENABLE_VALGRIND_SUPPORT], [$support_valgrind],
> +[Define to 1 to include support for running the binaries under
> valgrind,
> + or to 0 otherwise.])
>  ])
>
>
>
>
From 335cc8f54e4510d4a9025a401cc67a7d32ce1f56 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Sat, 25 Nov 2023 18:14:55 +0100
Subject: [PATCH] jit/cache-tests: New module.

* m4/valgrind-helper.m4: Unconditionally set support_valgrind to
fix configure error.
* modules/jit/cache-tests: New file.  Mark the test as unportable
for now.
* tests/jit/test-cache.c: New file.
---
 ChangeLog   |  9 +
 m4/valgrind-helper.m4   |  3 +-
 modules/jit/cache-tests | 20 +++
 tests/jit/test-cache.c  | 77 +
 4 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 modules/jit/cache-tests
 create mode 100644 tests/jit/test-cache.c

diff --git a/ChangeLog b/ChangeLog
index 1834acd9da..89419032b3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2023-11-25  Marc Nieper-Wißkirchen  
+
+	jit/cache-tests: New module.
+	* m4/valgrind-helper.m4: Unconditionally set support_valgrind to
+	fix configure error.
+	* modules/jit/cache-tests: New file.  Mark the test as unportable
+	for now.
+	* tests/jit/test-cache.c: New file.
+
 2023-11-24  Bruno Haible  
 
 	floorf, ceilf tests: Strengthen against compiler optimizations.
diff --git a/m4/valgrind-helper.m4 b/m4/valgrind-helper.m4
index 99c31030b9..90637e6745 100644
--- a/m4/valgrind-helper.m4
+++ b/m4/valgrind-helper.m4
@@ -17,7 +17,8 @@ AC_DEFUN_ONCE([gl_VALGRIND_HELPER],
  else
support_valgrind=0
  fi
-])
+],
+[support_valgrind=0])
   if test $support_valgrind = 1; then
 AC_CHECK_HEADERS([valgrind/valgrind.h])
 if test $ac_cv_header_valgrind_valgrind_h != yes; then
diff --git a/modules/jit/cache-tests b/modules/jit/cache-tests
new file mode 100644
index 00..eff828890d
--- /dev/null
+++ b/modules/jit/cache-tests
@@ -0,0 +1,20 @@
+Files:
+tests/jit/test-cache.c
+tests/macros.h
+
+Status:
+unportable-test
+
+Depends-on:
+getpagesize
+pagealign_alloc
+stdint
+
+configure.ac:
+AC_CHECK_HEADERS_ONCE([sys/mman.h])
+AC_CHECK_FUNCS_ONCE([mprotect])
+
+Makefile.am:
+TESTS += test-cache
+check_PROGRAMS += test-cache
+test_cache_SOURCES = jit/test-cache.c
diff --git a/tests/jit/test-cache.c b/tests/jit/test-cache.c
new file mode 100644
index 00..47c81a8b1e
--- /dev/null
+++ b/tests/jit/test-cache.c
@@ -0,0 +1,77 @@
+/* Test clear_cache.
+
+   Copyright 2023 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.or

Re: test environments

2023-11-14 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Thanks for making your module available.  I have acquired a Raspberry Pi
with an ARMv8, so as soon as my time permits, I should be able to do some
testing.

Marc

Am Mo., 13. Nov. 2023 um 00:10 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > I only have access to x86_64 systems, for which `clear_cache ()' can be a
> > no-op.  Emulators aren't of much help because they usually do not
> implement
> > the need for cache invalidation faithfully.  We would need the help of
> > someone with, say, an Aarch64 system.
>
> I've added info about how to get access to test environments here:
>
> https://git.savannah.gnu.org/gitweb/?p=gnulib/maint-tools.git;a=blob;f=platforms/test-environments.txt
>
> https://git.savannah.gnu.org/gitweb/?p=gnulib/maint-tools.git;a=tree;f=platforms/environments
>
> In particular, the compile farm has 4 machines with aarch64 CPU.
> Maybe you also have an Android phone with aarch64 CPU on it (and
> can install Termux in 64-bit mode on it)?
>
> It is correct that emulators sometimes don't implement the need for
> cache invalidation correctly; I observed this for mips too. But QEMU
> is quite good nowadays; it's better than nothing.
>
> Bruno
>
>
>
>


Re: Provide a module for `clear_cache'

2023-11-06 Thread Marc Nieper-Wißkirchen
Hello Bruno,

Am Mo., 6. Nov. 2023 um 19:33 Uhr schrieb Bruno Haible :

[...]

I can provide such a thing easily. In fact, I have it already sitting
> around on my disk since 2021 :-)
>

This is great!


> > GNU
> > lightning currently calls libgcc's `__clear_cache' directly ([1]), but
> this
> > will fail on systems that do not have libgcc.
>
> And also, libgcc's __clear_cache does not always work. My comments say:
>
>   /* GCC >= 4.3 has a GCC built-in.
>  
>  But it's sometimes not correctly implemented.
>
> So, I can provide the module's code. Can you provide a unit test for it?
>

I only have access to x86_64 systems, for which `clear_cache ()' can be a
no-op.  Emulators aren't of much help because they usually do not implement
the need for cache invalidation faithfully.  We would need the help of
someone with, say, an Aarch64 system.

If we want to support a wider variety of systems, the code of Chez Scheme
may be helpful.  The `S_doflush' function ([1]) does the cache clearing.

--

[1]
https://github.com/search?q=repo%3Acisco%2FChezScheme%20S_doflush=code


Provide a module for `clear_cache'

2023-11-06 Thread Marc Nieper-Wißkirchen
The idea is to offer the builtin procedure

__builtin__clear_cache (void *begin, void *end),

which is provided at least by gcc and clang, in a portable manner.  From
the documentation of gcc's version:

"This function is used to flush the processor’s instruction cache for the
region of memory between begin inclusive and end exclusive. Some targets
require that the instruction cache be flushed, after modifying memory
containing code, in order to obtain deterministic behavior."

Clients for such a procedure are, in particular, JIT compilers.  GNU
lightning currently calls libgcc's `__clear_cache' directly ([1]), but this
will fail on systems that do not have libgcc.

Having a Gnulib module that tests for the builtin analogous to the module
`builtin-expect' ([2]) would be great.
Thanks,

Marc

--

[1] --
https://git.savannah.gnu.org/cgit/lightning.git/tree/lib/jit_aarch64.c#n73
[2] --
https://www.gnu.org/software/gnulib/MODULES.html#module=builtin-expect


Re: How to fix code blocks?

2023-01-02 Thread Marc Nieper-Wißkirchen
Happy new year, Bruno!

Yes, sorry, wrong list! :)

Marc

Am Mo., 2. Jan. 2023 um 15:42 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > For purposes of dynamically calling code, I need the following layout
> > of the assembled code:
> >
> > 0x00: 16-byte header
> > 0x10: entry point #1
> > 0x20: entry point #2
>
> Happy new year. But this topic does not fit bug-gnulib. Maybe you sent it
> to the wrong list?
>
> Bruno
>
>
>



How to fix code blocks?

2023-01-02 Thread Marc Nieper-Wißkirchen
For purposes of dynamically calling code, I need the following layout
of the assembled code:

0x00: 16-byte header
0x10: entry point #1
0x20: entry point #2

I want to use the new align/skip instructions for this.  My code
generator currently looks like this:

align (16)
header:
skip (16)
entry1:
jump label1
align (16)
entry2:
   ...
label1:
   jump entry2 ; or some other code

However, with setup, GNU lightning often "optimizes" the block order
so that entry1 is no longer at the correct offset.  How can I reliably
prohibit this?  Putting skip (1) after the second align (16) seems to
help in my experiments.

Thanks,

Marc



[PATCH] hamt: fix technically undefined behavior

2022-08-12 Thread Marc Nieper-Wißkirchen
Bug reported by Bruno Haible in
<https://lists.gnu.org/r/bug-gnulib/2022-04/msg00023.html>.
* lib/hamt.c (entry_insert): Remove technically undefined
behavior when shifting an integer of N bits by N or more bits.
---
 ChangeLog  | 8 
 lib/hamt.c | 5 +
 2 files changed, 13 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index ccf4acd03..0d04cb70e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2022-08-12  Marc Nieper-Wißkirchen  
+
+   hamt: fix technically undefined behavior
+   Bug reported by Bruno Haible in
+   <https://lists.gnu.org/r/bug-gnulib/2022-04/msg00023.html>.
+   * lib/hamt.c (entry_insert): Remove technically undefined
+   behavior when shifting an integer of N bits by N or more bits.
+
 2022-08-10  Paul Eggert  
 
stdckdint: fix dependency
diff --git a/lib/hamt.c b/lib/hamt.c
index 2b07cf23b..be9712561 100644
--- a/lib/hamt.c
+++ b/lib/hamt.c
@@ -680,6 +680,11 @@ entry_insert (const struct function_table *functions, 
Hamt_entry *entry,
   Hamt_entry *new_entry = copy_entry (*elt_ptr);
   if (replace)
 *elt_ptr = NULL;
+  /* We have to take this shortcut as shifting an integer of N
+bits by N or more bits triggers undefined behavior.
+See: 
https://lists.gnu.org/archive/html/bug-gnulib/2022-04/msg00023.html.  */
+  if (depth >= _GL_HAMT_MAX_DEPTH)
+   return (Hamt_entry *) create_populated_bucket (new_entry, copy_entry 
(entry));
   return create_populated_subtrie (new_entry, copy_entry (entry), hash,
(hash_element (functions, entry)
 >> (5 * depth)), depth);
-- 
2.34.1




Re: undefined behaviour in hamt.c

2022-04-14 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thanks for exhibiting this. The complaint by the UB sanitizer is
correct. Although the code won't use the 64-bit value shifted by 65,
it is technically UB. When I wrote the code, I erroneously assumed
that it would not be UB, but that the result of the shift would be
implementation-dependent.

This can easily be fixed, see the attached diff. Could you patch the
upstream version and repeat the clang test? (I don't have clang
installed and currently do not have the time to do a formal commit
myself.)

Marc

diff --git a/lib/hamt.c b/lib/hamt.c
index 2b07cf23b..4018ad34e 100644
--- a/lib/hamt.c
+++ b/lib/hamt.c
@@ -680,6 +680,11 @@ entry_insert (const struct function_table
*functions, Hamt_entry *entry,
   Hamt_entry *new_entry = copy_entry (*elt_ptr);
   if (replace)
 *elt_ptr = NULL;
+  /* We have to take this shortcut as shifting an integer of N
+bits by N or more bits triggers undefined behavior.
+See: 
https://lists.gnu.org/archive/html/bug-gnulib/2022-04/msg00023.html.
*/
+  if (depth >= _GL_HAMT_MAX_DEPTH)
+   return (Hamt_entry *) create_populated_bucket (new_entry,
copy_entry (entry));
   return create_populated_subtrie (new_entry, copy_entry (entry), hash,
(hash_element (functions, entry)
 >> (5 * depth)), depth);

Am Do., 14. Apr. 2022 um 01:43 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> When running the gnulib tests with clang's UndefinedBehaviorSanitizer, I got
> this output in test-hamt.log:
>
> ../../gllib/hamt.c:685:41: runtime error: shift exponent 65 is too large for 
> 64-bit type 'size_t' (aka 'unsigned long')
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> ../../gllib/hamt.c:685:41 in
> PASS test-hamt (exit status: 0)
>
> How to reproduce:
> 1. Install clang 13.
> 2. Set CC, CXX, CFLAGS, CXXFLAGS:
>CC="clang 
> -fsanitize=undefined,signed-integer-overflow,shift,integer-divide-by-zero 
> -fno-sanitize=pointer-overflow"; \
>CXX="clang++ -L/usr/lib/gcc/x86_64-linux-gnu/9 -I/usr/include/c++/9 
> -I/usr/include/x86_64-linux-gnu/c++/9 
> -fsanitize=undefined,signed-integer-overflow,shift,integer-divide-by-zero 
> -fno-sanitize=pointer-overflow"; \
>CFLAGS="-O1 -fno-omit-frame-pointer -ggdb"; \
>CXXFLAGS="-O1 -fno-omit-frame-pointer -ggdb"; \
>export CC CXX CFLAGS CXXFLAGS
>(The set of -I options in CXX depend on your local g++ installation.)
> 3. Prepare a gnulib testdir.
> 4. Configure, make, make check
> 5. Search the *.log files for 'UndefinedBehaviorSanitizer'.
>
> Bruno
>
>
>



Re: Module idx

2022-04-14 Thread Marc Nieper-Wißkirchen
Bruno, Paul,

thank you for your responses. So for a universal solution, one would
have to teach first the i18n toolchain how to interpret general PRIxxx
macros.

Am Mi., 13. Apr. 2022 um 09:05 Uhr schrieb Paul Eggert :
>
> On 4/12/22 02:12, Marc Nieper-Wißkirchen wrote:
> > I am wondering how to print (using printf) values of type idx_t
> > reliably without assuming that idx_t == ptrdiff_t and without
> > conversion to uintptr_t.
>
> I just use %td, as that works better with i18n.
>
> If we ever change idx_t to some other type (not likely) I plan to change
> the %td instances to something else then; that's easier than worrying
> about this now.



Module idx

2022-04-12 Thread Marc Nieper-Wißkirchen
I am wondering how to print (using printf) values of type idx_t
reliably without assuming that idx_t == ptrdiff_t and without
conversion to uintptr_t.

Would it make sense to add PRIxxx and SCNxxx macros (as those found in
inttypes.h) to idx.h?

Thanks,

Marc



Re: and C++

2022-01-08 Thread Marc Nieper-Wißkirchen
Am Sa., 8. Jan. 2022 um 11:51 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > I submitted a patch. An email will follow.
>
> Ah, you already pushed it. Good!

My workflow is not optimized yet. Pushing and sending out the email
with the patch has to happen in two steps.



[PATCH 1/1] c-stack: Adapt header file for use in C++ applications.

2022-01-08 Thread Marc Nieper-Wißkirchen
* lib/c-stack.h: Add extern "C" block.
---
 ChangeLog | 5 +
 lib/c-stack.h | 8 
 2 files changed, 13 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index ad525f531..64d70fd41 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2022-01-08  Marc Nieper-Wißkirchen  
+
+   c-stack: Adapt header file for use in C++ applications.
+   * lib/c-stack.h: Add extern "C" block.
+
 2022-01-08  Marc Nieper-Wißkirchen  
 
version-etc: Adapt header file for use in C++ applications.
diff --git a/lib/c-stack.h b/lib/c-stack.h
index 883d5f3ed..431450a05 100644
--- a/lib/c-stack.h
+++ b/lib/c-stack.h
@@ -15,6 +15,10 @@
You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
 
+#ifdef __cplusplus
+extern "C"
+{
+#endif
 
 /* Set up ACTION so that it is invoked on C stack overflow and on other,
stack-unrelated, segmentation violation.
@@ -47,3 +51,7 @@
signal or exercise other system dependent exception handling APIs.  */
 
 extern int c_stack_action (_GL_ASYNC_SAFE void (* /*action*/) (int));
+
+# ifdef __cplusplus
+}
+# endif
-- 
2.32.0




[PATCH 1/1] version-etc: Adapt header file for use in C++ applications.

2022-01-08 Thread Marc Nieper-Wißkirchen
* lib/version-etc.h: Add extern "C" block.
---
 ChangeLog | 5 +
 lib/version-etc.h | 9 +
 2 files changed, 14 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 60ca624f1..ad525f531 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2022-01-08  Marc Nieper-Wißkirchen  
+
+   version-etc: Adapt header file for use in C++ applications.
+   * lib/version-etc.h: Add extern "C" block.
+
 2022-01-07  Bruno Haible  
 
sigsegv: Improve support for Linux/LoongArch64.
diff --git a/lib/version-etc.h b/lib/version-etc.h
index 93fbdca42..c6b4eef4b 100644
--- a/lib/version-etc.h
+++ b/lib/version-etc.h
@@ -22,6 +22,11 @@
 # include 
 # include 
 
+# ifdef __cplusplus
+extern "C"
+{
+# endif
+
 extern const char version_etc_copyright[];
 
 /* The three functions below display the --version information in the
@@ -66,4 +71,8 @@ extern void version_etc (FILE *stream,
 /* Display the usual "Report bugs to" stanza.  */
 extern void emit_bug_reporting_address (void);
 
+# ifdef __cplusplus
+}
+# endif
+
 #endif /* VERSION_ETC_H */
-- 
2.32.0




Re: and C++

2022-01-08 Thread Marc Nieper-Wißkirchen
Am Fr., 7. Jan. 2022 um 21:54 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > I just noticed that  doesn't include extern "C" guards
> > so that inclusion in a C++ source file becomes easier (see 4.2 in the
> > Gnulib manual).
>
> Patches are welcome.

I submitted a patch. An email will follow.

>
> > I don't know what the general policy is
>
> For these extern "C" guards, we do it on demand, i.e. the first time
> it gets reported.

Thanks,

Marc



and C++

2022-01-07 Thread Marc Nieper-Wißkirchen
I just noticed that  doesn't include extern "C" guards
so that inclusion in a C++ source file becomes easier (see 4.2 in the
Gnulib manual).

I don't know what the general policy is, but including 
in a C++ makes sense to me and could be made easier.

Marc

PS I haven't systematically checked other header files.



Re: Bytewise u??_conv_from_encoding

2022-01-06 Thread Marc Nieper-Wißkirchen
Am Mi., 5. Jan. 2022 um 21:59 Uhr schrieb Bruno Haible :
>
> Hello Marc,
>

[...]

> > If I understand your classification correctly, I meant something more
> > like (E) than (D), I think. As an interface, I would propose would be
> > something along the following lines:
> >
> > decoder_t d = decoder_create (iconveh_t *cd);
> > switch (decoder_push (d, byte))
> >   {
> >   case DECODER_BYTE_READ:
> > char *res = decoder_result (d);
> > size_t len = decoder_length (d);
> > ...
>
> What does the programmer do here with res and len? This is where things
> get complex.

RES will point to an array of bytes of length LEN that holds a decoded
string (of multibyte characters). The consumer could then process the
result and call `decoder_push` again when the previous result has been
fully processed.

>
> >   case DECODER_EOF:
> > ...
> >   case DECODER_INCOMPLETE:
> > ...
> >   case DECODER_ERROR:
> > ...
> >   }
> > ...
> > decoder_destroy (d);
>
> What you describe here is (D), in my view.
>
> (E) would look like this:
>
>   extern decoder_t create_decoder_context (void);
>   extern void push_bytes_into_decoder (const char *p, size_t n, decoder_t);
>   extern void free_decoder_context (decoder_t);

Isn't this my solution from above? Only that in my interface,
`decoder_push` only takes a single byte (but that could be changed, of
course). How would one get the decoded result in your API?

> > > (B) means to use a different programming language. I can't recommend C++ 
> > > [1].
> >
> > The main problem I see with C++'s coroutines is that they are
> > stackless coroutines; their expressiveness is tiny compared to
> > languages with full coroutine support, to say nothing of programming
> > languages like Scheme with its first-class continuations.
>
> It doesn't surprise me. 'constexpr', another new addition to C++, similarly
> does only a fraction of what would be useful.

As a side note, constexpr may also become part of C23, but I haven't
looked into the details. Could you briefly say for someone who doesn't
know much about constexpr where the shortcomings of C++'s constexpr
are?

> > > (C) is possible, but complex. See e.g. gnulib's pipe-filter-ii.c or
> > > pipe-filter-gi.c. Generally, threads are overkill when all you need are
> > > coroutines.
> >
> > I agree. Unfortunately, Posix's response to dropping makecontext and
> > friends seems to be to use threads. It would be great if C had a
> > lightweight context-swapping mechanism.
>
> Maybe. I think setcontext() has a severe problem; see
> .

makecontext also has the problem that it only takes integer variables
as arguments. This wasn't a problem on platforms where the size of an
int is the size of a pointer.

Another problem is that the context switching operations are not
necessarily faster than the threading API because they also have to go
through kernel code as the signal mask is part of the context. This
makes them unusable for most coroutines when they are supposed to be
fast.

That's why I have been advocating a *lightweight* context-swapping
mechanism that just saves and restores the general registers
(including the stack pointer).

> > By the way, libunistring's u??_conv_from_encoding does not seem to be
> > adapted to consuming buffers. The problem is that one doesn't know in
> > advance where boundaries of multi-byte sequences are so
> > u??_conv_from_encoding will likely signal a decoding error.
>
> Yes, u??_conv_from_encoding is made for converting entire strings.
> If you want to restart conversion after some bytes that are part of
> a multibyte character, you need the low-level iconv().

Okay, thanks.

Marc



Re: Stack module and -Wsign-compare

2022-01-06 Thread Marc Nieper-Wißkirchen
Thank you, Paul. My local tests also show that it behaves better now.

Am Mi., 5. Jan. 2022 um 20:54 Uhr schrieb Paul Eggert :
>
> On 1/5/22 10:00, Bruno Haible wrote:
> > Another possible fix would be to change
> >size_t size;
> > to
> >idx_t size;
> > in the struct.
>
> Yes, that fits better into our strategy of preferring signed to unsigned
> types for sizes. Plus, it avoids a cast that's too powerful in C. I
> installed the attached.



Stack module and -Wsign-compare

2022-01-05 Thread Marc Nieper-Wißkirchen
Commit 3dc36216f168f4e752b648b19d85eab32a037827 by Paul Eggert
introduced a regression in the stack module.

If "stack.h" is included by client code that is supposed to be
compiled with "-Wsign-compare", the compiler will complain about the
comparison on line 121.

There wasn't a problem before commit
3dc36216f168f4e752b648b19d85eab32a037827 because in the previous
version both the "size" and the "allocated" field of a stack had the
same (unsigned) type.

As telling clients to compile their code without "-Wsign-compare"
isn't an option, I would like to fix it.

The simplest fix is to replace the comparison by:

stack->size == (size_t) stack->allocated

This shouldn't hide any overflows because stack->allocated is
non-negative and holds a number of allocated items and the size_t type
is by definition large enough to hold any number of allocated items.

Marc



Re: [striconveh] Error handling and Unicode replacement character

2022-01-05 Thread Marc Nieper-Wißkirchen
Dear Bruno,

thank you for responding so quickly and for this addition!

Marc

Am Sa., 1. Jan. 2022 um 19:55 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote on 2021-12-30:
> > The striconveh module and related modules offer an error handler
> > argument. The current possible values are:
> >
> > iconveh_error
> > iconveh_question_mark
> > iconveh_escape_sequence
> >
> > The second option replaces any unconvertible character with a question mark 
> > "?".
> >
> > I would like to request to add a fourth option, say,
> > iconveh_replacement_character, which is like iconveh_question_mark but
> > uses U+FFFD whenever the target codeset is a Unicode codeset.
>
> That's a good suggestion, as nowadays people are frequently converting
> to UTF-8 or GB18030. Implemented as follows.
>
>
> 2022-01-01  Bruno Haible  
>
> striconveh: Support an error handler that produces a Unicode U+FFFD.
> Suggested by Marc Nieper-Wißkirchen in
> <https://lists.gnu.org/archive/html/bug-gnulib/2021-12/msg00175.html>.
> * lib/iconveh.h (iconveh_replacement_character): New enum value.
> * lib/striconveh.c (mem_cd_iconveh_internal): When the handler is
> iconveh_replacement_character, try to produce U+FFFD when possible,
> instead of '?'.
> * tests/test-striconveh.c (main): Add GB18030 tests. Test also
> iconveh_replacement_character.
>
> diff --git a/lib/iconveh.h b/lib/iconveh.h
> index d321d34cb..058f68ca2 100644
> --- a/lib/iconveh.h
> +++ b/lib/iconveh.h
> @@ -29,7 +29,10 @@ enum iconv_ilseq_handler
>  {
>iconveh_error,/* return and set errno = EILSEQ */
>iconveh_question_mark,/* use one '?' per unconvertible character */
> -  iconveh_escape_sequence   /* use escape sequence \u or \U 
> */
> +  iconveh_escape_sequence,  /* use escape sequence \u or \U 
> */
> +  iconveh_replacement_character /* use one U+FFFD per unconvertible character
> +   if that fits in the target encoding,
> +   otherwise one '?' */
>  };
>
>
> diff --git a/lib/striconveh.c b/lib/striconveh.c
> index 4aa8a2f07..612c38c3e 100644
> --- a/lib/striconveh.c
> +++ b/lib/striconveh.c
> @@ -457,13 +457,18 @@ mem_cd_iconveh_internal (const char *src, size_t srclen,
>  if (cd2 == (iconv_t)(-1))
>{
>  /* TO_CODESET is UTF-8.  */
> -/* Error handling can produce up to 1 byte of output.  */
> -if (length + 1 + extra_alloc > allocated)
> +/* Error handling can produce up to 1 or 3 bytes of
> +   output.  */
> +size_t extra_need =
> +  (handler == iconveh_replacement_character ? 3 : 1);
> +if (length + extra_need + extra_alloc > allocated)
>{
>  char *memory;
>
>  allocated = 2 * allocated;
> -if (length + 1 + extra_alloc > allocated)
> +if (length + extra_need + extra_alloc > allocated)
> +  allocated = 2 * allocated;
> +if (length + extra_need + extra_alloc > allocated)
>abort ();
>  if (result == initial_result)
>memory = (char *) malloc (allocated);
> @@ -482,7 +487,7 @@ mem_cd_iconveh_internal (const char *src, size_t srclen,
>  grow = false;
>}
>  /* The input is invalid in FROM_CODESET.  Eat up one byte
> -   and emit a question mark.  */
> +   and emit a replacement character or a question mark.  
> */
>  if (!incremented)
>{
>  if (insize == 0)
> @@ -490,8 +495,19 @@ mem_cd_iconveh_internal (const char *src, size_t srclen,
>  inptr++;
>  insize--;
>}
> -result[length] = '?';
> -length++;
> +if (handler == iconveh_replacement_character)
> +  {
> +/* U+FFFD in UTF-8 encoding.  */
> +result[length+0] = '\357';
> +result[length+1] = '\277';
> +result[length+2] = '\275';
> +length += 3;
> +  }
> +  

Re: Bytewise u??_conv_from_encoding

2022-01-01 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thanks for your insights, valuable as always.

Am Sa., 1. Jan. 2022 um 13:57 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > The demand to read a file (in local encoding) and to decode it
> > incrementally seems a typical one.
>
> There are four ways to satisfy this demand.
>
> (A) Using a pipe at the shell level:
>   iconv -t UTF-8 | my-program
>
> (B) Using a programming language that has a coroutines concept.
> This way, both the decoder and the consumer can be programmed in
> a straightforward manner.
>
> (C) In C, with multiple threads.
>
> (D) In C, with a decoder programmed in a straightforward manner
> and a consumer that is written as a callback with state.
>
> (E) In C, with a decoder written as a callback with state
> and a consumer programmed in a straightforward manner.
>
> > Thus, I am wondering whether it makes sense to offer a stateful
> > decoder that takes byte by byte and signals as soon as a decoded byte
> > sequence is ready.
>
> It seems that you are thinking of approach (D).

> I think (D) is the worst, because writing application code in a callback
> style with state is hard and error-prone. I would favour (E) instead,
> if (A) is not possible.

If I understand your classification correctly, I meant something more
like (E) than (D), I think. As an interface, I would propose would be
something along the following lines:

decoder_t d = decoder_create (iconveh_t *cd);
switch (decoder_push (d, byte))
  {
  case DECODER_BYTE_READ:
char *res = decoder_result (d);
size_t len = decoder_length (d);
...
  case DECODER_EOF:
...
  case DECODER_INCOMPLETE:
...
  case DECODER_ERROR:
...
  }
...
decoder_destroy (d);

> (B) means to use a different programming language. I can't recommend C++ [1].

The main problem I see with C++'s coroutines is that they are
stackless coroutines; their expressiveness is tiny compared to
languages with full coroutine support, to say nothing of programming
languages like Scheme with its first-class continuations.

> (C) is possible, but complex. See e.g. gnulib's pipe-filter-ii.c or
> pipe-filter-gi.c. Generally, threads are overkill when all you need are
> coroutines.

I agree. Unfortunately, Posix's response to dropping makecontext and
friends seems to be to use threads. It would be great if C had a
lightweight context-swapping mechanism.

> Now, when implementing (E), it will be useful to have some kind of "abstract
> input stream" data type. Such a thing does not exist in C, for historical
> reasons. But it can be done similarly to the "abstract output stream" data
> type that is at the heart of GNU libtextstyle [2][3][4].

I will have to take a closer look at that library.

> > On top of that, a decoding Unicode mbfile interface can be built, say 
> > ucfile.
>
> One of the problems of byte-by-byte decoding is that it's inefficient. It's
> way more efficient to do the same task (decoding, consuming) on an entire
> buffer of, say, at least 1 KiB. Buffering minimizes the context switches and
> time spent in function entry/exit. That needs to be considered in the design.

The mbfile interface tries hard not to read more than necessary in
advance to support interactive streams. That possibility should be
preserved, I think.

In my API proposal above, decoder_push can be redesigned to look as follows:

int decoder_push (decoder_t decoder, char *src, size_t srclen)

By the way, libunistring's u??_conv_from_encoding does not seem to be
adapted to consuming buffers. The problem is that one doesn't know in
advance where boundaries of multi-byte sequences are so
u??_conv_from_encoding will likely signal a decoding error.

What would be more helpful would be a version of
u??_conv_from_encoding that returns the decoded part of the string
before the invalid sequence and that gives the position of the invalid
sequence. For piping purposes, it would still not be very comfortable
because one then would have to copy by hand the undecoded part of the
string to the beginning of the buffer and refill the rest of the
buffer from the source.

Marc



Bytewise u??_conv_from_encoding

2022-01-01 Thread Marc Nieper-Wißkirchen
The demand to read a file (in local encoding) and to decode it
incrementally seems a typical one.

With Gnulib, this can be done using the mbfile module to read in the
multibytes byte-by-byte and then using the striconveh module to decode
the multibytes in, say, UTF-8 or UTF-32.

This, however, doesn't seem to be very efficient because the
multibytes have to be investigated at least twice; once by the mbfile
iterator and once by the striconveh iterator.

Thus, I am wondering whether it makes sense to offer a stateful
decoder that takes byte by byte and signals as soon as a decoded byte
sequence is ready.

On top of that, a decoding Unicode mbfile interface can be built, say ucfile.

Thanks,

Marc



[striconveh] Error handling and Unicode replacement character

2021-12-30 Thread Marc Nieper-Wißkirchen
The striconveh module and related modules offer an error handler
argument. The current possible values are:

iconveh_error
iconveh_question_mark
iconveh_escape_sequence

The second option replaces any unconvertible character with a question mark "?".

I would like to request to add a fourth option, say,
iconveh_replacement_character, which is like iconveh_question_mark but
uses U+FFFD whenever the target codeset is a Unicode codeset.

Thanks,

Marc



Re: Non-recursive Make and prefix-gnulib-mk

2021-12-16 Thread Marc Nieper-Wißkirchen
Thanks for the tips!

It's working perfectly now.

Marc

Am Do., 16. Dez. 2021 um 18:08 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > Am I supposed to set ${gl_source_base_prefix} somewhere?
>
> No. gnulib-tool defines gl_source_base_prefix as appropriate, depending
> on the command-line arguments that you pass to it.
>
> > If it helps:  I
> > am using a configuration based on bootstrap and I have set source_base
> (and
> > tests_base) in bootstrap.conf.
>
> 'bootstrap' does not yet know about the --automake-subdir option. Therefore
> better take a look at the (working, tested) example
> gnulib/examples/hello-c-gnulib-automakesubdir/. And at the coreutils patch
> that I posted today [1].
>
> Bruno
>
> [1] https://lists.gnu.org/archive/html/bug-gnulib/2021-12/msg00092.html
>
>
>
>


Re: Non-recursive Make and prefix-gnulib-mk

2021-12-16 Thread Marc Nieper-Wißkirchen
Dear Bruno,

thank you very much for the quick fix.

I am currently testing it.  I still have a problem with the Makefile lines
eventually produced by gl_CONDITIONAL_HEADER in m4/gnulib-common.m4.

Am I supposed to set ${gl_source_base_prefix} somewhere?  If it helps:  I
am using a configuration based on bootstrap and I have set source_base (and
tests_base) in bootstrap.conf.

Thanks,

Marc

Am Mi., 15. Dez. 2021 um 22:43 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > I'm using build-aux/prefix-gnulib-mk to rewrite the Gnulib Makefile
> > fragment so that it can be included by a Makefile in a top-level
> directory.
> >
> > What I haven't managed to get working, though, is renaming the Gnulib
> lib/
> > directory at the same time (by setting $source_base in bootstrap.conf).
> The
> > Perl script build-aux/prefix-gnulib-mk still prefixes the Automake
> > variables with lib_ as in
> >
> > lib_libgnu_la_SOURCES =
> >
> > I don't know much Perl but the relevant source line seems to be
> >
> >
> https://git.savannah.gnu.org/cgit/gnulib.git/tree/build-aux/prefix-gnulib-mk#n165
> >
> > The issue is possibly fixed just by replacing the string lib_ by %C% (see
> > [1]).
>
> Fixed through this patch.
>
> You are right, %C% would also fix it. But since we are generating the
> Makefile
> part anyway and know where it is located, we can just as well put in the
> actual
> relative subdirectory. This makes the generated code easier to understand.
>
>
> 2021-12-15  Bruno Haible  
>
> automake-subdir support: Support arbitrary --source-base value.
> Reported by Marc Nieper-Wißkirchen in
> <
> https://lists.gnu.org/archive/html/bug-gnulib/2021-12/msg00066.html>.
> * build-aux/prefix-gnulib-mk ($canon_prefix): New variable.
> (prefix): Initialize it.
> (prefix_assignment): Use it.
>
> diff --git a/build-aux/prefix-gnulib-mk b/build-aux/prefix-gnulib-mk
> index 36f7527fc..4f300c443 100755
> --- a/build-aux/prefix-gnulib-mk
> +++ b/build-aux/prefix-gnulib-mk
> @@ -40,6 +40,7 @@ use File::Basename; # for dirname
>  (my $ME = $0) =~ s|.*/||;
>
>  my $prefix;
> +my $canon_prefix;
>  my $lib_name;
>
>  sub usage ($)
> @@ -171,7 +172,7 @@ sub prefix_assignment ($$)
>
># Variables whose name depend on the location: libbison_a_SOURCES =>
># lib_libbison_a_SOURCES.
> -  $lhs_and_assign_op =~ s/($lib_name)/lib_$1/g;
> +  $lhs_and_assign_op =~ s/($lib_name)/$canon_prefix$1/g;
>
>$lhs_and_assign_op . $rhs;
>  }
> @@ -187,6 +188,12 @@ sub prefix ($)
># Work on $_.
>local ($_) = @_;
>
> +  # $canon_prefix is derived from $prefix in the same way as Automake
> +  # derives %canon_reldir% from %reldir%. See
> +  # <https://www.gnu.org/software/automake/manual/html_node/Include.html
> >.
> +  $canon_prefix = $prefix;
> +  $canon_prefix =~ s/[^a-zA-Z0-9_]/_/g;
> +
># Prefix all the occurrence of files in rules.  If there is nothing
># after in the :, it's probably a phony target, or a suffix rule.
># Don't touch it.
>
>
>
>


Non-recursive Make and prefix-gnulib-mk

2021-12-11 Thread Marc Nieper-Wißkirchen
Hi,

I'm using build-aux/prefix-gnulib-mk to rewrite the Gnulib Makefile
fragment so that it can be included by a Makefile in a top-level directory.

What I haven't managed to get working, though, is renaming the Gnulib lib/
directory at the same time (by setting $source_base in bootstrap.conf). The
Perl script build-aux/prefix-gnulib-mk still prefixes the Automake
variables with lib_ as in

lib_libgnu_la_SOURCES =

I don't know much Perl but the relevant source line seems to be

https://git.savannah.gnu.org/cgit/gnulib.git/tree/build-aux/prefix-gnulib-mk#n165

The issue is possibly fixed just by replacing the string lib_ by %C% (see
[1]).

Thanks,

Marc

--

[1] https://www.gnu.org/software/automake/manual/automake.html#Include


Re: Signaling NaNs

2021-12-09 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thanks for replying so quickly!

My intention is neither to feed the signaling NaN into floating-point
operations nor to cause it an exception to be raised.  What I really want
to do is to model a type whose value is either a floating-point number
(including infinities and the NaNs returned by the floating-point
functions) or a sentinel value. Furthermore, I want this to fit into the
size of a double.

The reason why I thought of using the bit pattern of a signaling NaN is
that such a signaling NaN wouldn't be returned by the usual floating-point
functions. A quiet NaN, which wouldn't either, would also work for my
purpose.  This would correspond to the first of your three suggestions.  So
I could reformulate my question to: "Is there a way to produce a (quiet)
NaN that won't occur as a result of the C library functions?" This seems to
be possible on most architectures but will need specific code for some
architectures, which is why I thought of Gnulib.

Thanks,

Marc

Am Do., 9. Dez. 2021 um 22:49 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > I have been searching through the list of modules but haven't been able
> to
> > find it:  Does Gnulib offer a way to store a signaling NaN in a memory
> > location (if supported by the platform)
>
> It doesn't, because quiet NaNs are easier to work with.
>
> > The forthcoming C2x standard will have FLT_SNAN, DBL_SNAN, and LDBL_SNAN,
> > but before that there doesn't seem to be a portable way to get signaling
> > NaNs.
>
> ... indeed, when you have a non-optimizing compiler, how to prevent the
> compiler from generating instructions that produce a floating-point
> exception
> earlier than desired?
>
> > and some way to check a memory
> > location whether it contains a signaling NaN?
>
> Why would you need that? If you are using floating-point operations
> (such as addition, square root, etc.) the signaling NaN will produce an
> exception, as desired. Whereas if you are implementing some extra floating-
> point operation by looking at the precise bit pattern (using integer
> arithmetic), the standards [1][2] tell you which bit pattern to look for.
>
> > In case the following problem is easier: What I really need is a bit
> > pattern for a double that won't be returned by the usual floating-point
> > functions.
>
> When you look at [1] and [2], all bit patterns have a meaning.
>
> So, you could
>   - either use one of the many particular quiet NaN values, and hope that
> no other code produces it, or
>   - (like what the IA64 CPU does in hardware) work with 65-bit units,
> where the 65th bit means "uninitialized", or
>   - reserve a couple of extra bits for a type tag on every value, like
> what some Lisp implementations do (e.g. GNU clisp).
> I think valgrind uses one of the latter two techniques as well.
>
> Bruno
>
> [1] https://en.wikipedia.org/wiki/Single-precision_floating-point_format
> [2] https://en.wikipedia.org/wiki/Double-precision_floating-point_format
>
>
>
>


Signaling NaNs

2021-12-09 Thread Marc Nieper-Wißkirchen
Hi,

I have been searching through the list of modules but haven't been able to
find it:  Does Gnulib offer a way to store a signaling NaN in a memory
location (if supported by the platform) and some way to check a memory
location whether it contains a signaling NaN?

The forthcoming C2x standard will have FLT_SNAN, DBL_SNAN, and LDBL_SNAN,
but before that there doesn't seem to be a portable way to get signaling
NaNs.

In case the following problem is easier: What I really need is a bit
pattern for a double that won't be returned by the usual floating-point
functions.

Thanks,

Marc


Inline warnings in gl_xlist.h

2021-07-21 Thread Marc Nieper-Wißkirchen
When compiling my code with "-Winline", I suddenly get a warning from
calling gl_list_add_last: "Call is unlikely and code size would grow."

Is there a way to suppress this warning using Gnulib or can the inline
functions of gl_xlist.h be implemented in a way so that this warning cannot
show up?


Re: Module suggestion: timeout

2021-04-30 Thread Marc Nieper-Wißkirchen
Adding timeout to all build operations automatically is probably too much
and too intrusive.

But in any case, it should be easy to add such a timeout to certain build
operations (e.g. downloading, running a compiler for which the halting
problem cannot be solved, testing a production binary, ...).

Am Fr., 30. Apr. 2021 um 11:43 Uhr schrieb Simon Josefsson <
si...@josefsson.org>:

> Marc Nieper-Wißkirchen  writes:
>
> > Moreover, use cases for a baked-in timeout are not restricted to tests.
> For
> > example, I may want to restrict the build time of certain components in
> > situations where a logical error may lead to infinite build times (a
> simple
> > example is that of a Scheme compiler used as a build tool; thanks to
> > Turing-completeness of Scheme macros, such a build may not terminate).
>
> This makes me believe even stronger that the functionality ought to be
> provided by automake natively -- it seems the desired functionality is
> not only timeouts for self-tests but timeouts for all operations.
>
> Implementing this for self-tests in automake would probably be quite
> simple, but implementing it for all operations is probably more
> complicated.  Maybe it should be two separate features.  I have wanted
> timeouts for self-tests but rarely for building.  If the second is more
> complicated to implement, maybe starting with the first will be
> sufficient and useful.
>
> /Simon
>


Re: Module suggestion: timeout

2021-04-30 Thread Marc Nieper-Wißkirchen
Dear Bruno, dear Simon,

thank you for your replies.

I understand that valgrind-tests and the proposed "timeout-tests" solution
are not completely equivalent. Nevertheless, I still think that some
timeout functionality provided by Gnulib would be useful.

Bruno's solution

**
#if HAVE_DECL_ALARM
  /* Declare failure if test takes too long, by using default abort
 caused by SIGALRM.  */
  int alarm_value = 600;
  signal (SIGALRM, SIG_DFL);
  alarm (alarm_value);
#endif
**

works for unit tests that have been written specifically for the project.
It doesn't help, though, if I want to test (production) binaries that are
to be installed because I generally don't want testing code inside them.

Moreover, use cases for a baked-in timeout are not restricted to tests. For
example, I may want to restrict the build time of certain components in
situations where a logical error may lead to infinite build times (a simple
example is that of a Scheme compiler used as a build tool; thanks to
Turing-completeness of Scheme macros, such a build may not terminate).

Marc

Am Fr., 30. Apr. 2021 um 11:24 Uhr schrieb Simon Josefsson <
si...@josefsson.org>:

> Bruno Haible  writes:
>
> > So, I don't think the "let's treat timeout like valgrind" approach is
> going
> > to work. Instead, you need to design a way to deal with timeouts,
> independently.
>
> Hi!  I think Marc's request for functionality to introduce timeouts for
> self-tests is a good one.  However I reach the same conclusion as Bruno,
> that having a module like valgrind-tests is probably not the best way to
> solve it.  To me, having a timeout seems like an essential feature of a
> self-test framework.  I know automake isn't primarily a self-test
> framework, but it has concepts for it and the test framework has been
> improved significantly over the years, so I think adding a timeout
> functionality to automake makes sense.  What do bug-automake people
> think?
>
> The functionality could be conditioned on the coreutils 'timeout' tool,
> and if that tool exists, and appears to work, running all self-tests
> under that tool could be done automatically.  The default self-test
> timeout be quite generous (say 17 hours?) but it should be easy to
> modify both by end-user and project developer.  If we want to be
> conservative, the functionality could be opt-in initially, and then
> after a few years become the default behaviour.
>
> Thoughts?
>
> /Simon
>


Module suggestion

2021-04-29 Thread Marc Nieper-Wißkirchen
Gnulib contains a module named valgrind that allows the easy use of
Valgrind for tests whenever it is supported by the build system.

I would suggest to add a similar module named timeout that sets the
variable TIMEOUT with suitable defaults whenever the GNU coreutils timeout
program (or an equivalent) is available.

This would allow test runners to be unconditionally prefixed with
$(TIMEOUT) so that on supporting systems, tests are killed (and an error is
reported) whenever they run for too long (mostly because of a logic error
causing an infinite loop).

Thanks,

Marc


Fwd: guile-devel Digest, Vol 221, Issue 18

2021-04-28 Thread Marc Nieper-Wißkirchen
Hi!

In Guile's digest I noticed a comment about replacing libltdl (see below)
with the standard `dlopen' interface and providing a shim for systems not
supporting it directly.

Do Guile's observations apply in general? Would it make sense for the
Gnulib project to provide such a shimmed version of `dlopen' making libltdl
obsolete?

Thanks,

Marc

-- Forwarded message -
Von: 
Date: Mi., 28. Apr. 2021 um 18:23 Uhr
Subject: guile-devel Digest, Vol 221, Issue 18
To: 


Send guile-devel mailing list submissions to
guile-de...@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.gnu.org/mailman/listinfo/guile-devel
or, via email, send a message with subject or body 'help' to
guile-devel-requ...@gnu.org

You can reach the person managing the list at
guile-devel-ow...@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of guile-devel digest..."


Today's Topics:

   1. GNU Guile 3.0.6 released (Andy Wingo)
   2. Re: GNU Guile 3.0.6 released (Jérémy Korwin-Zmijowski)


--

Message: 1
Date: Wed, 28 Apr 2021 09:17:28 +0200
From: Andy Wingo 
To: guile-u...@gnu.org, guile-sour...@gnu.org, guile-de...@gnu.org,
info-...@gnu.org
Subject: GNU Guile 3.0.6 released
Message-ID: <87lf92op2v@pobox.com>
Content-Type: text/plain; charset=utf-8

We are pleased to announce GNU Guile release 3.0.6, the latest in the
3.0 stable release series.

Compared to the previous release in the 3.0 series, Guile 3.0.6 improves
source-location information for compiled code, removes the dependency on
libltdl, fixes some important bugs, adds an optional bundled "mini-gmp"
library, as well as the usual set of minor optimizations and bug fixes.

Compared to the previous stable series (2.2.x), Guile 3.0 adds support
for just-in-time native code generation, speeding up all Guile programs.
See the NEWS extract at the end of the mail for full details.


The Guile web page is located at http://gnu.org/software/guile/, and
among other things, it contains a copy of the Guile manual and pointers
to more resources.

Guile is an implementation of the Scheme programming language, packaged
for use in a wide variety of environments.  In addition to implementing
the R5RS, R6RS, and R7RS Scheme standards, Guile includes full access to
POSIX system calls, networking support, multiple threads, dynamic
linking, a foreign function call interface, powerful string processing,
and HTTP client and server implementations.

Guile can run interactively, as a script interpreter, and as a Scheme
compiler to VM bytecode.  It is also packaged as a library so that
applications can easily incorporate a complete Scheme interpreter/VM.
An application can use Guile as an extension language, a clean and
powerful configuration language, or as multi-purpose "glue" to connect
primitives provided by the application.  It is easy to call Scheme code
from C code and vice versa.  Applications can add new functions, data
types, control structures, and even syntax to Guile, to create a
domain-specific language tailored to the task at hand.

Guile 3.0.6 can be installed in parallel with Guile 2.2.x; see
http://www.gnu.org/software/guile/manual/html_node/Parallel-Installations.html
.

A more detailed NEWS summary follows these details on how to get the
Guile sources.

Here are the compressed sources:
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.lz   (10MB)
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.xz   (13MB)
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.gz   (21MB)

Here are the GPG detached signatures[*]:
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.lz.sig
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.xz.sig
  http://ftp.gnu.org/gnu/guile/guile-3.0.6.tar.gz.sig

Use a mirror for higher download bandwidth:
  http://www.gnu.org/order/ftp.html

Here are the SHA256 checksums:

  2e5b9e3e56a967a58ae591053a65c1851875bf5e06c60caab409d5647cff4975
guile-3.0.6.tar.lz
  e2bd83c2077d721356e7579ca33097a13a20e2b7eda6c2362ee1166fbc845d28
guile-3.0.6.tar.xz
  000fc43c1b0a5cfbd85b67e01afd58e847bd1f279e3439bb7db37282b0459f56
guile-3.0.6.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify guile-3.0.6.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys
4FD4D288D445934E0A14F9A5A8803732E4436885

and rerun the 'gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.71
  Automake 1.16.2
  Libtool 2.4.6
  Gnulib v0.1-4551-ga3a946f670
  Makeinfo 6.7

An extract from NEWS follows.


Changes in 3.0.6 (since 3.0.5)

* Notable changes

** Reimplement dynamic library loading ("dlopening") 

Re: Autogenerated header files not included in .gitignore

2021-04-16 Thread Marc Nieper-Wißkirchen
PS:

Alternatively to the suggestion of a new gnulib-tool option
"--vc-files=...", one could also make Gnulib's behavior dependent on the
presence of a file, say ".gnulib-in-tree-build". This has the advantage
that no source files have to be changed to switch from one behavior to the
other but that, in principle, a developer can decide to switch to in-tree
builds by touching such a file locally.

Am Fr., 9. Apr. 2021 um 13:16 Uhr schrieb Marc Nieper-Wißkirchen <
marc.nieper+...@gmail.com>:

> I have been experimenting a bit with building in a dedicated build
> directory. Nevertheless, I still end up with artefacts in the source
> directory, which are not covered by Gnulib's .gitignore files, for example
> with:
>
> lib/iconv_open-aix.h
> lib/iconv_open-hpux.h
> lib/iconv_open-irix.h
> lib/iconv_open-osf.h
> lib/iconv_open-solaris.h
> lib/iconv_open-zos.h
>
> Thanks,
>
> Marc
>
> Am Sa., 3. Apr. 2021 um 12:22 Uhr schrieb Bruno Haible :
>
>> Marc Nieper-Wißkirchen wrote:
>> > Would it do much harm if ".gitignore" included all built files even for
>> > developers that work in a dedicated build tree?
>>
>> The developers who are using the first approach will not like this.
>>
>> Bruno
>>
>>


Re: Autogenerated header files not included in .gitignore

2021-04-09 Thread Marc Nieper-Wißkirchen
I have been experimenting a bit with building in a dedicated build
directory. Nevertheless, I still end up with artefacts in the source
directory, which are not covered by Gnulib's .gitignore files, for example
with:

lib/iconv_open-aix.h
lib/iconv_open-hpux.h
lib/iconv_open-irix.h
lib/iconv_open-osf.h
lib/iconv_open-solaris.h
lib/iconv_open-zos.h

Thanks,

Marc

Am Sa., 3. Apr. 2021 um 12:22 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > Would it do much harm if ".gitignore" included all built files even for
> > developers that work in a dedicated build tree?
>
> The developers who are using the first approach will not like this.
>
> Bruno
>
>


Re: Modules xsize and idx

2021-04-07 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thanks for replying so quickly.

Let's assume I have a procedure

void *foo_create (size_t n)
{
  void *foo = malloc (a + n * b);
  if (foo == NULL) ...;
  ...
  return foo;
}

I want 'foo_create' to handle possible overflows. To me, it seems that
should use the xsize module for this and to replace 'a + n * b' accordingly
because I have no easy local control over "-ftrapv". This seems to force,
however, that the 'n' parameter of the 'foo_create' has to be a size_t and
not an idx_t. Unless I want possibly unsafe casts in my program, this
forces the code interacting with 'foo_create' to use 'size_t' instead of
'idx_t' as well, which somewhat seems to forfeit the advantages of 'idx_t'.

That's why I am wondering whether it makes sense to have an xsize module
that uses idx_t instead of size_t.

PS Another question related to idx_t: Often I code something like:

void f (size_t n, size_t i)
{
  assure (i < n);
  ...
}

Now if I replaced unsigned by signed ints, I guess I should write

void f (idx_t n, idx_t i)
{
  assure (i < n);
  assure (0 <= i);
  ...
}

But this makes the code more complicated... :(


Am Mi., 7. Apr. 2021 um 11:51 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > What is the relationship between these two modules? Both try to minimize
> > subtle bugs due to overflow.
>
> These two modules, and the wraparound/overflow checking macros of
> 'intprops'
> [1], are attempts to catch integer overflow.
>
> The three approaches differ in terms of coding effort and percentage
> of overflows that get caught.
>
> With 'idx', you use signed integers, and rely on compiler options such as
> 'gcc -ftrapv' or 'gcc -fsanitize=undefined' to report overflows.
>   - Coding effort: small.
>   - Overflows caught: all.
>
> With 'xsize', you use unsigned integers (size_t), and do a single overflow
> check at the end of the computation; this check is implicit if you call
> malloc, as malloc (SIZE_MAX) will always fail.
>   - Coding effort: medium.
>   - Overflows caught: those with explicit checks.
>
> With 'intprops', you use signed or unsigned integers, and do an overflow
> check at each step of the computation.
>   - Coding effort: high.
>   - Overflows caught: those with explicit checks.
>
> > However, both approaches cannot be easily combined as xsize expects
> > unsigned integers while idx is a signed one.
>
> You don't need combine the three approaches for the same computation.
> For each computation, pick the approach you prefer.
>
> > What is the suggested use of these modules for new code?
>
> IMO, there's no definite answer to this question. All three approaches are,
> in some way, experimental. At least as long as not all distros are
> compiling with 'gcc -ftrapv' systematically.
>
> Paul, how do you see this?
>
> (I'm thinking of adding the answers to the documentation.)
>
> Bruno
>
> [1]
> https://www.gnu.org/software/gnulib/manual/html_node/Wraparound-Arithmetic.html
>
>


Modules xsize and idx

2021-04-07 Thread Marc Nieper-Wißkirchen
What is the relationship between these two modules? Both try to minimize
subtle bugs due to overflow.

However, both approaches cannot be easily combined as xsize expects
unsigned integers while idx is a signed one.

What is the suggested use of these modules for new code?

Thanks,

Marc


Re: gl_list API

2021-04-06 Thread Marc Nieper-Wißkirchen
Am Di., 6. Apr. 2021 um 23:04 Uhr schrieb Bruno Haible :

> > > But when designing a general utility, for all kinds of programs to use,
> > > it is inappropriate to say "storing null elements is not useful".
> >
> > I'm afraid we'll have to disagree here.
>
> In some of my uses of the gl_list module, the element is in fact a small
> integer, cast to an 'uintptr_t' and then further cast to a 'const void *'.
> If the API would forbid storing a NULL pointer, my specialized container
> type would need a workaround for storing the integer value 0.
>

NULL can also be a sensible value for "holes" in the list showing up in
algorithms that need to lazily remove elements (e.g. the list will only be
eventually compactified).


Re: Type-safe typecasts

2021-04-06 Thread Marc Nieper-Wißkirchen
CCing Bruno because of his involvement with the Gnulib list modules.

Disallowing NULL list elements could break existing code that actually uses
them but returning elements with type void * instead of const void * would
be much less incompatible. Code can trivially be ported to such an updated
interface (for example, in my snippet above, I would have to replace 'const
void *e' with 'void *e', All in all, it would make working with the code a
lot easier.

For lists where there are no non-null list elements, one could add

void *gl_list_iterator_next_element_or_null (gl_list_iterator_t iter,
gl_list_node_t *node_ptr)
{
  void *e;
  return gl_list_iterator_next (iter, , node_ptr) ? e : NULL;
}

to the global API (modulo a better name for the procedure).


Am Di., 6. Apr. 2021 um 21:20 Uhr schrieb Paul Eggert :

> On 4/6/21 12:13 PM, Marc Nieper-Wißkirchen wrote:
> > gl_list_iterator_next has to return two things: An element (represented
> by
> > a const void *) and a boolean value. As elements may be NULL
>
> Ah, OK, then that's the problem. The API shouldn't allow null elements.
> They're not that useful anyway. If they really are needed for some
> lists, I suppose one could have a more-complicated API to return them
> (by setting a bool elsewhere); but usually they aren't.
>
> > So to make my original code portable C, I would have to code
> >
> > ...
> > const void *e;
> > while (gl_list_iterator_next (, , NULL))
> >   {
> > struct foo *foo = (void *) e;
> > ++foo->bar;
> >   }
> > ...
> >
> > The const typecast is, unfortunately, still needed to silence compiler
> > warnings
>
> Yes, that would be portable. But that cast indicates another problem
> with the API. It should return void *, not void const * (think of strchr
> as the model here). The API should discourage type-casts, since they're
> more dangerous than the alternatives.
>


Re: New function xpalloc in module xalloc

2021-04-06 Thread Marc Nieper-Wißkirchen
Am Di., 6. Apr. 2021 um 21:05 Uhr schrieb Paul Eggert :

> On 4/5/21 11:48 PM, Marc Nieper-Wißkirchen wrote:
> > SIZE_MAX < (uintmax_t) nbytes
>
> Eeuuw! That's a cure worse than the disease. Among other things, there
> is no guarantee that PTRDIFF_MAX <= UINTMAX_MAX so in theory the
> comparison could go completely awry with a sufficiently-large NBYTES.
>

I checked the ISO C18 standard before I wrote this. :) In 6.2.5 it says
that for every signed type there is an unsigned type of the same width.
Given the definitions of INTMAX_MAX and UINTMAX_MAX, I then concluded

PTRDIFF_MAX <= INTMAX_MAX <= UINTMAX_MAX.

Where is the flaw in my reasoning?


Re: Type-safe typecasts

2021-04-06 Thread Marc Nieper-Wißkirchen
Hi Paul,

thanks!

By the way, the snippet you gave is not portable C code, as it assumes
> that 'void *' and 'struct foo *' have the same machine representation.
> This is not necessarily true on (admittedly now-rare) machines that have
> different flavors of pointers. I suspect the main problem here is either
> in the calling code, or in the API for gl_list_iterator_next: if it
> returned a possibly-null value of type 'const void *' instead of storing
> the result through a 'const void **' pointer, the calling code wouldn't
> have gotten into this portability mess.
>

gl_list_iterator_next has to return two things: An element (represented by
a const void *) and a boolean value. As elements may be NULL, the boolean
value is really needed. Of course, one could switch the actual return
parameter with the one returned through a pointer but then it wouldn't look
as nice in while or for-loops.

So to make my original code portable C, I would have to code

...
const void *e;
while (gl_list_iterator_next (, , NULL))
  {
struct foo *foo = (void *) e;
++foo->bar;
  }
...

The const typecast is, unfortunately, still needed to silence compiler
warnings as the Gnulib list API suffers a bit from const-poisoning when the
actual elements are pointers actually non-const objects.

Marc


Type-safe typecasts

2021-04-06 Thread Marc Nieper-Wißkirchen
Hi,

I have been wondering whether it makes sense to add a small utility trying
to make typecasts in C as type-safe as possible.

The problem is that typecasts are sometimes unavoidable.  For an example,
let's take a look at the following snippet using Gnulib's xlist module:

struct foo
{
  int bar;
};
gl_list_t list = gl_list_create_empty (GL_ARRAY_LIST, NULL, NULL, NULL,
false);
struct foo foo = { .bar = 1 };
gl_list_add_last (list, );
gl_list_iterator_t i = gl_list_iterator (list);
struct foo *elt;
while (gl_list_iterator_next (, (const void **) , NULL))
  ++elt->bar;
gl_list_iterator_free ();

Here, the typecast '(const void **)' is needed as 'gl_list_iterator_next'
expects an argument of that type and because 'struct foo **elt' is not
implicitly convertible to 'const void **' (in fact, even not to 'void **').

The problem with this typecast is that the compiler wouldn't have
complained if I had forgotten to write the address operator '&' before
'elt'.

So what I have in mind are macros that do a type conversion from A to B and
that signal an error on modern compilers if the input is not of type A. For
this, the C11 construct _Generic can be used.

The macros could look like

CAST (, , )

CVR_CAST (, expr) /* removes cvr qualifier */

ADDRESS_CAST (, , expr)

expanding, respectively, into

_Generic ((), (): (() ()))

_Generic ((), (): (() ()),
(const ): (( ()),
(const volatile ): (() ()),
...)

_Generic ((), (): ((() *) &()))

The names and the exact semantics and number of macros are, of course,
debatable.

Marc


Re: New function xpalloc in module xalloc

2021-04-06 Thread Marc Nieper-Wißkirchen
Hi Paul,

Am Di., 6. Apr. 2021 um 05:19 Uhr schrieb Paul Eggert :

> On 4/3/21 11:17 PM, Marc Nieper-Wißkirchen wrote:
> > Does the comparison make any sense, by the way?
>
> Yes, although it's needed only on unusual (and these days perhaps
> theoretical?) platforms where SIZE_MAX < PTRDIFF_MAX.
>

Ah, okay. I didn't think of this possibility.


> I hadn't noticed the issue, as the projects I contribute to (coreutils,
> etc.) compile with -Wno-sign-compare because gcc -Wsign-compare has too
> many false alarms.
>
> I prefer to avoid casts merely to pacify GCC (as casts are too
> error-prone), so I installed the attached. I hope it works for you. (If
> not, perhaps you can use -Wno-sign-compare too)
>

The fix works for me. Thank you very much! IMO, it's much better than
asking to compile with "-Wno-sign-compare' because this can (like type
casts) silence other, non-false positive warnings.

Speaking of type casts, I don't think they would have been bad here because
they would document exactly what was going on here. By writing

SIZE_MAX < (uintmax_t) nbytes

the otherwise implicit type conversion would have been made explicit and
using 'uintmax_t' also documents that it is expected that the width of
'nbytes' can be greater than the width of 'size_t'.


> This underscores the fact that the xalloc module should use idx_t
> instead of size_t pretty much everywhere. If xrealloc's size arg were of
> idx_t we wouldn't need any of this hacking. I realize that replacing
> size_t with idx_t is an incompatible change to xalloc's API, but it's
> time callers started using signed instead of unsigned byte counts as
> that helps avoid and/or catch integer-overflow errors better. I'll add
> that to my list of things to do for Gnulib.
>

The philosophy about idx_t could be worth an entry in the manual.

Marc


[PATCH 1/1] hamt: Fix coding errors.

2021-04-05 Thread Marc Nieper-Wißkirchen
From: Marc Nieper-Wißkirchen 

Reported by Bruno Haible in
<https://lists.gnu.org/archive/html/bug-gnulib/2021-04/msg00047.html>
after a Coverity run.
* lib/hamt.c (bucket_do_while, hamt_iterator): Add missing
derefencing operator and silence a bogus warning on uninitialized
variables.
* tests/test-hamt.c (test_general): Replace two errorneous
assignment operators with comparison operators.
---
 ChangeLog | 12 
 lib/hamt.c|  6 +++---
 tests/test-hamt.c |  4 ++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e8a27df73..81d3aab73 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2021-04-05  Marc Nieper-Wißkirchen  
+
+   hamt: Fix coding errors.
+   Reported by Bruno Haible in
+   <https://lists.gnu.org/archive/html/bug-gnulib/2021-04/msg00047.html>
+   after a Coverity run.
+   * lib/hamt.c (bucket_do_while, hamt_iterator): Add missing
+   derefencing operator and silence a bogus warning on uninitialized
+   variables.
+   * tests/test-hamt.c (test_general): Replace two errorneous
+   assignment operators with comparison operators.
+
 2021-04-05  Fabrice Fontaine  
 
pthread-cond: Fix compilation error.
diff --git a/lib/hamt.c b/lib/hamt.c
index 204c2f069..34880eff4 100644
--- a/lib/hamt.c
+++ b/lib/hamt.c
@@ -873,7 +873,7 @@ bucket_do_while (const struct bucket *bucket, 
Hamt_processor *proc, void *data,
   for (size_t i = 0; i < elt_count; ++i)
 {
   *success = proc (elts[i], data);
-  if (!success)
+  if (*success == false)
 return cnt;
   ++cnt;
 }
@@ -946,14 +946,14 @@ hamt_iterator (Hamt *hamt)
   Hamt_iterator iter;
   iter.hamt = hamt_copy (hamt);
   Hamt_entry *entry = hamt->root;
+  iter.path = 0;
+  iter.position = 0;
   if (entry == NULL)
 {
   iter.depth = -1;
   return iter;
 }
   iter.depth = 0;
-  iter.path = 0;
-  iter.position = 0;
   while (iter.entry[iter.depth] = entry, entry_type (entry) == subtrie_entry)
 {
   const struct subtrie *subtrie = (const struct subtrie *) entry;
diff --git a/tests/test-hamt.c b/tests/test-hamt.c
index 0cf6eb5a9..882bf0fc7 100644
--- a/tests/test-hamt.c
+++ b/tests/test-hamt.c
@@ -152,10 +152,10 @@ test_general (void)
   hamt1 = hamt_remove (hamt2, );
   sum = 0;
   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
-  ASSERT (sum = 52);
+  ASSERT (sum == 52);
   sum = 0;
   ASSERT (hamt_do_while (hamt1, proc, ) == 3);
-  ASSERT (sum  = 48);
+  ASSERT (sum == 48);
 
   hamt_free (hamt1);
   hamt_free (hamt2);
-- 
2.25.1




Re: [PATCH] hamt: New module.

2021-04-05 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Am Mo., 5. Apr. 2021 um 15:02 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > * tests/test-hamt.c: New file.
>
> Can you please have a quick look whether these Coverity findings (from our
> weekly Coverity run) are relevant?
>

Coverity seems to be a good tool. I haven't yet tested GCC's new static
analyzer.


>
> 
> *** CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
> /gllib/hamt.c: 876 in bucket_do_while()
> 870   size_t cnt = 0;
> 871   size_t elt_count = bucket->elt_count;
> 872   Hamt_entry *const *elts = bucket->elts;
> 873   for (size_t i = 0; i < elt_count; ++i)
> 874 {
> 875   *success = proc (elts[i], data);
> >>> CID 1503613:  Null pointer dereferences  (REVERSE_INULL)
> >>> Null-checking "success" suggests that it may be null, but it has
> already been dereferenced on all paths leading to the check.
> 876   if (!success)
> 877 return cnt;
> 878   ++cnt;
> 879 }
> 880   return cnt;
> 881 }
>

Correct finding. It should be "!*success".


>
> 
> *** CID 1503612:  Uninitialized variables  (UNINIT)
> /gllib/hamt.c: 952 in hamt_iterator()
> 946   Hamt_iterator iter;
> 947   iter.hamt = hamt_copy (hamt);
> 948   Hamt_entry *entry = hamt->root;
> 949   if (entry == NULL)
> 950 {
> 951   iter.depth = -1;
> >>> CID 1503612:  Uninitialized variables  (UNINIT)
> >>> Using uninitialized value "iter". Field "iter.path" is
> uninitialized.
> 952   return iter;
> 953 }
> 954   iter.depth = 0;
> 955   iter.path = 0;
> 956   iter.position = 0;
> 957   while (iter.entry[iter.depth] = entry, entry_type (entry) ==
> subtrie_entry)
>
>
Irrelevant, but I guess the warning can be silenced by moving up the
initialization of "iter.path" and "iter.position".


>
> 
> *** CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> /gltests/test-hamt.c: 155 in ()
> 149   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> 150   ASSERT (sum == 52);
> 151
> 152   hamt1 = hamt_remove (hamt2, );
> 153   sum = 0;
> 154   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> >>> CID 1503618:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> >>> use of "=" where "==" may have been intended
> 155   ASSERT (sum = 52);
> 156   sum = 0;
> 157   ASSERT (hamt_do_while (hamt1, proc, ) == 3);
> 158   ASSERT (sum  = 48);
> 159
> 160   hamt_free (hamt1);
>
>
Correct observation.


>
> 
> *** CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> /gltests/test-hamt.c: 158 in ()
> 152   hamt1 = hamt_remove (hamt2, );
> 153   sum = 0;
> 154   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> 155   ASSERT (sum = 52);
> 156   sum = 0;
> 157   ASSERT (hamt_do_while (hamt1, proc, ) == 3);
> >>> CID 1503615:  Incorrect expression  (PW.ASSIGN_WHERE_COMPARE_MEANT)
> >>> use of "=" where "==" may have been intended
> 158   ASSERT (sum  = 48);
> 159
> 160   hamt_free (hamt1);
> 161   hamt_free (hamt2);
> 162   free_element (y5);
> 163 }
>

Dito.


>
> 
> *** CID 1503614:(ASSERT_SIDE_EFFECT)
> /gltests/test-hamt.c: 158 in test_general()
> 152   hamt1 = hamt_remove (hamt2, );
> 153   sum = 0;
> 154   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> 155   ASSERT (sum = 52);
> 156   sum = 0;
> 157   ASSERT (hamt_do_while (hamt1, proc, ) == 3);
> >>> CID 1503614:(ASSERT_SIDE_EFFECT)
> >>> Assignment "sum = 48" has a side effect.  This code will work
> differently in a non-debug build.
> 158   ASSERT (sum  = 48);
> 159
> 160   hamt_free (hamt1);
> 161   hamt_free (hamt2);
> 162   free_element (y5);
> 163 }
> /gltests/test-hamt.c: 155 in test_general()
> 149   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> 150   ASSERT (sum == 52);
> 151
> 152   hamt1 = hamt_remove (hamt2, );
> 153   sum = 0;
> 154   ASSERT (hamt_do_while (hamt2, proc, ) == 4);
> >>> CID 1503614:(ASSERT_SIDE_EFFECT)
> >>> Assignment "sum = 52" has a side effect.  This code will work
> differently in a non-debug build.
> 155   ASSERT (sum = 52);
> 156   sum = 0;
> 157   ASSERT (hamt_do_while (hamt1, proc, ) == 3);
> 158   ASSERT (sum  = 48);
> 159
> 160   hamt_free (hamt1);
>
>
These are aftereffects.

I will push the fixes.

Thanks,

Marc


Re: Node to first or last element of a sequential list in module list/xlist

2021-04-05 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Am Mo., 5. Apr. 2021 um 04:45 Uhr schrieb Bruno Haible :

> [Adding back bug-gnulib in CC.]
>

Thanks!


> > Is there a fundamental reason why a list walking algorithm that both
> > inserts and removes items is not possible with an arbitrary Gnulib list
> > container? Obviously, the current API doesn't allow it. If it is just the
> > limitation of an API that forces one to work with a second, temporary
> list,
> >
> > Would it be possible for all list types to provide a
> >
> > gl_list_node_t gl_list_remove_node_then (gl_list_t list, gl_list_node_t
> > node, gl_list_node_t then)
> >
> > that removes NODE from LIST and returns the node of that element
> > represented by THEN before the removal.  The precondition is, of course,
> > that THEN != NODE.  Furthermore, if THEN is NULL, NULL is returned.
> >
> > A typical pattern would be:
> >
> > for (gl_list_node_t i = gl_list_first_node (list); i != NULL;)
> >   {
> > Fruit fruit;
> > if (is_apple (fruit))
> >   node = gl_list_next_node (list, gl_list_next_node (list,
> > gl_list_add_before (list, node, make_pear (;
> > else if (is_peach (fruit))
> >   node = gl_list_next_node (gl_list_add_after (list, node,
> > make_strawberry ()));
> > else if (is_banana (fruit))
> >   node = gl_list_remove_node_then (list, node, gl_list_next_node
> > (list));
> >   }
> > (This is just a silly example for an algorithm that may want to replace
> > elements by sublists (including sublists of length 0).)
>
> This example makes it clear what you mean.
>
> And what if the programmer wants to preserve not one 'then' node, but
> several at once? One could generalize the function to
>
>   void gl_list_remove_preserving_nodes (gl_list_t list, gl_list_node_t
> victim,
> size_t count, gl_list_node_t
> *to_preserve);
>
> which would preserve the nodes to_preserve[0 .. count-1].
>

I think there is a conceptual difference between preserving a single node
and preserving an arbitrary number of nodes. The point is that
gl_list_add_before and gl_list_add_after, which also mutate the list,
return a single node from where the algorithm can continue (as the loop
above shows). So it is one thing to ask for gl_list_remove_node to return a
single valid node from where an iteration can proceed but quite a different
thing to ask it to preserve an arbitrary number of nodes.

Instead of the THEN node being arbitrary, it could also make sense to force
it to be the node before or after, i.e. to have gl_list_remove_node_next
and gl_list_remove_node_previous.
Marc


[PATCH 1/1] hamt: Document the module in the Gnulib manual.

2021-04-04 Thread Marc Nieper-Wißkirchen
From: Marc Nieper-Wißkirchen 

Suggested by Bruno Haible in
<https://lists.gnu.org/archive/html/bug-gnulib/2021-04/msg00026.html>.
* doc/containers.texi: Add a subsection to section 15.11 Container
data types.
* lib/hamt.h: Improve documentation on how Hamt_entry is supposed
to be used.
---
 ChangeLog   | 10 +
 doc/containers.texi | 55 +
 lib/hamt.h  | 11 +
 3 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 4a665c275..7b296de04 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2021-04-04  Marc Nieper-Wißkirchen  
+
+   hamt: Document the module in the Gnulib manual.
+   Suggested by Bruno Haible in
+   <https://lists.gnu.org/archive/html/bug-gnulib/2021-04/msg00026.html>.
+   * doc/containers.texi: Add a subsection to section 15.11 Container
+   data types.
+   * lib/hamt.h: Improve documentation on how Hamt_entry is supposed
+   to be used.
+
 2021-04-03  Paul Eggert  
 
savedir: avoid unlikely undefined behavior
diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..a8cfebc14 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.
 
+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,58 @@ For C++, Gnulib provides a C++ template class for each of 
these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable
 
+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module implements the hash array mapped trie (HAMT) data
+structure.  This is a data structure that contains (key, value) pairs.
+Lookup of a (key, value) pair given the key is on average an @math{O(1)}
+operation, assuming a good hash function for the keys is employed.
+
+The HAMT data structure is useful when you want modifications (additions
+of pairs, removal, value changes) to be visible only to some part of
+your program, whereas other parts of the program continue to use the
+unmodified HAMT.  The HAMT makes this possible in a space-efficient
+manner: the modified and the unmodified HAMT share most of their
+allocated memory.  It is also time-efficient: Every such modification
+is @math{O(1)} on average, again assuming a good hash function for the keys.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container. On the other hand, a hash table is
+simpler so that its performance may be better when non-destructive
+update operations are not needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.  You have to provide a hash
+function and an equivalence relation, which defines key equality.  The
+module includes a test file @code{test-hamt.c}, which demonstrates how
+the API can be used.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} entries and subtries.  Whenever a collision between
+the initial bits of the hash values of two entries would happen, the
+next @math{5} bits of the hash values are examined and the two entries
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+Hash Trees (Report). Infoscience Department, École Polytechnique
+Fédérale de Lausanne.}
+
+The persistence aspect of the HAMT data structure, which means that each
+updating operation (like inserting, replacing, or removing an entry)
+returns a new HAMT while leaving the original one intact, is achieved
+through structure sharing, which is even safe in the presence of
+multiple threads when the used C compiler supports atomics.
+
 @ifnottex
 @unmacro log
 @end ifnottex
diff --git a/lib/hamt.h b/lib/hamt.h
index 55ee964dd..25a0ad9f9 100644
--- a/lib/hamt.h
+++ b/lib/hamt.h
@@ -78,10 +78,13 @@ _GL_INLINE_HEADER_BEGIN
 //
 
 /* A hamt stores pointers to elements.  Each element has to be a
-   struct whose initial member is of the type Hamt_entry.  An element
-   is conceptually owned by a hamt as soon as it is inserted.  It will
-   be automatically freed as soon as the last hamt containing it is
-   freed.  */
+   struct whose initial member is of the type Hamt_entry.  You need to
+   defin

Re: New function xpalloc in module xalloc

2021-04-04 Thread Marc Nieper-Wißkirchen
SIZE_MAX could be defined as -1 promoted to an unsigned type, meaning that
the unsigned comparison would always yield false. Or am I missing something?

In any case, if the promotion is what is intended, 'nbytes' should be
probably replaced with '(size_t) nbytes' to silence the warning and to make
it explicit.


Am So., 4. Apr. 2021 um 08:24 Uhr schrieb Jeffrey Walton :

> On Sun, Apr 4, 2021 at 2:17 AM Marc Nieper-Wißkirchen
>  wrote:
> >
> > GCC prints the following warning when compiling the new code:
> >
> > lib/xmalloc.c: In function 'xpalloc':
> > lib/xmalloc.c:132:64: warning: comparison of integer expressions of
> different signedness: 'long unsigned int' and 'idx_t' {aka 'long int'}
> [-Wsign-compare]
> >   132 | = ((INT_MULTIPLY_WRAPV (n, item_size, ) || SIZE_MAX <
> nbytes)
> >
> > If I understand the error message correctly, it is because 'nbytes' is a
> signed type while SIZE_MAX is unsigned.
> >
> > Does the comparison make any sense, by the way?
>
> Only for positive integers.
>
> If idx_t is negative, like -1, then -1 will be promoted to an unsigned
> type and then -1 > 0.
>
> Jeff
>


New function xpalloc in module xalloc

2021-04-04 Thread Marc Nieper-Wißkirchen
GCC prints the following warning when compiling the new code:

lib/xmalloc.c: In function 'xpalloc':
lib/xmalloc.c:132:64: warning: comparison of integer expressions of
different signedness: 'long unsigned int' and 'idx_t' {aka 'long int'}
[-Wsign-compare]
  132 | = ((INT_MULTIPLY_WRAPV (n, item_size, ) || SIZE_MAX <
nbytes)

If I understand the error message correctly, it is because 'nbytes' is a
signed type while SIZE_MAX is unsigned.

Does the comparison make any sense, by the way?

Thanks for investigating,

Marc


Re: [PATCH] hamt: New module.

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thank you very much for all the helpful feedback. I have incorporated your
suggestions, and you can find an updated diff below.

Thanks,

Marc

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..9bc1ae43d 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.

+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,58 @@ For C++, Gnulib provides a C++ template class for each
of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable

+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module implements the hash array mapped trie (HAMT) data
+structure.  This is a data structure that contains (key, value) pairs.
+Lookup of a (key, value) pair given the key is an O(1) operation,
+assuming a good hash function for the keys is employed.
+
+The HAMT data structure is useful when you want modifications (additions
+of pairs, removal, value changes) to be visible only to some part of
+your program, whereas other parts of the program continue to use the
+unmodified HAMT.  The HAMT makes this possible in a space-efficient
+manner: the modified and the unmodified HAMT share most of their
+allocated memory.  It is also time-efficient: Every such modification
+is O(1) on average, again assuming a good hash function for the keys.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container  On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.  You have to provide a hash
+function and an equivalence relation, which defines key equality.  The
+module includes a test file @code{test-hamt.c}, which demonstrates how
+the API can be used.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} entries and subtries.  Whenever a collision between
+the initial bits of the hash values of two entries would happen, the
+next @math{5} bits of the hash values are examined and the two entries
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+Hash Trees (Report). Infoscience Department, École Polytechnique
+Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact.  This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
 @ifnottex
 @unmacro log
 @end ifnottex
diff --git a/lib/hamt.h b/lib/hamt.h
index 55ee964dd..25a0ad9f9 100644
--- a/lib/hamt.h
+++ b/lib/hamt.h
@@ -78,10 +78,13 @@ _GL_INLINE_HEADER_BEGIN
 //

 /* A hamt stores pointers to elements.  Each element has to be a
-   struct whose initial member is of the type Hamt_entry.  An element
-   is conceptually owned by a hamt as soon as it is inserted.  It will
-   be automatically freed as soon as the last hamt containing it is
-   freed.  */
+   struct whose initial member is of the type Hamt_entry.  You need to
+   define this struct yourself.  It will typically contain an
+   Hamt_entry, a key, and, optionally, a value.
+
+   An element is conceptually owned by a hamt as soon as it is
+   inserted.  It will be automatically freed as soon as the last hamt
+   containing it is freed.  */
 typedef struct
 {
 #if GL_HAMT_THREAD_SAFE

Am Sa., 3. Apr. 2021 um 21:55 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > what do you think of the attempt attached below?
>
> I believe that technical documentation of a feature should be written
> to answer the questions in this order:
>   1. What is it? (Short summary)
>   2. When would I use it? (Use cases)
>   3. How do I use it?
>   4. Further details.
>
> The idea is to get the reader understand rapidly whether they want to
> use the feature or not. That is the purpose of parts 1 and 2.
>
> For part 1:
>
> > +The @code{hamt} module provides a persistant version of persistent hash
> > +array mapped tries (HAMTs).
>
> "A persistant version of persistent ..." ?
>
> 

Re: [PATCH] hamt: New module.

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi Bruno,

what do you think of the attempt attached below?

Thanks,

Marc

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..35caf200c 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.

+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,46 @@ For C++, Gnulib provides a C++ template class for each
of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable

+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module provides a persistant version of persistent hash
+array mapped tries (HAMTs).  A HAMT is an array mapped trie in which
+elements are stored according to the initial bits of their hash values.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} elements and subtries.  Whenever a collision between
+the initial bits of the hash values of two elements happens, the next
+@math{5} bits of the hash values are examined and the two elements
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+   Hash Trees (Report). Infoscience Department, École Polytechnique
+   Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact.  This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container  On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.
+
 @ifnottex
 @unmacro log
 @end ifnottex


Am Sa., 3. Apr. 2021 um 19:02 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > + hamt: New module.
> > + This module provides (persistent) hash array mapped tries.
> > + * MODULES.html.sh: Add hamt.
> > + * lib/hamt.c: New file.
> > + * lib/hamt.h: New file.
> > + * modules/hamt: New file.
> > + * modules/hamt-tests: New file.
> > + * tests/test-hamt.c: New file.
>
> Would you like to write a bit of documentation for the Gnulib manual
> about it? It does not need to copy the extensive comments in hamt.h.
> It need only answer the questions:
>   - What is a HAMT?
>   - When would I (a programmer) make use of it? How does it compare
> to other container data types?
>   - What is the Gnulib module name and the include file name?
>
> I think it would fit into the section "Container data types"
>
> https://www.gnu.org/software/gnulib/manual/html_node/Container-data-types.html
> .
>
> Find attached the start of a diff to add this documentation.
>
> Bruno
>


Re: Node to first or last element of a sequential list in module list/xlist

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thank you very much!

Am Sa., 3. Apr. 2021 um 18:28 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > For example, given a list of fruits, insert "pear" after each "apple" and
> > "peach" before each "apple". This can be easily done through
> > gl_list_add_before/gl_list_add_after/gl_list_next_node without using
> > invalidated nodes.
>
> Now I see what you mean. Quasi companion functions to gl_list_next_node
> and gl_list_previous_node: gl_list_first_node and gl_list_last_node,
> respectively.
>

I'm sorry for any confusion. I should have expressed myself more clearly in
my initial email.


> > What I want to propose is to allow the NULL value in
> > gl_next_node/gl_previous_node. In this case, gl_next_node shall return
> the
> > first node and gl_previous_node shall return the last node.
>

Yes, gl_list_first_node and gl_list_last_node are indeed much better. I was
only worried about the size of the vtable.


> I don't think this encourages robust programs. If a program passes a NULL
> pointer to gl_list_next_node or gl_list_previous_node, the program should
> better crash.
>
> > PS What can't be done (because gl_list_remove doesn't return a valid
> > (next?) node) is list walking algorithms that both remove and insert
> nodes.
> > For removal, one has to use an iterator.
>
> Yes, some algorithms need a second, temporary list. Not all algorithms
> can be written to use the original list, be efficient in O() terms, and
> be implementation-independent.
>

Speaking of this, this is not always non-trivial if the dispose function is
not NULL. Consider an algorithm that processes one list to produce a new
one. In the end, the old list shall be removed but the items that have
ended up in the new list shall not be disposed. I guess that the canonical
way to achieve this is to use gl_list_set to overwrite the values in the
old list with dummy values (e.g. NULL) so that they are not freed on
eventual removal.

-- Marc


Re: Node to first or last element of a sequential list in module list/xlist

2021-04-03 Thread Marc Nieper-Wißkirchen
Am Sa., 3. Apr. 2021 um 12:14 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > > > > I don't understand. You want to use a list_node_t while adding
> nodes to
> > > > > the list? This is invalid, since the comments in gl_list.h say:
> > > > >
> > > > >   /* Type representing the position of an element in the list, in
> a way
> > > > > that
> > > > >  is more adapted to the list implementation than a plain index.
> > > > >  Note: It is invalidated by insertions and removals!  */
> > > > >   typedef struct gl_list_node_impl * gl_list_node_t;
> > > > >
> > > >
> > > > It won't work with removals but it does work with insertions because
> > > > gl_list_add_before/gl_list_add_after/... etc. all return new, valid
> list
> > > > node objects.
> > >
> > > While this may be true for the linked-list implementation, it is not
> true
> > > for the array-list and other implementation. But the point of the
> Gnulib
> > > list module is to allow the developer to switch to a different
> > > implementation
> > > without changing their algorithms. [1]
> > >
> >
> > I understand the point about being able to switch to a different
> > implementation and subscribe to it, but I don't understand why there
> should
> > be a problem with array-list or other implementations.
>
> When a program, say, takes the gl_list_node_t to the 3rd element of a list,
> then inserts an element at the beginning or between the two first elements
> of the list, and then attempts to use said gl_list_node_t:
>   - In the case of a linked-list, it will refer to the 4th element of the
> list,
>   - In the case of an array-list, it will refer to the 3rd element of the
> list,
> that is, to the element that was previously the 2nd one.
>
> You can't write implementation-independent algorithms while doing this
> kind of things.
>

This is not what I have been having in mind and it's clear that such a
thing wouldn't work.

I have been talking about list walking algorithms that insert items in the
list *while* walking (and which cannot be done with iterators).

For example, given a list of fruits, insert "pear" after each "apple" and
"peach" before each "apple". This can be easily done through
gl_list_add_before/gl_list_add_after/gl_list_next_node without using
invalidated nodes.

The only missing piece to have a clear C formulation of the algorithm
(without having to resort to the "set_first (get_first)"-hack) is a
canonical clean way to retrieve the first node (or last node for similar
algorithms).


> > Besides, what's the point of
> > returning nodes if they weren't valid?
>
> They are valid, but only until the next destructive operation.
>

Exactly! And more isn't needed.
Thanks,

Marc
PS What can't be done (because gl_list_remove doesn't return a valid
(next?) node) is list walking algorithms that both remove and insert nodes.
For removal, one has to use an iterator.


Re: Autogenerated header files not included in .gitignore

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Am Sa., 3. Apr. 2021 um 12:05 Uhr schrieb Bruno Haible :

> Hi Marc,
>
> > If Gnulib is going to be extended to support the second option as well
>
> That's what I'm suggesting, because the second way to manage files appears
> to be the majority one.
>
> > I am not yet sure how. The problem is that the same source tree would
> have to
> > support both options to support both kinds of developers.
>
> No, when .gitignore lists all built files, this holds for all developers.
>

Right. This is even more fundamental than my point about "bootstrap.conf".


> In a project where some developers want the first approach and some want
> the second one, the built files need to be listed in the .git/info/exclude
> file, not in .gitignore. But since .git/info/exclude is not shared among
> developers, it is tedious for every developer to maintain their own
> .git/info/exclude list. Therefore I don't think this "mixed" model is
> widely used.
>

Would it do much harm if ".gitignore" included all built files even for
developers that work in a dedicated build tree?

Adding all built files automatically to ".gitignore" may, however, be
non-trivial because file extensions (like "$EXE") can vary from system to
system.


Re: Node to first or last element of a sequential list in module list/xlist

2021-04-03 Thread Marc Nieper-Wißkirchen
Am Sa., 3. Apr. 2021 um 11:49 Uhr schrieb Bruno Haible :

> Marc Nieper-Wißkirchen wrote:
> > > I don't understand. You want to use a list_node_t while adding nodes to
> > > the list? This is invalid, since the comments in gl_list.h say:
> > >
> > >   /* Type representing the position of an element in the list, in a way
> > > that
> > >  is more adapted to the list implementation than a plain index.
> > >  Note: It is invalidated by insertions and removals!  */
> > >   typedef struct gl_list_node_impl * gl_list_node_t;
> > >
> >
> > It won't work with removals but it does work with insertions because
> > gl_list_add_before/gl_list_add_after/... etc. all return new, valid list
> > node objects.
>
> While this may be true for the linked-list implementation, it is not true
> for the array-list and other implementation. But the point of the Gnulib
> list module is to allow the developer to switch to a different
> implementation
> without changing their algorithms. [1]
>

I understand the point about being able to switch to a different
implementation and subscribe to it, but I don't understand why there should
be a problem with array-list or other implementations. When I look into
"lib/gl_array_list.c", I see that a valid node is returned, namely the node
corresponding to the newly inserted element.  Besides, what's the point of
returning nodes if they weren't valid?  Note that we are not talking about
reusing the node object used as the argument to add_before or add_after.
That is invalid, indeed.


[PATCH] hamt: New module.

2021-04-03 Thread Marc Nieper-Wißkirchen
From: Marc Nieper-Wißkirchen 

This module provides (persistent) hash array mapped tries.
* MODULES.html.sh: Add hamt.
* lib/hamt.c: New file.
* lib/hamt.h: New file.
* modules/hamt: New file.
* modules/hamt-tests: New file.
* tests/test-hamt.c: New file.
---
 ChangeLog  |   11 +
 MODULES.html.sh|1 +
 lib/hamt.c | 1083 
 lib/hamt.h |  253 +++
 modules/hamt   |   29 ++
 modules/hamt-tests |   11 +
 tests/test-hamt.c  |  378 
 7 files changed, 1766 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/ChangeLog b/ChangeLog
index 6224aa7cb..a8b9aec60 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2020-04-03  Marc Nieper-Wißkirchen  
+
+   hamt: New module.
+   This module provides (persistent) hash array mapped tries.
+   * MODULES.html.sh: Add hamt.
+   * lib/hamt.c: New file.
+   * lib/hamt.h: New file.
+   * modules/hamt: New file.
+   * modules/hamt-tests: New file.
+   * tests/test-hamt.c: New file.
+
 2021-04-02  Bruno Haible  
 
strtoul, strtoll, strtoull: Fix compilation warning.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index dad6c2a77..646f30b72 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2027,6 +2027,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 0..204c2f069
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,1083 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen , 2021.  */
+
+#include 
+#define _GL_HAMT_INLINE _GL_EXTERN_INLINE
+#include "hamt.h"
+
+#include 
+#include 
+#include 
+#include "count-one-bits.h"
+#include "verify.h"
+#include "xalloc.h"
+
+/* Reference counters can be shared between different threads if the
+   entry they belong to is shared between different threads.
+   Operations on them therefore have to be atomic so that no invalid
+   state is observable.
+
+   A thread must not modify an entry or its children (!) if its
+   reference count implies that the entry is shared by at least two
+   hamts.  */
+typedef
+#if GL_HAMT_THREAD_SAFE
+_Atomic
+#endif
+size_t ref_counter;
+
+/***/
+/* Entry Types */
+/***/
+
+/* Leaf nodes are of type element.  Non-leaf nodes are either subtries
+   or, if at maximal depth, buckets.  The entry type is stored in the
+   lower two bits of the reference counter, whereas reference counters
+   for entries are incremented and decremented in multiples of 4.  */
+enum entry_type
+{
+  element_entry = 0,
+  subtrie_entry = 1,
+  bucket_entry = 2
+};
+
+/* Return the type an entry.  */
+static enum entry_type
+entry_type (const Hamt_entry *entry)
+{
+  return entry->ref_count & 3;
+}
+
+//
+/* Reference Counts */
+//
+
+/* Initialize the reference counter, storing its type.  */
+static void
+init_ref_counter (ref_counter *counter, enum entry_type type)
+{
+  *counter = 4 + type;
+}
+
+/* Increase the reference counter of an entry.  */
+static void
+inc_ref_counter (ref_counter *counter)
+{
+  *counter += 4;
+}
+
+/* Decrease the entry reference counter.  Return false if the entry
+   can be deleted.  */
+static bool
+dec_ref_counter (ref_counter *counter)
+{
+  *counter -= 4;
+  return *counter >= 4;
+}
+
+/**/
+/* Structures */
+/**/
+
+/* Different generations of a hamt share a function table.  */
+struct function_table
+{
+  Hamt_hasher *hasher;
+  Hamt_comparator *comparator;
+  Hamt_freer *freer;
+  ref_counter ref_count;
+};
+
+/* Different generations of a hamt share subtries.  A singleton
+   subtrie is modelled as a single element.  */
+struct subtrie
+{
+  ref_counter ref_count;
+  /* Nodes carry labels from 0 to 31.  The i-th bit in MAP is set if
+ the node labelled i is present.  */
+  uint32_t map;
+  /* The length of the NODES array is t

Re: Non-opaque hamt type?

2021-04-03 Thread Marc Nieper-Wißkirchen
Please excuse the delay in finalizing the new module. I was distracted due
to the start of the semester in October last year and then forgot to finish
my work.

To summarize, I have finally come to the conclusion not to change the API
as theorized in this thread.

First of all, the benefits of making the hamt type non-opaque are too small
compared with the possible drawbacks (the non-opaqueness, the inability to
return NULL in future API extensions, etc.).

Secondly, after having given it some more thought, the alternative protocol
(which we have called more robust) seems to be harder to understand because
"p != e" could then mean two different things. So I will leave the original
protocol in place, which is easy to comprehend: If "old_hamt == new_hamt",
no insertion has taken place and one has manually free the element one has
attempted to insert. If "old_hamt != new_hamt" the element has been
inserted and has now to eventually free "new_hamt" besides "old_hamt".

After I have rebased my code to HEAD, I will commit the new module to
Gnulib.

Thank you for your patience.

Marc

Am So., 18. Okt. 2020 um 20:11 Uhr schrieb Marc Nieper-Wißkirchen <
marc.nieper+...@gmail.com>:

> Okay, if you find the latter protocol better anyway, I will switch to
> this protocol, and hamts will be stack-allocated (just two words) and
> passed by value.
>
> Thanks,
>
> Marc
>
> Am So., 18. Okt. 2020 um 19:58 Uhr schrieb Bruno Haible :
> >
> > Marc Nieper-Wißkirchen wrote:
> > > The existing protocol is as follows:
> > >
> > > Hamt_entry *e = hamt_entry (...);
> > > Hamt_entry *p = e;
> > > Hamt *new_hamt = hamt_insert (old_hamt, );
> > > if (old_hamt == new_hamt)
> > >   {
> > > /* The element hasn't been insert as an equivalent element has
> already been in the hamt. p now holds a reference to the entry that already
> existed in the hamt.
> > > element_free (e);
> > > ...
> > > hamt_free (old_hamt); /* We don't have to free new_hamt because no
> new hamt was created. */
> > >   }
> > > else
> > >   {
> > > /* The element has been inserted. p hasn't changed. */
> > > ...
> > > hamt_free (old_hamt);  /* This frees all hamts */
> > > hamt_free (new_hamt); /* and all elements inserted, including e. */
> > >   }
> > >
> > > A protocol where no pointer values need to be compared could use p to
> > > carry the information:
> > >
> > > Hamt_entry *e = hamt_entry ();
> > > Hamt_entry *p = e;
> > > Hamt new_hamt = hamt_insert (old_hamt, );
> > > if (p == e)
> > >   {
> > > /* The element has been inserted. */
> > > ... /* See above. */
> > >   }
> > > else if (p == NULL)
> > >   {
> > > /* The element e already existed in the hamt. */
> > >... /* See above. */
> > >   }
> > > else /* p != e && p != NULL */
> > >   {
> > > /* An element equivalent to e already existed in the hamt. p now
> holds this element. */
> > > ... /* See above. */
> > >   }
> >
> > I find the latter protocol more robust: it does not depend on details of
> > the implementation of hamt_insert.
> >
> > Can you decide on your original question (allow stack-allocated HAMTs)?
> > I don't feel I can help you decide this.
> >
> > Bruno
> >
>


Re: Node to first or last element of a sequential list in module list/xlist

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi,

Am Sa., 3. Apr. 2021 um 00:49 Uhr schrieb Bruno Haible :

> This is useful if the list has to be traversed while nodes are added
> > because an iterator cannot be used here.
>
> I don't understand. You want to use a list_node_t while adding nodes to
> the list? This is invalid, since the comments in gl_list.h say:
>
>   /* Type representing the position of an element in the list, in a way
> that
>  is more adapted to the list implementation than a plain index.
>  Note: It is invalidated by insertions and removals!  */
>   typedef struct gl_list_node_impl * gl_list_node_t;
>

It won't work with removals but it does work with insertions because
gl_list_add_before/gl_list_add_after/... etc. all return new, valid list
node objects.


> > What I want to propose is to allow the NULL value in
> > gl_next_node/gl_previous_node. In this case, gl_next_node shall return
> the
> > first node and gl_previous_node shall return the last node.
>
> It's not useful to extend an API
>   - for a rare fringe use-case,
>

Why do you think it is a fringe use-case? Algorithms that walk a list and
add nodes during the traversal are not uncommon.

  - when there are already two other ways to do what you ask for (with the
> same O(·) performance).
>

User's code would be more self-documenting if a "hack" like

gl_list_node_t last_node = gl_list_set_last (list, gl_get_last (list))

wouldn't be needed.

Thanks,

Marc


Re: Autogenerated header files not included in .gitignore

2021-04-03 Thread Marc Nieper-Wißkirchen
Hi Bruno,

thank you very much for your prompt reply.

Am Sa., 3. Apr. 2021 um 01:00 Uhr schrieb Bruno Haible :

As far as I know, there are two practices regarding .gitignore management:
>
>   * Some developers want a very clean source directory at all times.
> They always to VPATH builds. They want to have in .gitignore the
> files brought in by Automake and Gnulib, but not the built files
> (*.o, alloca.h ... wctype.h, $(BUILT_SOURCES), etc.)
>
>   * Some developers don't want to think much when building a package.
> They often build in the source dir. They want to have in .gitignore
> also the built files (*.o, alloca.h ... wctype.h, $(BUILT_SOURCES),
> etc.),
> even the log files left my "make check".
>
> Gnulib, so far, is tailored for the first practice. How, would you say,
> should we handle these two (incompatible) expectations?
>
>   - A gnulib-tool option --vc-files=... (two possible options)?
>   - Let gnulib-tool guess which of the two possible options is desired,
> based on the existing contents of the .gitignore files?
>   - Other ideas?
>

Thank you for the detailed explanation. As you will have deduced, I have
been using the second option, the build in the source tree.

I can switch to the first option so that the issue disappears for me. If
Gnulib is not going to be extended to support the second option as well, it
may be a good idea to add something along the lines of your remark above to
section 3.12 of its manual ([1]).

If Gnulib is going to be extended to support the second option as well, I
am not yet sure how. The problem is that the same source tree would have to
support both options to support both kinds of developers. This rules out
the "--vc-files" option because it would appear in "bootstrap.conf" (if
"gnulib-tool" is not used directly), which isn't developer-specific.

If development model #2 is going to be supported, the only feasible option
I see here would be to add the autogenerated header files non-conditionally
(and to rely on user-provided ".gitignore" fragments to ignore object
files, log files, etc.)

--

[1] https://www.gnu.org/software/gnulib/manual/html_node/VCS-Issues.html


Autogenerated header files not included in .gitignore

2021-04-02 Thread Marc Nieper-Wißkirchen
A number of autogenerated header files are not added to ".gitignore" in
"$source_base".

For example, in a current project of mine,

$ git status

shows

lib/alloca.h
lib/fcntl.h
lib/inttypes.h
lib/langinfo.h
lib/limits.h
lib/locale.h
lib/signal.h
lib/stdio.h
lib/stdlib.h
lib/string.h
lib/sys/
lib/time.h
lib/unistd.h
lib/wchar.h
lib/wctype.h

Currently, Gnulib only adds "in"-files like "alloca.h.in" to ".gitignore".

Marc


Node to first or last element of a sequential list in module list/xlist

2021-04-02 Thread Marc Nieper-Wißkirchen
Hi!

At the moment, there does not seem to be an easy way to retrieve the node
of the first or the last element in a list.

This is useful if the list has to be traversed while nodes are added
because an iterator cannot be used here.

What is currently possible is to use get_first/set_first (and
get_last/set_last) to get hold of the first (and last) node but this is
clearly a hack.

Another way to get hold of the first node is to create an iterator and to
call its next procedure just once. But this is overkill if the iterator
isn't used afterward.

What I want to propose is to allow the NULL value in
gl_next_node/gl_previous_node. In this case, gl_next_node shall return the
first node and gl_previous_node shall return the last node.

Thanks,

Marc


Re: Non-opaque hamt type?

2020-10-18 Thread Marc Nieper-Wißkirchen
Okay, if you find the latter protocol better anyway, I will switch to
this protocol, and hamts will be stack-allocated (just two words) and
passed by value.

Thanks,

Marc

Am So., 18. Okt. 2020 um 19:58 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > The existing protocol is as follows:
> >
> > Hamt_entry *e = hamt_entry (...);
> > Hamt_entry *p = e;
> > Hamt *new_hamt = hamt_insert (old_hamt, );
> > if (old_hamt == new_hamt)
> >   {
> > /* The element hasn't been insert as an equivalent element has already 
> > been in the hamt. p now holds a reference to the entry that already existed 
> > in the hamt.
> > element_free (e);
> > ...
> > hamt_free (old_hamt); /* We don't have to free new_hamt because no new 
> > hamt was created. */
> >   }
> > else
> >   {
> > /* The element has been inserted. p hasn't changed. */
> > ...
> > hamt_free (old_hamt);  /* This frees all hamts */
> > hamt_free (new_hamt); /* and all elements inserted, including e. */
> >   }
> >
> > A protocol where no pointer values need to be compared could use p to
> > carry the information:
> >
> > Hamt_entry *e = hamt_entry ();
> > Hamt_entry *p = e;
> > Hamt new_hamt = hamt_insert (old_hamt, );
> > if (p == e)
> >   {
> > /* The element has been inserted. */
> > ... /* See above. */
> >   }
> > else if (p == NULL)
> >   {
> > /* The element e already existed in the hamt. */
> >... /* See above. */
> >   }
> > else /* p != e && p != NULL */
> >   {
> > /* An element equivalent to e already existed in the hamt. p now holds 
> > this element. */
> > ... /* See above. */
> >   }
>
> I find the latter protocol more robust: it does not depend on details of
> the implementation of hamt_insert.
>
> Can you decide on your original question (allow stack-allocated HAMTs)?
> I don't feel I can help you decide this.
>
> Bruno
>



Re: Non-opaque hamt type?

2020-10-18 Thread Marc Nieper-Wißkirchen
Am So., 18. Okt. 2020 um 16:39 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > At the moment, the header file exposes an opaque struct Hamt and
> > communication with the code happens through (stack-allocated) pointers
> > to a Hamt. In the implementation, however, each Hamt just consists of
> > two pointers (a pointer to a function table and a pointer to the
> > root), so stack-allocating Hamts instead and passing them around by
> > values makes as much sense.
> >
> > This would have the advantage of reducing heap allocation.
>
> By how much does it reduce the heap allocation? Say, someone allocates
> a HAMT and adds 100 entries. There will be heap allocations of ca. 5-10
> buckets and internal nodes. Adding one more heap allocation for the
> root object is not a tragedy.

A struct with just two pointers is allocated on the heap, meaning it
is probably below the minimum malloc size on most implementations. In
architectures where structs of two pointers are passed by value in and
out of functions, a heap-allocated structure will mean one more
pointer indirection per hamt access.

> So, in the end the question is whether to optimize the case of small HAMTs.
>
> > The disadvantage would be that adding further fields to a Hamt in future
> > extensions might be more problematic
>
> True. But when this happens we can mitigate this through a warning in the
> NEWS file.
>
> > and that identity of Hamts cannot
> > be decided through pointer identity anymore (so the protocol how to
> > decide whether an element has been inserted or not has to be changed).
>
> Does this protocol change introduce a significant slowdown?

The existing protocol is as follows:

Hamt_entry *e = hamt_entry (...);
Hamt_entry *p = e;
Hamt *new_hamt = hamt_insert (old_hamt, );
if (old_hamt == new_hamt)
  {
/* The element hasn't been insert as an equivalent element has
already been in the hamt. p now holds a reference to the entry that
already existed in the hamt.
element_free (e);
...
hamt_free (old_hamt); /* We don't have to free new_hamt because no
new hamt was created. */
  }
else
  {
/* The element has been inserted. p hasn't changed. */
...
hamt_free (old_hamt);  /* This frees all hamts */
hamt_free (new_hamt); /* and all elements inserted, including e. */
  }

A protocol where no pointer values need to be compared could use p to
carry the information:

Hamt_entry *e = hamt_entry ();
Hamt_entry *p = e;
Hamt new_hamt = hamt_insert (old_hamt, );
if (p == e)
  {
/* The element has been inserted. */
... /* See above. */
  }
else if (p == NULL)
  {
/* The element e already existed in the hamt. */
   ... /* See above. */
  }
else /* p != e && p != NULL */
  {
/* An element equivalent to e already existed in the hamt. p now
holds this element. */
... /* See above. */
  }

Marc



Re: Non-opaque hamt type?

2020-10-12 Thread Marc Nieper-Wißkirchen
One final issue (I hope):

At the moment, the header file exposes an opaque struct Hamt and
communication with the code happens through (stack-allocated) pointers
to a Hamt. In the implementation, however, each Hamt just consists of
two pointers (a pointer to a function table and a pointer to the
root), so stack-allocating Hamts instead and passing them around by
values makes as much sense.

This would have the advantage of reducing heap allocation. The
disadvantage would be that adding further fields to a Hamt in future
extensions might be more problematic and that identity of Hamts cannot
be decided through pointer identity anymore (so the protocol how to
decide whether an element has been inserted or not has to be changed).

What do you think?

Am So., 11. Okt. 2020 um 21:09 Uhr schrieb Bruno Haible :

> OK for me. Thanks for having listened to the many remarks.

Your remarks only helped to make the module better!

Marc



Re: Draft #3 (with iterators)

2020-10-11 Thread Marc Nieper-Wißkirchen
I have implemented everything we have discussed today. The patch
versus the current master is attached so it can be reviewed.

The changes versus the previous draft can be summarized as follows:

* Bug fixes.
* Use _Atomic on GCC 4.9+.
* Implement a lightweight iterator type akin to the iterators of gl_list.
* Expose element initialization to the user so than an element can be
inserted in more than one hamt.
* Rename delete to remove.
* Improve documentation.

Future options for when the code has matured:

* Inline a number of subtrie procedures to get rid of forward
declarations to help compilers.
* Implement purely non-pure hamts to replace ordinary hash tables.
* Add _nx versions of the procedures that won't call xalloc_die.

Many thanks to Bruno for his support, guidance and his great suggestions.

Marc

Am So., 11. Okt. 2020 um 19:32 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
> :
> >
> > Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :
>
> [...]
>
> > > * hamt_lookup: If the caller is allowed to modify the payload stored in 
> > > the
> > >   returned entry, this function should return a 'Hamt_entry *', not a
> > >   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
> > >   not a 'const char *'.
> >
> > Unless the caller knows what it does, modifying the payload is not a
> > good idea because the entry is shared between different hamts. If the
> > caller really wants to modify the payload, it has to do an explicit
> > type cast (which is safe).
>
> I have noticed a problem with the current design: While an element can
> be in more than one hamt (because copies of hamts are created through
> various operations), an element cannot be actively inserted in more
> than one hamt. The reason is that the reference counter of the element
> is initialized whenever the element is inserted.
>
> The way out is to expose the initialization function to the user, who
> becomes responsible for initializing each element exactly once.
>
> As soon as it is possible to insert an element more than once, another
> observation will be made: The insert procedure does not accept a
> pointer to a const element because it must be able to change the
> reference counter internally. Thus it is more convenient if procedures
> like hamt_lookup do not return const versions.
From ece7b9e3cd090e9c084efd72677669130e80dd9c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Sat, 10 Oct 2020 23:38:21 +0200
Subject: [PATCH] hamt: New module.

This module provides (persistent) hash array mapped tries.
* MODULES.html.sh: Add hamt.
* lib/hamt.c: New file.
* lib/hamt.h: New file.
* modules/hamt: New file.
* modules/hamt-tests: New file.
* tests/test-hamt.c: New file.
---
 ChangeLog  |   11 +
 MODULES.html.sh|1 +
 lib/hamt.c | 1083 
 lib/hamt.h |  253 +++
 modules/hamt   |   29 ++
 modules/hamt-tests |   11 +
 tests/test-hamt.c  |  378 
 7 files changed, 1766 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/ChangeLog b/ChangeLog
index 22f79fb09..7263e628f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2020-10-11  Marc Nieper-Wißkirchen  
+
+	hamt: New module.
+	This module provides (persistent) hash array mapped tries.
+	* MODULES.html.sh: Add hamt.
+	* lib/hamt.c: New file.
+	* lib/hamt.h: New file.
+	* modules/hamt: New file.
+	* modules/hamt-tests: New file.
+	* tests/test-hamt.c: New file.
+
 2020-10-11  Bruno Haible  
 
 	stdioext: Update comments regarding UnixWare.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index 7e7cdae3e..2907eb741 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2028,6 +2028,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 0..f92d9c4e8
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,1083 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License fo

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :

[...]

> > * hamt_lookup: If the caller is allowed to modify the payload stored in the
> >   returned entry, this function should return a 'Hamt_entry *', not a
> >   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
> >   not a 'const char *'.
>
> Unless the caller knows what it does, modifying the payload is not a
> good idea because the entry is shared between different hamts. If the
> caller really wants to modify the payload, it has to do an explicit
> type cast (which is safe).

I have noticed a problem with the current design: While an element can
be in more than one hamt (because copies of hamts are created through
various operations), an element cannot be actively inserted in more
than one hamt. The reason is that the reference counter of the element
is initialized whenever the element is inserted.

The way out is to expose the initialization function to the user, who
becomes responsible for initializing each element exactly once.

As soon as it is possible to insert an element more than once, another
observation will be made: The insert procedure does not accept a
pointer to a const element because it must be able to change the
reference counter internally. Thus it is more convenient if procedures
like hamt_lookup do not return const versions.



Re: terminology

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 16:14 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > I have attached an improved version of the HAMT module to this email.
>
> How about terminology: "delete" vs. "remove"?

> In this sense, 'hamt_delete' is triggering the wrong associations.
> How about renaming 'hamt_remove'?
>
> Deleting an entry from a hash table or HAMT does not always mean to
> delete the object that the entry references.
>
> The Java collections [1], C# collections [2], Python collections [3]
> all use the verb "remove".

I'm fine with this change and agree with your reasoning. I used the
hash module (see your comment below) as a guide for the interface,
that's why I called the procedure hamt_delete in the first place.

> Yes, we still have hash_delete (in module 'hash') and 'argz_delete' (in
> module 'argz'); these are very old modules.

Marc



Re: HAMT iterators

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 12:29 Uhr schrieb Bruno Haible :

> The recursion across hamt_do_while / entry_do_while / subtrie_do_while /
> bucket_do_while builds up a stack whose essential contents is a set of
> indices, each at most 5 bits large, with a total of SIZE_WIDTH bits.
> In other words, the iterator object can be a

Yes, this is roughly how the implementation I worked on this morning
works. However, I haven't yet implemented the 5-bit compression.

> struct
>   {
> const Hamt *hamt;
> size_t position;
> int depth;
> const Hamt_entry *entry;
>   };
>
> where during the iteration
>   - hamt stays unmodified,
>   - position is incremented by some amount at each round,
>   - depth is the recursion depth.
>   - entry is a cache of hamt->nodes[(position >> 59) & 31][(position >> 54) & 
> 31]...
> with depth-1 indexing operations, and is changed at each round,
>
> HASH-TABLE-ITERATOR sets position, depth, entry to point to the first entry.
> HASH-TABLE-ITERATE works as follows:
>   - in a bucket, it increments position and fetches the next entry of the 
> bucket,

For a bucket, in the worst case, we need the full size_t range for
position, so we have to store the path in the tree that leads to the
bucket in another size_t variable.

>   - if the bucket is done, it decreases depth and goes to
> hamt->nodes[(position >> 59) & 31][(position >> 54) & 31]... (with depth-1
> indexing operations) until it finds a subtrie that is not yet done,

This means a lot of pointer operations if large subtries are processed
that are deep in the trie. The hamt_do_while operation stores that
chain of entries from the root implicitly on the stack. The same
should be done for the iterator, meaning that an array of MAX_DEPTH
entries has to be cached. On a 32 bit system, this means 28 bytes,
and, on a 64 bit system, 130 bytes.

> then it increases depth, entry to point to the first entry in that 
> subtrie.
>
> If done this way, the iterator will fit into the gl_set_iterator_t and
> gl_map_iterator_t that we already have in gl_set.h and gl_map.h.
>
> Bruno



Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 14:04 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > > > /* Return an independent copy of ITER that is initially in the same
> > > >state.  */
> > > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> > >
> > > Then a copy function is not needed, because the user's program can do
> > >
> > >   Hamt_iterator iter_clone = iter;
> >
> > The hamt itself has to be copied (to increase the reference counter).
>
> Then the comment should clarify what "independent" means. I thought it
> means that both iterators (the original one and the copy) can be used
> simultaneously, as long as the HAMT does not change. Do you mean
> something broader?
>   - If someone creates a derivative of the HAMT, the iterator won't
> be affected, right? ("persistence")

Yes.

>   - If someone makes destructive modifications to the HAMT (through the
> *_x functions), the iterator will be affected if it has not yet
> passed the point of modification, right?

No, the iterator shouldn't be affected. (The point of modification is
not well-defined without exposing implementation details.)

> So, what is the scenario where increasing the reference count will make
> a difference?

If you have a language with GC like Lisp or Scheme, say, the hamt may
be GC'd while the iterator is still live.



Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 14:14 Uhr schrieb Bruno Haible :

[...]

> > > > /* Return an independent copy of ITER that is initially in the same
> > > >state.  */
> > > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> > >
> > > Then a copy function is not needed, because the user's program can do
> > >
> > >   Hamt_iterator iter_clone = iter;
> >
> > The hamt itself has to be copied (to increase the reference counter).
>
> Whether copying an iterator can be done by assignment or requires a function
> call, is of second importance.
>
> The more important point I wanted to make: Does allocating an iterator
> require a (heap) memory allocation?
>
> If you iterate with do_while, no memory allocation is needed, because all
> information is stored on the stack, between the various function calls.
>
> If you iterate with hamt_iterator_create, and the Hamt_iterator type
> has bounded size (I guess it will not require more than 15 pointers and
> 13 indices), it can be stored on the caller's stack.

That's a very good point you make. The hamt iterator has, of course,
(low) bounded size, so it can (and should) be allocated on the stack.



Re: out-of-memory handling

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 13:56 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > I am not so convinced of whether it makes sense to create a
> > gl_set/gl_map frontend for this hamt implementation.
>
> Wikipedia [1] lists some advantages of HAMTs even without the persistence.
>
> > It is optimized
> > for the persistence (otherwise, use ordinary hash tables), while the
> > gl_set/gl_map interface is for destructive updates.
>
> How would a HAMT implementation look like that does not support persistence,
> but is instead optimized for destructive updates? Probably the reference
> counters would go away. Anything else?

The reference counter would go away as would all tests for "shared" in the code.

Without a reference counter, an element can be any "void *" (instead
of a pointer to a struct starting with Hamt_entry), which is a
substantial change.

We can add a number of #if's to the code so that it either compiles to
an implementation of persistent or of transient hamts, but I wouldn't
rather do it right now because it would make the code harder to read.
But when the hamt code has become stable (and has been put to use and,
therefore, to real-world testing), this sounds like a feasible option.



Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 13:02 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > I am going to implement the following iterator API as well:
> >
> > /* An alternative interface to iterating through the entry of a hamt
> >that does not make use of a higher-order function like
> >hamt_do_while uses the opaque Hamt_iterator type.  */
> > typedef struct hamt_iterator Hamt_iterator;
> >
> > /* Return a newly created iterator for HAMT.  */
> > extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);
>
> The pointer return here is overkill. A prototype
>
>   extern Hamt_iterator hamt_iterator_create (const Hamt *hamt);
>
> is sufficient, because the way such an iterator is used is:
>
>   Hamt_iterator iter = hamt_iterator_create (hamt);
>
>   for (...)
> {
>   ...
>   Hamt_entry *elt;
>   if (hamt_iterator_next (, ))
> {
>   ...
> }
>   ...
> }
>
>   hamt_iterator_free ();
>
> > /* Return an independent copy of ITER that is initially in the same
> >state.  */
> > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
>
> Then a copy function is not needed, because the user's program can do
>
>   Hamt_iterator iter_clone = iter;

The hamt itself has to be copied (to increase the reference counter).

> > /* Return true if ITER is not at the end, place the current element in
> >*ELT_PTR and move the iterator forward.  Otherwise, return
> >*false.  */
> > extern bool hamt_iterator_next (Hamt_iterator *iter,
> > Hamt_entry *const *elt_ptr);
>
> The 'const' here forbids assigning to *ELT_PTR.

Yes. Wrong position of const.



Re: out-of-memory handling

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 12:54 Uhr schrieb Bruno Haible :

> > The bigger problem is that it mustn't make the code slower for the
> > 99,9% of the use cases where there is either enough memory or
> > calling xalloc_die is the only reasonable action.
>
> OK. When we add hamt as a possible implementation for gl_set and gl_map, we
> could document that out-of-memory handling must be done
>   - either through xalloc_die(),
>   - or by defining xmalloc and xrealloc [this is what Emacs does],
> unlike for the other implementations.

I am not so convinced of whether it makes sense to create a
gl_set/gl_map frontend for this hamt implementation. It is optimized
for the persistence (otherwise, use ordinary hash tables), while the
gl_set/gl_map interface is for destructive updates.



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
I am going to implement the following iterator API as well:

/* An alternative interface to iterating through the entry of a hamt
   that does not make use of a higher-order function like
   hamt_do_while uses the opaque Hamt_iterator type.  */
typedef struct hamt_iterator Hamt_iterator;

/* Return a newly created iterator for HAMT.  */
extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);

/* Free the iterator ITER created through hamt_iterator_create or
   hamt_iterator_copy.  */
extern void hamt_iterator_free (Hamt_iterator *iter);

/* Return an independent copy of ITER that is initially in the same
   state.  */
extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);

/* Return true if ITER is not at the end, place the current element in
   *ELT_PTR and move the iterator forward.  Otherwise, return
   *false.  */
extern bool hamt_iterator_next (Hamt_iterator *iter,
Hamt_entry *const *elt_ptr);

Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
:

> Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :

> >- Creating a closure in C (unlike Scheme or Lisp) is tedious 
> > hand-work.
> >  (The GNU C extension that allows local functions inside functions 
> > [1]
> >  is not a solution, because
> >- it is unportable,
> >- it forces an executable stack on many platforms, which is
> >  considered bad for security.)

PS Conceptually, hamt_do_while takes a closure, which consists of the
function pointer PROC and the closure environment DATA.

If this interface is sufficient for an application, it is more
lightweight than the alternative interface above because all
intermediate state is stored on the stack.



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> Some comments:
>
> * GL_HAMT_THREAD_SAFE can be defined to 1 not only if
> __STDC_VERSION__ >= 201112 && ! defined __STD_NO_ATOMICS__
>   but also if
> __GNUC__ + (__GNUC_MINOR >= 9) > 4

Fixed.

>   (see the other mail).
>
> * Still '#ifdef GL_HAMT_THREAD_SAFE'.

Fixed.

> * Typo s/comparision/comparison/

Fixed.

> * Hamt_processor: The comment does not say what the return value means. Maybe 
> it
>   would be clearer if this type was moved down to the "Iteration" section.

Moved.

> * hamt_lookup: If the caller is allowed to modify the payload stored in the
>   returned entry, this function should return a 'Hamt_entry *', not a
>   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
>   not a 'const char *'.

Unless the caller knows what it does, modifying the payload is not a
good idea because the entry is shared between different hamts. If the
caller really wants to modify the payload, it has to do an explicit
type cast (which is safe).

> * In trienode_count, you don't need count_one_bits_l, only count_one_bits.
>   It's OK to assume that 'uint32_t' and 'unsigned int' are the same type.

Okay.

> * If subtrie_lookup was inlined in entry_lookup, you could get rid of the
>   forward declaration of entry_lookup, and compilers would be more likely
>   to turn the recursion of entry_lookup into a loop (tail-recursion
>   elimination).

I would prefer to have smaller functions. Are there any relevant
compilers who do not inline the function at higher recursion levels?

> * If subtrie_do_while was inlined in entry_do_while, you could get rid of the
>   forward declaration of entry_do_while.

> * It would be useful to be able to construct modules 'hamt-set' and 
> 'hamt-map',
>   analogous to 'hash-set' and 'hash-map', but the hamt API currently has two
>   issues that block it:
>
>   1) The iterator API is such that you cannot write the iteration using a 
> loop;
>  it involves a function call that is active during the entire iteration.
>  This is also problematic for two other reasons:
>
>- Creating a closure in C (unlike Scheme or Lisp) is tedious hand-work.
>  (The GNU C extension that allows local functions inside functions [1]
>  is not a solution, because
>- it is unportable,
>- it forces an executable stack on many platforms, which is
>  considered bad for security.)
>
>- In Common Lisp, iteration through a hash table is available not only
>  through a function-call interface (MAPHASH [2]) but also through an
>  iteration-loop interface (WITH-HASH-TABLE-ITERATOR [3], LOOP [4]).
>
>  For example, in GNU clisp, LOOP expands to
>
>  [1]> (macroexpand-1 '(loop for x being each hash-key of h do (foo 
> x)))
>  (MACROLET ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-ERROR)))
>   (BLOCK NIL
>(LET ((#:WHTI-3214 (SYSTEM::HASH-TABLE-ITERATOR H)))
> (MULTIPLE-VALUE-BIND (#:MORE?3215 #:HASH-KEY-3216 
> #:HASH-VALUE-3217)
>  (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214)
>  (DECLARE (IGNORE #:HASH-VALUE-3217))
>  (LET ((X NIL))
>   (LET NIL
>(MACROLET ((LOOP-FINISH NIL '(GO SYSTEM::END-LOOP)))
> (TAGBODY SYSTEM::BEGIN-LOOP (UNLESS #:MORE?3215 (LOOP-FINISH))
>  (SETQ X #:HASH-KEY-3216) (PROGN (PROGN (FOO X)))
>  (MULTIPLE-VALUE-SETQ
>   (#:MORE?3215 #:HASH-KEY-3216 #:HASH-VALUE-3217)
>   (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214))
>  (GO SYSTEM::BEGIN-LOOP) SYSTEM::END-LOOP
>  (MACROLET
>   ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-WARN)
> '(GO SYSTEM::END-LOOP ;
>
>  You can see that there is an initial function call 
> HASH-TABLE-ITERATOR
>  upfront, and then a function call to HASH-TABLE-ITERATE in each round
>  of the loop.
>
>  This principle makes it possible to iterate e.g. through a hash table
>  and a list in parallel.

In Scheme, it is even more relevant because you have call/cc so any
iteration can be suspended and restored any number of times.

I thought this could be solved by storing a "next" pointer in the
payload but this doesn't work with hamts sharing entries... So I will
add some iterator interface as well. Preferably one that can also be
used to implement Hamts in Scheme.

>   2) We have a dilemma regarding use of malloc vs. xmalloc. While modules
>  meant for use in programs can readily use xmalloc, modules meant for use
>  (also) in libraries cannot use xmalloc, since a library is not supposed
>  to terminate the program, even when memory is tight.

GMP more or less has this behavior.

>  The solution we found for this dilemma is that the *-set and *-map 
> modules
>  use just 

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
I have attached an improved version of the HAMT module to this email.

Apart from improving the comments (which includes moving some
documentation into the header file) and changing the things already
discussed, I added a few more tests and three procedures for in-place
updates.

For example, you can now do hamt_insert_x to destructively insert an
element into a hamt (instead of inserting the element into a
conceptual copy as you do with hamt_insert). Hamts that share
structure with the modified hamt are not effected. (The idea with the
_x prefix comes from Guile whose C interface uses _x where Scheme uses
!.)

NB: For those who fancy Scheme, hamts can be used to implement the
library (srfi 146 hash) of SRFI 146 ([1]). The destructive update
operations of this module can be used to implement the linear-update
procedures of (srfi 146 hash) efficiently.

--

[1] https://srfi.schemers.org/srfi-146/srfi-146.html

Am Sa., 10. Okt. 2020 um 23:24 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am Sa., 10. Okt. 2020 um 20:19 Uhr schrieb Paul Eggert :
>
> > #if __STDC_VERSION__ < 201112 || defined __STD_NO_ATOMICS__
> >
> > which is a cleaner way of writing the negative of the above test. These days
> > there should be no reason to check whether __STDC_VERSION__ is defined,
> > generally it's clearer to use "<" instead of ">=" so that textual order 
> > reflects
> > numeric order, and the parens after "defined" are better omitted.
>
> Thanks, I did the edit in my local copy.
From db21b3e975ad7526e0db531fd4936354d8f031ae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Sat, 10 Oct 2020 23:38:21 +0200
Subject: [PATCH] hamt: New module.

This module provides (persistent) hash array mapped tries.
* MODULES.html.sh: Add hamt.
* lib/hamt.c: New file.
* lib/hamt.h: New file.
* modules/hamt: New file.
* modules/hamt-tests: New file.
* tests/test-hamt.c: New file.
---
 ChangeLog  |  11 +
 MODULES.html.sh|   1 +
 lib/hamt.c | 984 +
 lib/hamt.h | 182 +
 modules/hamt   |  29 ++
 modules/hamt-tests |  11 +
 tests/test-hamt.c  | 352 
 7 files changed, 1570 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/ChangeLog b/ChangeLog
index 2aba2b0c7..b9e4f9970 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2020-10-10  Marc Nieper-Wißkirchen  
+
+	hamt: New module.
+	This module provides (persistent) hash array mapped tries.
+	* MODULES.html.sh: Add hamt.
+	* lib/hamt.c: New file.
+	* lib/hamt.h: New file.
+	* modules/hamt: New file.
+	* modules/hamt-tests: New file.
+	* tests/test-hamt.c: New file.
+
 2020-10-10  Bruno Haible  
 
 	*-list, *-oset, *-omap: Avoid possible compiler warnings.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index 7e7cdae3e..2907eb741 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2028,6 +2028,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 0..eb15e19c1
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,984 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen , 2020.  */
+
+#include 
+#include "hamt.h"
+
+#include 
+#include 
+#include 
+#include 
+#include "count-one-bits.h"
+#include "verify.h"
+#include "xalloc.h"
+
+/* Reference counters can be shared between different threads if the
+   entry they belong to is shared between different threads.
+   Operations on them therefore have to be atomic so that no invalid
+   state is observable.
+
+   A thread must not modify an entry or its children (!) if its
+   reference count implies that the entry is shared by at least two
+   hamts.  */
+typedef
+#if GL_HAMT_THREAD_SAFE
+_Atomic
+#endif
+size_t ref_counter;
+
+/* Hash values are of type size_t.  For each level of the trie, we use
+   

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
Am Sa., 10. Okt. 2020 um 20:19 Uhr schrieb Paul Eggert :

> #if __STDC_VERSION__ < 201112 || defined __STD_NO_ATOMICS__
>
> which is a cleaner way of writing the negative of the above test. These days
> there should be no reason to check whether __STDC_VERSION__ is defined,
> generally it's clearer to use "<" instead of ">=" so that textual order 
> reflects
> numeric order, and the parens after "defined" are better omitted.

Thanks, I did the edit in my local copy.



[PATCH] stack: New module.

2020-10-10 Thread Marc Nieper-Wißkirchen
From: Marc Nieper-Wißkirchen 

* MODULES.html.sh: Add entry for the stack module.
* modules/stack: New file.
* modules/stack-tests: New file.
* lib/stack.h: New file.
* tests/test-stack.c: New file.
---
 ChangeLog   |   9 +++
 MODULES.html.sh |   1 +
 lib/stack.h | 165 
 modules/stack   |  25 +++
 modules/stack-tests |  13 
 tests/test-stack.c  |  74 
 6 files changed, 287 insertions(+)
 create mode 100644 lib/stack.h
 create mode 100644 modules/stack
 create mode 100644 modules/stack-tests
 create mode 100644 tests/test-stack.c

diff --git a/ChangeLog b/ChangeLog
index b479cf0c7..f496c8345 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2020-10-10  Marc Nieper-Wißkirchen  
+
+   stack: New module.
+   * MODULES.html.sh: Add entry for the stack module.
+   * modules/stack: New file.
+   * modules/stack-tests: New file.
+   * lib/stack.h: New file.
+   * tests/test-stack.c: New file.
+
 2020-10-10  Paul Eggert  
 
attribute: improve const, pure doc
diff --git a/MODULES.html.sh b/MODULES.html.sh
index a8a629e29..7e7cdae3e 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2031,6 +2031,7 @@ func_all_modules ()
   func_module readline
   func_module readtokens
   func_module readtokens0
+  func_module stack
   func_module strverscmp
   func_module filevercmp
   func_end_table
diff --git a/lib/stack.h b/lib/stack.h
new file mode 100644
index 0..5d803901c
--- /dev/null
+++ b/lib/stack.h
@@ -0,0 +1,165 @@
+/* Type-safe stack data type.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen , 2020.  */
+
+/* This header file does not have include-guards as it is meant to be
+   includeable multiple times as long as the necessary defines have
+   been set up.
+
+   A stack is implemented with a homogenous array of elements in
+   memory, which will be resized (and moved) as needed.  Elements are
+   passed by value and it is the responsibility of the user-code to
+   free any resources associated with individual elements.
+
+   The exported names are all prefixed by ‘stack’ (e.g. stack_init),
+   but this can be changed by setting an appropriate define.
+ Type:   stack_type
+ Initialization: stack_init ();
+ De-initialization:  stack_destroy ();
+ Predicate:  bool res = stack_empty ();
+ Introspection:  ELEMENT *base = stack_base ();
+ Pushing:stack_push (, element);
+ Popping:ELEMENT element = stack_pop ();
+ Discarding: stack_discard ();
+ Top-of-stack:   ELEMENT element = stack_top ();
+ Size:   size_t size = stack_size ();
+
+   Here, ELEMENT is the type to which GL_STACK_ELEMENT was defined when
+   this file was included.
+*/
+
+/* Before including this file, you need to define:
+ GL_STACK_ELEMENT   The type of the elements on the stack.
+ GL_STACK_STORAGECLASS  (Optional) The storage class specifier (e.g. 
static)
+to use in the function definitions.
+ GL_STACK_NAME  (Optional) The prefix to use for the type names
+and functions definitions instead of the default
+‘stack’.
+
+   After including this file, these names will be undefined.
+
+   Before including this file, you also need to include:
+ #include ""
+ #include ""
+ #include "assure.h"
+ #include "xalloc.h"
+*/
+
+#ifndef GL_STACK_ELEMENT
+# error "Please define GL_STACK_ELEMENT first."
+#endif
+
+#ifndef GL_STACK_STORAGECLASS
+# define GL_STACK_STORAGECLASS
+#endif
+
+#ifndef GL_STACK_NAME
+# define GL_STACK_NAME stack
+#endif
+
+#define _GL_STACK_PREFIX(name) _GL_CONCAT(GL_STACK_NAME, _GL_CONCAT(_, name))
+#define _GL_STACK_TYPE _GL_STACK_PREFIX(type)
+
+typedef struct
+{
+  GL_STACK_ELEMENT *base;
+  size_t size;
+  size_t allocated;
+} _GL_STACK_TYPE;
+
+/* Initialize a stack.  */
+GL_STACK_STORAGECLASS void
+_GL_STACK_PREFIX (init) (_GL_STACK_TYPE *stack)
+{
+  stack->base = NULL;
+  stack->size = 0;
+  stack->allocated = 0;
+}
+
+/* Destroy a stack by freeing the allocated space.  */
+GL_STACK_STORAGECLASS vo

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
Am Sa., 10. Okt. 2020 um 19:41 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > Is there a special Gnulib way (or Autoconf macro) for the following test:
> >
> > #if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L &&
> > !defined (__STD_NO_ATOMICS__)
> >
> > I am asking because there may be non-C11 compilers that nevertheless
> > understand _Atomic.
>
> And there may be C11 compilers which don't support _Atomic and nevertheless
> dont't define __STD_NO_ATOMICS__.
>
> So, indeed it would be useful to have an Autoconf macro that tests whether
> _Atomic can be used.

If someone is able to provide such a macro for Gnulib, I would replace
my check with this macro.



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
Speaking of GL_HAMT_THREAD_SAFE:

Is there a special Gnulib way (or Autoconf macro) for the following test:

#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L &&
!defined (__STD_NO_ATOMICS__)

I am asking because there may be non-C11 compilers that nevertheless
understand _Atomic.

Am Sa., 10. Okt. 2020 um 17:01 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am Sa., 10. Okt. 2020 um 16:54 Uhr schrieb Bruno Haible :
> >
> > Since you define GL_HAMT_THREAD_SAFE to either 0 or 1, you need
> > to test it through
> >   #if GL_HAMT_THREAD_SAFE
> > not
> >   #ifdef GL_HAMT_THREAD_SAFE
> >
> > Bruno
>
> Oh, yes... thanks!



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
Am Sa., 10. Okt. 2020 um 16:54 Uhr schrieb Bruno Haible :
>
> Since you define GL_HAMT_THREAD_SAFE to either 0 or 1, you need
> to test it through
>   #if GL_HAMT_THREAD_SAFE
> not
>   #ifdef GL_HAMT_THREAD_SAFE
>
> Bruno

Oh, yes... thanks!



Re: Unused parameter warnings

2020-10-10 Thread Marc Nieper-Wißkirchen
Am Sa., 10. Okt. 2020 um 16:39 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > I can ask for a better description of "pure" in the GCC manual.
>
> Yes, please.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97364



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-10 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Am Sa., 10. Okt. 2020 um 16:35 Uhr schrieb Bruno Haible :

> It is exciting to see such a datastructure with various application domains 
> [1]
> in Gnulib!

I'm glad that you like the idea.

>
> I haven't read through it line by line; nevertheless a couple of comments:

A few more lines will follow before the final draft is done because I
have been adding some destructive update procedures in the meantime.

>   - +/* The public interface is documented in the file hamt.c.
>
> This is not good. As a user of the module, I would want to read *only*
> hamt.h and know how to use the public interface. In other words, hamt.h
> should have the functions' comments.

I followed the example of the already existing "hash" module. But I
can easily move the public comments into the header file.

>   - "Each
> updating operation returns a new hamt, which shares structure with
> the original one."
> So, after calling hamt_insert or hamt_delete, which function should I call
> to delete the original Hamt without affecting the new one? (In order to
> avoid accumulating pieces of old Hamts in memory.)

When you do, say,

new_hamt = hamt_insert (hamt, ...)

and new_hamt != hamt afterwards, you have to delete both hamt and
new_hamt with hamt_free to free all the memory. After you have freed
the first hamt, the second won't be affected.

>   - "The GL_HAMT_THREAD_SAFE flag is set if the implementation of hamts
> is thread-safe as long as two threads do not simultaneously access
> the same hamt."
> I don't understand this sentence. Isn't the guarantee that
> "is thread-safe as long as two threads do not simultaneously access
> the same hamt" trivial? And what you want to achieve through the use
> of _Atomic is that it is thread-safe *also* when two threads access
> the same hamt?

The guarantee is not trivial because two different hamts may share
some structure.

So... assume you have a hamt. You do some operations on it (or you do
hamt_copy) to get a new_hamt that shares structure. Now if thread 1
works on hamt and thread 2 on new_hamt, it is safe when
GL_HAMT_THREAD_SAFE is set.

If two threads access the same hamt, it is not guaranteed to be safe.
If you need this, get an (extremely shallow) copy through hamt_copy
first.

Marc



Re: stack module

2020-10-10 Thread Marc Nieper-Wißkirchen
Hi Bruno,

Am Sa., 10. Okt. 2020 um 16:06 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > please see the attached patch for the new module.
>
> Very nice. Looks all good, except for some nit-picking in the comments:

Thanks.

>
> > + Introspection:  ELEMENT *base = stack_base ();
>
> I would add a comment here:
>   Where ELEMENT is the type to which GL_STACK_ELEMENT was defined when
>   this file was included.
> (It would be tempting to write
> GL_STACK_ELEMENT *base = stack_base ();
> but GL_STACK_ELEMENT is no longer defined after the file was included...)

I will add this comment.

> Other than that, please feel free to commit and push this. Thanks!!

I will do this as soon as I have been approved by Paul. (Do I have to
do something about the ChangeLog entry or will it be auto-generated
from my Git commit message?)

Marc



Re: Unused parameter warnings

2020-10-10 Thread Marc Nieper-Wißkirchen
I can ask for a better description of "pure" in the GCC manual. In any
case, if a function doesn't return at all due to a function call
marked with _Noreturn, it does not affect the observable state locally
and GCC can avoid "emitting some calls in repeated invocations of the
function with the same argument values". As I wrote before, this is an
important observation as otherwise "assert" couldn't be used in "pure"
functions.

Anyway, the function gl_linked_iterator_from_to in Gnulib should
probably be fixed independently by adding the "pure" attribute so that
the code is compilable with -Werror -Wall -Wextra.

Am Sa., 10. Okt. 2020 um 15:50 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > > And a function that may invoke abort () does "affect observable state".
> >
> > This description of the a pure function does not seem to be accurate.
> > When a function calls another function like abort that is marked with
> > _Noreturn in a pure context, for the compiler the function can still
> > be pure (but not const). It can eliminate a second call to the
> > function with the same parameters. I am pretty sure that the warning
> > of GCC in line 938 is correct.
>
> The documentation of ATTRIBUTE_PURE in Gnulib is taken from the GCC
> documentation [1][2], therefore if you think it needs to be fixed, it's
> through a GCC bug report.
>
> Bruno
>
> [1] https://lists.gnu.org/archive/html/bug-gnulib/2020-05/msg00105.html
> [2] 
> https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Function-Attributes.html
>



[New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-09 Thread Marc Nieper-Wißkirchen
Hi,

after I have contributed two comparably trivial modules to Gnulib, I
would like to contribute a less trivial module this time. It
implements a persistent version of Phil Bagwell's HAMTs, which has
been popularized by Clojure. HAMTs can be used when a persistent
(functional/pure) version of a data structure akin to hash tables is
needed. For example, the dynamic environment of a (possibly
multi-threaded) Lisp or Scheme can be modeled with persistent HAMTs.

Please take a look at the attached patch.

Thank you,

Marc
From 39a1ac1f78c8d00b1dfad4f260318a6fb5cf5584 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Wed, 7 Oct 2020 09:53:48 +0200
Subject: [PATCH] hamt: New module.

This module provides (persistent) hash array mapped tries.

* MODULES.html.sh: Add hamt.
* lib/hamt.c, lib/hamt.h, modules/hamt, modules/hamt-tests,
tests/test-hamt.c: New files.
---
 MODULES.html.sh|   1 +
 lib/hamt.c | 812 +
 lib/hamt.h |  97 ++
 modules/hamt   |  29 ++
 modules/hamt-tests |  11 +
 tests/test-hamt.c  | 148 +
 6 files changed, 1098 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/MODULES.html.sh b/MODULES.html.sh
index a8a629e29..b48ca2bc4 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2028,6 +2028,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 0..4876d5809
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,812 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Written by Marc Nieper-Wißkirchen , 2020.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+#include 
+#include "hamt.h"
+
+#include 
+#include 
+#include 
+#include 
+#include "count-one-bits.h"
+#include "verify.h"
+#include "xalloc.h"
+
+/* See: Phil Bagwell (2000). Ideal Hash Trees (Report). Infoscience
+   Department, École Polytechnique Fédérale de Lausanne.
+
+   http://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf
+
+   We implement a persistent version of hash array mapped tires.  Each
+   updating operation returns a new hamt, which shares structure with
+   the original one.  If persistence is not needed, transient hash
+   tables are probably faster. */
+
+typedef
+#ifdef GL_HAMT_THREAD_SAFE
+_Atomic
+#endif
+size_t ref_counter;
+
+/* Hash values are of type size_t.  For each level of the trie, we use
+   5 bits (corresponding to lg2 of the width of a 32-bit word.  */
+#define MAX_DEPTH ((SIZE_WIDTH + 4) / 5)
+
+/***/
+/* Entry Types */
+/***/
+
+/* Leaf nodes are of type element.  Non-leaf nodes are either
+   subtries or, if at maximal depth, buckets.  */
+enum entry_type
+{
+  element_entry = 0,
+  subtrie_entry = 1,
+  bucket_entry = 2
+};
+
+/* Return the type an entry.  */
+static enum entry_type
+entry_type (const Hamt_entry *entry)
+{
+  return entry->ref_count & 3;
+}
+
+//
+/* Reference Counts */
+//
+
+/* Initialize the reference counter, storing its type.  */
+static void
+init_ref_counter (ref_counter *counter, enum entry_type type)
+{
+  *counter = 4 + type;
+}
+
+/* Increase the reference counter of an entry.  */
+static void
+inc_ref_counter (ref_counter *counter)
+{
+  *counter += 4;
+}
+
+/* Decrease the entry reference counter.  Return false if the entry
+   can be deleted.  */
+static bool
+dec_ref_counter (ref_counter *counter)
+{
+  *counter -= 4;
+  return *counter >= 4;
+}
+
+/**/
+/* Structures */
+/**/
+
+/* Different generations of a hamt share a function table.  */
+struct function_table
+{
+  Hamt_hasher *hasher;
+  Hamt_comparator *comparator;
+  Hamt_freer *freer;
+  ref_counter ref_count;
+};
+
+/* Different generations of a hamt share subtries.  A singleton
+   subtrie is modelled as a single element.  */
+struct subtrie
+{
+  ref_counter ref_count;
+  /* Nodes carry labels from 0 to 31.  The i-th bit in map is set if
+ the node l

Re: stack module

2020-10-07 Thread Marc Nieper-Wißkirchen
Dear Bruno,

please see the attached patch for the new module.

Thanks,

Marc

Am Mi., 7. Okt. 2020 um 11:01 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Hi Bruno,
>
> I am finally finishing my work on the stack module.
>
> Am Do., 23. Juli 2020 um 12:36 Uhr schrieb Bruno Haible :
>
> [...]
>
> > This is perfectly acceptable for Gnulib. It has debuggability and type 
> > safety.
> >
> > You have precedent e.g. in lib/diffseq.h and lib/aligned-malloc.h.
> >
> > You can even omit the 'GL_' prefix from the macro names. The user can #undef
> > the macros after including the file; therefore there is nearly no risk of
> > collision with macros defined by other code.
>
> After I have given it a bit more thought, I see a different type of
> collision possibility, namely if some header file uses the stack
> header file. It will leak the macro names. Assume that the stack
> module accepts the macro name ELEMENT for the element type. A
> hypothetical header for the frob type may be implemented as:
>
> #ifndef FROB_H
> #define FROB_H
> ...
> #define ELEMENT int
> #define STORAGECLASS static inline
> #include "stack.h"
> ...
> struct frob
> {
>   stack frob_stack;
>   ...
> };
> ...
> #endif /* FROB_H */
>
> Now, user code cannot do:
>
> #define ELEMENT hydrogen
> #include "frob.h"
> ...
>
> (The ELEMENT definition in the first line could in fact be exported by
> some other header file.)
>
> Therefore, I think I prefer to use prefixed names.
>
> Cheers,
>
> Marc
From c449d60058f4a5806729ec76779fbb050890ddaf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Wed, 7 Oct 2020 13:29:10 +0200
Subject: [PATCH] Type-safe stack data type. * MODULES.html.sh: Add entry for
 the stack module. * modules/stack: New file. * modules/stack-tests: New file.
 * lib/stack.h: New file. * tests/test-stack.c: New file.

---
 ChangeLog   |   9 +++
 MODULES.html.sh |   1 +
 lib/stack.h | 163 
 modules/stack   |  25 +++
 modules/stack-tests |  13 
 tests/test-stack.c  |  74 
 6 files changed, 285 insertions(+)
 create mode 100644 lib/stack.h
 create mode 100644 modules/stack
 create mode 100644 modules/stack-tests
 create mode 100644 tests/test-stack.c

diff --git a/ChangeLog b/ChangeLog
index 875f3551a..b64689f73 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2020-10-07  Marc nieper-Wißkirchen  
+
+	stack: New module.
+	* MODULES.html.sh: Add entry for the stack module.
+	* modules/stack: New file.
+	* modules/stack-tests: New file.
+	* lib/stack.h: New file.
+	* tests/test-stack.c: New file.
+
 2020-10-05  Paul Eggert  
 
 	thread: pacify GCC on Solaris 10
diff --git a/MODULES.html.sh b/MODULES.html.sh
index a8a629e29..7e7cdae3e 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2031,6 +2031,7 @@ func_all_modules ()
   func_module readline
   func_module readtokens
   func_module readtokens0
+  func_module stack
   func_module strverscmp
   func_module filevercmp
   func_end_table
diff --git a/lib/stack.h b/lib/stack.h
new file mode 100644
index 0..22e976952
--- /dev/null
+++ b/lib/stack.h
@@ -0,0 +1,163 @@
+/* Type-safe stack data type.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Marc Nieper-Wißkirchen , 2020.  */
+
+/* This header file does not have include-guards as it is meant to be
+   includeable multiple times as long as the necessary defines have
+   been set up.
+
+   A stack is implemented with a homogenous array of elements in
+   memory, which will be resized (and moved) as needed.  Elements are
+   passed by value and it is the responsibility of the user-code to
+   free any allocate and free any resources associated with individual
+   elements.
+
+   The exported names are all prefixed by ‘stack’ (e.g. stack_init),
+   but this can be changed by setting an appropriate define.
+ Type:   stack_type
+ Initialization: stack_init ();
+ De-initialization:  stack_destroy ();
+ Predicate:  bool res = stack_empty ();
+ Introspection:  ELEMENT *base = stack_base 

Re: stack module

2020-10-07 Thread Marc Nieper-Wißkirchen
Hi Bruno,

I am finally finishing my work on the stack module.

Am Do., 23. Juli 2020 um 12:36 Uhr schrieb Bruno Haible :

[...]

> This is perfectly acceptable for Gnulib. It has debuggability and type safety.
>
> You have precedent e.g. in lib/diffseq.h and lib/aligned-malloc.h.
>
> You can even omit the 'GL_' prefix from the macro names. The user can #undef
> the macros after including the file; therefore there is nearly no risk of
> collision with macros defined by other code.

After I have given it a bit more thought, I see a different type of
collision possibility, namely if some header file uses the stack
header file. It will leak the macro names. Assume that the stack
module accepts the macro name ELEMENT for the element type. A
hypothetical header for the frob type may be implemented as:

#ifndef FROB_H
#define FROB_H
...
#define ELEMENT int
#define STORAGECLASS static inline
#include "stack.h"
...
struct frob
{
  stack frob_stack;
  ...
};
...
#endif /* FROB_H */

Now, user code cannot do:

#define ELEMENT hydrogen
#include "frob.h"
...

(The ELEMENT definition in the first line could in fact be exported by
some other header file.)

Therefore, I think I prefer to use prefixed names.

Cheers,

Marc



  1   2   >