Re: string types

2019-12-28 Thread Paul Eggert
On 12/28/19 12:44 PM, ag wrote:
> is your opininion that this is adequate?
> 
> typedef ptrdiff_t msize_t (m for memory here)

Yes, something like that. dfa.c calls this type 'idx_t', which is a couple of
characters shorter.



Re: string types

2019-12-28 Thread ag
Hi Paul,

On Sat, Dec 28, at 10:28 Paul Eggert wrote:
> > Based on the above assumptions this can be extended. First instead of 
> > size_t to
> > return ssize_t, so functions can return -1 and set errno accordingly.
> 
> It's better to use ptrdiff_t for this sort of thing, since it's hardwired into
> the C language (you can't do any better than ptrdiff_t anyway, if you use
> pointer subtraction), whereas ssize_t is merely in POSIX and is narrower than
> ptrdiff_t on some (obsolete?) platforms.

So, let's say we designed this thing without obligating to the past and 
thinking for
the next hundred years (of course with the current knowledge and to lessons 
from the
past), and wanted to make it work with malloc and string type functions, as 
best it
can be done and without worries for overflows and unsigned divisions and all 
this
kind of confusing things that hunts us altogether after so many years that 
things
should have been settled by now... is your opininion that this is adequate?

typedef ptrdiff_t msize_t (m for memory here)

> > #define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4))
> > #define MEM_IS_INT_OVERFLOW(nmemb, ssize) \
> >  (((nmemb) >= MUL_NO_OVERFLOW || (ssize) >= MUL_NO_OVERFLOW) &&   \
> >   (nmemb) > 0 && SIZE_MAX / (nmemb) < (ssize))
> 
> Ouch. That code is not good. An unsigned division at runtime to do memory
> allocation? Gnulib does better than that already. Also, Glibc has some code in
> this area that we could migrate into Gnulib, that could be better yet.

Sorry, i don't have time to do it right now - as i just escaped from a 
snow-storm -
but i will check this for atleast not to spread misleading information (is quite
possible my fault here), so thanks for your comment.

By the way Paul and since i'm self taught by practical experience kind of human
being and joking with zoi here said that at least my teacher is a hall of famer
in the computing history. Isn't this life great!
So true this is also a school for free afterall.

My Honor,
 Αγαθοκλής



Re: string types

2019-12-28 Thread Paul Eggert
On 12/28/19 5:14 AM, ag wrote:

>   - PTRDIFF_MAX is at least INT_MAX and at most SIZE_MAX
> (PTRDIFF_MAX is INT_MAX in 32bit)

PTRDIFF_MAX can exceed SIZE_MAX, in the sense that POSIX and C allows it and it
could be useful on 32-bit platforms for size_t to be 32 bits and ptrdiff_t to be
64 bits. Although I don't know of any platforms doing things that way, I prefer
not to assume that PTRDIFF_MAX <= SIZE_MAX so as to allow for the possibility.

>   - SIZE_MAX as (size_t) (-1)
> 
>   - ssize_t (s means signed?) can be as big as SIZE_MAX? and SSIZE_MAX equals 
> to
> SIZE_MAX?

ssize_t can be either narrower or wider than size_t, according to POSIX.
Historically ssize_t was 32 bits and size_t 64 bits on some platforms, and
though I don't know of any current platforms doing that it's easy to not make
assumptions here.

> Based on the above assumptions this can be extended. First instead of size_t 
> to
> return ssize_t, so functions can return -1 and set errno accordingly.

It's better to use ptrdiff_t for this sort of thing, since it's hardwired into
the C language (you can't do any better than ptrdiff_t anyway, if you use
pointer subtraction), whereas ssize_t is merely in POSIX and is narrower than
ptrdiff_t on some (obsolete?) platforms.

> In my humble opinion there is also the choise to choose reallocarray() from 
> OpenBSD,
> which always checks for integer overflows with the following way:
> 
> #define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4))
> #define MEM_IS_INT_OVERFLOW(nmemb, ssize) \
>  (((nmemb) >= MUL_NO_OVERFLOW || (ssize) >= MUL_NO_OVERFLOW) &&   \
>   (nmemb) > 0 && SIZE_MAX / (nmemb) < (ssize))

Ouch. That code is not good. An unsigned division at runtime to do memory
allocation? Gnulib does better than that already. Also, Glibc has some code in
this area that we could migrate into Gnulib, that could be better yet.



Re: immutable string type

2019-12-28 Thread Paul Eggert
On 12/28/19 3:17 AM, Bruno Haible wrote:

> Would you find it useful to have an immutable string type in gnulib?

Sounds useful. I assume you plan to generalize it to any type; something like 
this:

  p = immalloc (sizeof *p);
  p->x = whatever; p->y = something; ...
  imfreeze (p, sizeof *p);
  [no changes to *p allowed here]
  imfree (p);

imfreeze can be a no-op unless debugging.

Oh, I see that Tim Rühsen has the same idea.

I prefer the prefix "im" to "i" for immutable, as plain "i" could stand for a
lot of things. (Plus, "imasprintf" rolls off the tongue better. :-)



Re: immutable string type

2019-12-28 Thread Ben Pfaff
On Sat, Dec 28, 2019 at 3:17 AM Bruno Haible  wrote:
> Would you find it useful to have an immutable string type in gnulib?

I like this idea! Actually the idea of having primitives for allocating and
filling data and then getting read-only access to it is a good one in
general.  I haven't worked with anything like this before, so perhaps the
real value of it won't be apparent without some experience.

This sort of thing won't work on systems with virtually indexed caches,
at least not without inserting explicit flushes.  I don't know whether
virtually indexed caches still exist in the wild.



Re: string types

2019-12-28 Thread ag
Hi,

On Fri, Dec 27, at 11:51 Bruno Haible wrote:
>  - providing primitives for string allocation reduces the amount of buffer
>overflow bugs that otherwise occur in this area. [1]

[1] Re: string allocation
https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html

Thanks, i remember this thread, though at the time i couldn't understand some 
bits.

>> ag wrote:
> > ... to the actual algorithm (usually conditions that can or can't be met).

> That is the idea behind the container types (list, map) in gnulib. However, I 
> don't
> see how to reasonably transpose this principle to string types.

Ok, let us try, so allow me to summarize with some of (my unqualified) 
assumptions
(please correct):

  - glibc malloc can request at most PTRDIFF_MAX

  - PTRDIFF_MAX is at least INT_MAX and at most SIZE_MAX
(PTRDIFF_MAX is INT_MAX in 32bit)

  - SIZE_MAX as (size_t) (-1)

  - ssize_t (s means signed?) can be as big as SIZE_MAX? and SSIZE_MAX equals to
SIZE_MAX?

  - the returned value of the *printf family of functions dictates their
limits/range, as they return an int, this can be as INT_MAX mostly

Some concerns:

  - truncation errors should be caught

  - memory checkers should catch overflows

  - as since there is a "risk"¹ that someone has to take at some point (either 
the
programmer or the underlying library code (as strdup() does)), the designed
interface should lower those risks

There is a proposal from Eric Sanchis to Austin group at 9 Jun 2016, for a 
String
copy/concatenation interface, that his functions have both the allocated size 
and
the number of bytes to be written as arguments (some i will inline them here, 
since
i was unable to find his mail in the Posix mailing list archives).

I used this as a basis (as it was rather intuitive and perfectly suited for C), 
to
implement my own str_cp, which goes like this:

size_t str_cp (char *dest, size_t dest_len, const char *src, size_t nelem) {
  size_t num = (nelem > (dest_len - 1) ? dest_len - 1 : nelem);
  size_t len = (NULL is src ? 0 : byte_cp (dest, src, num));
  dest[len] = '\0';
  return len;
}

size_t byte_cp (char *dest, const char *src, size_t nelem) {
  const char *sp = src;
  size_t len = 0;

  while (len < nelem and *sp) {
dest[len] = *sp++;
len++;
  }

  return len;
}

Of course it can be done better, but here we have a low level function 
(byte_cp),
that does only the required checks and which returns the actual bytes written to
`dest', while str_cp checks if `src' is NULL and if `nelem' is bigger than 
`dest_len'
(if it is then copies at least `dest_len' - 1). It returns 0 or the actual 
written
bytes.

Since this returns the actual bytes written, it is up to the programmer to check
if truncation happened, but there is no possibility to copy more than 
`dest_len' - 1.

Based on the above assumptions this can be extended. First instead of size_t to
return ssize_t, so functions can return -1 and set errno accordingly.

Eric Sanchis in his proposal does it a bit different because in his functions 
adds
an extra argument as size_t, that uses this to control the behavior of the 
function
(what it will do in the case that destination length is less than source len).

He uses an int as a returned value which either is 0/1 on succesful operation, 
the
following:
#define   OKNOTRUNC  0  /* copy/concatenation performed without 
truncation */
#define   OKTRUNC1  /* copy/concatenation performed with truncation 
*/

And below is the extra information passed as fifth argument:
#define   TRUNC  0  /* truncation allowed */
#define   NOTRUNC1  /* truncation not allowed */

In the case of an error, returns > 0 which is either:
#define   EDSTPAR   -1  /* Error : bad dst parameters */
#define   ESRCPAR   -2  /* Error : bad src parameters */
#define   EMODPAR   -3  /* Error : bad mode parameter */
#define   ETRUNC-4  /* Error : not enough space to copy/concatenate
   and truncation not 
allowed */

Now combining all this and if the assumptions are correct, gnulib can return
ssize_t and uses this to make it's functions to work up to SIZE_MAX and uses
either Eric's interface or to set errno accordingly.

But to me a function call like:
  str_cp (dest, memsize_of_dest, src, memsize_of_dest - 1)
is quite common C's way to do things, plus we have a way to catch truncation and
not to go out of bounds at the same time.

Of course such operations are tied with malloc().
I've read the gnulib document yesteday and i saw that gnulib wraps malloc() 
with a
function that (quite logically) aborts execution and even allows to set a 
callback
function.

In my humble opinion there is also the choise to choose reallocarray() from 
OpenBSD,
which always checks for integer overflows with the following way:

#define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4))
#define 

Re: immutable string type

2019-12-28 Thread Tim Rühsen
Hi Bruno,

On 28.12.19 12:17, Bruno Haible wrote:
> Would you find it useful to have an immutable string type in gnulib?

The idea is good in fact had similar thoughts/needs a while ago. IMO,
the use cases are mostly in the testing area (especially fuzzing).

As a more general approach, a function that switches already allocated
memory into read-only memory would be handy. Like in
 - m = malloc()
 - initialize m with some data
 - if in debug mode: call memmap_readonly(m) - from this point on 'm' is
read-only and a write leads to a segmentation fault.
 - ...
 - free(m)

Maybe it would best be integrated into glibc ?

Functions like iasprintf could then be built around existing functions
as needed, e.g. as static inline or as macro.

> In the simplest case, this would a 'const char *' where the 'const' is
> actually checked by the hardware. You allocate it through
> 
>const char *str = iasprintf (...);
> 
> You use it like any 'const char *'.
> 
> You free it through
> 
>ifree (str);
> 
> not free (str). And when you attempt to write into it:
> 
>((char *) str)[0] = 'x';
> 
> it crashes.
> 
> The benefits I imagine:
>   - no worry about security flaws through multithreaded accesses,
>   - in large applications: verification that no part of the application
> is doing side effects that it shouldn't.
> 
> The implementation uses mmap() to create a read-only and a read-write
> view of the same memory area. The contents of the string is filled through
> the read-write view. All other operations are done through the read-only
> view, because the address os the string is the one of the read-only view.
> 
> This won't work on all platforms, e.g. HP-UX. But it will work on glibc
> systems, BSD, and Solaris, at least.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


immutable string type

2019-12-28 Thread Bruno Haible
Hi all,

Would you find it useful to have an immutable string type in gnulib?

In the simplest case, this would a 'const char *' where the 'const' is
actually checked by the hardware. You allocate it through

   const char *str = iasprintf (...);

You use it like any 'const char *'.

You free it through

   ifree (str);

not free (str). And when you attempt to write into it:

   ((char *) str)[0] = 'x';

it crashes.

The benefits I imagine:
  - no worry about security flaws through multithreaded accesses,
  - in large applications: verification that no part of the application
is doing side effects that it shouldn't.

The implementation uses mmap() to create a read-only and a read-write
view of the same memory area. The contents of the string is filled through
the read-write view. All other operations are done through the read-only
view, because the address os the string is the one of the read-only view.

This won't work on all platforms, e.g. HP-UX. But it will work on glibc
systems, BSD, and Solaris, at least.

Bruno