Re: string types
On 12/28/19 12:44 PM, ag wrote: > is your opininion that this is adequate? > > typedef ptrdiff_t msize_t (m for memory here) Yes, something like that. dfa.c calls this type 'idx_t', which is a couple of characters shorter.
Re: string types
Hi Paul, On Sat, Dec 28, at 10:28 Paul Eggert wrote: > > Based on the above assumptions this can be extended. First instead of > > size_t to > > return ssize_t, so functions can return -1 and set errno accordingly. > > It's better to use ptrdiff_t for this sort of thing, since it's hardwired into > the C language (you can't do any better than ptrdiff_t anyway, if you use > pointer subtraction), whereas ssize_t is merely in POSIX and is narrower than > ptrdiff_t on some (obsolete?) platforms. So, let's say we designed this thing without obligating to the past and thinking for the next hundred years (of course with the current knowledge and to lessons from the past), and wanted to make it work with malloc and string type functions, as best it can be done and without worries for overflows and unsigned divisions and all this kind of confusing things that hunts us altogether after so many years that things should have been settled by now... is your opininion that this is adequate? typedef ptrdiff_t msize_t (m for memory here) > > #define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4)) > > #define MEM_IS_INT_OVERFLOW(nmemb, ssize) \ > > (((nmemb) >= MUL_NO_OVERFLOW || (ssize) >= MUL_NO_OVERFLOW) && \ > > (nmemb) > 0 && SIZE_MAX / (nmemb) < (ssize)) > > Ouch. That code is not good. An unsigned division at runtime to do memory > allocation? Gnulib does better than that already. Also, Glibc has some code in > this area that we could migrate into Gnulib, that could be better yet. Sorry, i don't have time to do it right now - as i just escaped from a snow-storm - but i will check this for atleast not to spread misleading information (is quite possible my fault here), so thanks for your comment. By the way Paul and since i'm self taught by practical experience kind of human being and joking with zoi here said that at least my teacher is a hall of famer in the computing history. Isn't this life great! So true this is also a school for free afterall. My Honor, Αγαθοκλής
Re: string types
On 12/28/19 5:14 AM, ag wrote: > - PTRDIFF_MAX is at least INT_MAX and at most SIZE_MAX > (PTRDIFF_MAX is INT_MAX in 32bit) PTRDIFF_MAX can exceed SIZE_MAX, in the sense that POSIX and C allows it and it could be useful on 32-bit platforms for size_t to be 32 bits and ptrdiff_t to be 64 bits. Although I don't know of any platforms doing things that way, I prefer not to assume that PTRDIFF_MAX <= SIZE_MAX so as to allow for the possibility. > - SIZE_MAX as (size_t) (-1) > > - ssize_t (s means signed?) can be as big as SIZE_MAX? and SSIZE_MAX equals > to > SIZE_MAX? ssize_t can be either narrower or wider than size_t, according to POSIX. Historically ssize_t was 32 bits and size_t 64 bits on some platforms, and though I don't know of any current platforms doing that it's easy to not make assumptions here. > Based on the above assumptions this can be extended. First instead of size_t > to > return ssize_t, so functions can return -1 and set errno accordingly. It's better to use ptrdiff_t for this sort of thing, since it's hardwired into the C language (you can't do any better than ptrdiff_t anyway, if you use pointer subtraction), whereas ssize_t is merely in POSIX and is narrower than ptrdiff_t on some (obsolete?) platforms. > In my humble opinion there is also the choise to choose reallocarray() from > OpenBSD, > which always checks for integer overflows with the following way: > > #define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4)) > #define MEM_IS_INT_OVERFLOW(nmemb, ssize) \ > (((nmemb) >= MUL_NO_OVERFLOW || (ssize) >= MUL_NO_OVERFLOW) && \ > (nmemb) > 0 && SIZE_MAX / (nmemb) < (ssize)) Ouch. That code is not good. An unsigned division at runtime to do memory allocation? Gnulib does better than that already. Also, Glibc has some code in this area that we could migrate into Gnulib, that could be better yet.
Re: immutable string type
On 12/28/19 3:17 AM, Bruno Haible wrote: > Would you find it useful to have an immutable string type in gnulib? Sounds useful. I assume you plan to generalize it to any type; something like this: p = immalloc (sizeof *p); p->x = whatever; p->y = something; ... imfreeze (p, sizeof *p); [no changes to *p allowed here] imfree (p); imfreeze can be a no-op unless debugging. Oh, I see that Tim Rühsen has the same idea. I prefer the prefix "im" to "i" for immutable, as plain "i" could stand for a lot of things. (Plus, "imasprintf" rolls off the tongue better. :-)
Re: immutable string type
On Sat, Dec 28, 2019 at 3:17 AM Bruno Haible wrote: > Would you find it useful to have an immutable string type in gnulib? I like this idea! Actually the idea of having primitives for allocating and filling data and then getting read-only access to it is a good one in general. I haven't worked with anything like this before, so perhaps the real value of it won't be apparent without some experience. This sort of thing won't work on systems with virtually indexed caches, at least not without inserting explicit flushes. I don't know whether virtually indexed caches still exist in the wild.
Re: string types
Hi, On Fri, Dec 27, at 11:51 Bruno Haible wrote: > - providing primitives for string allocation reduces the amount of buffer >overflow bugs that otherwise occur in this area. [1] [1] Re: string allocation https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html Thanks, i remember this thread, though at the time i couldn't understand some bits. >> ag wrote: > > ... to the actual algorithm (usually conditions that can or can't be met). > That is the idea behind the container types (list, map) in gnulib. However, I > don't > see how to reasonably transpose this principle to string types. Ok, let us try, so allow me to summarize with some of (my unqualified) assumptions (please correct): - glibc malloc can request at most PTRDIFF_MAX - PTRDIFF_MAX is at least INT_MAX and at most SIZE_MAX (PTRDIFF_MAX is INT_MAX in 32bit) - SIZE_MAX as (size_t) (-1) - ssize_t (s means signed?) can be as big as SIZE_MAX? and SSIZE_MAX equals to SIZE_MAX? - the returned value of the *printf family of functions dictates their limits/range, as they return an int, this can be as INT_MAX mostly Some concerns: - truncation errors should be caught - memory checkers should catch overflows - as since there is a "risk"¹ that someone has to take at some point (either the programmer or the underlying library code (as strdup() does)), the designed interface should lower those risks There is a proposal from Eric Sanchis to Austin group at 9 Jun 2016, for a String copy/concatenation interface, that his functions have both the allocated size and the number of bytes to be written as arguments (some i will inline them here, since i was unable to find his mail in the Posix mailing list archives). I used this as a basis (as it was rather intuitive and perfectly suited for C), to implement my own str_cp, which goes like this: size_t str_cp (char *dest, size_t dest_len, const char *src, size_t nelem) { size_t num = (nelem > (dest_len - 1) ? dest_len - 1 : nelem); size_t len = (NULL is src ? 0 : byte_cp (dest, src, num)); dest[len] = '\0'; return len; } size_t byte_cp (char *dest, const char *src, size_t nelem) { const char *sp = src; size_t len = 0; while (len < nelem and *sp) { dest[len] = *sp++; len++; } return len; } Of course it can be done better, but here we have a low level function (byte_cp), that does only the required checks and which returns the actual bytes written to `dest', while str_cp checks if `src' is NULL and if `nelem' is bigger than `dest_len' (if it is then copies at least `dest_len' - 1). It returns 0 or the actual written bytes. Since this returns the actual bytes written, it is up to the programmer to check if truncation happened, but there is no possibility to copy more than `dest_len' - 1. Based on the above assumptions this can be extended. First instead of size_t to return ssize_t, so functions can return -1 and set errno accordingly. Eric Sanchis in his proposal does it a bit different because in his functions adds an extra argument as size_t, that uses this to control the behavior of the function (what it will do in the case that destination length is less than source len). He uses an int as a returned value which either is 0/1 on succesful operation, the following: #define OKNOTRUNC 0 /* copy/concatenation performed without truncation */ #define OKTRUNC1 /* copy/concatenation performed with truncation */ And below is the extra information passed as fifth argument: #define TRUNC 0 /* truncation allowed */ #define NOTRUNC1 /* truncation not allowed */ In the case of an error, returns > 0 which is either: #define EDSTPAR -1 /* Error : bad dst parameters */ #define ESRCPAR -2 /* Error : bad src parameters */ #define EMODPAR -3 /* Error : bad mode parameter */ #define ETRUNC-4 /* Error : not enough space to copy/concatenate and truncation not allowed */ Now combining all this and if the assumptions are correct, gnulib can return ssize_t and uses this to make it's functions to work up to SIZE_MAX and uses either Eric's interface or to set errno accordingly. But to me a function call like: str_cp (dest, memsize_of_dest, src, memsize_of_dest - 1) is quite common C's way to do things, plus we have a way to catch truncation and not to go out of bounds at the same time. Of course such operations are tied with malloc(). I've read the gnulib document yesteday and i saw that gnulib wraps malloc() with a function that (quite logically) aborts execution and even allows to set a callback function. In my humble opinion there is also the choise to choose reallocarray() from OpenBSD, which always checks for integer overflows with the following way: #define MUL_NO_OVERFLOW ((size_t) 1 << (sizeof (size_t) * 4)) #define
Re: immutable string type
Hi Bruno, On 28.12.19 12:17, Bruno Haible wrote: > Would you find it useful to have an immutable string type in gnulib? The idea is good in fact had similar thoughts/needs a while ago. IMO, the use cases are mostly in the testing area (especially fuzzing). As a more general approach, a function that switches already allocated memory into read-only memory would be handy. Like in - m = malloc() - initialize m with some data - if in debug mode: call memmap_readonly(m) - from this point on 'm' is read-only and a write leads to a segmentation fault. - ... - free(m) Maybe it would best be integrated into glibc ? Functions like iasprintf could then be built around existing functions as needed, e.g. as static inline or as macro. > In the simplest case, this would a 'const char *' where the 'const' is > actually checked by the hardware. You allocate it through > >const char *str = iasprintf (...); > > You use it like any 'const char *'. > > You free it through > >ifree (str); > > not free (str). And when you attempt to write into it: > >((char *) str)[0] = 'x'; > > it crashes. > > The benefits I imagine: > - no worry about security flaws through multithreaded accesses, > - in large applications: verification that no part of the application > is doing side effects that it shouldn't. > > The implementation uses mmap() to create a read-only and a read-write > view of the same memory area. The contents of the string is filled through > the read-write view. All other operations are done through the read-only > view, because the address os the string is the one of the read-only view. > > This won't work on all platforms, e.g. HP-UX. But it will work on glibc > systems, BSD, and Solaris, at least. Regards, Tim signature.asc Description: OpenPGP digital signature
immutable string type
Hi all, Would you find it useful to have an immutable string type in gnulib? In the simplest case, this would a 'const char *' where the 'const' is actually checked by the hardware. You allocate it through const char *str = iasprintf (...); You use it like any 'const char *'. You free it through ifree (str); not free (str). And when you attempt to write into it: ((char *) str)[0] = 'x'; it crashes. The benefits I imagine: - no worry about security flaws through multithreaded accesses, - in large applications: verification that no part of the application is doing side effects that it shouldn't. The implementation uses mmap() to create a read-only and a read-write view of the same memory area. The contents of the string is filled through the read-write view. All other operations are done through the read-only view, because the address os the string is the one of the read-only view. This won't work on all platforms, e.g. HP-UX. But it will work on glibc systems, BSD, and Solaris, at least. Bruno