On 3 Oct 2012, at 08:09, Wolfgang Lux wrote: > Richard Frith-Macdonald wrote: > >> We could probably adapt your patch to use precision as string lengh in those >> cases where it will work, but you can't catch all cases that way ... so >> maybe it's better if people find out as soon as possible that c-strings have >> to be nul terminated. >> >> Sorry about this ... but it's a behavior inherited from the C stdio library >> and posix etc standards. My own feeling is that format strings *ought* to >> provide some way of working with unterminated strings, but they just don't, >> so you have to copy the data into a big enough buffer, add the nul >> terminator, and use that buffer intead of the original data :-( > > I don't think your description of the standards is correct. My copy of the > ANSI C'99 standard has this to say on the %s format specifier: > "If the precision is specified, no more than that many characters are > written. If the precision is not specified or is greater than the size of the > array, the array shall contain a null character. > With that specification, I'd say that Chris's code is correct. He uses an > array containing 50 bytes and uses precision 50, so the array shouldn't > require a NULL terminator.
Oh, that's a different section of the documentationm (I was reading the bit dealing with precision, and I just found the bit you quote under the 's' flag). Which would mean there are apparent inconsistencies ... so I looked further (specifically at recent xopen documentation ... which really ought to be authoritative for modern software). And ... that's different again ... the xopen docs make it clear that they are talking about *bytes* (so the current implementation is wrong) where other documentation talks about characters: The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte. If an l (ell) qualifier is present, the argument shall be a pointer to an array of type wchar_t. Wide characters from the array shall be converted to characters (each as if by a call to the wcrtomb() function, with the conversion state described by an mbstate_t object initialized to zero before the first wide character is converted) up to and including a terminating null wide character. The resulting characters shall be written up to (but not including) the terminating null character (byte). If no precision is specified, the application shall ensure that the array contains a null wide character. If a precision is specified, no more than that many characters (bytes) shall be written (including shift sequences, if any), and the array shall contain a null wide character if, to equal the character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case shall a partial character be written. Interestingly, they are very specific about saying that the precision is a number of bytes rather than a number of characters (quite different from the older documentation I was looking at before) even in the case where the output is wide characters. They even mention omitting the last character if it's a multibyte one and not all bytes would be permitted by the precision. Maybe we should update the code to try to match the modern standard, but ... in the context of GSFormat adopting a byte-based output precision would be very counter-intuitive since an NSString deals with UTF-16 and everyone expects the precision to give a number of 16bit characters in the resulting NSString object. So I'm not sure what to do ... the C standards have changed from working with characters to working with bytes (which is good), but we can't simply adopt that because it would break OSX compatibility (and people's reasonable expectations). Perhaps what we need is what I suggested (as a complex/inefficient option) in an earlier email ... to parse the input string character by character and treat the precision as a limit on the number of characters we read from it. Perhaps tests on OSX to reverse-engineer Apple's behavior are our best bet. _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev