Michel Suignard <[EMAIL PROTECTED]> wrote: > By excluding table C.2.1 from the StringPrep profile used by IDNA, the > ToASCII operation allows all C0 control codes and 7F in its default > mode (UseSTD3ASCIIRules flag unset).
As Paul said, there is no default mode. It is up to the application to decide whether to set UseSTD3ASCIIRules. Of course a library could make one mode or the other the default, and it is free to choose either mode to be the default. A library could also, I suppose, implement only one mode or the other, in which case I guess it would be an incomplete conformant implementation of ToASCII. > This is rather troublesome as these control codes, especially the 00 > value, may create all sorts of issues for run time libraries that use > zero as string terminator on input. If the programming environment customarily uses a string representation that does not allow embedded NULs to be represented, then it will be a moot point whether your ToASCII implementation handles NUL correctly, because it cannot be tested anyway. You can reasonably claim that it's not your IDN library that's incomplete, but the programming environment that's incomplete. Note that ToASCII and ToUnicode will never try to output an embedded NUL character if they never receive an embedded NUL character as input. For an example of a C library that handles embedded NULs, see GNU libidn: http://www.gnu.org/software/libidn/ > I understand the value of allowing all ASCII non control characters > but allowing by default the control characters in a ToASCII function > seems to open the door for all sorts of abuse and security risks. ToASCII and ToUnicode never introduce control characters; they output only those control characters that were already present in the input. If you consider control characters to be dangerous, then I would think you'd want to reject them as early as possible, before you even get to calling ToASCII or ToUnicode. One you have control-code-free strings, ToASCII and ToUnicode will preserve that property. > Would a library that by default only allow the range 20-7E be still > considered conformant? If you want to claim to have a complete implementation of ToASCII, then I think it needs to be possible to pass control characters through. But it needn't be the default mode. If you want to add your own AllowControlChars flag that is unset by default, I see no problem with that. The spec standardizes the function (input --> output), but not the interface. AMC
