[Resending with different From: address.] It would have saved me some confusion if it was stated early that stringprep only takes Unicode is input to the processing.
Right now it says "text strings" up until it starts to discuss the output of the process, where it begins to talk about Unicode. One could get the impression that stringprep is a framework for preparing all kind of text strings into canonicalized Unicode strings, where it is in fact only about preparing Unicode text strings. Sentences like the following support that view (from Introduction): "these profiles will allow users to enter internationalized text strings in applications and have the highest chance of getting the content of the strings correct." That sentence doesn't reflect what will happen in typical internationalized systems (like, on _my_ machine) -- many systems enter internationalized text strings in charsets other than Unicode, and must convert it into Unicode before stringprep is useful. My $.2 solution: --- draft-hoffman-stringprep-03.txt.orig Mon May 27 21:38:42 2002 +++ draft-hoffman-stringprep-03.txt Mon May 27 22:08:02 2002 @@ -25,7 +25,7 @@ Abstract -This document describes a framework for preparing text strings in order +This document describes a framework for preparing Unicode text strings in order to increase the likelihood that string input and string comparison work in ways that make sense for typical users throughout the world. The stringprep protocol is useful for protocol identifier values, company @@ -92,7 +92,7 @@ behaviors that make it difficult to compare text in a consistent fashion. -This document specifies a framework of text processing rules. Other +This document specifies a framework of text processing rules for text in Unicode +format. Other protocols can create profiles of these rules; these profiles will allow users to enter internationalized text strings in applications and have the highest chance of getting the content of the strings correct. @@ -100,6 +100,13 @@ they think is the same string into two different input mechanisms, the strings should match on a character-by-character basis. +This framework does not describe how data is translated from other +characters into Unicode characters. Systems that uses non-Unicode +input methods must use a consistent way to transcode data into Unicode +before using this framework. In such systems, the transcoding +algorithm is a critical part of enabling secure and "correct" +operation of internationalized text strings. + In addition to helping string matching, profiles of stringprep can also exclude characters that should not normally appear in text that is used in the protocol. The profile can prevent such characters by changing the @@ -753,7 +760,10 @@ Because it is impossible to map similar-looking characters without a great deal of context such as knowing the fonts used, stringprep does nothing to map similar-looking characters together nor -to prohibit some characters because they look like others. +to prohibit some characters because they look like others. Nor does it +do anything to assure that any algorithms translating characters +from non-Unicode into Unicode produce the same output in all +implementations. 9. IANA Considerations
