Re: [Standards] Nodeprep question
Peter Saint-Andre wrote: Joe Hildebrand wrote: On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote: Would it be helpful to post a little XEP about this or something? Perhaps we should just improve Wikipedia starting here: http://en.wikipedia.org/wiki/Unicode_normalization and keep adding more until it makes sense. Yes or that. :) Or we might want to make a special page on wiki.jabber.org that talks about normalization and our special characters from nodeprep. Less clutter that way. Looking at it more... #x22 () FF02;FF02;FF02;0022;0022; # FULLWIDTH QUOTATION MARK #x26 () FE60;FE60;FE60;0026;0026; # SMALL AMPERSAND FF06;FF06;FF06;0026;0026; # FULLWIDTH AMPERSAND #x27 (') FF07;FF07;FF07;0027;0027; # FULLWIDTH APOSTROPHE #x2F (/) 2100;2100;2100;0061 002F 0063;0061 002F 0063; # ACCOUNT OF 2101;2101;2101;0061 002F 0073;0061 002F 0073; # ADDRESSED TO THE SUBJECT 2105;2105;2105;0063 002F 006F;0063 002F 006F; # CARE OF 2106;2106;2106;0063 002F 0075;0063 002F 0075; # CADA UNA FF0F;FF0F;FF0F;002F;002F; # FULLWIDTH SOLIDUS #x3A (:) 2A74;2A74;2A74;003A 003A 003D;003A 003A 003D; # DOUBLE COLON EQUAL FE13;FE13;FE13;003A;003A; # PRESENTATION FORM FOR VERTICAL COLON FE55;FE55;FE55;003A;003A; # SMALL COLON FF1A;FF1A;FF1A;003A;003A; # FULLWIDTH COLON #x3C () 226E;226E;003C 0338;226E;003C 0338; # NOT LESS-THAN FE64;FE64;FE64;003C;003C; # SMALL LESS-THAN SIGN FF1C;FF1C;FF1C;003C;003C; # FULLWIDTH LESS-THAN SIGN #x3E () 226F;226F;003E 0338;226F;003E 0338; # NOT GREATER-THAN FE65;FE65;FE65;003E;003E; # SMALL GREATER-THAN SIGN FF1E;FF1E;FF1E;003E;003E; # FULLWIDTH GREATER-THAN SIGN #x40 (@) FE6B;FE6B;FE6B;0040;0040; # SMALL COMMERCIAL AT FF20;FF20;FF20;0040;0040; # FULLWIDTH COMMERCIAL AT Peter -- Peter Saint-Andre https://stpeter.im/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] Nodeprep question
Joe Hildebrand wrote: On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote: Would it be helpful to post a little XEP about this or something? Perhaps we should just improve Wikipedia starting here: http://en.wikipedia.org/wiki/Unicode_normalization and keep adding more until it makes sense. Yes or that. :) Or we might want to make a special page on wiki.jabber.org that talks about normalization and our special characters from nodeprep. Less clutter that way. Peter -- Peter Saint-Andre https://stpeter.im/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] Nodeprep question
On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote: Would it be helpful to post a little XEP about this or something? Perhaps we should just improve Wikipedia starting here: http://en.wikipedia.org/wiki/Unicode_normalization and keep adding more until it makes sense. -- Joe Hildebrand
Re: [Standards] Nodeprep question
Mickaël Rémond wrote: Hello, Le 19 nov. 07 à 23:20, Tomasz Sterna a écrit : Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze: Nodeprep adds forbidden characters to usual stringprep tables. Among those characters we find / (47). IIUC the only reason that slash '/' character is forbidden in a node part is, that it is a resource delimiter. So encountering '/' in the JID means that the resource has just started. Yes, sure I understand the purpose of the limitation. Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. I finally found the document that can be really usefull to know which characters should be forbidden after normalization. For the record, you can check: http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt It shows that KC normalization turns c/o character (8453 in decimal, 2105 in hexa) in 0063 002F 006F It shows that it contains 002F (47 in decimal) which is a forbidden character. This is the resource I was looking for on Unicode normalization for as it explains precisely implied forbidden characters due to normalization. Would it be helpful to post a little XEP about this or something? Peter -- Peter Saint-Andre https://stpeter.im/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] Nodeprep question
Hello, Le 19 nov. 07 à 23:20, Tomasz Sterna a écrit : Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze: Nodeprep adds forbidden characters to usual stringprep tables. Among those characters we find / (47). IIUC the only reason that slash '/' character is forbidden in a node part is, that it is a resource delimiter. So encountering '/' in the JID means that the resource has just started. Yes, sure I understand the purpose of the limitation. Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. I finally found the document that can be really usefull to know which characters should be forbidden after normalization. For the record, you can check: http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt It shows that KC normalization turns c/o character (8453 in decimal, 2105 in hexa) in 0063 002F 006F It shows that it contains 002F (47 in decimal) which is a forbidden character. This is the resource I was looking for on Unicode normalization for as it explains precisely implied forbidden characters due to normalization. -- Mickaël Rémond http://www.process-one.net/
Re: [Standards] Nodeprep question
On 11/20/07, Tomasz Sterna [EMAIL PROTECTED] wrote: Dnia 20-11-2007, Wt o godzinie 09:37 +0300, Sergei Golovan pisze: Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. You forgot about Unicode normalization. So what? Resource separator is 0x2F slash, not any other 'normalized' slash. Any checks for the forbidden characters in stringprepped string is performed after unicode normalization (see sections 4 and 5 of RFC 3454). -- Sergei Golovan
Re: [Standards] Nodeprep question
Dnia 20-11-2007, Wt o godzinie 09:37 +0300, Sergei Golovan pisze: Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. You forgot about Unicode normalization. So what? Resource separator is 0x2F slash, not any other 'normalized' slash. -- /\_./o__ Tomasz Sterna (/^/(_^^' Xiaoka.com ._.(_.)_ XMPP: [EMAIL PROTECTED]
[Standards] Nodeprep question
Hello, I am trying to find the rules (or the logic) behing nodeprep processing as done by many libraries. Nodeprep adds forbidden characters to usual stringprep tables. Among those characters we find / (47). Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. Is there somewhere a complete list of such unicode characters that are implicitely forbidden by the following section of the Nodeprep RFC: In addition, the following Unicode characters are also prohibited: #x22 () #x26 () #x27 (') #x2F (/) #x3A (:) #x3C () #x3E () #x40 (@) I end up wondering why this other types of fractions are often accepted by nodeprep libraries: 1/4: 188 1/2: 189 3/4: 190 Fraction Slash: 8260 1/8: 8539 3/8: 8540 5/8: 8541 7/8: 8542 Division slash: 8725 Any pointers are appreciated. Cheers :) -- Mickaël Rémond http://www.process-one.net/
Re: [Standards] Nodeprep question
Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze: Nodeprep adds forbidden characters to usual stringprep tables. Among those characters we find / (47). IIUC the only reason that slash '/' character is forbidden in a node part is, that it is a resource delimiter. So encountering '/' in the JID means that the resource has just started. Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. -- /\_./o__ Tomasz Sterna (/^/(_^^' Xiaoka.com ._.(_.)_ XMPP: [EMAIL PROTECTED]
Re: [Standards] Nodeprep question
On 11/20/07, Tomasz Sterna [EMAIL PROTECTED] wrote: Some libraries extend it to caracters such as c/o (8453). The rational behind that is that it contains a fraction. I think they do wrong. You forgot about Unicode normalization. -- Sergei Golovan
Re: [Standards] Nodeprep question
On 11/20/07, Mickaël Rémond [EMAIL PROTECTED] wrote: I end up wondering why this other types of fractions are often accepted by nodeprep libraries: 1/4: 188 1/2: 189 3/4: 190 Fraction Slash: 8260 Fraction slash normalizes to itself, and all slashes in given fractions normalize to Fraction Slash. Since this slash isn't used for separating server and resource parts of a JID it's harmless (in fact it's not so harmless as another source of phishing). 1/8: 8539 3/8: 8540 5/8: 8541 7/8: 8542 Division slash: 8725 Any pointers are appreciated. All pointers are listed in RFC 3290 reference list. You need to read at least RFC3454. -- Sergei Golovan