Re: [Standards] Nodeprep question

2007-12-05 Thread Peter Saint-Andre
Peter Saint-Andre wrote:
 Joe Hildebrand wrote:
 On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote:

 Would it be helpful to post a little XEP about this or something?

 Perhaps we should just improve Wikipedia starting here:

 http://en.wikipedia.org/wiki/Unicode_normalization

 and keep adding more until it makes sense.

 
 Yes or that. :)
 
 Or we might want to make a special page on wiki.jabber.org that talks
 about normalization and our special characters from nodeprep. Less
 clutter that way.

Looking at it more...

#x22 ()

FF02;FF02;FF02;0022;0022; # FULLWIDTH QUOTATION MARK

#x26 ()

FE60;FE60;FE60;0026;0026; # SMALL AMPERSAND
FF06;FF06;FF06;0026;0026; # FULLWIDTH AMPERSAND

#x27 (')

FF07;FF07;FF07;0027;0027; # FULLWIDTH APOSTROPHE

#x2F (/)

2100;2100;2100;0061 002F 0063;0061 002F 0063; # ACCOUNT OF
2101;2101;2101;0061 002F 0073;0061 002F 0073; # ADDRESSED TO THE SUBJECT
2105;2105;2105;0063 002F 006F;0063 002F 006F; # CARE OF
2106;2106;2106;0063 002F 0075;0063 002F 0075; # CADA UNA
FF0F;FF0F;FF0F;002F;002F; # FULLWIDTH SOLIDUS

#x3A (:)

2A74;2A74;2A74;003A 003A 003D;003A 003A 003D; # DOUBLE COLON EQUAL
FE13;FE13;FE13;003A;003A; # PRESENTATION FORM FOR VERTICAL COLON
FE55;FE55;FE55;003A;003A; # SMALL COLON
FF1A;FF1A;FF1A;003A;003A; # FULLWIDTH COLON

#x3C ()

226E;226E;003C 0338;226E;003C 0338; # NOT LESS-THAN
FE64;FE64;FE64;003C;003C; # SMALL LESS-THAN SIGN
FF1C;FF1C;FF1C;003C;003C; # FULLWIDTH LESS-THAN SIGN

#x3E ()

226F;226F;003E 0338;226F;003E 0338; # NOT GREATER-THAN
FE65;FE65;FE65;003E;003E; # SMALL GREATER-THAN SIGN
FF1E;FF1E;FF1E;003E;003E; # FULLWIDTH GREATER-THAN SIGN

#x40 (@)

FE6B;FE6B;FE6B;0040;0040; # SMALL COMMERCIAL AT
FF20;FF20;FF20;0040;0040; # FULLWIDTH COMMERCIAL AT

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Nodeprep question

2007-12-05 Thread Peter Saint-Andre
Joe Hildebrand wrote:
 
 On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote:
 
 Would it be helpful to post a little XEP about this or something?
 
 
 Perhaps we should just improve Wikipedia starting here:
 
 http://en.wikipedia.org/wiki/Unicode_normalization
 
 and keep adding more until it makes sense.
 

Yes or that. :)

Or we might want to make a special page on wiki.jabber.org that talks
about normalization and our special characters from nodeprep. Less
clutter that way.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Nodeprep question

2007-12-05 Thread Joe Hildebrand


On Dec 5, 2007, at 12:47 AM, Peter Saint-Andre wrote:


Would it be helpful to post a little XEP about this or something?



Perhaps we should just improve Wikipedia starting here:

http://en.wikipedia.org/wiki/Unicode_normalization

and keep adding more until it makes sense.

--
Joe Hildebrand



Re: [Standards] Nodeprep question

2007-12-04 Thread Peter Saint-Andre
Mickaël Rémond wrote:
 Hello,
 
 Le 19 nov. 07 à 23:20, Tomasz Sterna a écrit :
 
 Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze:
 Nodeprep adds forbidden characters to usual stringprep tables. Among
 those characters we find / (47).

 IIUC the only reason that slash '/' character is forbidden in a node
 part is, that it is a resource delimiter.
 So encountering '/' in the JID means that the resource has just started.
 
 Yes, sure I understand the purpose of the limitation.
 
 Some libraries extend it to caracters such as c/o (8453). The rational
 behind that is that it contains a fraction.

 I think they do wrong.
 
 I finally found the document that can be really usefull to know which
 characters should be forbidden after normalization.
 For the record, you can check:
 http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
 
 It shows that KC normalization turns c/o character (8453 in decimal,
 2105 in hexa) in 0063 002F 006F
 It shows that it contains 002F (47 in decimal) which is a forbidden
 character.
 
 This is the resource I was looking for on Unicode normalization for as
 it explains precisely implied forbidden characters due to normalization.

Would it be helpful to post a little XEP about this or something?

Peter

-- 
Peter Saint-Andre
https://stpeter.im/




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Nodeprep question

2007-11-21 Thread Mickaël Rémond

Hello,

Le 19 nov. 07 à 23:20, Tomasz Sterna a écrit :


Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze:

Nodeprep adds forbidden characters to usual stringprep tables. Among
those characters we find / (47).


IIUC the only reason that slash '/' character is forbidden in a node
part is, that it is a resource delimiter.
So encountering '/' in the JID means that the resource has just  
started.


Yes, sure I understand the purpose of the limitation.

Some libraries extend it to caracters such as c/o (8453). The  
rational

behind that is that it contains a fraction.


I think they do wrong.



I finally found the document that can be really usefull to know which  
characters should be forbidden after normalization.

For the record, you can check:
http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt

It shows that KC normalization turns c/o character (8453 in decimal,  
2105 in hexa) in 0063 002F 006F
It shows that it contains 002F (47 in decimal) which is a forbidden  
character.


This is the resource I was looking for on Unicode normalization for as  
it explains precisely implied forbidden characters due to normalization.


--
Mickaël Rémond
 http://www.process-one.net/





Re: [Standards] Nodeprep question

2007-11-20 Thread Sergei Golovan
On 11/20/07, Tomasz Sterna [EMAIL PROTECTED] wrote:
 Dnia 20-11-2007, Wt o godzinie 09:37 +0300, Sergei Golovan pisze:
Some libraries extend it to caracters such as c/o (8453). The
  rational
behind that is that it contains a fraction.
  
   I think they do wrong.
 
  You forgot about Unicode normalization.

 So what?
 Resource separator is 0x2F slash, not any other 'normalized' slash.

Any checks for the forbidden characters in stringprepped string is
performed after unicode normalization (see sections 4 and 5 of RFC
3454).

-- 
Sergei Golovan


Re: [Standards] Nodeprep question

2007-11-20 Thread Tomasz Sterna
Dnia 20-11-2007, Wt o godzinie 09:37 +0300, Sergei Golovan pisze:
   Some libraries extend it to caracters such as c/o (8453). The
 rational
   behind that is that it contains a fraction.
 
  I think they do wrong.
 
 You forgot about Unicode normalization.

So what?
Resource separator is 0x2F slash, not any other 'normalized' slash.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



[Standards] Nodeprep question

2007-11-19 Thread Mickaël Rémond

Hello,

I am trying to find the rules (or the logic) behing nodeprep  
processing as done by many libraries.


Nodeprep adds forbidden characters to usual stringprep tables. Among  
those characters we find / (47).


Some libraries extend it to caracters such as c/o (8453). The rational  
behind that is that it contains a fraction.


Is there somewhere a complete list of such unicode characters that are  
implicitely forbidden by the following section of the Nodeprep RFC:


In addition, the following Unicode characters are also prohibited:

  #x22 ()

  #x26 ()

  #x27 (')

  #x2F (/)

  #x3A (:)

  #x3C ()

  #x3E ()

  #x40 (@)


I end up wondering why this other types of fractions are often  
accepted by nodeprep libraries:

1/4: 188
1/2: 189
3/4: 190
Fraction Slash: 8260
1/8: 8539
3/8: 8540
5/8: 8541
7/8: 8542
Division slash: 8725

Any pointers are appreciated.

Cheers :)

--
Mickaël Rémond
 http://www.process-one.net/





Re: [Standards] Nodeprep question

2007-11-19 Thread Tomasz Sterna
Dnia 19-11-2007, Pn o godzinie 22:27 +0100, Mickaël Rémond pisze:
 Nodeprep adds forbidden characters to usual stringprep tables. Among
 those characters we find / (47). 

IIUC the only reason that slash '/' character is forbidden in a node
part is, that it is a resource delimiter.
So encountering '/' in the JID means that the resource has just started.


 Some libraries extend it to caracters such as c/o (8453). The rational
 behind that is that it contains a fraction.

I think they do wrong.


-- 
  /\_./o__ Tomasz Sterna
 (/^/(_^^'  Xiaoka.com
._.(_.)_  XMPP: [EMAIL PROTECTED]



Re: [Standards] Nodeprep question

2007-11-19 Thread Sergei Golovan
On 11/20/07, Tomasz Sterna [EMAIL PROTECTED] wrote:

  Some libraries extend it to caracters such as c/o (8453). The rational
  behind that is that it contains a fraction.

 I think they do wrong.

You forgot about Unicode normalization.

-- 
Sergei Golovan


Re: [Standards] Nodeprep question

2007-11-19 Thread Sergei Golovan
On 11/20/07, Mickaël Rémond [EMAIL PROTECTED] wrote:

 I end up wondering why this other types of fractions are often accepted by
 nodeprep libraries:
 1/4: 188
 1/2: 189
 3/4: 190
 Fraction Slash: 8260

Fraction slash normalizes to itself, and all slashes in given
fractions normalize to Fraction Slash. Since this slash isn't used
for separating server and resource parts of a JID it's harmless (in
fact it's not so harmless as another source of phishing).

 1/8: 8539
 3/8: 8540
 5/8: 8541
 7/8: 8542
 Division slash: 8725

 Any pointers are appreciated.

All pointers are listed in RFC 3290 reference list. You need to read
at least RFC3454.

-- 
Sergei Golovan