Le 08/10/15 20:27, Emmanuel Lécharny a écrit : > One mor ething I forgot to mention... > > Escaping a space can be done in many ways : > > - '\ ' > - '\20' > - '#20' (unlikely, but this is possible) > > All those three kind of escaped space must be handled, so that the space > is kept after the prepareString processing. > There is something I'm not sure of : what is considered as an escaped space. Currently, in the RFC, only '\ ' is, none of the other chars that are translated to a space (and there are many !) will be considered as escaped spaces.
If I interpret the RFC strictly, \20 ans #20 are not supposed to be considered as escaped spaces. OTOH, would some of those representation being seen in a String, this should probably be considered as a will to see those chars being escaped and present after the normalization... There are more chars that are going to be translated to a space : 0x09 to 0x0D will be tranlsated to 0x20, so is 0x85, 0xA0, 0x2000 to 0x200A, 0x2028 and 0x2029, 0x202F, 0x205F and 0x3000. That means any of those chars will be removed if they are at the beginning or at the end of the String, or if we have 2 contiguous chars that translate to a space. And every hexpair like \09, \0A, \0B, \0C, \0D, \85, \A0 are going to be seen as spaces, too. At this point, I guess what is important is to be consistant. As soon as we respect one form of escaped space (ie '\ ' only), and ignore all the others, then operations like comparison of values, assuming they have been both normalized the same way, should lead to the same result. The only risk is that we may have some different values being compared and resorting to an equality, when it should not. OTOH, we are talking about use case that are very unlikely to happen... I'm not going to spend days fixing corner cases that will probably never be faced in real life, when we have more urgent bugs to fix ;-)
