Le 28/03/16 12:23, Emmanuel Lécharny a écrit : > Hi guys, > > I'm now working on the PrepareString part. It need a bit of work, as we > don't correctly handle spaces. We also have to remove the escaping we do > there. > > That is what I'm working on atm. A bit more of what's going on...
The String Preparation is specified in RFC 4518. It's a prcoess that involves 6 steps : 1) Transcode 2) Map 3) Normalize 4) Prohibit 5) Check bidi 6) Insignificant Character Handling The first phase is just a transformation of a byte[] to a String, which is done through a call to Strings.utf8ToString( bytes ). The good thing is that Java stores the String using Unicode. The Map phase is a bit more complex, as we have to go through all the chars, and depending on the fact that the Syntax is case sensitive or not, it will transform the char to some others so that theyc an be compared safely. There is a long list of special chars to handle (around 1000). The Normalize phase consist on a transformation of the String to a String respecting the NFKC form, described here : http://www.unicode.org/reports/tr15/tr15-22.html#Specification. This is also implemented in Java, so we use the Normalizer.normalize( mapped, Normalizer.Form.NFKC ) method, if necessary. The Prohibit phase is about checking every char to check if they are all valid. There are a few hundreds prohibited chars. The Check Bidi phase is about dealing with bi-directional characters (arabic, for instance). "Bidirectional characters are ignored." says the RFC, so be it :-) The insignificant character handling phase is the last one, where we remove useless spaces or some other specific chars, in various type of values. In order to speddup the process, which is quite expensive, the idea is to assume the value to be ASCII first. In this case, the Normalize, Prohibit and most of the Map phases can be zapped. We can safely design a simplest method that will work fast for all those phases, throwing an exception when we meet a non-ASCII char. If so, we fail over to the more complex process that involves all the phases and the various String creations. Somehow, this is the same process than what we have for DNs : FastDnParser and ComplexDnParser. One thing thwat will be completely removed from the prepareString implementation is the escaping we currently (wrongly) do. It is the not the place to do that. Bottom line, this String preparation will completely replace the Normalizers we are using. They are useless parts of our schema. last, not least, as this is a COSTLY operation, this function will only be called when needed (ie for AT we know are used in Index, or in teh DN's RDN, or when a Filter uses it). That will save a hell lot of CPU. The consequences is that most of the values we receive or send will *not* we converted to String, we will just keep the byte[] value. That is the main source of CPU save. Expect the server and teh API to be kind of impacted :-)