So for the record, after a couple of hours working on it tonite, I get the DeepTrimToLowerNormalizer() working fine, with tests passing.
I was also able to improve the performances of the beast : from 20 seconds to normalize 10 000 000 or String like "xs crvtbynU Jikl7897790", down to 4.3s. I just assumed that most of the time, we will deal with chars between 0x00 and 0x7F, and wrote a specific function for that. If we have chars above 0x7F, then an exception is thrown and we fell back to the complexe process, which will then take 47s instead of 20s. So this is a balance : - we have an implementation that covers all the chars, and takes 20s for 10M Strings - we have an implementation that tries to process the String if chars are in [0c00, 0x7F] and takes 4.3 s for 10M Strings, but takes 47 seconds if we have a char outside this range. Beside the obvious gain, there is another reason why I wanted to do that : processing IA5String values will benefit from this separation, and that covers numerous AttributeTypes (like mail, homeDirectory, memberUid, krb5principalname, krb5Realmname, and a lot more. wdyt ? Going for an average of 20s no matter what, or accepting a huge penalty when the String does not contain ASCII chars ?
