On Fri, Sep 13, 2002 at 11:55:50AM +0100, Wez Furlong wrote: > Where mapping is one of "upper", "lower" or "title" (since unicode > knows about title case). This function would then be able to > internally convert to unicode, apply the appropriate transformation > and then convert back to the original encoding. [...] > Until we make the whole of PHP multi-byte aware, I think mbstring is > the best place for this functionality. [...] > I'm tempted to volunteer for this, if you don't mind supplying that > unicode manipulation code (I'm fairly familiar with the mbstring > internals).
Sounds good. The code I have in mind is currently in a library (the license should not be a problem), and is currently used in OpenLDAP and also some other applications. It consiste of code to parse the Unicode tables (that specifies what upper, lower and title of the different characters are) and create multiple data files, we would only need case.dat for this. This should be done once. Then there is code to load the data file, and do the actual case folding. I think it would be best to distribute the Unicode text file with PHP, build the comp.dat file on make, and install comp.dat on make install. We only need to do this when this extension is enabled of course. There are a lot of other Unicode functions in this library if it's interesting. I said that I had code, I wrote some of the code, but most is not mine. I think it makes sense to include just the code we want in PHP, rather than relying on this library being installed. See http://crl.nmsu.edu/~mleisher/ucdata.html and the documentation link near the top. I can contribute some to this as well perhaps, but it's in good hands if you do it (: Stig -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php