On Fri, Sep 13, 2002 at 11:55:50AM +0100, Wez Furlong wrote:
> Where mapping is one of "upper", "lower" or "title" (since unicode
> knows about title case).  This function would then be able to
> internally convert to unicode, apply the appropriate transformation
> and then convert back to the original encoding.
[...]
> Until we make the whole of PHP multi-byte aware, I think mbstring is
> the best place for this functionality.
[...]
> I'm tempted to volunteer for this, if you don't mind supplying that
> unicode manipulation code (I'm fairly familiar with the mbstring
> internals).

Sounds good. The code I have in mind is currently in a library (the
license should not be a problem), and is currently used in OpenLDAP
and also some other applications. It consiste of code to parse the
Unicode tables (that specifies what upper, lower and title of the
different characters are) and create multiple data files, we would
only need case.dat for this. This should be done once. Then there is
code to load the data file, and do the actual case folding. I think
it would be best to distribute the Unicode text file with PHP, build
the comp.dat file on make, and install comp.dat on make install. We
only need to do this when this extension is enabled of course. There
are a lot of other Unicode functions in this library if it's
interesting. I said that I had code, I wrote some of the code, but
most is not mine. I think it makes sense to include just the code we
want in PHP, rather than relying on this library being installed.
See http://crl.nmsu.edu/~mleisher/ucdata.html and the documentation
link near the top.

I can contribute some to this as well perhaps, but it's in good hands
if you do it (:

Stig

-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to