On Wednesday, April 24, 2002, at 09:25 , Bart Schuller wrote: > Hello, > > The cool Encoding support in 5.8 to be enables me to properly solve a > very common task: making HTML entities out of utf-8 data. > > I generated a ucm file with entries like this: > > <U00A0> \x26\x6E\x62\x73\x70\x3B |0 # nbsp > > The resulting Encode::HTMLEntities encoding works perfectly. However, I > want it to do more. > > Not every unicode character has a corresponding entity. Unknown ones can > be encoded like €, so I would like my Encoding to use a simple > function as a fallback. This proves hard. With CHECK == Encode::FB_WARN > it looks like the whole string is left untouched, so my plan to just > substr() off the first character, handle it by hand and repeat is not > going to work. > > I'd be very happy with a CHECK mode which would allow me to handle a > single problematic character in perl. Having to find it in a longer > string is very hard in this case, because it's every character > 0x{7f} > which is not in my .ucm file.
As a matter of fact, I was thinking of adding FB_HTMLENT or something like that. It seems trivial; Unless jhi whips me for the sin of Feeping Creaturism, I'll do so. CAVEAT; This will be done via fallback so &<>" will not turn into entities! Dan the Encode Maintainer