Re: Practical problems with custom .ucm based encoding

Dan Kogai Wed, 24 Apr 2002 05:15:23 -0700

On Wednesday, April 24, 2002, at 09:25 , Bart Schuller wrote:
> Hello,
>
> The cool Encoding support in 5.8 to be enables me to properly solve a
> very common task: making HTML entities out of utf-8 data.
>
> I generated a ucm file with entries like this:
>
>     <U00A0> \x26\x6E\x62\x73\x70\x3B                 |0 # nbsp
>
> The resulting Encode::HTMLEntities encoding works perfectly. However, I
> want it to do more.
>
> Not every unicode character has a corresponding entity. Unknown ones can
> be encoded like &#8364;, so I would like my Encoding to use a simple
> function as a fallback. This proves hard. With CHECK == Encode::FB_WARN
> it looks like the whole string is left untouched, so my plan to just
> substr() off the first character, handle it by hand and repeat is not
> going to work.
>
> I'd be very happy with a CHECK mode which would allow me to handle a
> single problematic character in perl. Having to find it in a longer
> string is very hard in this case, because it's every character > 0x{7f}
> which is not in my .ucm file.


As a matter of fact, I was thinking of adding FB_HTMLENT or something 
like that.  It seems trivial;  Unless jhi whips me for the sin of 
Feeping Creaturism, I'll do so.

CAVEAT;  This will be done via fallback so &<>" will not turn into 
entities!

Dan the Encode Maintainer

Re: Practical problems with custom .ucm based encoding

Reply via email to