Alex Shinn scripsit:

> On Tue, Jul 8, 2014 at 5:58 AM, Mario Domenech Goulart
> <mario.goul...@gmail.com> wrote:
> 
> It might help the discussion if we had a list of eggs which
> are known to break on UTF-8 inputs.

Indeed.

> > 1. Have <egg> and <egg>-utf8 variants.  Or, more generally, <egg> and
> >    <egg>-<encoding> variants.  That would turn our coop into a disgusting
> >    mess and would be a nightmare to egg authors.

I don't think the extra generality is required.  If the egg needs to be
able to correctly handle arbitrary characters, UTF-8 is the appropriate
internal representation.  If not, ASCII/Latin-1 is appropriate.  Anything
Eggs that do conversion will need to convert between arbitrary encodings
in byte vectors and UTF-8 strings.  So at worst, some eggs might need
to be split in two.  This is already the case for SRFI 13 and 14.

> The same approaches also apply to eggs needing the full
> numeric tower, though with UTF-8 there's less chance of
> breakage when mixing eggs which do and don't use the utf8 egg.

I would say that UTF-8 has *more* chance of causing undetected
breakage, because UTF-8 strings have an interpretation as core
strings, whereas bignums, ratnums, compnums etc. don't look
like numbers to the core, and errors will be thrown.

-- 
John Cowan          http://www.ccil.org/~cowan        co...@ccil.org
Police in many lands are now complaining that local arrestees are insisting
on having their Miranda rights read to them, just like perps in American TV
cop shows.  When it's explained to them that they are in a different country,
where those rights do not exist, they become outraged.  --Neal Stephenson

_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to