Am Thu, 09 Jan 2014 15:51:36 -0500
schrieb Jerry <jlqu...@optonline.net>:

> Marco Leise <marco.le...@gmx.de> writes:
> 
> > Am Thu, 09 Jan 2014 15:20:13 +0000
> > schrieb "John Colvin" <john.loughran.col...@gmail.com>:
> >
> 
> > The point about graphemes is good. D's functions still stop
> > mid-way. From UTF-8 you can iterate UTF-32 code points, but
> > grapheme clusters are the new characters. I.e. the basic need
> > to iterate Unicode _characters_ is not supported!
> > I cannot even come up with use cases for working with code
> > points and think they are a conceptual black hole. Something
> > carried over from a time when grapheme clusters didn't exist.
> 
> Actually, you can do tons of NLP without grapheme clusters.  If you're
> paranoid, you standardize on a specific Unicode normalization first.
> 
> You can probably get a bit better results by paying attention to
> clusters, but I suspect it will be a marginal improvement.
> 
> That said, I do agree with the OP that the string API is currently more
> complex to understand than I'd like.  However, it's significantly easier
> to use than what's in standard C++ for anything beyond ascii.
> 
> Jerry

Sorry, I got confused with the Unicode definitions. I see now
that a grapheme cluster is e.g. \r\n. What I really meant is
that Phobos needs to support graphemes. But seeing that
monsters like this exist: n͠g, I don't even know if this is
one character or two, but right now Phobos sees it as three
characters.

-- 
Marco

Reply via email to