Re: Formal Review of std.uni

Dmitry Olshansky Sun, 12 May 2013 12:30:26 -0700

30-Apr-2013 23:17, Jonathan M Davis пишет:

On Tuesday, April 30, 2013 15:13:14 Dmitry Olshansky wrote:

Unicode --> can't be done on character by character basis


Sure it can. It operates on dchar.


Getting back to this.

Sure it can't - I'd hate to break the illusion but the keyword is e.g.Unicode Case Folding. Another one is Combining Character sequence.

So, with how it's been, std.uni would only be operating on dchars, and putting
a function in there which operated on strings wouldn't make any sense. Maybe
that doesn't work if you've done a bunch of grapheme stuff, and things will
have to be adjusted, but it would be a definite shift to put anything in
std.uni which operated on strings, and I think that it would need some definite
justification (and there's a good chance that I'd be inclined to argue that it
should still go in std.string, possibly using some internal modules if
necessary).

Justification is that we'd rather have exactly one module dealing with abunch of Unicode data arranged into intricate tables.

Strictly speaking I'd abolish any Unicode related algorithm instd.string since it's almost definitely doing it wrong anyway (I'vechecked only 2 - both broken).

There is not a single sign of unicode standards used, just thefallacious logic: byte --> dchar and use the same algorithm as withASCII. It won't work.


But clearly I need to take the time to take a look at what you've actually
done (I keep meaning to but haven't gotten around to it yet). It had been my
impression that what you were doing was primarily a matter of improving the
implementation, but it sounds like you're doing something beyond that.


Take a peek at icmp and sicmp in new std.uni.
Current fork of Phobos is here:
https://github.com/blackwhale/phobos/tree/new-std-uni

Eventually we'd have to do a bit more in the same direction e.g. titlecasing, split by word boundary etc. (all of these need fixing instd.string).

Also all of the core tools are now in the open: CodepointSet, andgenerating Tries from sets and AA-s.



--
Dmitry Olshansky

Re: Formal Review of std.uni

Reply via email to