30-Apr-2013 23:17, Jonathan M Davis пишет:
On Tuesday, April 30, 2013 15:13:14 Dmitry Olshansky wrote:
Unicode --> can't be done on character by character basis

Sure it can. It operates on dchar.

Getting back to this.

Sure it can't - I'd hate to break the illusion but the keyword is e.g. Unicode Case Folding. Another one is Combining Character sequence.

So, with how it's been, std.uni would only be operating on dchars, and putting
a function in there which operated on strings wouldn't make any sense. Maybe
that doesn't work if you've done a bunch of grapheme stuff, and things will
have to be adjusted, but it would be a definite shift to put anything in
std.uni which operated on strings, and I think that it would need some definite
justification (and there's a good chance that I'd be inclined to argue that it
should still go in std.string, possibly using some internal modules if
necessary).

Justification is that we'd rather have exactly one module dealing with a bunch of Unicode data arranged into intricate tables.

Strictly speaking I'd abolish any Unicode related algorithm in std.string since it's almost definitely doing it wrong anyway (I've checked only 2 - both broken).

There is not a single sign of unicode standards used, just the fallacious logic: byte --> dchar and use the same algorithm as with ASCII. It won't work.


But clearly I need to take the time to take a look at what you've actually
done (I keep meaning to but haven't gotten around to it yet). It had been my
impression that what you were doing was primarily a matter of improving the
implementation, but it sounds like you're doing something beyond that.

Take a peek at icmp and sicmp in new std.uni.
Current fork of Phobos is here:
https://github.com/blackwhale/phobos/tree/new-std-uni

Eventually we'd have to do a bit more in the same direction e.g. title casing, split by word boundary etc. (all of these need fixing in std.string).

Also all of the core tools are now in the open: CodepointSet, and generating Tries from sets and AA-s.


--
Dmitry Olshansky

Reply via email to