Re: Formal Review of std.uni

Dmitry Olshansky Sun, 12 May 2013 13:10:28 -0700

28-Apr-2013 20:56, Jesse Phillips пишет:

This is a replacement module for the current std.uni by Dmitry
Olshansky. The std.uni module provides an implementation of fundamental
Unicode algorithms and data structures.


To use this module, install 2.63 beta, import uni; and not std.uni,
compile two files from the source uni.d unicode_tables.d

Docs:
http://blackwhale.github.io/phobos/uni.html

Source:
https://github.com/blackwhale/gsoc-bench-2012

DMD Beta:
http://forum.dlang.org/post/517c8552.7040...@digitalmars.com

It should be noted that inclusion into Phobos may require addressing
inter-dependencies, see "Reducing the inter-dependencies"
http://forum.dlang.org/post/kl8hn8$bm3$1...@digitalmars.com

We have only one week for review left so I'd like to sort out the lastissues before we get to the voting.


First to fill in on latest developments.

With a bunch of ugly hacks I've managed to integrate new std.uni in myPhobos fork and it passes unittests for me now (on win32 at least).


See it hanging there and waiting to be destroyed by the pull tester:
https://github.com/D-Programming-Language/phobos/pull/1289

Remaining issues that I'm aware of:
- proper toLower/toUpper (current one is simplified codepoint-for-codepoint)

- clean up the debris after crush-landing back into Phobos, revert someunrelated changes etc.

Please take time to make that list grow, esp w.r.t interface choices andthe code itself.

Plus separately I'd need to remove rudimentary versions of the samedata-structures used in std.regex and rewire it to use the new std.uni.

There are few bugs and issues uncovered during integration that I wishto get feedback on.


std.string has a bogus test for toLower:

Of the very few tests being done 2 are very special corner case around\u0130 which is I with dot and is expected to be lowercased to i.But it's *not* supposed to - this conversion is specific to Turk(?)locale (=tailoring). What should happen is unfolding it to 2-codepointsequence 'i' and 'dot-above' (this is in works).

I just hope nobody depends on these particular conversions and I amwondering who's put them there in the first place.

std.json is another thing - 0x7F somehow is specifically tested as beingaccepted as part of string literal. Yet ECMA script docs clearly statethat Unicode control characters are to be stripped even before lexing(ignored even in literals).

P.S. Someday I need to track down and file about 2 (or 3?) distinctcompiler bugs (fwd-ref hell, private alias hijacking) that I workedaround while getting there.

Another one has a fix already (thanks, Kenji):
http://d.puremagic.com/issues/show_bug.cgi?id=10067

--
Dmitry Olshansky

Re: Formal Review of std.uni

Reply via email to