http://d.puremagic.com/issues/show_bug.cgi?id=5543
Dmitry Olshansky <dmitry.o...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.o...@gmail.com --- Comment #5 from Dmitry Olshansky <dmitry.o...@gmail.com> 2012-12-21 07:17:53 PST --- >Java even implements > one taking chars, and another taking int (dchar) That's because Java folks used to have only 16bit chars. Now true codepoints are going in form of 'int'. > http://msdn.microsoft.com/en-us/library/system.char.getnumericvalue.aspx > http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html > > I'd say we should just add: > std.ascii.getNumericValue > std.uni.getNumericValue > (or plain numericValue) > Agreed and the name should be numericValue. > I already wrote the ascii version (easy as pie), and support for the [Nd] > group, using a binary search, followed by an offset from the lower bound. > > [Nl] and [Po] require a straight up mapping of codepoint to value, but I'm > still writing the parser that extract the data for the raw UCD > (http://www.unicode.org/Public/6.2.0/ucdxml/). > I'm wrapping up a revamp of std.uni that makes it piece of cake to create character sets. And maps are converted to multi-staged tables that are faster the binary search on a large set. I'd suggest to wait a bit on it (so as to not duplicate work) and introduce only std.ascii version as the most useful. The ongoing polishing, fixing and testing against ICU is going on here: https://github.com/blackwhale/gsoc-bench-2012 > The file is too large for std.xml to handle, so it's back to C++ for me :/ > http://www.unicode.org/Public/UNIDATA/UnicodeData.txt Same thing but no useless XML trash. Description of fields is somewhere in the middle of this document http://www.unicode.org/reports/tr44/ > The only questions I have is: > Return value: int or double? Should be rational to acurately represent things like "1/5" character ;) I do suspect some simple custom type could do (2 shorts packed in one struct etc.). > Input is not numeric: -1 or exception? -1 is fine I think as this rather low level (per character) and it's not at all convenient to throw (and then catch). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------