On 10/23/2015 04:33 AM, rumbu via Digitalmars-d-learn wrote:
My opinion is to use the Tango's unicodedata.d module to obtain the unicode category, std.uni does not provide such functionality.

This module does not have any dependency, therefore you can just use it directly:

https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/text/UnicodeData.d#L169

Thank you for confirming that std.uni doesn't implement that functionality, and for pointing to a Tango source. That's probably the one I was originally remembering, but is Tango even still being maintained? (OK, this very module was last updated 3 days ago.)

FWIW, in the past I've had a lot of trouble syncing Tango and D, to the point that I just dropped Tango, but as you say, this module doesn't seem to have any external dependencies, and it would be a faster solution to the problem, and perhaps it would work on the various control chars.

Still, I don't use this for heavy processing, so maintaining this external dependency would likely be more effort than it is worth...as long as I don't need to handle exotic chars in the control range.

If speed were my main consideration, I'd certainly give that solution a try. The benefit of the solution that I proposed is that it's easy to understand given the phobos library. And if I actually needed to handle exotic control chars, then it would be the only option I've seen. However the text I'm handling is *almost* all ASCII, so ... (occasional German, occasional footnotes in Greek, and occasional and usually isolated single ideograms in Chinese or some Japanese script, etc. I don't think I've run across any Sanskrit yet. etc.)

As such the solution I proposed is probably good enough, though if there were a Phobos level solution I'd prefer that.

Reply via email to