On 10/23/2015 04:33 AM, rumbu via Digitalmars-d-learn wrote:
My opinion is to use the Tango's unicodedata.d module to obtain the
unicode category, std.uni does not provide such functionality.
This module does not have any dependency, therefore you can just use
it directly:
https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/text/UnicodeData.d#L169
Thank you for confirming that std.uni doesn't implement that
functionality, and for pointing to a Tango source. That's probably the
one I was originally remembering, but is Tango even still being
maintained? (OK, this very module was last updated 3 days ago.)
FWIW, in the past I've had a lot of trouble syncing Tango and D, to the
point that I just dropped Tango, but as you say, this module doesn't
seem to have any external dependencies, and it would be a faster
solution to the problem, and perhaps it would work on the various
control chars.
Still, I don't use this for heavy processing, so maintaining this
external dependency would likely be more effort than it is worth...as
long as I don't need to handle exotic chars in the control range.
If speed were my main consideration, I'd certainly give that solution a
try. The benefit of the solution that I proposed is that it's easy to
understand given the phobos library. And if I actually needed to handle
exotic control chars, then it would be the only option I've seen.
However the text I'm handling is *almost* all ASCII, so ... (occasional
German, occasional footnotes in Greek, and occasional and usually
isolated single ideograms in Chinese or some Japanese script, etc. I
don't think I've run across any Sanskrit yet. etc.)
As such the solution I proposed is probably good enough, though if there
were a Phobos level solution I'd prefer that.