Re: Faster Command Line Tools in D

Steven Schveighoffer via Digitalmars-d-announce Tue, 30 May 2017 15:36:06 -0700

On 5/30/17 5:57 PM, Patrick Schluter wrote:

On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:

On 5/26/17 11:20 AM, John Colvin wrote:

On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:

[...]


This version also has the advantage of being (discounting any bugs in
iopipe) correct for arbitrary unicode in all common UTF encodings.


I worked a lot on making sure this works properly. However, it's
possible that there are some lingering issues.

I also did not spend much time optimizing these paths (whereas I spent
a ton of time getting the utf8 line parsing as fast as it could be).
Partly because finding things other than utf8 in the wild is rare, and
partly because I have nothing to compare it with to know what is
possible :)


If you want UCS-2 (aka UTF-16 without surrogates) data I can give you
gigabytes of files in tmx format.

The data I can (and have) generated from UTF-8 data. I have tested mybyLine parser to make sure it properly splits on "interesting" codepoints in all widths. UTF-16 data without surrogates should probablywork fine. I haven't tuned it though like I tuned the UTF-8 version. Isthere a memchr for wide characters? ;)

What I really haven't done is compared my line parsing code withmulti-code-unit delimiters against one that can do the same thing. Iknow Phobos and C FILE * really can't do it. I haven't really looked atall in C++, so I should probably look there before giving up.


-Steve

Re: Faster Command Line Tools in D

Reply via email to