On 06-Apr-2016 01:00, Timothee Cour via Digitalmars-d wrote:
Is there a way to avoid decoding (as utf8) when calling regex' apis?
or a plan to do so?

Custom alphabets - yes, including ASCII.

use case: speed (no decoding) and avoiding throwing on invalid utf8 sequences

The speed gain for ASCII only vs Unicode with ASCII special case would be around 0.5% (the time spent on decoding) as my extensive profiling shows. Latest pull for std.regex did exactly that - special path for ASCII.


ideally this should allow:

---
auto s = cast(ubyte[])  "abcd"; //potentially not valid utf8 sequence
auto r = cast(ubyte[])  `^\d`;
auto m=match(s, r.regex); // right now: regex cannot deduce function
from argument types !()(ubyte[])
---



--
Dmitry Olshansky

Reply via email to