On 06-Apr-2016 01:00, Timothee Cour via Digitalmars-d wrote:
Is there a way to avoid decoding (as utf8) when calling regex' apis? or a plan to do so?
Custom alphabets - yes, including ASCII.
use case: speed (no decoding) and avoiding throwing on invalid utf8 sequences
The speed gain for ASCII only vs Unicode with ASCII special case would be around 0.5% (the time spent on decoding) as my extensive profiling shows. Latest pull for std.regex did exactly that - special path for ASCII.
ideally this should allow: --- auto s = cast(ubyte[]) "abcd"; //potentially not valid utf8 sequence auto r = cast(ubyte[]) `^\d`; auto m=match(s, r.regex); // right now: regex cannot deduce function from argument types !()(ubyte[]) ---
-- Dmitry Olshansky