Hi, I think the cost of an early check is still lower than doing this during pcre_exec().
Anyway, PCRE have macros in pcre_internal.h: GETCHAR(c, eptr) GETCHARTEST(c, eptr) GETCHARINC(c, eptr) GETCHARINCTEST(c, eptr) GETCHARLEN(c, eptr, len) GETCHARLENTEST(c, eptr, len) BACKCHAR(eptr) You can redefine them to fit for your purposes including handling illegal characters. Regards, Zoltan ND <[email protected]> írta: >> In the default case, PCRE does not crash: it returns PCRE_ERROR_BADUTF8.> This output is non-useful when the main application needs to analyze input > stream no matter what. To do this the main application now is forced to:> have its own built-in UTF8-parser;> reparse the input stream by this built-in parser to find invalid UTF-8 > characters;> make them valid and remember changes to have possibility to restore them > later;> reexecute pcre_exec() with valid UTF-8 stream;> rebuild output stream with restoring of replaced invalid UTF-8 > characters.> And cost of this work is very high.> > Situations when analyzis must be successfully dealed regardles erroneous > or not is input UTF-8 stream are widespread. The reason of error > appearance in some cases is unwitting or wilful in other. Now PCRE can't > offer effective solution.> > > I think it would penalize the normal running of PCRE too much.> I wrote that this behaviour may be OPTIONAL.> > > I also think one could argue about how to interpret a sequence of > > invalid byteswhose values are greater than 127. How many characters does > > such astring encode? For example, suppose the first byte indicates that > > thereare three more bytes in a UTF-8 character, two of them are OK, but > > thethird one has an invalid value (less than 128, say). Is that a mangled > > UTF-8 character followed by an ASCII byte, or is it four single-byte> > characters?> IMHO a sequence of invalid bytes may be interprets as one character of > type "invalid" per byte.> > -- > ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev > -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
