>This output is non-useful when the main application needs to analyze 
>input
>stream no matter what. To do this the main application now is forced to:
>    have its own built-in UTF8-parser;
>    reparse the input stream by this built-in parser to find invalid 
> UTF-8
>characters;
>    make them valid and remember changes to have possibility to 
> restore them
>later;
>    reexecute pcre_exec() with valid UTF-8 stream;
>    rebuild output stream with restoring of replaced invalid UTF-8
>characters.
>And cost of this work is very high.

And just why on Earth do you feel it's up to PCRE to carry the burden 
for processing nonsensical input strings?  If you happen to have to use 
say SQLite or some other DB engine will you ask their devs to carry the 
very same (useless) burden as well?  And the OS as well?  And what else?

That simply doesn't make any sense.  Whatever the application is, it's 
its responsability to conform to APIs as they are defined or place an 
intermediate layer at this effect.  PCRE is asking for either valid 
UTF-8 or random strings of bytes and offers an option to catch invalid 
input in both cases.  IMHO this is very permissive already.  I would 
find it natural that a library like PCRE would specify an undefined 
behavior (crash included) in case of invalid UTF-8 input.  Checking 
UTF-8 conformance and eventually correcting things is just not its 
business.

It's the same as validation of user input, for example personal data 
entered at a website.  The data entry module needs to perform 
validation once and reject until correct data is input.  It would be 
plain crazy to accept for instance random text as ZIP code, deeper 
applications having to check that and correct the wrong data at their 
sole level every time those applications need to process a ZIP code.

Write your own random-to-UTF8 code in a way which suits your needs and 
let PCRE process pure UTF-8.  That you can easily find invalid UTF-8 
source is no excuse to rewrite gazillions of libraries with redundant 
useless code in order to bear with it (with doubtful results anyway).

I wouldn't like to see valuable time and energy of benevolent PCRE devs 
wasted to have PCRE able to process random binary data.  If you need 
that, I'm sure you can write such a module yourself. 


-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to