Hi everyone, I have been working on an application that uses PCRE2, along with the serialization API. The serialization API is a major gain over PCRE1, especially as this application makes use of fairly large regexes that take a non-trivial amount of time to compile.
However, I am seeing that this API has the restriction that a serialized regex can only be loaded by the same version of PCRE2 that was used to create it. This severely limits the usefulness of the API, as the application must then make provisions for potentially re-compiling a regex if the serialized form cannot be loaded. I have to hang on to the original regex text and compile options somewhere, in other words, in the event that PCRE2 is updated (e.g. due to a security vulnerability). Also, this application is making use of architecture-independent data files that contain these regexes. Ideally these files would be updated with a current-format serialization whenever there is a version bump. But given how these files are deployed, they must essentially be treated as read-only. So this brings up awkward questions of where and how the newer serialized form can be cached to avoid the performance hit of re-compiling regexes repeatedly. Is it not feasible for the serialized form to be forward-compatible with later versions of PCRE2? That's what I was expecting from this API going in, since that is the norm for object serialization protocols. The current behavior is more in line with the limitations of dumping an object's memory representation straight to disk. (And that's what I've seen people do to serialize regexes in PCRE1; it was dangerous as all heck, but it seemed to work well enough for some folks not to mind.) --Daniel P.S.: Please Cc: me on any replies, as I am not subscribed to this list. -- Daniel Richard G. || [email protected] My ASCII-art .sig got a bad case of Times New Roman. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
