> > It would be a delight to be able to use more advanced (IMHO) Perl- > > compatible regexes in PostgreSQL. > > After some further research, pcre does seem like an interesting > alternative. Both pcre and Spencer's new code have essentially > Berkeley-style licenses, so there's no problem there. Some relevant > comparisons: > > 1. pcre tries to be exactly compatible with Perl, so details of its > regex flavor will be familiar to many more people than the Tcl flavor > (by and large the features are similar, but there are differences).
pcre is lgpl, iirc. Ruby went off and wrote an explicitly BSD licensed regexp engine to replace it's GPL'ed Perl/pcre based bits. > 2. pcre is already distributed as a nice tidy library; we need not > extract code from the Tcl distribution. http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/oniguruma/ > 3. pcre is actively maintained (although tracking a new release every > couple months may not be something we really want to do, anyway). > AFAICT Henry's not doing anything much with his code, so it'd be > pretty much take-once-and-maintain-for-ourselves. Oniguruma is pretty well maintained given that it's Ruby's regexp engine, and has the perk of being maintained outside of Ruby as a standalone module that gets periodically imported. > 4. pcre looks like it's probably *not* as well suited to a multibyte > environment. In particular, I doubt that its UTF8 compile option > was even turned on for the performance comparison Neil cited --- and > the man page only promises "experimental, incomplete support for > UTF-8 encoded strings". The Tcl code by contrast is used only in a > multibyte environment, so that's the supported, optimized path. It > doesn't even assume null-terminated strings (yay). Oniguruma only supports ASCII, UTF-8, EUC-JP, and Shift_JIS, but boasts being 10-20% faster than PCRE for ASCII (no clue about multi-byte character sets). In terms of development/API, it supports the GNU regex, POSIX, Oniguruma APIs (the latter is what ruby uses to hook in). Just another option to add to the table, don't know if it fully fits our requirements, but since it is actively being developed by resources outside of this project, and it has support for 16-bit and 32-bit encodings (UCS-2, UCS-4, UTF-16) is on the TODO list, it might be nice to keep this in mind and let Ruby maintain it instead of PostgreSQL. -sc -- Sean Chittenden ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster