Neat; thanks Dirk! Will be interesting to see if I can get that finnagled on Windows when I get back to Boston.
Best, Oliver On Wednesday, 10 August 2016, Dirk Eddelbuettel <e...@debian.org> wrote: > > On 10 August 2016 at 18:15, Oliver Keyes wrote: > | I'm trying to incorporate PCRE-compliant regular expressions into C > | code in an R package. > | > | >From digging around in R's source code, it appears that R (pretty > | much?) guarantees the presence of either a system-level PCRE library, > | or an R-internal one.[0] Is this exposed (or grabbable) via the R C > | API in any way? > > The key to realize here is that R does indeed provide an environment. And > at > least where I like to work, in get this right off the bat: > > edd@max:/tmp$ grep lpcre /etc/R/* > /etc/R/Makeconf:LIBS = -lpcre -llzma -lbz2 -lz -lrt -ldl -lm > edd@max:/tmp$ > > So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are > essentially "there for the taking". If built as a shared library. > > An existence proof is below; it is based on the 2nd Google hit I got for > 'libpcre example' and has the advantge of being shorter than the first hit. > > I first created baseline. The example, as given and then repaired, gets us: > > edd@max:/tmp$ ./ex_pcre > 0: From:regular.expressi...@example.com <javascript:;> > 1: regular.expressions > 2: example.com > 0: From:ex...@43434.com <javascript:;> > 1: exddd > 2: 43434.com > 0: From:7853...@exgem.com <javascript:;> > 1: 7853456 > 2: exgem.com > edd@max:/tmp$ > > Turning that into something callable from R took about another minute. It > looks like this: > > ------------------------------------------------------------ > ----------------- > // modified (and repaired) example from http://stackoverflow.com/a/ > 1421923/143305 > #include "pcre.h" > #include <Rcpp.h> > > // [[Rcpp::export()]] > void foo() { > const char *error; > int erroffset; > pcre *re; > int rc; > int i; > int ovector[100]; > > const char *regex = "From:([^@]+)@([^\r]+)"; > char str[] = "From:regular.expressi...@example.com <javascript:;> > \r\n"\ > "From:ex...@43434.com <javascript:;>\r\n"\ > "From:7853...@exgem.com <javascript:;>\r\n"; > > re = pcre_compile (regex, /* the pattern */ > PCRE_MULTILINE, > &error, /* for error message */ > &erroffset, /* for error offset */ > 0); /* use default character tables */ > if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", > erroffset, error); > > unsigned int offset = 0; > unsigned int len = strlen(str); > while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, > ovector, sizeof(ovector))) >= 0) { > for(int i = 0; i < rc; ++i) { > Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + > ovector[2*i]); > } > offset = ovector[1]; > } > } > > /*** R > foo() > */ > ------------------------------------------------------------ > ----------------- > > and, lo and behold, produces the same output demonstrating that, yes, > Veronica, we do get pcre for free: > > R> library(Rcpp) > R> sourceCpp("/tmp/oliver.cpp") > > R> foo() > 0: From:regular.expressi...@example.com <javascript:;> > 1: regular.expressions > 2: example.com > 0: From:ex...@43434.com <javascript:;> > 1: exddd > 2: 43434.com > 0: From:7853...@exgem.com <javascript:;> > 1: 7853456 > 2: exgem.com > R> > > Your package will probably want to a litmus test in configure to see if > this > really holds on the platform it is currently being built on. > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > <javascript:;> > > [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel