Re: Very promising results with libpcre2
Ævar Arnfjörð Bjarmasonwrites: > I am very tempted though to support them in parallel, if only for ease > of performance testing and to be able to roll out support for > grep.patternType=perl meaning pcre1 for now, but add a > grep.patternType=pcre2 for testing (and make grep.patternType=pcre1 > work, with grep.patternType=pcre being synonymous with > grep.patternType=perl, i.e. whatever the default is). Perhaps. As long as the code doesn't get too ugly, I think we would survive ;-)
Re: Very promising results with libpcre2
Jeffrey Waltonwrites: >> Just to make sure that we are on the same page. While I do not see >> the need to link with both variants and allow users to choose >> between them at runtime, I do not know if the whole world is ready >> to drop pcre1 and use pcre2 (the latter of which has only been >> around for a bit over two years). >> >> So we'd probably want to do >> >> (1) keep USE_LIBPCRE and enable v1 when set; >> (2) add USE_LIBPCRE2 and enable v2 when set; >> (3) make sure to error out when both are set. >> >> or something like that. It is tempting to allow us to say >> >> make USE_LIBPCRE=2 >> >> but the existing builds are likely to be depending on "is it set to >> anything? then use PCRE1" behaviour, so we unfortunately cannot take >> that route. > > Yeah, that's the question I kinda had. > >> make USE_LIBPCRE=2 > > I'd prefer a configure option for consistency. Maybe: > ... It is way too early to worry about how ./configure support for this will look like to the end user. Because our Makefile is designed to be usable without configure, the order we do things will be to first decide how we are going to use Makefile variables to configure the feature (i.e. what you saw that I said to Ævar). Once we know the decision, then we arrange autoconf to spit out the chosen Makefile variables using --with-pcre or whatever input. This step cannot start before we know the decision of the former. The one who writes --with-pcre support in ./configure would not know if USE_LIBPCRE=YesPlease or something else is the desired output until then.
Re: Very promising results with libpcre2
> Just to make sure that we are on the same page. While I do not see > the need to link with both variants and allow users to choose > between them at runtime, I do not know if the whole world is ready > to drop pcre1 and use pcre2 (the latter of which has only been > around for a bit over two years). > > So we'd probably want to do > > (1) keep USE_LIBPCRE and enable v1 when set; > (2) add USE_LIBPCRE2 and enable v2 when set; > (3) make sure to error out when both are set. > > or something like that. It is tempting to allow us to say > > make USE_LIBPCRE=2 > > but the existing builds are likely to be depending on "is it set to > anything? then use PCRE1" behaviour, so we unfortunately cannot take > that route. Yeah, that's the question I kinda had. > make USE_LIBPCRE=2 I'd prefer a configure option for consistency. Maybe: --with-pcre # Original PCRE --with-pcre1 # Alias --with-pcre2 # PCRE2 I prefer it because I usually do the following to see the interesting things that's going on: ./configure --help Often, I find a `--with-ssl` or similar. If `--with-ssl` fails, then I go to the README and INSTALL to fine tune it. By the way, if you are tweaking Configure, then consider adding a --with-perl=X, too. Its consistent, it side steps the hard coded /usr/bin/perl, and it signals to users its tunable. Jeff
Re: Very promising results with libpcre2
On Sat, Apr 1, 2017 at 8:24 PM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano wrote: >>> Ævar Arnfjörð Bjarmason writes: >>> That enables the new JIT support in pcre v2: s/iterrx fixed prx rx 2.19-- -33% -44% fixed 1.47 49%-- -17% prx 1.22 79% 20%-- >>> >>> The numbers with JIT does look "interesting". >>> >>> I couldn't quite tell if there are major incompatibilities in the >>> regex language itself between two versions from their documentation, >>> but assuming that there isn't (modulo bugfixes and enhancements) and >>> assuming that we are going to use their standard matcher, it may be >>> OK to just use the newer one without linking both. >> >> There's no incompatibilities in the regex language itself (modulo bugs >> etc). So yeah, I'll prepare some patch to use v2. > > Just to make sure that we are on the same page. While I do not see > the need to link with both variants and allow users to choose > between them at runtime, I do not know if the whole world is ready > to drop pcre1 and use pcre2 (the latter of which has only been > around for a bit over two years). > > So we'd probably want to do > > (1) keep USE_LIBPCRE and enable v1 when set; > (2) add USE_LIBPCRE2 and enable v2 when set; > (3) make sure to error out when both are set. > > or something like that. It is tempting to allow us to say > > make USE_LIBPCRE=2 > > but the existing builds are likely to be depending on "is it set to > anything? then use PCRE1" behaviour, so we unfortunately cannot take > that route. We're on the same page, all of this makes sense, there'll be some OS's / environments which for years won't be packaging pcre2 but will have pcre1. I am very tempted though to support them in parallel, if only for ease of performance testing and to be able to roll out support for grep.patternType=perl meaning pcre1 for now, but add a grep.patternType=pcre2 for testing (and make grep.patternType=pcre1 work, with grep.patternType=pcre being synonymous with grep.patternType=perl, i.e. whatever the default is).
Re: Very promising results with libpcre2
Ævar Arnfjörð Bjarmasonwrites: > On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> >>> That enables the new JIT support in pcre v2: >>> >>> s/iterrx fixed prx >>> rx 2.19-- -33% -44% >>> fixed 1.47 49%-- -17% >>> prx 1.22 79% 20%-- >> >> The numbers with JIT does look "interesting". >> >> I couldn't quite tell if there are major incompatibilities in the >> regex language itself between two versions from their documentation, >> but assuming that there isn't (modulo bugfixes and enhancements) and >> assuming that we are going to use their standard matcher, it may be >> OK to just use the newer one without linking both. > > There's no incompatibilities in the regex language itself (modulo bugs > etc). So yeah, I'll prepare some patch to use v2. Just to make sure that we are on the same page. While I do not see the need to link with both variants and allow users to choose between them at runtime, I do not know if the whole world is ready to drop pcre1 and use pcre2 (the latter of which has only been around for a bit over two years). So we'd probably want to do (1) keep USE_LIBPCRE and enable v1 when set; (2) add USE_LIBPCRE2 and enable v2 when set; (3) make sure to error out when both are set. or something like that. It is tempting to allow us to say make USE_LIBPCRE=2 but the existing builds are likely to be depending on "is it set to anything? then use PCRE1" behaviour, so we unfortunately cannot take that route. Thanks.
Re: Very promising results with libpcre2
On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> That enables the new JIT support in pcre v2: >> >> s/iterrx fixed prx >> rx 2.19-- -33% -44% >> fixed 1.47 49%-- -17% >> prx 1.22 79% 20%-- > > The numbers with JIT does look "interesting". > > I couldn't quite tell if there are major incompatibilities in the > regex language itself between two versions from their documentation, > but assuming that there isn't (modulo bugfixes and enhancements) and > assuming that we are going to use their standard matcher, it may be > OK to just use the newer one without linking both. There's no incompatibilities in the regex language itself (modulo bugs etc). So yeah, I'll prepare some patch to use v2.
Re: Very promising results with libpcre2
Ævar Arnfjörð Bjarmasonwrites: > That enables the new JIT support in pcre v2: > > s/iterrx fixed prx > rx 2.19-- -33% -44% > fixed 1.47 49%-- -17% > prx 1.22 79% 20%-- The numbers with JIT does look "interesting". I couldn't quite tell if there are major incompatibilities in the regex language itself between two versions from their documentation, but assuming that there isn't (modulo bugfixes and enhancements) and assuming that we are going to use their standard matcher, it may be OK to just use the newer one without linking both.
Very promising results with libpcre2
The recent libpcre2 got me interested in seeing what the difference in v1 and v2 was. So I hacked up a *very basic* patch for libpcre2 that passes all tests, but obviously isn't ready for inclusion (I searched/replaced all the v1 usage with v2). I'm not even bothering sending this to the list since discussing the patch itself isn't the point: https://github.com/avar/git/commit/414647d88dd9c5 Before that patch, running a test[1] on linux.git where I grep the whole tree for a fixed string / simple regex with POSIX regexes & PCRE (all the greps match the same few lines) gives: s/iterrx prx fixed rx 2.20-- -2% -34% prx 2.172%-- -32% fixed 1.46 51% 48%-- I.e. fixed string is fastest, and both POSIX regcomp() & pcre v1 are ~30% slower than that, with no difference in performance between the two. Now with my patch above with pcre v2 there's a notable performance difference: s/iterrx prx fixed rx 2.18-- -16% -33% prx 1.84 19%-- -20% fixed 1.47 48% 25%-- We've gone from ~30% slower to ~20% slower for PCRE with v2. But now let's test that with this patch: https://github.com/avar/git/commit/4b7e5da3606c0b9b12025437de8005f5fa07ff54 That enables the new JIT support in pcre v2: s/iterrx fixed prx rx 2.19-- -33% -44% fixed 1.47 49%-- -17% prx 1.22 79% 20%-- Now it's PCRE that's 20% faster than our currently fastest grep codepath that searches for a fixed string, and in absolute terms it's around 50% faster than the current PCRE implementation. This is on Debian testing with both PCRE libraries installed via packages, 8.35 & 10.22 for v1 and v2, respectively. Both are the second-latest[2] point releases for their respective versions. As far as turning this into a patch goes there's a few open questions: * PCRE itself supports linking to v1 and v2 in the same program just fine. Should we provide the possibility to link to both, or just make the user choose? If these performance numbers hold up preferring v2 is definitely better. * The JIT is supposedly a bit slower if you're not doing a lot of matching, although I doubt this matters in practice, but whether to use it & a few other options could be controlled by some config/CLI option. I think it probably makes sense just to always use it if it's there pending some cases where it makes performance worse in practice As an aside I started looking into this because I'm interested in eventually hacking up something that makes every user-facing regcomp()/regexec() we have now (e.g. log -G) accept PCRE as well. How do do this in all cases isn't very obvious, we could just have some global config option, but there's lots of stuff like e.g. "^{/} & "git show :/" that takes regexes, and e.g. how to pass a /i flag to some of these isn't obvious at all. The solution I'm leaning towards is to just make stuff like thath only work under PCRE, via the native (?:) facility. E.g. this now works: git grep -P '(?xi: h e l l o)' 1. PF=~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(20, { fixed => sub { system "$ENV{PF}git grep -F avarasu >/dev/null" }, rx => sub { system "$ENV{PF}git grep avara?su >/dev/null" }, prx => sub { system "$ENV{PF}git grep -P avara?su >/dev/null" } })' 2. https://ftp.pcre.org/pub/pcre/