On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> This patch implements version 1.2 of contrib module pg_trgm. > > This supports the triconsistent function, introduced in version 9.4 of the > server, to make it faster to implement indexed queries where some keys are > common and some are rare. > Thank you for the patch! Lack of pg_trgm triconsistent support was significant miss after "fast scan" implementation. > I've included the paths to both upgrade and downgrade between 1.1 and 1.2, > although after doing so you must close and restart the session before you > can be sure the change has taken effect. There is no change to the on-disk > index structure > pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you expect them in final commit? As I can see in other contribs we have only last version and upgrade scripts. This shows the difference it can make in some cases: > > create extension pg_trgm version "1.1"; > > create table foo as select > > md5(random()::text)|| case when random()<0.000005 then 'lmnop' else > '123' end || > > md5(random()::text) as bar > > from generate_series(1,10000000); > > create index on foo using gin (bar gin_trgm_ops); > > --some queries > > alter extension pg_trgm update to "1.2"; > > --close, reopen, more queries > > > select count(*) from foo where bar like '%12344321lmnabcddd%'; > > > > V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache > > V1.2: Time: 2.839 ms --- after repeated execution to warm the cache > Nice! > You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in > core) from 4 to some higher value (which it probably should be anyway, but > there will always be a case where it needs to be higher than you can afford > it to be, so a real solution is needed). > Actually, it depends on how long it takes to calculate consistent function. The cheaper consistent function is, the higher MAX_MAYBE_ENTRIES could be. Since all functions in PostgreSQL may define its cost, GIN could calculate MAX_MAYBE_ENTRIES from the cost of consistent function. I wasn't sure if this should be a new version of pg_trgm or not, because > there is no user visible change other than to performance. But there may > be some cases where it results in performance reduction and so it is nice > to provide options. Also, I'd like to use it in a back-branch, so versions > seems to be the right way to go there. > We definitely have to stamp a new version of extension every time when its SQL-definition changes. By the current design of GIN, providing both consistent and triconsistent function should be optimal for performance. If it's not so then it's a problem of GIN, not opclass. There is a lot of code duplication between the binary consistent function > and the ternary one. I thought it the duplication was necessary in order > to support both 1.1 and 1.2 from the same code base. > For me current code duplication is OK. There may also be some gains in the similarity and regex cases, but I > didn't really analyze those for performance. > > I've thought about how to document this change. Looking to other example > of other contrib modules with multiple versions, I decided that we don't > document them, other than in the release notes. > > > The same patch applies to 9.4 code with a minor conflict in the Makefile, > and gives benefits there as well. > Some other notes about the patch: * You allocate boolcheck array and don't use it. * Check coding style and formatting, in particular "check[i]==GIN_TRUE" should be "check[i] == GIN_TRUE". * I think some comments needed in gin_trgm_triconsistent() about trigramsMatchGraph(). gin_trgm_triconsistent() may use trigramsMatchGraph() that way because trigramsMatchGraph() implements monotonous boolean function. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company