On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> This patch implements version 1.2 of contrib module pg_trgm.
>
> This supports the triconsistent function, introduced in version 9.4 of the
> server, to make it faster to implement indexed queries where some keys are
> common and some are rare.
>

Thank you for the patch! Lack of pg_trgm triconsistent support was
significant miss after "fast scan" implementation.


> I've included the paths to both upgrade and downgrade between 1.1 and 1.2,
> although after doing so you must close and restart the session before you
> can be sure the change has taken effect. There is no change to the on-disk
> index structure
>

pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you
expect them in final commit? As I can see in other contribs we have only
last version and upgrade scripts.

 This shows the difference it can make in some cases:

>
> create extension pg_trgm version "1.1";
>
> create table foo as select
>
>   md5(random()::text)|| case when random()<0.000005 then 'lmnop' else
> '123' end ||
>
>   md5(random()::text) as bar
>
> from generate_series(1,10000000);
>
> create index on foo using gin (bar gin_trgm_ops);
>
> --some queries
>
> alter extension pg_trgm update to "1.2";
>
> --close, reopen, more queries
>
>
> select count(*) from foo where bar like '%12344321lmnabcddd%';
>
>
>
> V1.1: Time: 1743.691 ms  --- after repeated execution to warm the cache
>
> V1.2: Time:  2.839 ms      --- after repeated execution to warm the cache
>

Nice!


> You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in
> core) from 4 to some higher value (which it probably should be anyway, but
> there will always be a case where it needs to be higher than you can afford
> it to be, so a real solution is needed).
>

Actually, it depends on how long it takes to calculate consistent function.
The cheaper consistent function is, the higher MAX_MAYBE_ENTRIES could be.
Since all functions in PostgreSQL may define its cost, GIN could calculate
MAX_MAYBE_ENTRIES from the cost of consistent function.

I wasn't sure if this should be a new version of pg_trgm or not, because
> there is no user visible change other than to performance.  But there may
> be some cases where it results in performance reduction and so it is nice
> to provide options.  Also, I'd like to use it in a back-branch, so versions
> seems to be the right way to go there.
>

We definitely have to stamp a new version of extension every time when its
SQL-definition changes. By the current design of GIN, providing both
consistent and triconsistent function should be optimal for performance. If
it's not so then it's a problem of GIN, not opclass.

There is a lot of code duplication between the binary consistent function
> and the ternary one.  I thought it the duplication was necessary in order
> to support both 1.1 and 1.2 from the same code base.
>

For me current code duplication is OK.

There may also be some gains in the similarity and regex cases, but I
> didn't really analyze those for performance.
>


> I've thought about how to document this change.  Looking to other example
> of other contrib modules with multiple versions, I decided that we don't
> document them, other than in the release notes.
>
>
> The same patch applies to 9.4 code with a minor conflict in the Makefile,
> and gives benefits there as well.
>

Some other notes about the patch:
* You allocate boolcheck array and don't use it.
* Check coding style and formatting, in particular "check[i]==GIN_TRUE"
should be "check[i] == GIN_TRUE".
* I think some comments needed in gin_trgm_triconsistent() about
trigramsMatchGraph(). gin_trgm_triconsistent() may use trigramsMatchGraph()
that way because trigramsMatchGraph() implements monotonous boolean
function.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to