Re: [HACKERS] GIN improvements part2: fast scan

Heikki Linnakangas Thu, 30 Jan 2014 08:39:40 -0800

On 01/30/2014 01:53 AM, Tomas Vondra wrote:

(3) A file with explain plans for 4 queries suffering ~2x slowdown,
     and explain plans with 9.4 master and Heikki's patches is available
     here:


       http://www.fuzzy.cz/tmp/gin/queries.txt

     All the queries have 6 common words, and the explain plans look
     just fine to me - exactly like the plans for other queries.

     Two things now caught my eye. First some of these queries actually
     have words repeated - either exactly like "database & database" or
     in negated form like "!anything & anything". Second, while
     generating the queries, I use "dumb" frequency, where only exact
     matches count. I.e. "write != written" etc. But the actual number
     of hits may be much higher - for example "write" matches exactly
     just 5% documents, but using @@ it matches more than 20%.

     I don't know if that's the actual cause though.

I tried these queries with the data set you posted here:http://www.postgresql.org/message-id/52e4141e.60...@fuzzy.cz. The reasonfor the slowdown is the extra consistent calls it causes. That'sexpected - the patch certainly does call consistent in situations whereit didn't before, and if the "pre-consistent" checks are not able toeliminate many tuples, you lose.


So, what can we do to mitigate that? Some ideas:

1. Implement the catalog changes from Alexander's patch. That ought toremove the overhead, as you only need to call the consistent functiononce, not "both ways". OTOH, currently we only call the consistentfunction if there is only one unknown column. If with the catalogchanges, we always call the consistent function even if there are moreunknown columns, we might end up calling it even more often.


2. Cache the result of the consistent function.

3. Make the consistent function cheaper. (How? Magic?)

4. Use some kind of a heuristic, and stop doing the pre-consistentchecks if they're not effective. Not sure what the heuristic would looklike.

The caching we could easily do. It's very simple and very effective, aslong as the number of number of entries is limited. The amount of spacerequired to cache all combinations grows exponentially, so it's onlyfeasible for up to 10 or so entries.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

Reply via email to