Changes:
Results of pernding list's scan now are placed directly in resulting tidbitmap. This saves cycles for filtering results and reduce memory usage. Also, it allows to not check losiness of tbm.


Is this a 100% bulletproof solution, or is it still possible for a query
to fail due to the pending list? It relies on the stats collector, so
perhaps in rare cases it could still fail?
Yes :(

Can you explain why the tbm must not be lossy?

The problem with lossy tbm has two aspects:
 - amgettuple interface hasn't possibility to work with page-wide result instead
   of exact ItemPointer. amgettuple can not return just a block number as
   amgetbitmap can.
 - Because of concurrent vacuum process: while we scan pending list, it's
   content could be transferred into regular structure of index and then we will
   find the same tuple twice. Again, amgettuple hasn't protection from that,
   only amgetbitmap has it. So, we need to filter results from regular GIN
   by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results are collected in single resulting tidbitmap.



Also, can you clarify why a large update can cause a problem? In the

If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col then there is a probability of increasing of pending list over non-lossy limit.


previous discussion, you suggested that it force normal index inserts
after a threshold based on work_mem:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of patch was frequent question from users: why update of whole table with GIN index is so slow? So this way will not resolve this question. - after limit is reached, force cleanup of pending list by calling gininsertcleanup. Not very good, because users sometimes will see a huge execution time of simple insert. Although users who runs a huge update should be satisfied.

I have difficulties in a choice of way. Seems to me, the better will be second way: if user gets very long time of insertion then (auto)vacuum of his installation should tweaked.


--
Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/

Attachment: fast_insert_gin-0.21.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to