On Tue, Feb 17, 2009 at 2:28 PM, Teodor Sigaev <teo...@sigaev.ru> wrote: > Hi there, > > we present two variants of GIN fast insert patch, since we're not sure > what is a the best solution: > > v0.28.1 > - remove disable cost in gincostestimate > - per http://archives.postgresql.org/message-id/499466d2.4010...@sigaev.ru > gingettuple could force cleanup of pending list if it got a lossy > tidbitmap. > If such cleanup occurs the gingettuple will rescan pending list again. > > v0.28.2 > - remove disable cost in gincostestimate > - per > http://archives.postgresql.org/message-id/12795.1234379...@sss.pgh.pa.us > AM can now have only one search method: amgettuple or amgetbitmap. > - GIN now has only amgetbitmap interface
I reviewed v0.28.1. I see that disable_cost is gone from gincostestimate, but you're still using the pending list to set costs, and I still think that's bogus. It seems clear that this is going to change much faster than plans are going to be invalidated, and if autovacuum is doing its job, the pending list won't get long enough to matter much anyway, right? I don't think this patch should be touching gincostestimate at all. I am thinking that it is may not be necessary to remove the gingettuple interface (as you did in v0.28.2). Forcing a cleanup of the pending list seems like a reasonable workaround. We don't expect this situation to come up frequently, so if the method we use to handle it is not terribly efficient, oh well. The one thing that concerns me is - what will happen in a hot standby environment, when that patch is committed? In that situation, I believe that we can't call modify any heap or index pages, so... Some other assorted minor comments on v0.28.1... 1. The description of the "fastupdate" reloption should be reworded for consistency with other options: Enables "fast update" feature for this GIN index 2. Why is this implemented as a string reloption rather than a boolean reloption? It seems like you want to model this on autovacuum_enabled. 3. Documentation wordsmithing. You have the following paragraph: As of <productname>PostgreSQL</productname> 8.4, this problem is alleviated by postponing most of the work until the next <command>VACUUM</>. Newly inserted index entries are temporarily stored in an unsorted list of pending entries <command>VACUUM</> inserts all pending entries into the main <acronym>GIN</acronym> index data structure, using the same bulk insert techniques used during initial index creation. This greatly improves <acronym>GIN</acronym> index update speed, even counting the additional vacuum overhead. Here is my proposed rewrite: As of <productname>PostgreSQL</productname> 8.4, <acronym>GIN</> is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. When the table is vacuumed, or in some cases when the pending list becomes too large, the entries are moved to the main <acronym>GIN</acronym> data structure using the same bulk insert techniques used during initial index creation. This greatly improves <acronym>GIN</acronym> index update speed, even counting the additional vacuum overhead. 4. More wordsmithing. In the following paragraph, you have: It's recommended to use properly-configured autovacuum with tables having <acronym>GIN</acronym> indexes, to keep this overhead to reasonable levels. I think it is clearer and more succinct to write simply: Proper use of autovacuum can minimize this problem. 5. In textsearch.sgml, you say that GIN indexes are moderately slower to update, but about 10x slower without fastupdate. Can we provide a real number in place of "moderately"? I don't know whether to think this means 20% or 2x. 6. One of the comment changes in startScanEntry is simply a correction of a typographical error ("deletion" for "deletition"). You might as well commit this change separately and remove it from this patch. 7. pg_stat_get_fresh_inserted_tuples. I am not crazy about the fact that we call this the pending list in some places, fast update in some places, and now, here, fresh tuples. Let's just call it fast insert tuples. 8. tbm_check_tuple. The opening curly brace should be uncuddled. The curly braces around wordnum = bitnum = 0 are superfluous. 9. gincostestimate. There are a lot of superfluous whitespace changes here, and some debugging code that obviously wasn't fully removed. 10. GinPageOpaqueData. Surely we can come up with a better name than GIN_LIST. This is yet another name for the same concept. Why not call this GIN_FAST_INSERT_LIST? 11. ginInsertCleanup. "Inserion" is a typo. Unfortunately, I don't understand all of this patch well enough to give it as thorough a going-over as it deserves, so my apologies for whatever I've missed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers