On 6/17/13 3:38 PM, Josh Berkus wrote:
Why?  Why can't we just update the affected pages in the index?
>
>The page range has to be scanned in order to find out the min/max values
>for the indexed columns on the range; and then, with these data, update
>the index.
Seems like you could incrementally update the range, at least for
inserts.  If you insert a row which doesn't decrease the min or increase
the max, you can ignore it, and if it does increase/decrease, you can
change the min/max.  No?

For updates, things are more complicated.  If the row you're updating
was the min/max, in theory you should update it to adjust that, but you
can't verify that it was the ONLY min/max row without doing a full scan.
  My suggestion would be to add a "dirty" flag which would indicate that
that block could use a rescan next VACUUM, and otherwise ignore changing
the min/max.  After all, the only defect to having min to low or max too
high for a block would be scanning too many blocks.  Which you'd do
anyway with it marked "invalid".

If we add a dirty flag it would probably be wise to allow for more than one 
value so we can do a clock-sweep. That would allow for detecting a range that 
is getting dirtied repeatedly and not bother to try and re-summarize it until 
later.

Something else I don't think was mentioned... re-summarization should be 
somehow tied to access activity: if a query will need to seqscan a segment that 
needs to be summarized, we should take that opportunity to summarize at the 
same time while pages are in cache. Maybe that can be done in the backend 
itself; maybe we'd want a separate process.
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to