On 2012-11-15 16:42:57 -0800, Jeff Davis wrote: > Related to discussion here: > http://archives.postgresql.org/message-id/cahyxu0zn5emepledozugraqif92f-yjvfr-p5vuh6n0wpkz...@mail.gmail.com > > It occurred to me recently that many of the hint bits aren't terribly > important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly > has a purpose, and we'd expect it to be used many times following the > initial CLOG lookup.
> But the other tuple hint bits seem to be there just for symmetry, > because they shouldn't last long. If HEAP_XMIN_INVALID or > HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed > soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it > should just be changed to InvalidTransactionId. Wrt HEAP_XMAX_COMMITTED: It can take an *awfully* long time till autovacuum crosses the thresholds the next time for a big table. I also think we cannot dismiss the case of longrunning transactions because vacuum won't be able to cleanup those rows in that case. Wrt HEAP_(XMIN|XMAX)_INVALID: yes, if we are in need of new flag bits those sound like a good target to me. > Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced > in the visibility map patch, apparently as a way to know when to clear > the VM bit when doing an update. It was then also used for scans, which > showed a significant speedup. But I wonder: why not just use the > visibilitymap directly from those places? It can be used for the scan > because it is crash safe now (not possible before). And since it's only > one lookup per scanned page, then I don't think it would be a measurable > performance loss there. Inserts/updates/deletes also do a significant > amount of work, so again, I doubt it's a big drop in performance there > -- maybe under a lot of concurrency or something. > > The benefit of removing PD_ALL_VISIBLE would be significantly higher. > It's quite common to load a lot of data, and then do some reads for a > while (setting hint bits and flushing them to disk), and then do a > VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the > pages again. Also, if I remember correctly, Robert went to significant > effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM > bits consistent. Maybe this was all discussed before? As far as I understand the code the crash-safety aspects of the visibilitymap currently rely on on having the knowledge that ALL_VISIBLE has been cleared during a heap_(insert|update|delete). That allows management of the visibilitymap without it being xlogged itself which seems pretty important to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers