BTW we¹ve removed HINT bit checking in Greenplum DB and improved the visibility caching which was enough to provide performance at the same level as with the HINT bit optimization, but avoids this whole ³write the data, write it to the log also, then write it again just for good measure² behavior.
For people doing data warehousing work like the poster, this Postgres behavior is miserable. It should be fixed for 8.4 for sure (volunteers?) BTW for the poster¹s benefit, you should implement partitioning by date, then load each partition and VACUUM ANALYZE after each load. You probably won¹t need the date index anymore so your load times will vastly improve (no indexes), you¹ll store less data (no indexes) and you¹ll be able to do simpler data management with the partitions. You may also want to partition AND index if you do a lot of short range selective date predicates. Example would be: partition by day, index on date field, queries selective on date ranges by hour will then select out only the day needed, then index scan to get the hourly values. Typically time-oriented data is nearly time sorted anyway, so you¹ll also get the benefit of a clustered index. - Luke On 5/15/08 10:40 AM, "Pavan Deolasee" <[EMAIL PROTECTED]> wrote: > On Thu, May 15, 2008 at 7:51 AM, Greg Smith <[EMAIL PROTECTED]> wrote: >> > >> > >> > So is vacuum helpful here because it will force all that to happen in one >> > batch? To put that another way: if I've run a manual vacuum, is it true >> > that it will have updated all the hint bits to XMIN_COMMITTED for all the >> > tuples that were all done when the vacuum started? >> > > > Yes. For that matter, even a plain SELECT or count(*) on the entire > table is good enough. That will check every tuple for visibility and > set it's hint bits. > > Another point to note is that the hint bits are checked and set on a > per tuple basis. So especially during index scan, the same heap page > may get rewritten many times. I had suggested in the past that > whenever we set hint bits for a tuple, we should check all other > tuples in the page and set their hint bits too to avoid multiple > writes of the same page. I guess the idea got rejected because of lack > of benchmarks to prove the benefit. > > Thanks, > Pavan > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >