[SQL] Concurrent Reindex on Primary Key for large table
I have a large table with about 60 million rows, everyday I add 3-4 million, remove 3-4 million and update 1-2 million. I have a script that reindexes concurrently a couple of times a week, since I see significant bloat. I have autovac on and the settings are below. I can't concurrently reindex the primary key, since there can be only one on a table. I can't take the table offline to do a drop/recreate. I assumed the autovac would take care of the bloat there, but I checked the size of the index and dropped it and recreated it and the size went from 2.5GB to 1.3GB and my daily import into that table took almost half the time after recreating the primary key. I was wondering what the options are to reindex it. Should i not have a primary key and just a unique index so that I can concurrently reindex? Do I lose anything by replacing the primary key with a unique index? I prefer to have a primary key on the table for Slony, but I suppose I could make the OID column the primary key just for SLONY purposes. Any thoughts about the best approach would be appreciated. Note that this bloat on the primary key is from about 2-3 weeks of daily deletes/updates/inserts. I have multiple tables with similar structure and data size. Here are my autovac settings. I am on PostgreSQL 9.1.1 #autovacuum_max_workers = 3 # max number of autovacuum subprocesses #autovacuum_naptime = 1min # time between autovacuum runs autovacuum_vacuum_threshold = 2# min number of row updates before # vacuum default 50 autovacuum_analyze_threshold = 2# min number of row updates before # analyze default 50 autovacuum_vacuum_scale_factor = 0.1# fraction of table size before vacuum autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze #autovacuum_freeze_max_age = 2 # maximum XID age before forced vacuum autovacuum_vacuum_cost_delay = 5ms #autovacuum_vacuum_cost_limit = -1 Thanks RV -- View this message in context: http://postgresql.1045698.n5.nabble.com/Concurrent-Reindex-on-Primary-Key-for-large-table-tp5467243p5467243.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Concurrent Reindex on Primary Key for large table
rverghese wrote on 08.02.2012 19:07: I have a large table with about 60 million rows, everyday I add 3-4 million, remove 3-4 million and update 1-2 million. I have a script that reindexes concurrently a couple of times a week, since I see significant bloat. I have autovac on and the settings are below. I can't concurrently reindex the primary key, since there can be only one on a table. With 9.1 you can create a new index and drop and re-create the primary key using the new index. This still requires an exclusive lock on the table, but only for a very short moment: The following example is more or less taken from the manual: http://www.postgresql.org/docs/9.1/static/sql-altertable.html create unique concurrently new_index on your_table(your_pk); alter table your_table drop primary key; alter table your_table add primary key using index new_index; As this can be done in one transaction it should be safe with regards to the primary key. Thomas -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] time interval math
I'm still working on getting this to work, but the summary is this: I am getting several (many) intervals of hour, minutes and seconds. I need a sum of the absolute value these intervals, similar to the SUM(ABS()) function for numbers; and I need to divide this sum by an integer (bigint). Getting the intervals is no problem, but I can't find built in functions for the rest. Currently on 8.3, want to upgrade to 9.x but I can't until this is finished. Do these functions exist, or will I be forced to convert to seconds, do the math and then convert back to hour-minute-second format (I am assuming from current data that, after the divide, the result should be in the minute: second range). Edward W. Rouse Comsquared System, Inc. 770-734-5301 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] time interval math
On 02/08/2012 12:01 PM, Edward W. Rouse wrote: I'm still working on getting this to work, but the summary is this: I am getting several (many) intervals of hour, minutes and seconds. I need a sum of the absolute value these intervals, similar to the SUM(ABS()) function for numbers; and I need to divide this sum by an integer (bigint). Getting the intervals is no problem, but I can't find built in functions for the rest. Currently on 8.3, want to upgrade to 9.x but I can't until this is finished. Do these functions exist, or will I be forced to convert to seconds, do the math and then convert back to hour-minute-second format (I am assuming from current data that, after the divide, the result should be in the minute: second range). You will have to do some work on your own. Time and intervals are tricky beasts and depend on the oddities of daylight saving rules. Even though you are only using hours/minutes/seconds the interval type also supports days and months. A day interval is probably 24 hours but could be 23 or 25 if it crosses a DST boundary. Months have different numbers of days. You have situations where adding and subtracting a month does not give the original date: select '2011-03-31'::date - '1 month'::interval + '1 month'::interval; ?column? - 2011-03-28 00:00:00 There is no abs(interval) function but, if you know that all your intervals are basic H:M:S and that you won't have any difficulty due to problems similar to the above you can mimic it with: ...case when myinterval < '0'::interval then '0'::interval - myinterval else myinterval end... You are even allowed to sum that and divide it (though I suspect there are some interesting corner-cases waiting to be discovered): ...sum(case when myinterval < '0'::interval then '0'::interval - myinterval else myinterval end)/2... Before you upgrade, be sure to read the release notes and test your calculations. The way intervals are handled, especially regarding intervals across DST boundaries, have changed over time. IIRC most of those changes were pre-8.3 but haven't looked recently. Cheers, Steve -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] time interval math
Hehe, thanks, I played around and ended up with this: round(SUM(extract('epoch' from (time_out - time_in I will have to do the division outside of the query, but that's really a minor issue. Knowing the total in seconds was the big roadblock. And converting back is easier (a / 3600 + ":" + a / 60 + ":" + a % 60) > -Original Message- > From: Steve Crawford [mailto:scrawf...@pinpointresearch.com] > Sent: Wednesday, February 08, 2012 3:26 PM > To: Edward W. Rouse > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] time interval math > > On 02/08/2012 12:01 PM, Edward W. Rouse wrote: > > I'm still working on getting this to work, but the summary is this: > > > > I am getting several (many) intervals of hour, minutes and seconds. I > need a > > sum of the absolute value these intervals, similar to the SUM(ABS()) > > function for numbers; and I need to divide this sum by an integer > (bigint). > > Getting the intervals is no problem, but I can't find built in > functions for > > the rest. Currently on 8.3, want to upgrade to 9.x but I can't until > this is > > finished. > > > > Do these functions exist, or will I be forced to convert to seconds, > do the > > math and then convert back to hour-minute-second format (I am > assuming from > > current data that, after the divide, the result should be in the > minute: > > second range). > > You will have to do some work on your own. > > Time and intervals are tricky beasts and depend on the oddities of > daylight saving rules. Even though you are only using > hours/minutes/seconds the interval type also supports days and months. > A > day interval is probably 24 hours but could be 23 or 25 if it crosses a > DST boundary. Months have different numbers of days. You have > situations > where adding and subtracting a month does not give the original date: > > select '2011-03-31'::date - '1 month'::interval + '1 month'::interval; >?column? > - > 2011-03-28 00:00:00 > > There is no abs(interval) function but, if you know that all your > intervals are basic H:M:S and that you won't have any difficulty due to > problems similar to the above you can mimic it with: > ...case when myinterval < '0'::interval then '0'::interval - myinterval > else myinterval end... > > You are even allowed to sum that and divide it (though I suspect there > are some interesting corner-cases waiting to be discovered): > ...sum(case when myinterval < '0'::interval then '0'::interval - > myinterval else myinterval end)/2... > > Before you upgrade, be sure to read the release notes and test your > calculations. The way intervals are handled, especially regarding > intervals across DST boundaries, have changed over time. IIRC most of > those changes were pre-8.3 but haven't looked recently. > > Cheers, > Steve -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] time interval math
"Edward W. Rouse" wrote: > Hehe, thanks, I played around and ended up with this: > round(SUM(extract('epoch' from (time_out - time_in > I will have to do the division outside of the query, but that's really a > minor issue. You can always use subqueries. > Knowing the total in seconds was the big roadblock. And > converting back is easier (a / 3600 + ":" + a / 60 + ":" + a % 60) > [...] PostgreSQL has also: | tim=# SELECT 3661::TEXT::INTERVAL; | interval | -- | 01:01:01 | (1 Zeile) | tim=# Tim -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql