Re: [GENERAL] Using the wrong index (very suboptimal), why?
Tom, >> Index Scan using index_answers_nidiid1 on answers >> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645 >> rows=21891 loops=1) >>Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) >> Total runtime: 2424.769 ms >Well, here's the problem all right: 1 row estimated vs 21891 actual. >Is there something odd about the joint distribution of these two >columns? I should also clarify, there will almost always be many many rows that match any nid/iid1 pair (for any nid or iid1 value that exists in this table anyway). The chance of there actually being only one result is very small. Most of the time this will be hundreds/thousands - but will also be tens of thousands on a regular basis. A hundred thousand or more periodically. Many thanks Shane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using the wrong index (very suboptimal), why?
Hi Tom, >> - increasing the statistics target to the maximum setting with SET >> STATISTICS 1000 on columns rid, nid and iid1 of answers, then >> re-vacuuming. > I hope you meant re-analyzing. Hehe absolutely yes; I always VACUUM VERBOSE ANALYSE :) >> Index Scan using index_answers_nidiid1 on answers >> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645 >> rows=21891 loops=1) >>Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) >> Total runtime: 2424.769 ms >Well, here's the problem all right: 1 row estimated vs 21891 actual. >Is there something odd about the joint distribution of these two >columns? Good :) - I've been chasing the same thing, albeit not with any luck yet :( Shouldn't be anything odd about the data I wouldn't have thought... There are ~670 million rows. No nulls in nid, ~6% of iid1 are null (they will always be null or not null for a given nid. I.e. rows with a given nid value will either all be null or all be not null). nids are randomly selected, there are only ~27000 distinct values, all between 1 and 9. iid1s are also random in the same range, ~5 distinct values. All the rows for a given value of nid will have one of a small set of possible iid1 values; usually 5-15 distinct values. The frequency at which each nid may occurs is quite uneven; some will be tens of times, others will be a couple of hundred thousand. Same applies to corresponding iid1 values. (the table stores answers to questions; nid is the question ID, iid1 is the answer ID [for questions where the user picks from a pre-defined list] - iid1 is NULL for textual answers. iid1 values are grouped into sets of options the user can pick from, defined elsewhere. These sets can be shared across questions/nids but not often) Does the above sound strange? There aren't any strange errors from the database, autovacuum is enabled (but not vacuuming the table often enough). This problem was triggered this time when I manually vacuumed the table - which cleared a lot of dead rows (again, I always VACUUM VERBOSE ANALYSE). A separate installation with a similar data set (actually the same but older; seems the most comparable) estimated 10 rows returned for the same query. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using the wrong index (very suboptimal), why?
Tom, > You need to > look into what the estimated vs actual rowcounts are for just the > two-column condition (ie, where nid = something AND iid1 = something). > Getting those estimates to match reality is the key to getting a sane > plan choice here. Many thanks - I agree in principle it isn't automatically the worst decision, however for this data set it unfortunately is :( Improving the accuracy of the cost estimates is exactly what I'm trying to achieve, so far I've tried - increasing the statistics target to the maximum setting with SET STATISTICS 1000 on columns rid, nid and iid1 of answers, then re-vacuuming - adjusting random_page_cost downwards even more to try to make it prefer index IO - increasing cpu_tuple_cost up to 1.0 (from 0.01) to try to discourage it from processing data rows - decreating cpu_index_tuple_cost to 0.0005 (from 0.005) to encourage it to spend more time in index rows. None of the above had any effect :( Interestingly, I didn't revacuum between changing the cost figues in the config file (I did reload). The cost figures from EXPLAIN did change, but using the queries below the 'wrong' index always comes up just over 1/3rd more expensive. Here's a query that matches actual data in the table; it sees the cost of the 2-column index as much less, but it takes an order of magnitude longer to run (using nid & iid1 means filtering through around 22,000 rows). The three queries; first is the one that's used in practise; against all 3 columns. The next 2 queries use the two different combinations of just 2 columns: emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND nid=253869913 AND iid1=535292129; QUERY PLAN -- Index Scan using index_answers_nidiid1 on answers (cost=0.00..114.14 rows=1 width=60) (actual time=0.168..790.262 rows=1 loops=1) Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) Filter: (rid = 668332334) Total runtime: 790.305 ms (4 rows) emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND nid=253869913; QUERY PLAN Index Scan using index_answers_ridnidiid1 on answers (cost=0.00..152.49 rows=1 width=60) (actual time=0.120..0.141 rows=18 loops=1) Index Cond: ((rid = 668332334) AND (nid = 253869913)) Total runtime: 0.207 ms (3 rows) emystery=# explain analyse select * FROM ecos_answers WHERE nid=253869913 AND iid1=535292129; QUERY PLAN --- Index Scan using index_ecos_answers_nidiid1 on ecos_answers (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645 rows=21891 loops=1) Index Cond: ((nid = 253869913) AND (iid1 = 535292129)) Total runtime: 2424.769 ms (3 rows) What setting should I be tweaking to make the two column index more expensive? I thought I would need to either... - inform the database that there are likely to be a lot of rows that need filtering (SET STATISTICS [maximum]???) or - tell the database that filtering rows is expensive (cpu_tuple_cost++, random_page_cost-- ???) ...but both of which I've already triedbut it would seem that from the EXPLAIN output that the first one is key; it shouldn't think (for the last example( that it's only going to get 1 row back!) but aside from increasing SET STATISTICS what other setting is there to inform it of this? The only thing I can think that's remaining is to rebuild/replace the preferred index, that should make it smaller and *if* the index cost estimates are partly based on # disk pages that should reduce the perceived cost of using it. Is this the case? Even if it does that still doesn't fix the underlying problem. Any help appreciated, Many thanks Shane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using the wrong index (very suboptimal), why?
Hi, I have somewhat of a quandary with a large table in my database; PostgreSQL is choosing the 'wrong' index for a certain kind of query; causing performance to become an order of magnitude slower (query times usually measured in milliseconds now become seconds/minutes!). It's not that it isn't using any index (so enable_seqscan=off doesn't help), it's that the index it picks is suboptimal. The query is based on 3 of the table columns - there is an index on all three, but it prefers to use an index on just two of them, then filtering by the 3rd (EXPLAIN output is below - showing the query as well). I've tried everything I can find to make it not do this (config parameters, statistics target, vacuuming more...) - what am I missing? It happened a couple of times before - after some unusual activity patterns in a client script - but repeatedly vacuuming until the planner changed its mind made it go away. We then made config changes (random_page_cost and effective_cache_size) thinking that fixed the reason why it made the bad decision in the first place...but it would appear not The only thing remaining I can think of is rebuilding the 'preferred' index; that should reduce its size from 27Gb down to something more manageable; maybe that will let the database want to use it more because there'll be less I/O? I've considered upgrading to 8.3.5 as well, but I can't see anything in the changelogs that would fix this issue (I appreciate upgrading is a good idea anyway, that will be done at some point soon anyway). Hopefully all relevant info is listed below - if anyone's got any ideas I'd appreciate any help or pointers anyone can give, thanks... The server is PostgreSQL 8.3.0 on Linux with 32Gb RAM. /var/lib/pgsql/ is on a fibre-channel SAN. This table has around 680 million rows - and has been reasonably regularly vacuumed, but is probably in dire need of a VACUUM FULL and REINDEX to reclaim dead space (see disk space info at the bottom of the post). emystery=# \d answers Table "public.answers" Column | Type | Modifiers +-+- --- aid| integer | not null default nextval(('seq_answers_aid'::text)::regclass) rid| integer | not null nid| integer | not null iid1 | integer | iid2 | integer | iid3 | integer | text | text| extra | bigint | Indexes: "answers_pkey" PRIMARY KEY, btree (aid) "index_answers_iid1" btree (iid1) WHERE iid1 > 0 "index_answers_iid2" btree (iid2) WHERE iid2 > 0 "index_answers_iid3" btree (iid3) WHERE iid3 > 0 "index_answers_nidiid1" btree (nid, iid1) "index_answers_ridnidiid1" btree (rid, nid, iid1) This is what it is doing [slowly] - but for many values of rid/nid/iid1 there are a lot of rows to filter through (tens/hundreds of thousands) so this can take many seconds or minutes: emystery=# explain select * from answers where rid=1 and nid=2 and iid1=3; - Index Scan using index_answers_nidiid1 on answers (cost=0.00..28.74 rows=1 width=62) Index Cond: ((nid = 2) AND (iid1 = 3)) Filter: (rid = 1) This is the pattern it *should* use (and does use on other installations of similar/older data). When this pattern is used the query always completes in <1 second, usually ~0.2 seconds! emystery20080821=# explain select * from answers where rid=1 and nid=2 and iid1=3; Index Scan using index_answers_ridnidiid1 on answers (cost=0.00..99.04 rows=1 width=67) Index Cond: ((rid = 1) AND (nid = 2) AND (iid1 = 3)) The PostgreSQL configuration has been altered to help prefer random lookups [via an index], and to indicate to the database how much data the OS #seq_page_cost = 1.0# measured on an arbitrary scale random_page_cost = 1.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 31GB The table has been VACUUM ANALYSE'd (repeatedly!) to no avail. We've also increased the statistics target for the columns in the table to 200 (from 10) which still isn't making any difference (I thought because the table is so large and the dataset is quite uneven that this should help...it's now at 1000 and vacuuming again so we might be lucky...) alter table answers alter column rid set statistics 200; alter table answers alter column nid set statistics 200; alter table answers alter column iid1 set statistics 200; Here is the [full] output from the most recent VACUUM: emystery=# vacuum verbose analyse answers; INFO: vacuuming "public.answers" INFO: scanned index "index_a
Re: [GENERAL] Upgrade to 8.3.0?
hmm - I don't suppose the RHEL builds already include your patch against this issue? If so I guess it would be useful to know :) I've run through the steps you outlined to reproduce the fault on a new box (after having reproduced it successfully on test servers), and it does not show the problem: [EMAIL PROTECTED] data]# !psql psql test -U X Welcome to psql 8.3.0, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit test=# create table foo as select null::int as x from generate_series(1,2000) x; SELECT test=# delete from foo; DELETE 2000 test=# select count(*) from foo; count --- 0 (1 row) test=# vacuum full foo; VACUUM test=# test=# set debug_assertions to on; ERROR: assertion checking is not supported by this build test=# The RPMs installed from are these: postgresql-8.3.0-1PGDG.rhel4 postgresql-libs-8.3.0-1PGDG.rhel4 postgresql-server-8.3.0-1PGDG.rhel4 Thanks Shane From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Fri 29/02/2008 22:55 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Upgrade to 8.3.0? "Shane Wright" <[EMAIL PROTECTED]> writes: > I'm assuming that the default RPMs for RHEL 4 (on ftp.postgresql.org) are not > built with --enable-cassert - and that I can veryify this by using the same > reproduction case you demonstrated in the bug history? I believe they aren't, but you could check by seeing what pg_config reports as the configure options; or even more directly by seeing if it will let you "SET debug_assertions TO on". regards, tom lane
Re: [GENERAL] Upgrade to 8.3.0?
Tom Many thanks :) I've tried to find this out for myself but have failed :( I'm assuming that the default RPMs for RHEL 4 (on ftp.postgresql.org) are not built with --enable-cassert - and that I can veryify this by using the same reproduction case you demonstrated in the bug history? (I should just get the memory allocation error on attempting to vacuum full - instead of an assertion failing and bringing down the whole PostgreSQL server (panic?)) Please could you confirm my assertion (sic) is correct? (I get this behaviour on my test server and all appears to be ok) Kind regards Shane From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Fri 29/02/2008 19:46 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Upgrade to 8.3.0? Shane Wright writes: > - Is the HOT/VACUUMFULL bug above dangerous or is its biggest > side-effect an error until a non-full vacuum is run? As long as you don't build with --enable-cassert, that bug isn't real nasty. Especially if you don't do VACUUM FULL routinely ;-) regards, tom lane
[GENERAL] Upgrade to 8.3.0?
Hi, I'm considering an upgrade to 8.3.0 for some internal databases - would normally wait until at least .1 release of anything for safety but there's a lot of nice sounding stuff here! After trawling the bug logs and doing some testing on some servers here all looks ok except for this bug around VACUUM FULL and HOT tables: http://archives.postgresql.org/pgsql-bugs/2008-02/msg00068.php So my questions are thus... - Is the HOT/VACUUMFULL bug above dangerous or is its biggest side-effect an error until a non-full vacuum is run? - Has anyone spotted anyone else that would discourage an upgrade at this stage? If it helps, our use of SQL is fairly basic - nothing fancy, and nothing Postgres-specific. Any help appreciated, Thanks Shane
Re: [GENERAL] recovery from xid wraparound
Aw :( Its at the default of 8Mb. The table contains 220 million rows and 6 indices. It has a few deleted rows... If I change vacuum_mem I'll need to at least 'pg_ctl reload' - will it apply straightaway with the next vacuum query or does it need a full restart? Does vacuum_mem need shared memory? (i.e. is it subject to the OS's limit) - have looked in the docs and googled but can't see detail on this If I have managed to vacuum all the catalog tables, and my script has ensured all user tables other than this one have been vacuumed, then... will the first pass of vacuum on this have set the xid to FrozenXID for all rows - i.e. is the table safe? What's the relative safety of restarting this vacuum with a bigger vacuum_mem, say at the end of the week when traffic is quieter? Basically if its just datfrozenxid that's not updated I can live with delaying the vacuum a few days. But if things are more serious then obviously I can't wait. Is it safe to say that if the catalog tables are ok and an individual tables has been vacuumed then its data is safe? S -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 24 October 2006 15:52 To: Shane Wright Cc: pgsql-general@postgresql.org; Martijn van Oosterhout Subject: Re: [GENERAL] recovery from xid wraparound "Shane Wright" <[EMAIL PROTECTED]> writes: > Incidentally, how many passes of a table can vacuum make! Lots, especially if the table hasn't been vacuumed in a long time... Perhaps you should be using a higher maintenance_work_mem? (Um, in 7.4 make that vacuum_mem.) Larger work memory translates directly to fewer passes over the indexes. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] recovery from xid wraparound
Incidentally, how many passes of a table can vacuum make! Its currently on its third trip through the 20Gb of indices, meaning another 7 hours till completion [of this table]!. Assume it only does three passes? (it chooses based on the table continuing to be updated while vacuum is running) S -Original Message- From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] Sent: 24 October 2006 10:24 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote: > Anyway - not noticed any data loss yet and was hoping it would be such > that if all tables had been vacuumed recently (including system > catalog tables), that there would be no remaining rows that would > appear to have a future xid and so the database should be ok? Running vacuum is the right solution, but I think you have to let it finish. In particular, in that version a database-wide vacuum has to complete before it will update the datfrozenxid (it's not tracked per table). > a) is my assumption about the database being ok correct - assuming all > tables have been vacuumed recently, including catalog tables? Should be ok, but apparently you missed one, or didn't do a database wide vacuum. > b) is it possible to safely abort my whole table vacuum now so I can > run it at the weekend when there's less traffic? Aborting vacuum is safe, but you have to do a database-wide vacuum at some point. > c) if I have experienced data loss, on the assumption all the table > structure remains (looks like it does), and I have a working backup > from before the xid wraparound (I do), can I just reinsert any > detected-missing data at the application level without needing a > dump/reload? A VACUUM will recover any data that slipped beyond the horizon less than 1 billion transactions ago, which I think covers you completely. The only issue is that unique indexes may be confused because new conflicting data may have been inserted while the old data was invisible. Only you can say if that's going to be an issue. Hope this helps, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] recovery from xid wraparound
Hi I'm running 7.4 on RHAS 4, and I think I've had a transaction id wraparound issue. Running the command below gives the suitably worrying negative number: emystery=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age --+- [maindbname] | -2081610471 [otherdbname] | 1075601025 [otherdbname] | 1257289757 [otherdbname] | 1074582099 [otherdbname] | 1257289757 Which is weird - because I have vacuumed the database quite a lot - both individual tables and I thought a vacuum of the whole database a month or so ago. Anyway - not noticed any data loss yet and was hoping it would be such that if all tables had been vacuumed recently (including system catalog tables), that there would be no remaining rows that would appear to have a future xid and so the database should be ok? Obviously I'm now doing the write thing with a vacuumdb -a - however this has been running 9 hours now and looks like at least 7 hours to go just on this one monstrous table in the interests of risk reduction I've just knocked up a script to run ahead and quickl vacuum all the other tables. But my questions are thus... a) is my assumption about the database being ok correct - assuming all tables have been vacuumed recently, including catalog tables? b) is it possible to safely abort my whole table vacuum now so I can run it at the weekend when there's less traffic? c) if I have experienced data loss, on the assumption all the table structure remains (looks like it does), and I have a working backup from before the xid wraparound (I do), can I just reinsert any detected-missing data at the application level without needing a dump/reload? Any help appreciated in this really not-fun time, thanks S ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] recovery from xid wraparound
Tom, Thanks But are there just 28 (the 28 that have been vacuumed), or are there more (in 7.4). Happy there's no guarantee, but would help to know any possible damager in my current situation, Thanks S -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 24 October 2006 15:23 To: Shane Wright Cc: Martijn van Oosterhout; pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound "Shane Wright" <[EMAIL PROTECTED]> writes: >> Just make sure you've really covered *all* the system tables. > I've been under the impression system tables get done first, then > user(me)-created tables after - No, there's no such guarantee. A database-wide vacuum just does the tables in the order it finds 'em in pg_class. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] recovery from xid wraparound
Martijn, Thanks, >Just make sure you've really covered *all* the system tables. If they go you >get really weird results. I've been under the impression system tables get done first, then user(me)-created tables after - which means my previous [aborted] attempts at vacuuming them would have covered it, unless I'm missing something? (db was created by initdb, then pg_restore to load data into it, database was then vacuumed before production work began) I've looked at the list of catalog tables from 7.4's docs (URL below), and all 28 have been processed in this vacuum, so presumably same order for previous attempts: http://www.postgresql.org/docs/7.4/static/catalogs.html Checked with 'grep vacuuming vacuum.log | grep pg_ | grep toast -v | sort | uniq' Does this sound like a fair assumption? (it is on the first database in the cluster, these aren't coming up from other databases) Many thanks for your help! S -Original Message- From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] Sent: 24 October 2006 11:50 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote: > If I was to abort this vacuum, given that all other tables are > vacuumed (including system catalog tables), what's the worst case > scenario? - given that more transactions are happening on the database Only tables that havn't been vacuumed in the last billion transactions are at risk. It's possible that if you've vacuumed that large table recently by itself that all the data is actually safe, just the system doesn't realise it. Just make sure you've really covered *all* the system tables. If they go you get really wierd results. > If I understand correctly, it would be that some rows could disappear > from this large unvacuumed table if their xid was too old - but no > other consequence? The VACUUM would make them reappear. To truly disappear they would have to be 3 billion transactions old. That leaves the unique index issue I mentioned. > (fully aware that a db-wide vacuum is needed, but if it can [safely] > wait for the weekend that would be preferable) That's risk-management. For example, all the really old tuples are possibly near the beginning of the table, thus this current vacuum will have fixed them already. But to get a handle on that you need to analyse your tuple turnover and usage ratio. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] recovery from xid wraparound
Martin, Thanks :) >Running vacuum is the right solution, but I think you have to let it >finish. In particular, in that version a database-wide vacuum has to >complete before it will update the datfrozenxid (it's not tracked per >table). >> a) is my assumption about the database being ok correct - assuming all >> tables have been vacuumed recently, including catalog tables? >Should be ok, but apparently you missed one, or didn't do a database wide >vacuum. Yes, probably missed this 220 million row beast that's still running now.. If I was to abort this vacuum, given that all other tables are vacuumed (including system catalog tables), what's the worst case scenario? - given that more transactions are happening on the database If I understand correctly, it would be that some rows could disappear from this large unvacuumed table if their xid was too old - but no other consequence? (fully aware that a db-wide vacuum is needed, but if it can [safely] wait for the weekend that would be preferable) Many thanks, S -Original Message- From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] Sent: 24 October 2006 10:24 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote: > Anyway - not noticed any data loss yet and was hoping it would be such > that if all tables had been vacuumed recently (including system > catalog tables), that there would be no remaining rows that would > appear to have a future xid and so the database should be ok? Running vacuum is the right solution, but I think you have to let it finish. In particular, in that version a database-wide vacuum has to complete before it will update the datfrozenxid (it's not tracked per table). > a) is my assumption about the database being ok correct - assuming all > tables have been vacuumed recently, including catalog tables? Should be ok, but apparently you missed one, or didn't do a database wide vacuum. > b) is it possible to safely abort my whole table vacuum now so I can > run it at the weekend when there's less traffic? Aborting vacuum is safe, but you have to do a database-wide vacuum at some point. > c) if I have experienced data loss, on the assumption all the table > structure remains (looks like it does), and I have a working backup > from before the xid wraparound (I do), can I just reinsert any > detected-missing data at the application level without needing a > dump/reload? A VACUUM will recover any data that slipped beyond the horizon less than 1 billion transactions ago, which I think covers you completely. The only issue is that unique indexes may be confused because new conflicting data may have been inserted while the old data was invisible. Only you can say if that's going to be an issue. Hope this helps, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] recovery from xid wraparound
HiI'm running 7.4 on RHAS 4, and I think I've had a transaction idwraparound issue in a stats database we have. Running the command below gives the suitablyworrying negative number:[dbname]=# SELECT datname, age(datfrozenxid) FROM pg_database; datname | age--+- [maindbname] | -2081610471 [otherdbname] | 1075601025 [otherdbname] | 1257289757 [otherdbname] | 1074582099 [otherdbname] | 1257289757Which is weird - because I have vacuumed the database quite a lot -both individual tables and I thought a vacuum of the whole database amonth or so ago.Anyway - not noticed any data loss yet and was hoping it would be suchthat if all tables had been vacuumed recently (including system catalogtables), that there would be no remaining rows that would appear tohave a future xid and so the database should be ok?Obviously I'm now doing the write thing with a vacuumdb -a - howeverthis has been running 9 hours now and looks like at least 7 hours togo just on this one monstrous tablein the interests of risk reduction I've just knocked up a script to runahead and quickly vacuum all the other tables.But my questions are thus...a) is my assumption about the database being ok correct - assuming alltables have been vacuumed recently, including catalog tables?b) is it possible to safely abort my whole table vacuum now so I canrun it at the weekend when there's less traffic?c) if I have experienced data loss, on the assumption all the tablestructure remains (looks like it does), and I have a working backupfrom before the xid wraparound (I do), can I just reinsert anydetected-missing data at the application level without needing adump/reload?Any help appreciated in this really not-fun time,thanksS
Re: [GENERAL] mount -o async - is it safe?
Hi Tom, > > If we turn sync off, surely PostgreSQL keeps the data consistent, ext3 > > journalling keeps the filesystem clean [assuming other mount options > > left at defaults], and then everything should be ok with either a server > > crash, power failure, storage failure, whatever. right? > > I checked around with some of Red Hat's kernel folk, and the bottom line > seems to be that it's OK as long as you trust the hardware: fabulous, thanks :) > :> Question is, can fsync(2) be trusted to behave properly, ie, not return > :> until all writes are down to disk, if the SAN is mounted -o async ? > : > : async is the default, which is the whole point of having things like > : fsync, fdatasync, O_DIRECT, etc. You can trust fsync as far as you can > : trust the hardware. The call will not return until the SAN says the > : data has been written. > : > : In reality, the SAN is probably buffering these writes (possibly into > : SRAM or battery-backed RAM), and the disks are probably buffering them > : again, but you've got redundant power supplies and UPSs, right? that sounds true (and it has) - but presumably this is the case whether we mount -o sync or not? I.e. if its going to buffer, then its going to do so whether its postgres or the kernel sync'ing the writes? (specifically that the SAN likely buffers anyway - IMO having to trust the hardware to some degree is a given ;) Cheers Shane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] mount -o async - is it safe?
Hi, thanks :) > > If -o async means "all I/O is asyncronous except stuff explicitly > > fsync()ed" you're fine. Otherwise... > > That's the way it works. Async is the default setting for most > filesystems, but fsync() is always honored, at last as far as > non-lying hardware will allow. :) That sounds good :) ext's journalling should take care of the rest I guess - does that sound ok? I have read in various places I think that pgSQL doesn't need any directory-level operations in keeping WAL up to date so provided the ext3 partition remains mountable then the database should be fine, > > The usual advice is to stick the WAL on a properly synced partition and > > stick the rest somewhere else. Note, I have no experience with this, > > it's just what I've heard. > > This might not be optimal, as having every write synchronous actually > results in more synced writes than are strictly necessary. Actually I thought that *all* the database had to have fsync() work correctly; not for integrity on failed transactions, but to maintain integrity during checkpointing as well. But I could well be wrong! thanks, Shane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] mount -o async - is it safe?
Hi, We've recently set up our database (7.4.9) with our new hosting provider. We have two database servers running RHEL 4 in a cluster; one active and one hot-spare. They share a [fibre-channel connected] SAN partition; the active server has it mounted. Now my question is this; the provider has, by default, mounted it with -o sync; so all reads/writes are synchronous. This doesn't result in the greatest of performance, and indeed remounting -o async is significantly faster. They tell me this is so mySQL databases don't get corrupted in the event of a crash. which is fine... But as Postgres uses fsync() to force committed transactions to disk, then this shouldn't be necessary, right? (I know this is based on the assumption the SAN doesn't lie about its syncs, but then surely it would lie to the kernel with -o sync anyway?) If we turn sync off, surely PostgreSQL keeps the data consistent, ext3 journalling keeps the filesystem clean [assuming other mount options left at defaults], and then everything should be ok with either a server crash, power failure, storage failure, whatever. right? I've googled and come up with some info; the most relevant of which is here: http://archives.postgresql.org/pgsql-general/2003-11/msg01515.php http://archives.postgresql.org/pgsql-general/2003-11/msg01592.php If anyone can confirm either way that'd be great - or even just point me in the direction of enough firm info to work it out myself ;) Thanks, Shane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] weird quote bug
> > They are identical! I can't work out whats going on! Please, if anyone > > can see what's wrong it'll stop me careering into my early grave! > > I've never tried this but the docs for LIKE (secfion 6.6.1 in the 7.3 > docs) say that to match a literal \ you need to type . An alternative > might be to try adding an ESCAPE '' clause at the end (also in the same > doc section). Ah. I had s read that bit of the manual - must have missed it. Phew it works now! Thanks (and thanks to the others who answered my question!) Cheers Shane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] weird quote bug
Should have said - I'm using postgreSQL 7.3.3 on Gentoo Linux. The problem occurs both through psql and through PHP4.3.3. Cheers Shane On Monday 30 Jun 2003 12:25 pm, Shane Wright wrote: > Hi > > This is really driving me silly - I can't work it out, can anyone see what > I'm doing thats stupid and causing this not to match? > > > This shows that the row exists in the table: > > > emystery=> select aid,useragent from useragent where useragent like > '%ntserver-ps%'; aid| > useragent > ---+--- >-- 875741007 | Mozilla/4.0 > (compatible; MSIE 6.0; Windows NT 5.0; > file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) > > > this tries to select it (note that I've escaped the backslashes): > > > emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 > (compatible; MSIE 6.0; Windows NT 5.0; > file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; aid | > useragent | date_added | data_browser | data_version | data_os > -+---++--+--+- (0 > rows) > > > no match! This one encodes the backslashes (\xxx octal for ASCII value) in > a different way: > > > emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 > (compatible; MSIE 6.0; Windows NT 5.0; > file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)'; > aid | useragent | date_added | data_browser | data_version | data_os > -+---++--+--+- (0 > rows) > > > again no match! And to show that the above queries were correct: > > > emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; > file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; ?column? > --- >-- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT > 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) > > emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; > file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)'; > ?column? > --- >-- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT > 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) > > > > They are identical! I can't work out whats going on! Please, if anyone > can see what's wrong it'll stop me careering into my early grave! > > Thanks > > Shane > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] weird quote bug
Hi This is really driving me silly - I can't work it out, can anyone see what I'm doing thats stupid and causing this not to match? This shows that the row exists in the table: emystery=> select aid,useragent from useragent where useragent like '%ntserver-ps%'; aid|useragent ---+- 875741007 | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) this tries to select it (note that I've escaped the backslashes): emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; aid | useragent | date_added | data_browser | data_version | data_os -+---++--+--+- (0 rows) no match! This one encodes the backslashes (\xxx octal for ASCII value) in a different way: emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)'; aid | useragent | date_added | data_browser | data_version | data_os -+---++--+--+- (0 rows) again no match! And to show that the above queries were correct: emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; ?column? - Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)'; ?column? - Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row) They are identical! I can't work out whats going on! Please, if anyone can see what's wrong it'll stop me careering into my early grave! Thanks Shane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] backend stalls
Hi I'm tearing my hair out here, the backend seems to randomly decide to stall and not serve any more queries (processes are listed as waiting). It happens on all sorts of queries (including CREATE TABLE and CREATE INDEX). I have completely uninstalled and deleted the whole installation and reinstalled - only to have it do it again mid way through entering the SQL to rebuild my database (not a big one ~25 tables, and no rows to speak of - havent got that far yet). I was thinking corrupted tables - or corrupt system catalog, but even vacuumdb stalls as well... Any help would be appreciated, I have so much work to do and I cant do anything without the database working :( My install is Postgres 7.0.3-8 from the RedHat RPMs runing on RH7.1 I know I havent given much specific information, I dont really know where to start. What its doing at this very minute is hanging on an insert into a freshly created table on an installation thats only existed for 30 minutes and only just had all the tables created. Thanks Shane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster