Re: [HACKERS] will PITR in 8.0 be usable for hot spare/log shipping type
I wonder if there will be assumptions in the startup code concerning time. What if the startup takes 18 months, would there be some sort of problem with this approach you think? On Aug 11, 2004, at 6:14 PM, Gaetano Mendola wrote: Tom Lane wrote: Somebody should hack this together and try it during beta. I don't have time myself. Will see, if I have spare time I will try. Regards Gaetano Mendola ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] WARNING: buffer refcount leak
I'm working on a new machine, and i think it's got possible bad hardware, since that seems more likely than a bug in postgresql. I'm wondering if someone has any idea what kind of hardware failure might cause this message: WARNING: buffer refcount leak: [424] (freeNext=425, freePrev=423, rel=0/0, blockNum=4294967295, flags=0x1c, refcount=-631 30464) The one time this happened, postmaster displayed the refcount leak, or it would segfault or it crashed with messages like these: (free(): invalid pointer 0xa06ffc0!). Usually it just works fine, this appears to be a very intermittent problem. We've already replaced the SCA backplane, the SCSI cables, the RAID controller, and the motherboard. The only components not replaced are the memory and the CPUs. I've run Memtest86 on the box for several days without it finding any bad memory. It's the first test I run on any new machine. Can anyone recommend any good (free) diagnostics programs like Memtest86 that check CPUs, PCI bus, etc, etc. The machine is a dual xeon 2.8, 4gb ECC ram, and 14 15k 36G U320 drives with a megaraid 320-2x controller. running fedora core 1, postgres 7.3.4 thanks for any advice, and i hope this isn't too off topic. --brian ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] pg_autovacuum seems to be a neat freak and cleans way too much
I'm following up on my own email and cross posting to hackers, because there is a bug that needs fixed. I spent some more time digging into this, and I found the cause of the problem. reltuples in pg_class is defined as a real, reltuples in pg_autovacuum is defined as an int. the query used to get reltuples returns scientific notation for my larg tables, '4.06927e+06' for the one i mention below.pg_autovacuum happily converts that to a '4' by doing atoi('4.06927e+06'), which is why it's all fubar for my large tables with over a million tuples. my real quick hack of changing the define in pg_autovacuum.h to cast reltuples to ::int4 makes it work line: 37 #define TABLE_STATS_QUERY select a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples:: int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.oid=b.relid and a .relkind = 'r' #define PAGES_QUERY select oid,reltuples::int4,relpages from pg_class where oid=%i however, i think a better fix would be to change the autovacuum to use a double instead of an int. if it's going to stay at int, it should probably be increased to long and the casts changed to ::int8 any suggestions on how best way to fix? i'll supply a patch once the approach is agreed upon and the problem has been verified. best regards, --brian On May 18, 2004, at 7:37 PM, Brian Hirt wrote: I've having a strange issue with pg_autovacuum. I have a table with about 4 million rows in 20,000 pages. autovacuum likes to vacuum and/or analyze it every 45 minutes or so, but it probably doesn't have more that a few hundred rows changed every few hours. when i run autovacuum with -d3 it says [2004-05-18 07:04:26 PM] table name: basement_nightly.public.search_words4 [2004-05-18 07:04:26 PM] relid: 396238832; relisshared: 0 [2004-05-18 07:04:26 PM] reltuples: 4; relpages: 20013 [2004-05-18 07:04:26 PM] curr_analyze_count: 0; cur_delete_count: 0 [2004-05-18 07:04:26 PM] ins_at_last_analyze: 0; del_at_last_vacuum: 0 [2004-05-18 07:04:26 PM] insert_threshold:504; delete_threshold1008 reltuples: 4 seems wrong. I would expect a table with 4m rows and 20k pages to have more than 4 tuples. I think this is why the insert threshhold is all messed up -- which is why it gets analyzed way too frequently. this happens with other big tables too. the autovacuum is from 7.4.2, some information is below. output from vacuum: basement=# vacuum ANALYZE verbose search_words4; INFO: vacuuming public.search_words4 INFO: index search_words4_data_id now contains 4069268 row versions in 15978 pages DETAIL: 479 index row versions were removed. 1 index pages have been deleted, 0 are currently reusable. CPU 0.42s/0.70u sec elapsed 29.48 sec. INFO: index search_words4_pkey now contains 4069268 row versions in 17576 pages DETAIL: 479 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.77s/0.74u sec elapsed 150.19 sec. INFO: search_words4: removed 479 row versions in 6 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: search_words4: found 479 removable, 4069268 nonremovable row versions in 19950 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 1.30s/1.61u sec elapsed 179.96 sec. INFO: analyzing public.search_words4 INFO: search_words4: 19950 pages, 3000 rows sampled, 4069800 estimated total rows VACUUM basement=# here's the frequency [2004-05-18 12:12:54 PM] Performing: ANALYZE public.search_words4 [2004-05-18 01:59:13 PM] Performing: ANALYZE public.search_words4 [2004-05-18 02:05:36 PM] Performing: ANALYZE public.search_words4 [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE public.search_words4 [2004-05-18 02:46:09 PM] Performing: ANALYZE public.search_words4 [2004-05-18 03:39:31 PM] Performing: ANALYZE public.search_words4 [2004-05-18 05:20:45 PM] Performing: ANALYZE public.search_words4 [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE public.search_words4 [2004-05-18 06:18:34 PM] Performing: ANALYZE public.search_words4 [2004-05-18 07:34:27 PM] Performing: ANALYZE public.search_words4 [2004-05-18 07:43:18 PM] Performing: ANALYZE public.search_words4 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum seems to be a neat freak and cleans way too much
there might be another similar bug that was fixed in 7.4.2 i just doubled checked the 7.4.2 tarball, and it does have this problem. you might want to double check to see if it's fixed in 7.4.3, or i can grab cvs and check it if you like. On May 18, 2004, at 8:06 PM, Bruce Momjian wrote: I think we already fixed that in 7.4.2. We also have a few bugs still in 7.4.2 and we need to get those fixed soon and release 7.4.3. --- Brian Hirt wrote: I'm following up on my own email and cross posting to hackers, because there is a bug that needs fixed. I spent some more time digging into this, and I found the cause of the problem. reltuples in pg_class is defined as a real, reltuples in pg_autovacuum is defined as an int. the query used to get reltuples returns scientific notation for my larg tables, '4.06927e+06' for the one i mention below.pg_autovacuum happily converts that to a '4' by doing atoi('4.06927e+06'), which is why it's all fubar for my large tables with over a million tuples. my real quick hack of changing the define in pg_autovacuum.h to cast reltuples to ::int4 makes it work line: 37 #define TABLE_STATS_QUERY select a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples:: int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.oid=b.relid and a .relkind = 'r' #define PAGES_QUERY select oid,reltuples::int4,relpages from pg_class where oid=%i however, i think a better fix would be to change the autovacuum to use a double instead of an int. if it's going to stay at int, it should probably be increased to long and the casts changed to ::int8 any suggestions on how best way to fix? i'll supply a patch once the approach is agreed upon and the problem has been verified. best regards, --brian On May 18, 2004, at 7:37 PM, Brian Hirt wrote: I've having a strange issue with pg_autovacuum. I have a table with about 4 million rows in 20,000 pages. autovacuum likes to vacuum and/or analyze it every 45 minutes or so, but it probably doesn't have more that a few hundred rows changed every few hours. when i run autovacuum with -d3 it says [2004-05-18 07:04:26 PM] table name: basement_nightly.public.search_words4 [2004-05-18 07:04:26 PM] relid: 396238832; relisshared: 0 [2004-05-18 07:04:26 PM] reltuples: 4; relpages: 20013 [2004-05-18 07:04:26 PM] curr_analyze_count: 0; cur_delete_count: 0 [2004-05-18 07:04:26 PM] ins_at_last_analyze: 0; del_at_last_vacuum: 0 [2004-05-18 07:04:26 PM] insert_threshold:504; delete_threshold1008 reltuples: 4 seems wrong. I would expect a table with 4m rows and 20k pages to have more than 4 tuples. I think this is why the insert threshhold is all messed up -- which is why it gets analyzed way too frequently. this happens with other big tables too. the autovacuum is from 7.4.2, some information is below. output from vacuum: basement=# vacuum ANALYZE verbose search_words4; INFO: vacuuming public.search_words4 INFO: index search_words4_data_id now contains 4069268 row versions in 15978 pages DETAIL: 479 index row versions were removed. 1 index pages have been deleted, 0 are currently reusable. CPU 0.42s/0.70u sec elapsed 29.48 sec. INFO: index search_words4_pkey now contains 4069268 row versions in 17576 pages DETAIL: 479 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.77s/0.74u sec elapsed 150.19 sec. INFO: search_words4: removed 479 row versions in 6 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: search_words4: found 479 removable, 4069268 nonremovable row versions in 19950 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 1.30s/1.61u sec elapsed 179.96 sec. INFO: analyzing public.search_words4 INFO: search_words4: 19950 pages, 3000 rows sampled, 4069800 estimated total rows VACUUM basement=# here's the frequency [2004-05-18 12:12:54 PM] Performing: ANALYZE public.search_words4 [2004-05-18 01:59:13 PM] Performing: ANALYZE public.search_words4 [2004-05-18 02:05:36 PM] Performing: ANALYZE public.search_words4 [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE public.search_words4 [2004-05-18 02:46:09 PM] Performing: ANALYZE public.search_words4 [2004-05-18 03:39:31 PM] Performing: ANALYZE public.search_words4 [2004-05-18 05:20:45 PM] Performing: ANALYZE public.search_words4 [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE public.search_words4 [2004-05-18 06:18:34 PM] Performing: ANALYZE public.search_words4 [2004-05-18 07:34:27 PM] Performing: ANALYZE public.search_words4 [2004-05-18 07:43:18 PM] Performing: ANALYZE public.search_words4 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian
[HACKERS] fairly serious bug with pg_autovacuum in pg7.4
Hello, I've run across a pretty serious problem with pg_autovacuum. pg_autovacuum looses track of any table that's ever been truncated (possibly other situations too). When i truncate a table it gets a new relfilenode in pg_class. This is a problem because pg_autovacuum assumes pg_class.relfilenode will join to pg_stats_all_tables.relid. pg_stats_all_tables.relid is actallly the oid from pg_class, not the relfilenode. These two values start out equal so pg_autovacuum works initially, but it fails later on because of this incorrect assumption. here is one query pg_autovacuum uses (from pg_autovacuum.h) to get tables that breaks. select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname,b .n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b where a.relfilenode=b.relid and a.relkind = 'r' here's a little test case you can use to see what happens: basement=# create table test_table ( id int4 ); CREATE TABLE basement=# select relname, relfilenode from pg_class where relkind = 'r' and relname = 'test_table'; relname | relfilenode +- test_table |28814151 (1 row) basement=# select relid,relname from pg_stat_all_tables where relname = 'test_table'; relid | relname --+ 28814151 | test_table (1 row) basement=# select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname, basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b basement-# where a.relfilenode=b.relid and a.relkind = 'r' and a.relname = 'test_table'; relfilenode | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -++--+--+- +---++---+---+--- 28814151 | test_table | 2200 | 10 | f | 1000 | public | 0 | 0 | 0 (1 row) basement=# basement=# truncate table test_table; TRUNCATE TABLE basement=# select relname, relfilenode from pg_class where relkind = 'r' and relname = 'test_table'; relname | relfilenode +- test_table |28814153 (1 row) basement=# select relid,relname from pg_stat_all_tables where relname = 'test_table'; relid | relname --+ 28814151 | test_table (1 row) basement=# select a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup les,b.schemaname, basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables b basement-# where a.relfilenode=b.relid and a.relkind = 'r' and a.relname = 'test_table'; relfilenode | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -+-+--+--+- +---++---+---+--- (0 rows) basement=# drop table test_table; DROP TABLE basement=# PS: i'm running pg-7.4 and pg_autovacuum from contrib. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Problem with plpgsql functions and foreign key constraints.
I recently ran across this (i think) bug relating to constraints and functions written in plpgsql. It seems that I'm getting erroneous foreign key violations. I've included two scripts which create the simplest test case I can reproduce. One script has a foreign key defined and the other one doesn't. Other than that, they are identical. From the data in the scripts, it's obvious there aren't any violations of keys. [postgres@loopy postgres]$ diff /tmp/good.sql /tmp/bad.sql 18c18 create table c2 ( id int, value_sum int); --- create table c2 ( id int references c(id), value_sum int); [postgres@loopy postgres]$ psql test /tmp/good.sql NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'c_pkey' for table 'c' CREATE INSERT 19107 1 INSERT 19108 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE INSERT 19126 1 INSERT 19127 1 INSERT 19128 1 INSERT 19129 1 INSERT 19130 1 INSERT 19131 1 CREATE CREATE CREATE UPDATE 6 id | value_sum +--- 1 | 6 2 |17 (2 rows) id 1 2 (2 rows) id 1 2 (2 rows) id 1 2 (2 rows) [postgres@loopy postgres]$ psql test /tmp/bad.sql NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'c_pkey' for table 'c' CREATE INSERT 19164 1 INSERT 19165 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE INSERT 19183 1 INSERT 19184 1 INSERT 19185 1 INSERT 19186 1 INSERT 19187 1 INSERT 19188 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE CREATE CREATE ERROR: triggered data change violation on relation c2 id | value_sum +--- (0 rows) id 1 2 (2 rows) id 1 2 (2 rows) id (0 rows) good.sql bad.sql ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Fw: Problem with plpgsql functions and foreign key constraints.
I forgot to mention that this is happening on 7.0.3and 7.1.1 -- and I'm running on a RedHat 7.0 machine. - Original Message - From: Brian Hirt [EMAIL PROTECTED] To: Postgres Hackers [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Sunday, June 24, 2001 1:12 AM Subject: Problem with plpgsql functions and foreign key constraints. I recently ran across this (i think) bug relating to constraints and functions written in plpgsql. It seems that I'm getting erroneous foreign key violations. I've included two scripts which create the simplest test case I can reproduce. One script has a foreign key defined and the other one doesn't. Other than that, they are identical. From the data in the scripts, it's obvious there aren't any violations of keys. [postgres@loopy postgres]$ diff /tmp/good.sql /tmp/bad.sql 18c18 create table c2 ( id int, value_sum int); --- create table c2 ( id int references c(id), value_sum int); [postgres@loopy postgres]$ psql test /tmp/good.sql NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'c_pkey' for table 'c' CREATE INSERT 19107 1 INSERT 19108 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE INSERT 19126 1 INSERT 19127 1 INSERT 19128 1 INSERT 19129 1 INSERT 19130 1 INSERT 19131 1 CREATE CREATE CREATE UPDATE 6 id | value_sum +--- 1 | 6 2 |17 (2 rows) id 1 2 (2 rows) id 1 2 (2 rows) id 1 2 (2 rows) [postgres@loopy postgres]$ psql test /tmp/bad.sql NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'c_pkey' for table 'c' CREATE INSERT 19164 1 INSERT 19165 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE INSERT 19183 1 INSERT 19184 1 INSERT 19185 1 INSERT 19186 1 INSERT 19187 1 INSERT 19188 1 NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE CREATE CREATE ERROR: triggered data change violation on relation c2 id | value_sum +--- (0 rows) id 1 2 (2 rows) id 1 2 (2 rows) id (0 rows) good.sql bad.sql ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Question about performance of planner
Hi, I have a question about the performance of the planner in 7.1. I've been testing the 11/21 snapshot of the database just to get an idea of how it will work for me when I upgrade from 7.02 I've noticed that some queries are taking much longer and I've narrowed it down (i think) to the planner. I've run an identical query against 7.02 and 7.1. Both databases have the exact same data, and both databases have been vacuum'd. As you can see from below, the 7.1 snapshot is spending 97% of the total time planning the query, where the 7.0.2 version is spending only 27% of the total time planning the query. If anyone is interested in this, I'll be happy to supply you with information that would help track this down. Thanks. 7.1-snapshot PLANNER STATISTICS ! system usage stats: ! 7.748602 elapsed 5.02 user 0.20 system sec ! [5.09 user 0.21 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 47/1246 [349/1515] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 20 read, 0 written, buffer hit rate = 99.94% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written EXECUTOR STATISTICS ! system usage stats: ! 0.317000 elapsed 0.16 user 0.01 system sec ! [5.25 user 0.22 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 328/364 [677/1879] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks:160 read, 0 written, buffer hit rate = 97.73% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written 7.0.2 ! Planner Stats: ! system usage stats: ! 0.051438 elapsed 0.05 user 0.00 system sec ! [0.33 user 0.05 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/51 [680/837] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written ! Executor Stats: ! system usage stats: ! 0.136506 elapsed 0.13 user 0.00 system sec ! [0.46 user 0.05 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/6 [680/843] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 98 read, 0 written, buffer hit rate = 98.98% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written -- The world's most ambitious and comprehensive PC game database project. http://www.mobygames.com