[GENERAL] Catalog Bloat
Probably temp table related ...but catalog bloat on one of my databases appears to be pretty bad. Is the below bloat (table and index) something to worry about? pg_stat_all_tables show the relations ARE getting successfully vacuumed... Any suggestions on eliminating? Not sure if tools like pg_reorg are appropriate (or effective) or even vacuum full (yikes). I'd prefer not to take a complete outage - but I would if this bloat is really an issue. (I know about "reindex system" (duh) - but as that requires me to take an outage, my question about IF the bloat is a cause for concern still stands) schemaname | tablename | tbloat | wastedbytes | iname | ibloat | wastedibytes +--++-+-++-- pg_catalog | pg_attribute |9.0 |27648000 | pg_attribute_relid_attnam_index | 243.5 |361627648 pg_catalog | pg_attribute |9.0 |27648000 | pg_attribute_relid_attnum_index | 168.5 |253894656 pg_catalog | pg_type | 10.8 |4890624 | pg_type_oid_index | 135.8 |28721152 pg_catalog | pg_type | 10.8 |4890624 | pg_type_typname_nsp_index | 287.2 |60956672 pg_catalog | pg_class| 10.3 |4562944 | pg_class_oid_index | 94.1 |26689536 pg_catalog | pg_class| 10.3 |4562944 | pg_class_relname_nsp_index | 270.1 |77144064 pg_catalog | pg_depend|5.3 |3948544 | pg_depend_reference_index | 337.0 |156901376 pg_catalog | pg_depend|5.3 |3948544 | pg_depend_depender_index | 359.6 |167436288 pg_catalog | pg_index|6.1 |1130496 | pg_index_indexrelid_index | 72.9 | 7659520 pg_catalog | pg_index|6.1 |1130496 | pg_index_indrelid_index | 72.9 | 7659520 (10 rows) Thanks in advance
Re: [GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.
On Thursday, January 9, 2014 4:03 PM, Jeff Amiel wrote: I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base postgresql reindex functions underneath - and I have learned by using ‘verbose’ that the actual clustering of the table is quick - it’s the reindexing that is slow (It’s doing each reindex sequentially instead of concurently) I'm second guessing this - I'm not sure I REALLY know how long the table cluster takes versus the index - -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.
I have a maintenance window coming up and using pg_upgrade to upgrade from 9.2.X to 9.3.X. As part of the window, I’d like to ‘cluster’ each table by its primary key. After doing so, I see amazing performance improvements (probably mostly because of index bloat - but possibly due to table fragmentation) That being said, I have a single table that is blowing my window - at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy is my white whale. There are 10 indexes (not including the primary key). Yes - 10 is a lot - but I’ve been monitoring their use (most are single column or partial indexes) and all are used. That being said, I’ve been reading and experimenting in trying to get a cluster of this table (which re-indexes all 10/11 indexes) to complete in a reasonable amount of time. There are lots of settings and ranges to chose from and while my experiments continue, I was looking to get some input. Lowest I have gotten for clustering this table is just under 6 hours. I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base postgresql reindex functions underneath - and I have learned by using ‘verbose’ that the actual clustering of the table is quick - it’s the reindexing that is slow (It’s doing each reindex sequentially instead of concurently) PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit 500 gig of ram 2.7gig processors (48 cores) Shared buffers set to 120gig Maintenance work men set to 1gig work men set to 500 meg Things I have read/seen/been told to tweak… fsync (set to off) setting wal_level to minimal (to avoid wal logging of cluster activity) bumping up maintenance work men (but I’ve also seen/read that uber high values cause disk based sorts which ultimately slow things down) Tweaking checkpoint settings (although with wal_level set to minimal - I don’t think it comes into play) any good suggestions for lighting a fire under this process? If worse comes to worse, I can vacuum full the table and reindex each index concurrently - but it won’t give me the benefit of having the tuples ordered by their oft-grouped primary key. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_prewarm status
Trying to follow the threads and other references - but I can't determine where this patch ended up. (http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com) I'm trying to experiment with some new hardware - and the functionality to add specific tables/indexes into cache ahead of time will benefit me greatly. I found a page describing how to apply the patch to 9.2.4 (jumping through some hoops - http://issues.collectionspace.org/browse/UCJEPS-432) and was hoping to get a version to apply to 9.3.X Can anyone advise me on how I might get this 'applied' to a 9.3.X source code base or any other options to denote specific relations that I'd like to get directly into shared_buffers? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.
On Monday, November 4, 2013 3:23 PM, Adrian Klaver wrote: >Probably poor choice of words:). So then, what we are looking at is >other clients trying to update user_profile but not succeeding because >pid 4899 is blocking. At this point all I can see is that the offending >query is updating some fields the others are not; disabled and reset_code. > >Is that always the case? > >If so any thing in the code path that is different when those fields are >updated? We have scenarios where exact same query is in play in all instances. Any thoughts as to the fact that this could be a full table_lock simply based on the use of username (non primary key - but specifically unique constraint) in the where clause? I'm grasping I know -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.
On Monday, November 4, 2013 2:25 PM, Adrian Klaver wrote: > > I grepped the schema (just to be sure) - no foreign keys on columns or table > at all. > I do have an audit trigger on the table for updates - inserts into an audit > table when changes are made and it DOES do a separate select from > user_profile for other reasons - but not "for update" or anything - no > explicit locking. > >Would it be possible to see that audit function? it's kind of long (really just a lot of compares of old/new values. The relevant portion (that selects from user_profile) looks like this: BEGIN SELECT user_id, user_ip INTO my_user_id, my_user_ip FROM audit_metadata WHERE pg_pid = getpid(); IF ((NOT FOUND) OR (my_user_id = -1)) THEN SELECT user_id INTO my_user_id FROM user_profile WHERE username = 'db-'||CURRENT_USER and user_type='DBASE'; IF (NOT FOUND) THEN RAISE EXCEPTION 'USERNAME NOT FOUND IN USER_PROFILE: % ',CURRENT_USER; END IF; my_user_ip := inet_client_addr(); END IF; INSERT INTO audit . END; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.
On Monday, November 4, 2013 1:48 PM, Adrian Klaver wrote: >Any triggers on user_profile? >Any FK relationship in either direction? I grepped the schema (just to be sure) - no foreign keys on columns or table at all. I do have an audit trigger on the table for updates - inserts into an audit table when changes are made and it DOES do a separate select from user_profile for other reasons - but not "for update" or anything - no explicit locking. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.
to: Rob Sargent The login references have nothing to do with postgres - is simply table/column names being used. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] table lock when where clause uses unique constraing instead of primary key.
PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit Have got an annoying scenario that has been creating issues for us for years…. Time to try to figure it out. Essentially, we have a user table where we maintain username, id number, enabled/disabled state, etc. When a user logs in successfully, we reset any failed login attempts on the user’s unique entry in this table. CREATE TABLE user_profile ( user_id serial NOT NULL, username character varying(50) NOT NULL, login_attempts integer DEFAULT 0, … CONSTRAINT user_id PRIMARY KEY (user_id), CONSTRAINT name UNIQUE (username) ) However - we often get “lock storms” where SOMEHOW, updates for individual users are causing all other updates to ‘lock’ on each other. Eventually the storm abates (sometimes in seconds - sometimes in minutes) See edited screen cap: http://i.imgur.com/x4DdYaV.png (PID 4899 just has a “where username = $1 cut off that you can’t see out to the right) All updates are done using the username (unique constraint) instead of the primary key (the serial) In retrospect, I suppose these queries should be using the primary key (9 year old code) but I am flummoxed as to how these updates can be causing table? level locks. I’ve never been able to catch the lock information during one of these storms - but I assume it is a table level lock causing this. Thoughts? Is this just ‘normal’ behavior that I am not expecting? (because postgres doesn’t know that the username is a unique field) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Thanks much! (sorry for top-posting, yahoo email sucks) - Original Message - From: Amit Langote To: Jeff Amiel Cc: "dep...@depesz.com" ; "pgsql-general@postgresql.org" Sent: Monday, May 20, 2013 9:51 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me I also found one other discussion which has similar issues addressed: http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Ok - I agree - Can somebody help me understand where the row estimates come from on a nested-loop operation in postgres then? - Original Message - From: hubert depesz lubaczewski To: Jeff Amiel Cc: "pgsql-general@postgresql.org" Sent: Saturday, May 18, 2013 3:39 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me Not sure if it helps, but it's apparently not a very rare thing. Quick analysis on data from explain.depesz.com showed that > 12% of plans with nested loop have such estimate. Couple of examples: http://explain.depesz.com/s/Qm4 http://explain.depesz.com/s/qmW http://explain.depesz.com/s/qnG http://explain.depesz.com/s/QO http://explain.depesz.com/s/qov ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
- Original Message - From: Amit Langote To: Jeff Amiel Cc: "pgsql-general@postgresql.org" Sent: Friday, May 17, 2013 2:21 PM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me >> explain analyze >> select era.child_entity from entity_rel era join user_entity ue on >> ue.entity_id = era.parent_entity and ue.user_id=12345 >> >> Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual >> time=0.028..0.274 rows=201 loops=1) >> -> Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 >>rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) >> Index Cond: (user_id = 12345) >> Heap Fetches: 1 >> -> Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 >>rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) >> Index Cond: (parent_entity = ue.entity_id) >> Total runtime: 0.361 ms >I noticed when the explain output in your first mail shows Index Cond: >(user_id = 10954) whereas your query says: ue.user_id=12345. Something >with that? Although, your explain analyze does show the same values at >both places with the row estimate being 29107 in both cases, which, >well, looks awful and quite unexpected though there seem to have been >similar observations before That was a weak attempt at hiding 'real' data - intended to change them all to 12345. :) >Did you also check select count(*) on both the relations and found >related numbers? Nothing related (that I could find) on the rowcounts - one table has 20 million rows or so ad the other 65K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
> explain analyze > select era.child_entity from entity_rel era join user_entity ue on > ue.entity_id = era.parent_entity and ue.user_id=12345 > > Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual > time=0.028..0.274 rows=201 loops=1) > -> Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 >rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) > Index Cond: (user_id = 12345) > Heap Fetches: 1 > -> Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 >rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) > Index Cond: (parent_entity = ue.entity_id) > Total runtime: 0.361 ms >Have you tried analyze (it's probably a case of insufficient/outdated >statistics to planner's disposal) or probably consider changing >default_statistics_target? Again - my question revolves not around the whether or not I am getting good or bad estimates - my question is related to the fact that the nested-loop row estimation does not appear to be derived from the nodes below it - it is off by orders of magnitude. I've never seen this before. That aside, yes - I did analyze and tweak stats target during experimentation - no change. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
- Original Message - From: Amit Langote To: Jeff Amiel Cc: "pgsql-general@postgresql.org" Sent: Friday, May 17, 2013 11:37 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel wrote: > On most nested loops that I do explain/explain analyze on, the row estimation > for the nested-loop itself is a product of the inner nodes of the nested loop. > However in this case, I am stumped! > > explain > select era.child_entity from entity_rel era join user_entity ue on > ue.entity_id = era.parent_entity and ue.user_id=12345 > > Nested Loop (cost=0.00..2903.37 rows=29107 width=4) > -> Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 >rows=2 width=4) > Index Cond: (user_id = 10954) > -> Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 >rows=317 width=8) > Index Cond: (parent_entity = ue.entity_id) > > > How can the estimated number of rows for the nested loop node EXCEED the > product of the 2 row estimates of the tables being joined? > Not only does it exceed it - but it is orders of magnitude greater. > > Am I missing something obvious here? I an see the nested loop row estimate > being LESS but certainly not more. > > Can you also post the output of explain analyze ? I'm not worried about performance (per se) but the row estimation issue which propagates up as part of a bigger query. But here ya go: explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) -> Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 -> Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
> Can you provide a self-contained test case that does this? That response scares me. :) I can try - Every other table set (small, easy to experiment with) returns results as expected - Is the implication that this looks 'unusual'? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why does row estimation on nested loop make no sense to me
On most nested loops that I do explain/explain analyze on, the row estimation for the nested-loop itself is a product of the inner nodes of the nested loop. However in this case, I am stumped! explain select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) -> Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) Index Cond: (user_id = 10954) -> Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) Index Cond: (parent_entity = ue.entity_id) How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Am I missing something obvious here? I an see the nested loop row estimate being LESS but certainly not more. PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] When did this behavior change (and where else might it bite me)?
In prepping for an upgrade to 9.2.3, I stumbled across this: CREATE TABLE foo ( myint integer, string1 text, string2 text ) WITH ( OIDS=FALSE ); insert into foo values (12345,'Y','N'); select * from foo f where f.myint = 12345 or f.name='Y' In 9.2.3, this returns: ERROR: column f.name does not exist LINE 1: select * from foo f where myint = 12345 or f.name='Y' in 8.4.6 ,this returns no error (and gives me the row from the table) It looks like the parser is short-circuiting in 8.4.6 before stumbling upon the invalid column name - EXCEPT when the column name is NOT a reserved word (although according to http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html, 'name' is not a reserved word). Example - in 8.4.6, this WILL return an error: select * from foo f where f.myint = 12345 or f.poopy='Y' ERROR: column f.poopy does not exist LINE 2: select * from foo f where f.myint = 12345 or f.poopy='Y' ^ NOTE: The problem (assuming the problem is in 8.4.6) only manifests itself when I use table aliases . select * from foo f where myint = 12345 or name='Y' gives an error I would expect: ERROR: column "name" does not exist LINE 2: select * from foo f where myint = 12345 or name='Y' ^ Any insight into what change (I poured through the release notes and couldn't find anything) may have 'fixed' this behavior so that I might better head these off before my conversion? (yes, my example was contrived - and I did have an bug where the wrong column name was used) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] planner, newly added records and most common values
Ive got a scenario where I've got a 2 million row table. Data from inbound files gets processed into it. A new file might have 10 resulting rows in this table...might have 40K...depends on the source, day of month, etc. I've got a process that parses the file and loads the records into the table...giving it a unique file_id for the overall load and places that value on each record. Another process will perform a series of queries...joining against that table (for only records with that file_id). The problem is that the planner has no idea how many records might exist for that file_id. If I throw a file_id at the planner that is not in the most common value list, it picks a nice number like 384 as it's row count estimate. So when I am referencing a new file_id (that obviously isn't IN the most common value list as yet..regardless of how many records I just loaded because I haven't run analyze yet), the planner dutifully estimates that I will get only 384 rows. For large files, this is off by 2 (or god forbid, 3) orders of magnitude. That yields very bad overall plans (regardless of the fact that I have indexes on the file_id column) It seems like I am in a no-win situation. The query I am executing is fairly complex...and when the planner is off by multiple orders of magnitude on a rowcount, it goes way off the tracks in terms of planning. I COULD do an analyze after loading the file...but there is no guarantee that the file I just loaded will end up in the most common value listand I end up with bad plan. Any thoughts? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, Tom Lane wrote: > The only real fix for that will require cross-column > statistics, which > we don't have yet --- without such, there's no way for the > planner to > know that distributors have an atypical number of child > customers. The only caveat that I can think of here is that each of the customer_id's in the subselect ARE in the MCV list for the parent_customer column for that customer_rel table. If I had listed SPECIFIC customer ids in the subselect, I would get spot-on row estimates. Is there anything there that gives me any hope? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, Tom Lane wrote: > The only real fix for that will require cross-column > statistics, which > we don't have yet --- without such, there's no way for the > planner to > know that distributors have an atypical number of child > customers. > I suspected as such. > At the moment I think the only way to work around this is > to denormalize > your schema a bit. And I feared as much. It's biting me in other areas as well...this unusual distribution of data...certain types of customers have completely different data patterns than others. Back to the drawing board...thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, David Johnston wrote: > > > Can you wrap the query into an SQL or PL/pgSQL function so > that, at least, > then planner will not be able to see the embedded plan info > in the outer > queries? You use-case may allow you to create one or > more VIEWs wrapping > the function call with pre-determined parameters so that > you do not lose the > ability to write simple select queries without the need for > explicit > function calls. > > No idea how that would impact the planner for the other > queries but maybe > worth a try while waiting for someone more knowledgeable > than myself to > respond. I had actually considered this...because you can define what the 'row estimate' is for a functionexcept in my case, I have some dynamic inputs that would change what those estimates would need to be. Thanks for the help though! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, David Johnston wrote: > From: David Johnston > - > My, possibly naïve, observation: > > So aside from the fact the estimates seem to be off the > planner has still > chosen the most effective plan? In that situation no > matter how accurate > you get the statistics you will not gain any performance > because the planner > will never choose a different plan. > Thanks. I'm not interested in optimizing this query.it's fast and efficient. However, the planner thinks that it is going to return a million rows when it is only going to return one thousand. When this query is used as an element of a BIGGER query, that causes me all sorts of planner issues That's the problem I am trying to solve...why the planner is 3 orders of magnitude off in row estimation. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, David Johnston wrote: > What happens if you disable, say, nested loops and/or index > scans? planner selects different join/indexing techniques (query is slower) but row estimates (bad) remain identical. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Oddball data distribution giving me planner headaches
--- On Fri, 12/2/11, David Johnston wrote: > From: David Johnston > What kind of plan does the following give? > > EXPLAIN ANALYZE > SELECT * > FROM customer_rel p > JOIN customer c ON (p.parent_customer = c.customer_id) > WHERE c.customer_type = 'DISTRIBUTOR' Nearly identical output "Nested Loop (cost=0.00..29624.69 rows=931 width=97) (actual time=0.032..1330.208 rows=1025401 loops=1)" " -> Seq Scan on customer c (cost=0.00..25429.10 rows=136 width=71) (actual time=0.017..212.059 rows=68 loops=1)" "Filter: (customer_type = 'DISTRIBUTOR'::bpchar)" " -> Index Scan using rel_parent on customer_rel p (cost=0.00..30.76 rows=7 width=26) (actual time=0.006..7.732 rows=15079 loops=68)" "Index Cond: (parent_customer = c.customer_id)" "Total runtime: 1544.281 ms" -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Oddball data distribution giving me planner headaches
Oddball data distribution giving me headaches. We have a distinct 'customer' table with customer_id, type and name/demographic information. Assume some 1 million rows in the customer table. We then have a customer 'relationship' table which simply contains 2 columns…designating parent and child relationships…but allowing complex hierarchies between customers. CREATE TABLE customer_rel ( parent_customer integer NOT NULL, child_customer integer NOT NULL, ) 8 million rows in this table. Oddball distribution. We have some 8 levels of hierarchy (customer type) represented with this table. Every customer gets an entry where parent/child is themselves…and then for every 'upline'. At the highest level, we have 'distributors' which have all other customer types underneath them. Assuming we had some 68 distributors, the entries where THEY are the parent_customer represent nearly a million rows of the 8 million. I have extracted a simple case from a larger query that was generating an off-beat plan because of the unexpected planner row-counts being spewed by a low level query. explain analyze select * from customer_rel where parent_customer in (select customer_id from customer where customer_type='DISTRIBUTOR') " Nested Loop (cost=25429.44..29626.39 rows=931 width=0) (actual time=216.325..1238.091 rows=1025401 loops=1)" "-> HashAggregate (cost=25429.44..25430.80 rows=136 width=4) (actual time=216.304..216.339 rows=68 loops=1)" " -> Seq Scan on customer (cost=0.00..25429.10 rows=136 width=4) (actual time=0.018..216.226 rows=68 loops=1)" "Filter: (customer_type = 'DISTRIBUTOR'::bpchar)" "-> Index Scan using rel_parent on customer_rel (cost=0.00..30.76 rows=7 width=4) (actual time=0.006..8.190 rows=15079 loops=68)" " Index Cond: (parent_customer = customer.customer_id)" "Total runtime: 1514.810 ms" The fact that the top level nested loop THINKS it only will be returning 931 rows (instead of over 1 million) is the killer here…3 orders of magnitude. The results of this query are used as part of a bigger query and the screwed up stats are causing all sorts of havoc upline. I'm experimenting in 9.1.0…have set the statistics to 1000 (and 1) on both columns (parent and child) to little effect. Have hardcoded the n_distinct on the parent_customer column to be 1,000,000 also…with no effect (doing analyze of table after each change) Does this oddball data distribution doom me to poor planning forever? Any other thoughts? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Covert database from ASCII to UTF-8
It's come time to bite the bullet and convert a half-terraybyte database from ASCII to UTF8. Have gone through a bit of effort to track down the unclean ascii text and repair it but would like to avoid the outage of a many-many hour dump-restore. Using Postgres 8.4.X. Are there any other magic options open to me? Any way to do an in-place conversion? I assume slony replication is an option. What about some sort of wal log shipping replication? Any thoughts would be appreciated. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Table Vacuum (to prevent wraparound)
PostgreSQL 8.4.4 on x86_64-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit 2 large tables (around a billion rows each) recently received attention from autovacuum. They both currently show up in my currently running queries as "autovacuum: VACUUM tablename (to prevent wraparound)". These tables are insert-only audit tables….no updates, no deletes. Autovacuum settings are default with no overrides for these specific tables (the only override is to set 10 max_workers) I will admit that the "Preventing Transaction ID Wraparound Failures" chapter in the manual (and the subsequent autovacuum daemon chapter) make my head spin. Is there some specific query related to relfrozenxid, datfrozenxid that I should be utilizing to do some analysis? Obviously SOME threshhold has been reached…hence autovacuum stepping in. My two real questions are: 1. Am I in any real danger here? I have run autovacuum continuously since the dump-restore that created this database 6 months ago. 2. These autovacuums have been running for 5 days now. I have a database maintenance window tonight where I have to take down the server. I assume autovacuum will re-awaken and decide to start vacuuming these again. I assume I am effecting performance with these needless? vacuums. Any tweak I can/should make to any of the freeze_age params to prevent this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Advice on contingency plan for DAS array with separate local WAL drives
On 6/14/10 8:05 AM, "Jeff Amiel" wrote: > What is recommended in terms of prep/switchover in this instance? Should we > be rsyncing or using built-in wal-log shipping of these transaction logs to > our stand-by server? Simply pop out these drives and hand-move them to the > standby-server? (assuming they are not the issue in the first place) I should note that this 'similar' server is not intended to actually be a real warm standby server in the postgres sense...it serves other purposes...it simply possesses the right configuration (memory, cpu) to act as a temporary database server. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Advice on contingency plan for DAS array with separate local WAL drives
Recently migrated to a shiny new 8.4.4 postgres instancedata stored on attached storage array. Transaction logs stored on 2 local mirrored drives (local to the database server itself) for best performance. While we are replicating (using slony) to our DR site, our first-choice plan (in the event of an issue simply with the database server itself) was to disconnect the SAS cables from it (connected to the DAS) and connect them to a 'similar' box we have on standby. With the WAL logs physically on the drives of the original database server, this obviously becomes an issue. What is recommended in terms of prep/switchover in this instance? Should we be rsyncing or using built-in wal-log shipping of these transaction logs to our stand-by server? Simply pop out these drives and hand-move them to the standby-server? (assuming they are not the issue in the first place) Any thoughts would be appreciated. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.
On 6/8/10 2:03 PM, "Alvaro Herrera" wrote: > > I've seen this problem (and others) in a high-load environment. Not > Slony related though. > > I wrote a small tool to check btree index files for consistency problems > such as this one, by parsing pg_filedump output. I've seen strange > things such as index pointers pointing to pages that shouldn't have been > pointed to; mismatching sibling pointers; and others. > > Do you have a copy of the broken index file? Alas, no. But I have another 5 days to reproduce the problem before changing hardware/versions (which will probably bring a whole new set of 'opportunities' to deal with). If it happens again, I will snag the index file for analysis. Any way you can share your tool source so I can proactively look for issues in my indexes? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.
On 6/8/10 1:15 PM, "Jaime Casanova" wrote: > On Tue, Jun 8, 2010 at 12:49 PM, Jeff Amiel wrote: >> >> Does Slony manage it's own vacuuming separate from postgres' autovacuum? >> > > Yes it does: http://www.slony.info/documentation/maintenance.html " It seems preferable to configure autovacuum to avoid vacuum Slony-I-managed configuration tables. " HmmmI don't do this. Surely this is not relative to my corrupt indexes2 attempted vacuums on same indexes? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.
On 6/8/10 12:56 PM, "Tom Lane" wrote: > Jeff Amiel writes: >> On a side note, I am 100% sure that autovacuum was disabled when I brought >> the database back up after the core dump(s). However, minutes after >> restarting, some of my larger tables started getting vacuumed by pgsql user. >> Any way that a vacuum would kick off for a particular table (or series of >> tables) even when autovacuum was off in the postgresql.conf? > > Anti-transaction-wraparound vacuums, perhaps? I would hope not. :) This is postgres 8.2.X. Autovacuum has been enabled forever (seemingly with no errors) Anything I can look for ? (I searched the logs for references to "must be vacuumed within" but found nothing) SELECT datname, age(datfrozenxid) FROM pg_database; postgres178649703 prod204588751 template1178653277 template0178653131 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.
On 6/8/10 11:23 AM, "Tom Lane" wrote: > In your original report you mentioned that the next autovacuum attempt > on the same table succeeded without incident. Has that been true each > time? I wonder whether this is some transient state, rather than actual > corruption that you need to REINDEX to recover from. > I didn't waste time during this event and reindexed as quick as I could. I will note, however, that these indexes (specifically the one that generated the error) were in use (successfully) AFTER the event BEFORE the reindex by the slony triggers (by virtue of inserts into the log tables and such) even though I stopped autovacuum and slon daemon. Not reads, obviously...just inserts/updates. If nobody else has seen/is seeing this, I will chalk this whole scenario up to oddball SAN issues (we've logged many a write/read error over the year(s) causing corruption on these heavily manipulated indexes. I'll be glad to move to my attached storage as quickly as possible. On a side note, I am 100% sure that autovacuum was disabled when I brought the database back up after the core dump(s). However, minutes after restarting, some of my larger tables started getting vacuumed by pgsql user. Any way that a vacuum would kick off for a particular table (or series of tables) even when autovacuum was off in the postgresql.conf? My only manual vacuum process is kicked off late at night, so this was not it. Alsobefore I had a chance to disable the slon daemon I also noticed other slony tables being vacuum analyzed (even though autovacuum was off) Does Slony manage it's own vacuuming separate from postgres' autovacuum? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Restore roles with same oid
On 6/8/10 10:47 AM, "Stephen Frost" wrote: > Then you've been through this before.. Perhaps you should go check out > what you did then. Back before 8.1, we didn't use OIDs for > users/groups. :) Changing to OIDs was part of the work that I did to > add role support. Hmmm...this code has been in place since March 2005. using the usesysid from pg_shadow. 8.1 wsn't released until november of 2005. Doesn't matterI still have some work to do. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Restore roles with same oid
On 6/8/10 10:39 AM, "Stephen Frost" wrote: > I'm afriad you're not going to have a choice.. I would recommend > creating a mapping from the old IDs to the new ones as part of this > upgrade, to keep the historical information. Guess it's not nice to > point this out- but you really shouldn't have ever used OIDs for > something external to PG (or even internally, really). :) To be honest...this was done early in our postgresql careers (back in the 7.X days). We knew so little. :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Restore roles with same oid
On 6/8/10 10:30 AM, "Thom Brown" wrote: > Can't you switch to using role names? I don't think oids are intended > to be used by anything other than PostgreSQL. :( If only I couldmassive audit tables contain these IDs with years of data We have a plan to change to sequence values stored in one of our tables (one for each of our postgresql users) but don't want to pull the trigger on that plan as part of this upgrade if we can help it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Restore roles with same oid
We currently use the 'usesysid' column from pg_shadow (which is really pg_authid.oid I assume) for a trigger-based auditing mechanism. We are about to do a dump from an 8.2 database into 8.4 and would like to preserve the usesysid/oid when restoring. No matter what options I throw ad pg_dumpall, it gives me 'create role' commands...and obviously no preservation of the oids. Any suggestions on how I can manage this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] 3rd time is a charm.....right sibling is not next child crash.
Not looking for help...just putting some data out there. 2 previous crashes caused by corrupt slony indexes http://archives.postgresql.org/pgsql-general/2010-02/msg00022.php http://archives.postgresql.org/pgsql-general/2009-12/msg01172.php New one yesterday. Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx" We are on the eve of switching off our SAN to some direct attached storage and upgrading postgres and slony in the process this weekendso any thoughts that it might be hardware, driver or even postgres/slony should be alleviated by the fact that everything is changing. That being said, the fact that each time this has happened, it has been a slony index that has been corrupt, I find it 'odd'. While I can't imagine a bug in slony corrupting postgres indexes...and I can't imagine a bug in postgres corrupting only slony indexes, I don't really know what to think. Just putting this out there in case anyone has similar issues or can use this data in some meaningful way. Stack trace looks similar to last time. Program terminated with signal 6, Aborted. #0 0xfecba227 in _lwp_kill () from /lib/libc.so.1 (gdb) bt #0 0xfecba227 in _lwp_kill () from /lib/libc.so.1 #1 0xfecb598f in thr_kill () from /lib/libc.so.1 #2 0xfec61ed3 in raise () from /lib/libc.so.1 #3 0xfec41d0d in abort () from /lib/libc.so.1 #4 0x0821b8a6 in errfinish (dummy=0) at elog.c:471 #5 0x0821c74b in elog_finish (elevel=22, fmt=0x82b7780 "right sibling %u of block %u is not next child of %u in index \"%s\"") at elog.c:964 #6 0x0809e1a0 in _bt_pagedel (rel=0x867bcd8, buf=139905, stack=0x86b3768, vacuum_full=0 '\0') at nbtpage.c:1141 #7 0x0809f835 in btvacuumscan (info=0x8043f70, stats=0x86b5c30, callback=0, callback_state=0x0, cycleid=29488) at nbtree.c:936 #8 0x0809fc65 in btbulkdelete (fcinfo=0x0) at nbtree.c:547 #9 0x0821f424 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at fmgr.c:1215 #10 0x0809a89f in index_bulk_delete (info=0x8043f70, stats=0x0, callback=0x812ffc8 , callback_state=0x86b5818) at indexam.c:573 #11 0x0812ff54 in lazy_vacuum_index (indrel=0x867bcd8, stats=0x86b5b70, vacrelstats=0x86b5818) at vacuumlazy.c:660 #12 0x0813055a in lazy_vacuum_rel (onerel=0x867b7f8, vacstmt=0x86659b8) at vacuumlazy.c:487 #13 0x0812e910 in vacuum_rel (relid=140925368, vacstmt=0x86659b8, expected_relkind=114 'r') at vacuum.c:1107 #14 0x0812f95a in vacuum (vacstmt=0x86659b8, relids=0x8665bc0) at vacuum.c:400 #15 0x08186e16 in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914 #16 0x08187278 in autovac_start () at autovacuum.c:178 #17 0x0818bfed in ServerLoop () at postmaster.c:1252 #18 0x0818d16d in PostmasterMain (argc=3, argv=0x833adc8) at postmaster.c:966 #19 0x08152cce in main (argc=3, argv=0x833adc8) at main.c:188 (gdb) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Another PANIC corrupt index/crash ...any thoughts?
About a month ago I posted about a database crash possibly caused by corrupt index.. Dec 30 17:41:57 db-1 postgres[28957]: [ID 748848 local0.crit] [34004622-1] 2009-12-30 17:41:57.825 CST28957PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1" Has since happened again with a DIFFERENT index (interestingly also a slony related index) Jan 29 15:17:42 db-1 postgres[29025]: [ID 748848 local0.crit] [4135622-1] 2010-01-29 15:17:42.915 CST29025PANIC: right sibling 183 of block 182 is not next child of 158 in index "sl_seqlog_idx" I re-indexed the table...and restarted the database and all appears well (shut down autovacuum and slony for a while first to get feet underneath and then restarted after a few hours with no apparent ill effects) Coincidentally (or not) started getting disk errors about a minute AFTER the above error (db storage is on a fibre attached SAN) /var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 /var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci10de,5...@d/pci1077,1...@0/f...@0,0 (fcp1): /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/d...@g000b08001c001958 (sd9): /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice] Requested Block: 206265378 Error Block: 206265378 /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice] Vendor: Pillar Serial Number: /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Stack trace from recent crash is below: Program terminated with signal 6, Aborted. #0 0xfed00c57 in _lwp_kill () from /lib/libc.so.1 (gdb) bt #0 0xfed00c57 in _lwp_kill () from /lib/libc.so.1 #1 0xfecfe40e in thr_kill () from /lib/libc.so.1 #2 0xfecad083 in raise () from /lib/libc.so.1 #3 0xfec90b19 in abort () from /lib/libc.so.1 #4 0x0821b6ea in errfinish (dummy=0) at elog.c:471 #5 0x0821c58f in elog_finish (elevel=22, fmt=0x82b7200 "right sibling %u of block %u is not next child of %u in index \"%s\"") at elog.c:964 #6 0x0809e0a8 in _bt_pagedel (rel=0x8602f78, buf=377580, stack=0x881d660, vacuum_full=0 '\0') at nbtpage.c:1141 #7 0x0809f73d in btvacuumscan (info=0x8043f60, stats=0x8578410, callback=0, callback_state=0x0, cycleid=20894) at nbtree.c:936 #8 0x0809fb6d in btbulkdelete (fcinfo=0x0) at nbtree.c:547 #9 0x0821f268 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at fmgr.c:1215 #10 0x0809a7a7 in index_bulk_delete (info=0x8043f60, stats=0x0, callback=0x812fea0 , callback_state=0x85765e8) at indexam.c:573 #11 0x0812fe2c in lazy_vacuum_index (indrel=0x8602f78, stats=0x85769c8, vacrelstats=0x85765e8) at vacuumlazy.c:660 #12 0x08130432 in lazy_vacuum_rel (onerel=0x8602140, vacstmt=0x85d9f48) at vacuumlazy.c:487 #13 0x0812e7e8 in vacuum_rel (relid=140353352, vacstmt=0x85d9f48, expected_relkind=114 'r') at vacuum.c:1107 #14 0x0812f832 in vacuum (vacstmt=0x85d9f48, relids=0x85d9f38) at vacuum.c:400 #15 0x08186cee in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914 #16 0x08187150 in autovac_start () at autovacuum.c:178 #17 0x0818bec5 in ServerLoop () at postmaster.c:1252 #18 0x0818d045 in PostmasterMain (argc=3, argv=0x83399a8) at postmaster.c:966 #19 0x08152ba6 in main (argc=3, argv=0x83399a8) at main.c:188 Any thoughts on how I should proceed? We are planning an upgrade to 8.4 in the short-term, but I can see no evidence of fixes since the 8.2 version that would relate to index corruption. I have no real evidence of bad disks...iostat -E reports: # iostat -E sd2 Soft Errors: 1 Hard Errors: 4 Transport Errors: 0 Vendor: Pillar Product: Axiom 300Revision: Serial No: Size: 2.20GB <2200567296 bytes> Media Error: 0 Device Not Ready: 0 No Device: 4 Recoverable: 0 Illegal Request: 1 Predictive Failure Analysis: 0 sd3 Soft Errors: 1 Hard Errors: 32 Transport Errors: 0 Vendor: Pillar Product: Axiom 300Revision: Serial No: Size: 53.95GB <53948448256 bytes> Media Error: 0 Device Not Ready: 0 No Device: 32 Recoverable: 0 Illegal Request: 1 Predictive Failure Analysis: 0 sd7 Soft Errors: 1 Hard Errors: 40 Transport Errors: 8 Vendor: Pillar Product: Axiom 300Revision: Serial No: Size: 53.95GB <53948448256 bytes> Media Error: 0 Device Not Ready: 1 No Device: 33 Recoverable: 0 Illegal Request: 1 Predictive Failure Analysis: 0 sd8 Soft Errors: 1 Hard Errors: 34 Transport Errors: 0 Vendor: Pillar Product: Axiom 300Revision: Serial No: Size: 107.62GB <107622432256 bytes> Media Error: 0 De
Re: [GENERAL] PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1"
--- On Wed, 12/30/09, Jeff Amiel wrote: > Subject: PANIC: right sibling 2019 of block 2018 is not next child of 1937 > in index "sl_log_2_idx1" I am assuming a re-index for that particular index will rebuild/fix the index (if it happens again). Any other thoughts? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1"
I know I don't have a lot of data right now (still looking for core dump) Any obvious thoughts or advice until I can get more info? Dec 30 17:41:57 db-1 postgres[28957]: [ID 748848 local0.crit] [34004622-1] 2009-12-30 17:41:57.825 CST28957PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1" Dec 30 17:46:17 db-1 postgres[17576]: [ID 748848 local0.info] [34005240-1] 2009-12-30 17:46:17.279 CST17576LOG: autovacuum process (PID 28957) was terminated by signal 6 Dec 30 17:46:17 db-1 postgres[17576]: [ID 748848 local0.info] [34005241-1] 2009-12-30 17:46:17.279 CST17576LOG: terminating any other active server processes Database recovered itself...and autovacuum of sl_log_2 started up again with no obvious issue right afterwords. PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath) Oldish version of slony (1.2.10) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.
--- On Fri, 11/27/09, Tom Lane wrote: > You didn't show us any evidence of that, either. Both > of your test > cases are using the index. Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not. Wouldn't seem that the planner is using the index effectively. Aggregate (cost=231884.57..231884.57 rows=1 width=0) versus Aggregate (cost=34.83..34.83 rows=1 width=0) By -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.
hmm...ok...planner is not using the index effectively (as effectively as when a non-empty value is passed in) --- On Fri, 11/27/09, Tom Lane wrote: > From: Tom Lane > Subject: Re: [GENERAL] empty string causes planner to avoid index. Makes me > sad. > To: "Jeff Amiel" > Cc: pgsql-general@postgresql.org > Date: Friday, November 27, 2009, 3:14 PM > Jeff Amiel > writes: > > It appears that somehow the empty string is causing > the planner to abandon the index. > > You didn't actually show us such a case... > > > regards, tom lane > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] empty string causes planner to avoid index. Makes me sad.
PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath) CREATE TABLE items ( field1 character(9) NOT NULL, field2 character varying(17) NOT NULL }; CREATE INDEX "field1-field2" ON items USING btree (field1, field2); About 15 million rows in the items table. explain select count(*) from items where field1 = '102100400' and field2 = ''; Aggregate (cost=231884.57..231884.57 rows=1 width=0) -> Bitmap Heap Scan on items (cost=4286.53..231841.95 rows=170468 width=0) Recheck Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text)) -> Bitmap Index Scan on "field1-field2-check" (cost=0.00..4282.27 rows=170468 width=0) Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text)) explain select count(*) from items where field1 = '102100400' and field2 = ' '; /*17 spaces*/ Aggregate (cost=34.83..34.83 rows=1 width=0) -> Index Scan using "field1-field2" on items (cost=0.00..34.82 rows=18 width=0) Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ' '::text)) If I have any value in field2 other than an empty string '' (like '1' or 'space'), it will use the index. It appears that somehow the empty string is causing the planner to abandon the index. Can I get any insights into this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Frustrated...pg_dump/restore
--- On Mon, 10/6/08, Scott Marlowe <[EMAIL PROTECTED]> wrote: > I'm wondering if the OP has some line breaks in his > data that are > getting misinterpreted, or maybe his encoding on the two > dbs is > different and he's not taking care of that. Ahhh *looks at encoding* Well..they are both the same...BUT...they are set to ENCODING = 'SQL_ASCII'; That explains a lotthey should probably be set to Unicode UTF8 Duh Any way to change encoding without dumping/restoring database? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] function returning setof..select versus select * from
What is the difference between: select foo(); and select * from foo(); Foo is defined as: CREATE OR REPLACE FUNCTION foo() RETURNS SETOF integer AS 'SELECT column from foo_table;' LANGUAGE 'sql' STABLE; Explain shows difference... explain select * from foo() "Function Scan on foo (cost=0.00..1.25 rows=1000 width=4)" Explain select foo(); "Result (cost=0.00..0.00 rows=1 width=0)" They both return the same results..yet yield different plans...and different speeds when using 'real' data. Why come? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Frustrated...pg_dump/restore
I performed a pg_dump on a database and created a new schema-only database to copy that data into. However trying to use psql -f to load the data in, I get a plethora of syntax errors including the dreaded "invalid command \N". I even tried to pipe the pg_dump results directly into the psql command /usr/local/pgsql/bin/pg_dump -U pgsql --data-only db1 | /usr/local/pgsql/bin/psql -U pgsql db2 Same results. Why? using -d (switching to inserts instead of COPY) seems to work just fine but is so slow as to be unusable. Is the COPY pg_dump method useful at all in ANY situation? Do I have to do a pg_dump using a custom archive option and use pg_resore to make this work? (sounds silly to me). Any help would be appreciated. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] function returning setof..select versus select * from
What is the difference between: select foo(); and select * from foo(); Foo is defined as: CREATE OR REPLACE FUNCTION foo() RETURNS SETOF integer AS 'SELECT column from foo_table;' LANGUAGE 'sql' STABLE; Explain shows difference... explain select * from foo() "Function Scan on foo (cost=0.00..1.25 rows=1000 width=4)" Explain select foo(); "Result (cost=0.00..0.00 rows=1 width=0)" They both return the same results..yet yield different plans...and different speeds when using 'real' data. Why come? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] The planner hates me.
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] The Right Way (tm) to do this would be something like create temp table dates as select * from get_dates(...); analyze dates; ... original select, but join against temp table ... which would leave the planner armed with some stats about the range of dates of interest. I actually tried that but minus the analyze. :( Btw, congrats on whomever contributed to the bitmap scans (v8.1 I believe)the ability to use multiple indexes for a single table in the same query is fantastic!! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] The planner hates me.
-Original Message- From: Hoover, Jeffrey [mailto:[EMAIL PROTECTED] change t.date2< dates.date to t.date2+0= dates.date) and t.date2+0http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] The planner hates me.
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" (test environment) Picture a table called 'transaction' with 1 million rows. most (> 99% of the records have date1 and date2 values in the past (spread over 4 years) 99.99% of the records have a state OTHER than 'I'. CREATE TABLE transaction ( amount numeric(10,2) NOT NULL, date1 date NOT NULL, state character(1) NOT NULL, date2 date DEFAULT date(now()) ) CREATE INDEX t_date1_index ON transaction USING btree (date1); CREATE INDEX t_date2_index ON transaction USING btree (date2); CREATE INDEX t_state_index ON transaction USING btree (state); explain analyze select sum(amount), dates.date as date from transaction t join (select get_dates as date from get_dates('09/17/08','09/24/08')) dates on (t.state='I' or t.date1 >= dates.date) group by dates.date get_dates simply returns each date between (and including) the passed dates ...and is a 'stable' function). Yes...I know I probably could have used a generate_series or something...but this was written before I knew anything about that "HashAggregate (cost=1290485.15..1290485.40 rows=200 width=17) (actual time=277.804..277.809 rows=8 loops=1)" " -> Nested Loop (cost=270.37..1123134.88 rows=334700533 width=17) (actual time=3.182..153.741 rows=120536 loops=1)" "-> Function Scan on get_dates (cost=0.00..1.25 rows=1000 width=4) (actual time=0.057..0.065 rows=8 loops=1)" "-> Bitmap Heap Scan on "transaction" t (cost=270.37..618.60 rows=336357 width=22) (actual time=3.093..10.958 rows=15067 loops=8)" " Recheck Cond: ((t.state = 'I'::bpchar) OR (t.date1 >= get_dates.get_dates))" " -> BitmapOr (cost=270.37..270.37 rows=336357 width=0) (actual time=2.853..2.853 rows=0 loops=8)" "-> Bitmap Index Scan on t_state_index (cost=0.00..8.97 rows=4971 width=0) (actual time=2.842..2.842 rows=15067 loops=8)" " Index Cond: (state = 'I'::bpchar)" "-> Bitmap Index Scan on t_date1_index (cost=0.00..252.99 rows=331387 width=0) (actual time=0.009..0.009 rows=2 loops=8)" " Index Cond: (t.date1 >= get_dates.get_dates)" "Total runtime: 277.883 ms" Uses bitmap scans for optimization and performs admirably. BUT...when I change the query thusly... (adding in an addition AND clause) explain analyze select sum(amount), dates.date as date from transaction t join (select get_dates as date from get_dates('09/17/08','09/24/08')) dates on (t.state='I' or t.date1 >= dates.date)and t.date2 < dates.date group by dates.date "HashAggregate (cost=1222618.09..1222618.34 rows=200 width=15) (actual time=7538.193..7538.199 rows=8 loops=1)" " -> Nested Loop (cost=0.00..1166174.15 rows=112887885 width=15) (actual time=0.889..7411.997 rows=120522 loops=1)" "Join Filter: ((t.state = 'I'::bpchar) OR (t.date1 >= get_dates.get_dates))" "-> Function Scan on get_dates (cost=0.00..1.25 rows=1000 width=4) (actual time=0.055..0.062 rows=8 loops=1)" "-> Index Scan using t_date2_index on "transaction" t (cost=0.00..590.77 rows=328800 width=24) (actual time=0.018..492.348 rows=986273 loops=8)" " Index Cond: (t.date2 < get_dates.get_dates)" "Total runtime: 7538.259 ms" I am pulling my hair out. Why does it insist on using the index on the date2 field? It was doing so well with the bitmap indexing on the other fields...if the planner would simply do the same with date2, all would be well. I really need The date2 portion of the query to filter AFTER the first two clauses, because those are quicker and narrows the data down to a manageable size Any suggestions on query or index changes? I still need the date2 index for other queriesbut the fact that it gets used exclusively for this query drives me batty!. I've tried various combinations of multi-column indexes to no avail...because I still need the date2 index for other reasons, this query ALWAYS chooses it for some reason. Any help would be appreciated. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] more SSL crash woes....
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" As the proud author of this previous post: http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php I never found a real answer except to disable SSL on the connections between my master and subscriber nodes. Things have been peachy ever since. Todayfirst db crash in quite some time...seemingly unrelated to slony Stack trace looks eerily familiar: Core was generated by `/usr/local/pgsql/bin/postgres -D /db'. Program terminated with signal 11, Segmentation fault. #0 0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8 *grumble* I have located nothing unusual occurring at the time of the event We have developers that connect from win32 and Fedora boxes via PGAdminIII and they use SSL connections...but they have ALWAYS connected using SSL. Any suggestions? I really need to try to either provide an explanation or make SOME change to prevent upgrade openssl? (we are on 9.8e) remove ALL encrypted connection capabilities (via pg_hba.conf)? punt? I'm dubious that that stack trace simply 'starts' in the middle of the libcrypto stuff. shouldn't I see some postgresql functions in there somewhere? (postgres compiled with --enable-debug) Any help would be appreciated. You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] More SSL crash woes
Tom Lane wrote: One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is it possible that when you compile PG it is finding the header files for some other version? Sure enough...I put a #ERROR at the top of the 'old/incorrect' ssl..h and did a make clean/make and errored out. So I was building with 0.9.8 libraries...but 0.9.7 header files. That can't be good. I guess hat would explain why nobody else on the planet has seen this issue :) thanks much for the assist! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] More SSL crash woes
Tom Lane wrote: One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is it possible that when you compile PG it is finding the header files for some other version? yes...if I could figure out how the include path is being set on the postgresql build. I'm looking at the config.log and I see no reference to -I (to set the include path) It simply references the header files as "openssl/ssl.h" Any way to tell the default include path for gcc? There are two sets: /usr/sfw/include/openssl/ssl.h (older incorrect one) /usr/local/ssl/include/openssl/ssl.h (newer 'correct one) I guess I could build something that #includes openssl/ssl.h and 'butcher' the bad one and see what happens. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] More SSL crash woes
Jeff Amiel wrote: Now their are 2 subscriber nodes that connect to this node for slony replication... One is running the same version (libssl 0.9.8e) but one is running 0.9.7e-p1 2. could this be an issue? Note that both nodes are set to 'hostnossl' in the pg_hba.conf -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] More SSL crash woes
Tom Lane wrote: The previous thread suggested that you might have a problem with different bits of code being linked to different versions of libssl. Did you ever resolve that? Given the lack of other reports, I'm pretty suspicious that it's something like that, rather than a real bug in either slony or PG. # ldd /usr/local/pgsql/bin/postgres ... libssl.so.0.9.8 => /usr/local/ssl/lib/libssl.so.0.9.8 libcrypto.so.0.9.8 =>/usr/local/ssl/lib/libcrypto.so.0.9.8 # ldd /usr/local/pgsql/bin/slon ... libssl.so.0.9.8 => /usr/local/ssl/lib/libssl.so.0.9.8 libcrypto.so.0.9.8 =>/usr/local/ssl/lib/libcrypto.so.0.9.8 Now their are 2 subscriber nodes that connect to this node for slony replication... One is running the same version (libssl 0.9.8e) but one is running 0.9.7e-p1 2. could this be an issue? so let's ask what is different between my config and the rest of the world The stack trace actually was one more level deep and the reference to 'output_cert_chain' got me thinking #0 0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8 #1 0xfef5b05b in ssl3_output_cert_chain () from /usr/local/ssl/lib/libssl.so.0.9.8 #2 0x in ?? () Is it unique that I use SSL for encryption but not for authentication? I have no root.crt (and see the warning in my logs about "could not load root certificate file "root.crt": No such file or directory. Will not verify client certificates.") Is this unusual? Do other people use SSL with postgres JUST for encryption? Is there something wrong with the way we build/install libssl? We currently do a pkgadd of the binary from sunfreeware: /usr/sfw/bin/wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/openssl-0.9.8e-sol10-x86-local.gz gzip -d openssl-0.9.8e-sol10-x86-local.gz pkgadd -d openssl-0.9.8e-sol10-x86-local I went back an researched the nearly identical problems we were having under FreeBSD and the stack trace (using a slightly different/older version of libssl) looks like a different spot: (gdb) bt #0 0x2838e492 in SHA1_Init () from /lib/libcrypto.so.3 #1 0x2838a14a in X509_check_private_key () from /lib/libcrypto.so.3 #2 0x2838a459 in EVP_DigestInit_ex () from /lib/libcrypto.so.3 Any other thoughts? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] More SSL crash woes
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" As the proud author of this previous post: http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php I never found a real answer except to disable SSL on the connections between my master and subscriber nodes (instead shuttling data over a secure tunnel) Things have been peachy ever since. Todayfirst db crash in quite some time...seemingly unrelated to slony Stack trace looks eerily familiar: Core was generated by `/usr/local/pgsql/bin/postgres -D /db'. Program terminated with signal 11, Segmentation fault. #0 0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8 *grumble* I have located nothing unusual occurring at the time of the event We have developers that connect from win32 and Fedora boxes via PGAdminIII and they use SSL connections...but they have ALWAYS connected using SSL. Any suggestions? I really need to try to either provide an explanation or make SOME change to prevent upgrade openssl? (we are on 9.8e) remove ALL encrypted connection capabilities (via pg_hba.conf) and force connectivity over secure tunnel? punt? Looks like this box does not have postgres compiled with --enable-debugbut dunno if it would help anyway being that crash occurs in libcrypto Any help would be appreciated.
Re: [GENERAL] oddly high cost but low actual_time (but not in slony replicants)
Sam Mason wrote: Looks like you've "disabled" seqscans. Sure enough. Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] oddly high cost but low actual_time (but not in slony replicants)
And yes...I vacuumed...analyzed...vacuum analyzed the table to no avail. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] oddly high cost but low actual_time (but not in slony replicants)
Only 19 rows in the table. explain analyze select * from table_a; "Seq Scan on table_a (cost=1.00..10001.02 rows=19 width=103) (actual time=0.007..0.012 rows=19 loops=1)" "Total runtime: 0.040 ms" If I run the same query on the same table in a different database that I pg_restored from a recent dump...same results "Seq Scan on table_a (cost=1.00..10001.02 rows=20 width=135) (actual time=0.009..0.011 rows=20 loops=1)" "Total runtime: 0.046 ms" But if I run from a slony replicated node... "Seq Scan on table_a (cost=0.00..1.20 rows=20 width=103)" Weird.. No indexes on the table except the primary key (bigserial)... Not that it has any impact but wherever I get the high cost it is running on Solaris with the database on a zfs mounted SAN "PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" Low cost comes from FreeBSD nodes running database locally "PostgreSQL 8.2.4 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518" It appears that may of my tables exhibit this characteristic... If, however, I use any REAL indexes, cost is much more 'normal'. any ideas? I'm not really worried...but I was troubleshooting a high-cost query that led me to this table specifically. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] AutoVacuum Behaviour Question
Bruce Momjian wrote: No, it isn't. Please add a TODO item about it: * Prevent long-lived temp tables from causing frozen-Xid advancement starvation Can somebody explain this one to me? because of our auditing technique, we have many LONG lived temp tables.(one per pooled connection)...so as long as the pool isn't disturbed, these temp tables can exist for a long time (weeksmonths?) (previous thread about our use of temp tables and autovacuum/xid issues) http://archives.postgresql.org/pgsql-general/2007-01/msg00690.php http://archives.postgresql.org/pgsql-general/2007-01/msg00691.php
Re: [GENERAL] SSL and crash woes.
--- Scott Marlowe <[EMAIL PROTECTED]> wrote: > > Interesting. Is it possible that either you've got 2 versions of > openssl? Maybe slony is being compiled against one, then using the > other lib, etc.? yes...I suppose it is.Solaris came with one...we installed another. hm... # find /usr /lib -name libssl* /usr/lib/mps/amd64/libssl3.so /usr/lib/mps/secv1/amd64/libssl3.so /usr/lib/mps/secv1/libssl3.so /usr/lib/mps/libssl3.so /usr/sfw/lib/amd64/libssl.so /usr/sfw/lib/amd64/libssl.so.0.9.7 /usr/sfw/lib/libssl.so /usr/sfw/lib/libssl.so.0.9.7 /usr/sfw/lib/mozilla/libssl3.so /usr/apache/libexec/libssl.so /usr/local/ssl/lib/libssl.a /usr/local/ssl/lib/libssl.so /usr/local/ssl/lib/libssl.so.0.9.8 /usr/local/ssl/lib/pkgconfig/libssl.pc Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] SSL and crash woes.
A couple of years back (2005) we were setting up replication for the first time (using slony) from our production database server to a stand-by box sitting next to it and a remote box in a DR site. We were running FreeBSD 5.X/6.X on all systems on Dell servers and postgres 7.4.X and then 8.0.X Replication appeared to crash our production database...a lot. After looking at the core dumps/stack traces at the time, we determined that we were crashing in the ssl layersso we disabled SSL (via pg_hba.conf and the slony conn settings) and haven't had an issue for the last couple of years. Stable as a rock. Wellwe just upgraded our hardware (Sun X4600s) and operating sytems (solaris 10) , postgres versions (8.2.4), and slony (1.2.10). Rock solid. However, our first indication of an issue was an issue with executing pg_dump from a remote backup server. (see http://archives.postgresql.org/pgsql-general/2007-08/msg01347.php) Local pg_dump's have no issue. So we changed our backup scheme to do local dumps and push the files off the server to the backup location. Problem solved. Then...replication woes again. With these fresh installs, we didn't think too much about the SSL settingsand bing-bang...crash. Crash. Crash. Crash. Stopped replication. Problem goes away. Start replication...crash crash. So we stopped replication. We recompiled postgres with debug info on a test db box and loaded up the most recent database dump. We then attempted a remote pg_dump from another local server. Crash. Took a look at the core dump... Core was generated by `/usr/local/pgsql/bin/postgres -D /testdb'. Program terminated with signal 11, Segmentation fault. #0 0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8 (gdb) bt #0 0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8 #1 0xfef5b05b in ssl3_output_cert_chain () from /usr/local/ssl/lib/libssl.so.0.9.8 #2 0x in ?? () Hmmm...that looked familiar (from years ago) Sowe set up the connection to be 'hostnossl' in pg_hba.conf and tried again. Success. Changed it back to 'hostssl'.crash. Same place. I am going to take the time and set up test environment for the replication as well, but I assume I will experience the same thing. SSL means crash...no SSL means no crash. Anyone have any thoughts? Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Out of Memory - 8.2.4
--- Alvaro Herrera <[EMAIL PROTECTED]> wrote: >2.168.20.44 28785LOG: duration: 22606.146 ms execute : select > > Interesting. What's your log_line_prefix? Does it have "%q" somewhere? No, no %q...not quite sure what it means: "stop here in non-session processes" Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Out of Memory - 8.2.4
I notice in the log entries for the out of memory events have no username, database name or host identifier (while regular logged events do) Does that mean anything to anybody? Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 2007-08-28 08:25:50.081 CDT29019ERROR: out of memory Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 2007-08-28 08:25:50.081 CDT29019DETAIL: Failed on request of size 536870910. (regular log entry) Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 2007-08-28 08:26:45.413 CDT jboss prod 192.168.20.44 28785LOG: duration: 22606.146 ms execute : select --- Tom Lane <[EMAIL PROTECTED]> wrote: > Can you correlate these occurrences with anything in the regular system > logs (kernel log in particular)? The Postgres log shows nothing out of > the ordinary --- it's simply that the kernel won't give us 512M for some > reason. I'm guessing it's got something to do with overall system load. > > regards, tom lane > Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Tables dissapearing
Kamil Srot wrote: Heh, is the pool still open? Maybe I can make at least something from it :-D Current odds Application bug: even money Application configuration issue: 2-1 Rogue cron job or other maintenance process: 4-1 Somebody messing with you (or SQL injection): 8-1 XID wraparound issue :10-1 Alien or supernatural intervention: 18-1 Obscure postgresql bug nobody else has ever seen: 25-1 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Tables dissapearing
My entire shop has set up a betting pool on the outcome of this...so I hope you post results regardless of the outcome, Kamil. (in all seriousness, we hope you find/fix the problem before things get really ugly) Kamil Srot wrote: Richard Huxton wrote: Kamil Srot wrote: In the version used on this particular server, there is no automatic/programing way of changing the schema. Upgrades are done manually and application itself doesn't need schema changes for routine operations... In that case, you can settle the matter by making sure your application connects as a user that doesn't own any of the tables. That way they can't be dropped. Yes, I'll do it in the future, but now, I want to know what is the problem and solve it rather than doing some workarounds (even if they are good also from the principle indeed). Regards, ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Out of Memory - 8.2.4
--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > The machine we are tracking this problem on is also 64bit. H.looks like 3 different people are tracking a similar issue on 64 bit platforms.you, Erik and myself. Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Out of Memory - 8.2.4
--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > We are actually diagnosing a similar problem on this > end, where we get a > failure at 1920... I am currently trying to get some > DEBUG output. Tracking for last few days. Does not appear to happen when little or no user activity (like Saturday) I don't know if that rules out autovacuum or not (if no update threshholds are reached, no vacuuming will take place anyway) Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848 local0.warning] [2-1] 2007-08-23 11:11:51 CDT ERROR: out of memory Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848 local0.warning] [2-2] 2007-08-23 11:11:51 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848 local0.warning] [2-1] 2007-08-23 11:34:49 CDT ERROR: out of memory Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848 local0.warning] [2-2] 2007-08-23 11:34:49 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848 local0.warning] [2-1] 2007-08-23 12:06:47 CDT ERROR: out of memory Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848 local0.warning] [2-2] 2007-08-23 12:06:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848 local0.warning] [2-1] 2007-08-23 12:58:47 CDT ERROR: out of memory Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848 local0.warning] [2-2] 2007-08-23 12:58:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848 local0.warning] [2-1] 2007-08-23 15:15:35 CDT ERROR: out of memory Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848 local0.warning] [2-2] 2007-08-23 15:15:35 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848 local0.warning] [2-1] 2007-08-23 16:50:47 CDT ERROR: out of memory Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848 local0.warning] [2-2] 2007-08-23 16:50:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848 local0.warning] [2-1] 2007-08-24 10:46:46 CDT ERROR: out of memory Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848 local0.warning] [2-2] 2007-08-24 10:46:46 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848 local0.warning] [2-1] 2007-08-24 11:29:00 CDT ERROR: out of memory Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848 local0.warning] [2-2] 2007-08-24 11:29:00 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848 local0.warning] [2-1] 2007-08-24 11:50:04 CDT ERROR: out of memory Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848 local0.warning] [2-2] 2007-08-24 11:50:04 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848 local0.warning] [2-1] 2007-08-24 12:00:33 CDT ERROR: out of memory Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848 local0.warning] [2-2] 2007-08-24 12:00:33 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848 local0.warning] [2493-1] 2007-08-24 16:03:19.296 CDT 18263ERROR: out of memory Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848 local0.warning] [2493-2] 2007-08-24 16:03:19.296 CDT 18263DETAIL: Failed on request of size 536870910. -- Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848 local0.warning] [3356-1] 2007-08-24 16:45:46.804 CDT 19313ERROR: out of memory Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848 local0.warning] [3356-2] 2007-08-24 16:45:46.804 CDT 19313DETAIL: Failed on request of size 536870910. -- Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848 local0.warning] [4238-1] 2007-08-24 17:29:16.926 CDT 20379ERROR: out of memory Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848 local0.warning] [4238-2] 2007-08-24 17:29:16.926 CDT 20379DETAIL: Failed on request of size 536870910. -- Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848 local0.warning] [4452-1] 2007-08-24 17:40:02.682 CDT 20651ERROR: out of memory Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848 local0.warning] [4452-2] 2007-08-24 17:40:02.682 CDT 20651DETAIL: Failed on request of size 536870910. -- Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848 local0.warning] [56115-1] 2007-08-26 11:14:56.077 CDT 22161ERROR: out of memory Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848 local0.warning] [56115-2] 2007-08-26 11:14:56.077 CDT 22161DETAIL: Failed on request of size 536870910. -- Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848 local0.warning] [56381-1] 2007-08-26 11:27:41.141 CDT 22477ERROR: out of memory Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848 local0.warning] [56381-2] 2007-08-26 11:27:41.141 CDT 22477DETAIL: Failed on request of size 536870910. -- Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848 local0.warning] [56603-1] 2007-08-26 11:37:27.476 CDT 22729ERROR: out of memory Aug 26 11:37:27 db-1 postgres[22729]: [ID 74
Re: [GENERAL] Out of Memory - 8.2.4
--- Tom Lane <[EMAIL PROTECTED]> wrote: > Jeff Amiel <[EMAIL PROTECTED]> writes: > > is log_min_messages one of them that requires a > > restart? > > No, SIGHUP (pg_ctl reload) should be sufficient. Weird looks like some items are going to syslog and some to my defined postgres logfile (from -L option). Bizarre. AnywayI hope this helps someone. At 10:46, I find this in my syslog files.. Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 27008ERROR: out of memory Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 27008DETAIL: Failed on request of size 536870910. and at the same time in my postgres logfile I see this (and only this) TopMemoryContext: 169608 total in 10 blocks; 18832 free (34 chunks); 150776 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (9 chunks); 544 used CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used Autovacuum context: 8192 total in 1 blocks; 5416 free (8 chunks); 2776 used smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 659000 total in 19 blocks; 264904 free (15 chunks); 394096 used sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0 chunks); 768 used PartInd_istream_replication_cluster_sl_log_1-node-1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_autovacuum_vacrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_database_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_cast_source_target_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used Per-database table: 122880 total i
Re: [GENERAL] Out of Memory - 8.2.4
--- Tom Lane <[EMAIL PROTECTED]> wrote: > Jeff Amiel <[EMAIL PROTECTED]> writes: > > is log_min_messages one of them that requires a > > restart? > > No, SIGHUP (pg_ctl reload) should be sufficient. Weird looks like some items are going to syslog and some to my defined postgres logfile (from -L option). Bizarre. AnywayI hope this helps someone. At 10:46, I find this in my syslog files.. Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 27008ERROR: out of memory Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 27008DETAIL: Failed on request of size 536870910. and at the same time in my postgres logfile I see this (and only this) TopMemoryContext: 169608 total in 10 blocks; 18832 free (34 chunks); 150776 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (9 chunks); 544 used CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used Autovacuum context: 8192 total in 1 blocks; 5416 free (8 chunks); 2776 used smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 659000 total in 19 blocks; 264904 free (15 chunks); 394096 used sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0 chunks); 768 used PartInd_istream_replication_cluster_sl_log_1-node-1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_autovacuum_vacrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_database_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_cast_source_target_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used Per-database table: 122880 total i
Re: [GENERAL] Out of Memory - 8.2.4
--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > Having log_line_prefix with at least %p and %m (or > %t) plus a > log_min_messages of DEBUG2 would be great. i am getting the additional timestampt/pid on my log lines nowbut no additional debug output... is log_min_messages one of them that requires a restart? Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Out of Memory - 8.2.4
--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > We are actually diagnosing a similar problem on this > end, where we get a > failure at 1920... I am currently trying to get some > DEBUG output. We are actually getting it semi-regularly today (3 times already)I would be happy to provide some more info if somebody guides me (just set log_min_messages to one of the debug settings?) Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Out of Memory - 8.2.4
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" Week-old installstill tuning and tweaking this thing. Over last 2 days, have spotted 10 "Out of Memory" errors in postgres logs (never saw before with same app/usage patterns on tuned hardware/postgres under FreeBSD) Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: out of memory. Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 local0.warning] [6-2] 2007-08-22 18:08:24 CDT DETAIL: Failed on request of size 536870910. What I found interesting is that It's ALWAYS the same size536870910 I am running autovacuum and slony.but I see nothing in the logs anywhere near the "out of memory" errors related to either (autovacuum used to under 8.0.X log INFO messages every time it vacuumed which came in handy...I assume it doesn't so this any more?) The events are fairly spread out...and cannot (by looking at app logs and rest of DB logs) correlate to any specific query or activity. Any help would be appreciated Box is a Sun X4600 with 8 dual-core processors and 32 gig of ram. # su - pgsql Sun Microsystems Inc. SunOS 5.10 Generic January 2005 -bash-3.00$ ulimit -a core file size(blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files(-n) 256 pipe size (512 bytes, -p) 10 stack size(kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes(-u) 16357 virtual memory(kbytes, -v) unlimited shared_buffers = 3GB# min 128kB or max_connections*16kB temp_buffers = 1000 # min 800kB was 8MB max_prepared_transactions = 450 # can be 0 or more work_mem = 100MB# min 64kB maintenance_work_mem = 512MB# min 1MB #max_stack_depth = 2MB # min 100kB max_fsm_pages = 208000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1 # min 100, ~70 bytes each #max_files_per_process = 1000 # min 25 #shared_preload_libraries = '' # (change requires restart) fsync = on # turns forced synchronization on or off wal_sync_method = fdatasync # the default is the first option full_page_writes = off # recover from partial page writes wal_buffers = 2300 # min 32kB commit_delay = 10 # range 0-10, in microseconds #commit_siblings = 5# range 1-1000 checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_timeout = 5min # range 30s-1h checkpoint_warning = 99s# 0 is off Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] pg_dump causes postgres crash
--- Tom Lane <[EMAIL PROTECTED]> wrote: > > I can't help thinking you are looking at generalized > system > instability. Maybe someone knocked a few cables > loose while > installing new network hardware? Database server/storage instability or network instability? There is no doubt that there is something flaky about the networking between the db server and the box(es) trying to do the pg_dump. We have indeed had issues (timeouts, halts, etc) moving large quantities of data across various segments to and from these boxes...like the db serverbut how would this effect something like a pg_dump? Would a good stack trace (assuming I want to crash my database again) help here? Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] pg_dump causes postgres crash
>From the logs tonight when the second crash occurred.. Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848 local0.info] [6-1] 2007-08-22 20:45:12 CDT LOG: received smart shutdown request Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848 local0.info] [7-1] 2007-08-22 20:45:12 CDT LOG: server process (PID 20188) was terminated by signal 11 Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848 local0.info] [8-1] 2007-08-22 20:45:12 CDT LOG: terminating any other active server processes There was a core file created...but I believe I do not have postgresql compiled with debug info.(well, a pstack provided nothing useful) pstack core |more core 'core' of 20188: /usr/local/pgsql/bin/postgres -D /db fee8ec23 sk_value (10023d, 105d8b00, d2840f, 1c7f, f20f883, 10584) + 33 0c458b51 (0, 0, 511f1600, 2000400, ff001c09, 467f71ea) () Once again...a local pg_dump worked just fine 30 minutes later.. We have introduced some new network architecture which is acting odd lately (dell managed switches, netscreen ssgs, etc) and the database itself resides on a zfs partition on a Pillar SAN (connected via fibre channel) Any thoughts would be appreciated. Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] pg_dump causes postgres crash
Fairly new (less than a week) install. "PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" database size around 43 gigabytes. 2 attempts at a pg_dump across the network caused the database to go down... The first time I thought it was because of mismatched pg_dump (was version 8.0.X)...but the second time it was definitely 8.2.4 version of pg_dump. My first thought was corruption...but this database has successfully seeded 2 slony subscriber nodes from scratch as well running flawlessly under heavy load for the past week. Even more odd is that a LOCAL pg_dump (from on the box) succeeded just fine tonight (after the second crash). Thoughts? First Crash--- backup-srv2 prod_backup # time /usr/bin/pg_dump --format=c --compress=9 --ignore-version --username=backup --host=prod_server prod > x pg_dump: server version: 8.2.4; pg_dump version: 8.0.13 pg_dump: proceeding despite version mismatch pg_dump: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. pg_dump: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: SQL command to dump the contents of table "access_logs" failed: PQendcopy() failed. pg_dump: Error message from server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: The command was: COPY public.access_logs (ip, username, "action", date, params) TO stdout; --Second Crash backup-srv2 ~ # time /usr/bin/pg_dump --format=c --compress=9 --username=backup --host=prod_server prod | wc -l pg_dump: Dumping the contents of table "audit" failed: PQgetCopyData() failed. pg_dump: Error message from server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: The command was: COPY public.audit (audit_id, entity_id, table_name, serial_id, audit_action, when_ts, user_id, user_ip) TO stdout; Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] could not open file "pg_clog/0BFF"
http://svr5.postgresql.org/pgsql-hackers/2006-03/msg01198.php http://www.mail-archive.com/pgsql-general@postgresql.org/msg90845.html This is related to an autovacuum bug and the freeze logic with template0and probably a corrupted pg_statistics table in template0 you should upgrade AND repair the template0 issue (I actually replaced template0 with a copy from a replicant database) ..But take heed, if you don't 'do' something about it, autovacuum is actually stopping with tempate 0 and never getting to the rest of your database (as I found out) Patrick Lindeman wrote: What could have caused the error "could not access status of transaction 3221180546" and is it more than just coincidence that since that moment the vacuum stopped running? Is there anything we can do to prevent this in future use? Besides we are using PostgreSQL 8.1.3. Any help would be appreciated. Regards, Patrick Lindeman ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Postmaster does not shut down
A 'bad' thing happened yesterday. Postgresql 8.1.X FreeBSD 6.0 At some point in the day, ran out of space on the root filesystem. (db is elsewhere) Took about 10 minutes to clear enough space to make processes stop freaking out and to slow my heart-rate down to below 200 beats per minute. Everything 'seemed' to be fine after that. Database was constantly utilized with no errors for several more hours. Later that evening (during our peak activity time), a process was inadvertently run that when combined with peak activity time caused the load average on the box to spike into the stratosphere (19). *sigh* Database activity was taking so long (heavy web app server use), that db connection handles skyrocketed (over 120 when normally stays below 30) and started getting complaints from pool management No ManagedConnections available within configured blocking timeout ( 3 [ms] ) At that point, started stopping all unnecessary activity and even restarted app server to clear all the connections. Howeverthe db connections all stayed in some sort of 'frozen' state.showing that they were either idle (in transaction) or processing SELECTs or whatever they were last doing. (Normally restarting app server will cause those connections to reset pretty quickly) Tried killing a few of the idle ones manually...but it was like pissing on a forest fire. H. It was time to do something hadn't done in a LOOONG time, restart postgres. Uh oh "postmaster does not shut down". I tried fast, I tried immediate. No dice. Postmaster refused to die. I don't know how many times I've heard "don't kill -9 the postmaster", but that's what I did (on the 'head' of the snake referenced in the pid file)it died, but all the other processes remained. *grumble*. Couldn't get a killall to get rid of the rest (not sure why) so I was back to pissing on the forest file typing kill kill kill. My co-hort in the office claimed that more processes were showing up for each one I killed(I assume he was delirious at that point) but this was taking too long anyway. So I bounced the box. I prayed. Came back up fine...postgres, slony, etc. no errors reported when the database started. I see no errors in my logs through the night when autovacuum and backups occurred, so the database 'appears' to be intact. I do notice that all of my logs (all.log, etc) are 'missing' between the filesystem full issue earlier in the day and the time I rebooted (I assume the OS was NOT happy with me letting it run out of space on / ) So.other than the half-dozen things that I obviously did wrong, was there anything else I could have done with postgres when it refused to shut down? Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Way to determine index bloat stats?
Got a bit of a rude surprise when I rebuilt a slony subscriber node from scratch and noticed the indexes for some tables using significantly less space than on their master-node counterpart. In once case, it was 2 gigabytes versus 9 gigabytes for the same table indexs on the master node. I'm assuming that because of the specific indexed fields and the various 'states' that they go through that we ended up with many many sparsely populated pages of index data. While I am not too concerned (took 3 years to get this 'fragmented' plus the ability to do concurrent index creation in 8.2 may help), I am interested in determining a way to see how 'sparsely' populated the index file pages are at any particular point in time Is there some magic way of determining the number of bytes in an index 'row' and then comparing the size on disk of the index file? Also...would another index type (hash?) might be better suited for some of these indexes (they are all currently btree)..and would that reduce the bloat? (I wish I had tracked which specific indexes were 'bloated' so I could analyze the usage pattern of the fields used.) Any help would be appreciated. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] low transaction ID wrap limit
Whenever I read the documentation on the transaction wraparound stuff, my head spins with all the references to frozen xids and min/max ages. When it comes down to it, my interpretation of the whole kaboodle is "run postgresql v8.1 or later and autovacuum, and you will not have to deal with the wraparound issue." I was looking at some logs this morning and spotted this on my backup database Jun 24 04:08:46 back-app-1 postgres[82445]: [2-1] pgsql 82445 LOG: transaction ID wrap limit is 41612954, limited by database "back" That seemed like an awfully low number compared to the 1 or 2 billion that I have seen in other posts on the subject. postgres=# SELECT datname, age(datfrozenxid) FROM pg_database; datname |age ---+ postgres | 1073968236 back | 1079399370 template0 | 1657840460 template1 | 1078788693 postgres=# select * from pg_database; datname | datdba | encoding | datistemplate | datallowconn | datconnlimit | datlastsysoid | datvacuumxid | datfrozenxid | dattablespace | datconfig | datacl ---++--+---+--+--+---+--+--+---+---+-- postgres | 10 |0 | f | t| -1 | 10792 | 3268269560 | 2194527737 | 1663 | | back | 10 |0 | f | t| -1 | 10792 | 3268269655 | 2194527832 | 1663 | | template0 | 10 |0 | t | f| -1 | 10792 | 1610655513 | 1610655513 | 1663 | | {pgsql=CT/pgsql} template1 | 10 |0 | t | t| -1 | 10792 | 326849 | 2194754451 | 1663 | | {pgsql=CT/pgsql} (4 rows) Should I be worried? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] pg_dump/restore and functions/triggers/trigger functions
Tom Lane <[EMAIL PROTECTED]> wrote:I think that would have excluded anything that didn't demonstrably belong to schema public, such as procedural languages. Is it possible that *all* your functions failed to load, and you only noted the ensuing GRANT/REVOKE failures? yes...ALL my functions did indeed fail to load. Weirdness is that I only saw errors on the GRANT/REVOKE failures. Huh. I created a simple test case with a single function...got the same results. I removed the --schema='public' and it worked fine. I compared the TOCs on the 2 different files and sure enough, there is an entry on the one that DIDN'T only use public that had the plpgsql entry. 248; 2612 90212 PROCEDURAL LANGUAGE - plpgsql Huh. I guess if I added plpgsql to template1 before I created the empty shell db to restore into, I would never have seen this issue. Ok I guess the real question is (other than related to this issue), it there any need to dump the catalog/informational schemas? - Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
Re: [GENERAL] pg_dump/restore and functions/triggers/trigger functions
The original pg_dump used --schema="public" . Could the fact that pg_catalog or information_schema weren't included cause these kinds of issues? (I can't imagine why) <[EMAIL PROTECTED]> wrote: did a pg_dump --format=c for a production database (on a 8.1.2 server) and attempted to pg_restore on a 8.2.0 server. Things seemed to go fine with the exception of functions, triggers and trigger functions. It was apparently doing a bunch of ACL work towards the end and spewed a slew of errors like this: Feb 6 16:23:02 devl-app1 postgres[22552]: [624-1] ERROR: function add_entity(text, text, integer, text) does not exist Feb 6 16:23:02 devl-app1 postgres[22552]: [624-2] STATEMENT: REVOKE ALL ON FUNCTION add_entity(text, text, integer, text) FROM PUBLIC; so I checks the TOC to ensure that the function and ACLs were listed [EMAIL PROTECTED] pg_restore -l prod-app-1.pgsql-prod.pgsql | grep add_entity 320; 1255 16412 FUNCTION public add_entity(text, text, integer, text) pgsql 3318; 0 0 ACL public add_entity(text, text, integer, text) pgsql sure enoughthe function and the ACLs were there. Hmmm...When all was said and done, the only functions in the database were ones that originally came from pg_crypto (from template0 when I created the new database to load the data into) Also...similar problem with triggers/trigger functions Feb 6 16:07:27 devl-app1 postgres[22552]: [509-1] ERROR: function batch_stat_populate() does not exist Feb 6 16:07:27 devl-app1 postgres[22552]: [509-2] STATEMENT: CREATE TRIGGER batch_stats_updater Feb 6 16:07:27 devl-app1 postgres[22552]: [509-3] AFTER INSERT ON batch_audit Feb 6 16:07:27 devl-app1 postgres[22552]: [509-4] FOR EACH ROW Feb 6 16:07:27 devl-app1 postgres[22552]: [509-5] EXECUTE PROCEDURE batch_stat_populate(); What up? is this a version inconsistency issue? Please tell me I don't have to -l into a listfile and manually re-order items and the use -L ? - Sucker-punch spam with award-winning protection. Try the free Yahoo! Mail Beta. - Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out.
[GENERAL] pg_dump/restore and functions/triggers/trigger functions
did a pg_dump --format=c for a production database (on a 8.1.2 server) and attempted to pg_restore on a 8.2.0 server. Things seemed to go fine with the exception of functions, triggers and trigger functions. It was apparently doing a bunch of ACL work towards the end and spewed a slew of errors like this: Feb 6 16:23:02 devl-app1 postgres[22552]: [624-1] ERROR: function add_entity(text, text, integer, text) does not exist Feb 6 16:23:02 devl-app1 postgres[22552]: [624-2] STATEMENT: REVOKE ALL ON FUNCTION add_entity(text, text, integer, text) FROM PUBLIC; so I checks the TOC to ensure that the function and ACLs were listed [EMAIL PROTECTED] pg_restore -l prod-app-1.pgsql-prod.pgsql | grep add_entity 320; 1255 16412 FUNCTION public add_entity(text, text, integer, text) pgsql 3318; 0 0 ACL public add_entity(text, text, integer, text) pgsql sure enoughthe function and the ACLs were there. Hmmm...When all was said and done, the only functions in the database were ones that originally came from pg_crypto (from template0 when I created the new database to load the data into) Also...similar problem with triggers/trigger functions Feb 6 16:07:27 devl-app1 postgres[22552]: [509-1] ERROR: function batch_stat_populate() does not exist Feb 6 16:07:27 devl-app1 postgres[22552]: [509-2] STATEMENT: CREATE TRIGGER batch_stats_updater Feb 6 16:07:27 devl-app1 postgres[22552]: [509-3] AFTER INSERT ON batch_audit Feb 6 16:07:27 devl-app1 postgres[22552]: [509-4] FOR EACH ROW Feb 6 16:07:27 devl-app1 postgres[22552]: [509-5] EXECUTE PROCEDURE batch_stat_populate(); What up? is this a version inconsistency issue? Please tell me I don't have to -l into a listfile and manually re-order items and the use -L ? - Sucker-punch spam with award-winning protection. Try the free Yahoo! Mail Beta.
[GENERAL] pg_restore single table privileges/indexes
"PostgreSQL 8.2.0 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518" We just switched from 'pg_dumpall" to "pg_dump -format=c" for our nightly backups. I wanted to experiment with restoring a single table (if the need should ever arise) from the dump file. I use the --clean option on the restore to drop the table first pg_restore -U pgsql --table codes --dbname devl --clean backup.pgsql It seems to work fine with no issues. However, none of the privileges are set. I checked the dump with grep and sure enough, they are there: [EMAIL PROTECTED] grep --binary-files=text "GRANT SELECT ON TABLE codes TO jboss;" backup.pgsql GRANT SELECT ON TABLE codes TO jboss; So...the privileges are in the dump...but do not get restored. Is this issue unique to the --table option? Is there some way to force it to restore the privileges? I also notice that the indexes are not restored...that I understand. I guess I have to do separate --index for each index on the table, correct? Any help would be appreciated. - Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates.
Re: [GENERAL] drive failre, corrupt data...
Tom Lane wrote: Yech. So much for RAID reliability ... maybe you need to reconfigure the array for more redundancy? Yeah...I'm not sure if I screwed the pooch by trying the bring the drive back 'online'.in the past we just try re-seating it and the raid card 'does its thing' and rebuilds or takes it offline again. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] drive failre, corrupt data...
raid rebuilt... ran fsck PARTIALLY TRUNCATED INODE I=612353 SALVAGE? yes INCORRECT BLOCK COUNT I=612353 (544 should be 416) CORRECT? yes PARTIALLY TRUNCATED INODE I=612389 SALVAGE? yes INCORRECT BLOCK COUNT I=612389 (544 should be 416) CORRECT? yes INCORRECT BLOCK COUNT I=730298 (676448 should be 675520) CORRECT? yes [EMAIL PROTECTED] find /db -inum 612353 /db/pg_clog/0952 [EMAIL PROTECTED] find /db -inum 612389 /db/pg_clog/0951 [EMAIL PROTECTED] find /db -inum 730298 /db/base/1093090/1212223 hmmm...wanted to see what the third one was so I test=# select oid, relname from pg_class order by oid; ERROR: could not access status of transaction 2485385834 DETAIL: could not open file "pg_clog/0942": No such file or directory Soam I screwed here...just I just re-init-db and restore the entire kit and kaboodle from scratch? Jeff Amiel <[EMAIL PROTECTED]> wrote: Had a drive failure on a raid 5 array of a backup box that a couple of postgres databases sit on. One of the databases is a slony subscriber to a production database and the other is a test-environment database. The drive was offline...brought it back online, hoping it would start a rebuild...which it didn't. Almost immediately started getting errors from slony could not access status of transaction 2463273456 could not open file "pg_clog/0937": No such file or directory ... etc. Looks like the subscriber database had some issues (at least with one specific table). In addition, trying to access to the other (test) database yielded an error accessing pg_namespace. Soreseated the drive which started a rebuild. I stopped postgres. When the rebuild is done (or if it fails, I will replace the drive), I will restart postgres and see what happens. Question...should I just re-initdb and restore databases from backup? Should I have done something differently once I noticed the failure? I've had drive failures before on this box and either rebuilt the array or replaced the drive with no postgres issues (although the amount of traffic was much less than now) Any help would be appreciated. - Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. - Any questions? Get answers on any topic at Yahoo! Answers. Try it now.
[GENERAL] drive failre, corrupt data...
Had a drive failure on a raid 5 array of a backup box that a couple of postgres databases sit on. One of the databases is a slony subscriber to a production database and the other is a test-environment database. The drive was offline...brought it back online, hoping it would start a rebuild...which it didn't. Almost immediately started getting errors from slony could not access status of transaction 2463273456 could not open file "pg_clog/0937": No such file or directory ... etc. Looks like the subscriber database had some issues (at least with one specific table). In addition, trying to access to the other (test) database yielded an error accessing pg_namespace. Soreseated the drive which started a rebuild. I stopped postgres. When the rebuild is done (or if it fails, I will replace the drive), I will restart postgres and see what happens. Question...should I just re-initdb and restore databases from backup? Should I have done something differently once I noticed the failure? I've had drive failures before on this box and either rebuilt the array or replaced the drive with no postgres issues (although the amount of traffic was much less than now) Any help would be appreciated. - Don't pick lemons. See all the new 2007 cars at Yahoo! Autos.
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Tom Lane <[EMAIL PROTECTED]> wrote: Really? Wow, *that's* an interesting thought. Is it likely that that temp table could contain many-hour-old data? Certainly...our connection pool used by jboss can have connections to postgres persisting for multiple days. (We're still looking for a way to tell it to recycle these occasionally). As each 'user' of our web based app performs some action, they acquire one of the connection pool connections and set their user_id in the temporary table used by that connection (we use that for our audit triggers) Once they are 'done' with the connection, the connection is just released back to the pool but not actually closed...so the temp table still contains the data from a previous iteration. - TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV.
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Tom Lane <[EMAIL PROTECTED]> wrote: BTW, please don't do anything to try to correct the problem until we're pretty sure we understand how this happened --- we might ask you for more info. AFAICS this isn't having any bad effects except for bleats in your log file, so you can wait. Happened again...however this time not appearingly related to an autovacuum. I have a past example of this almost identical event a few weeks ago. Here's what the logs look like with debug2 as the log_min level: Jan 13 08:26:53 prod-app-1 postgres[41795]: [40171-1] 41795 LOG: autovacuum: processing database "template1" Jan 13 08:26:53 prod-app-1 postgres[560]: [40171-1] 560 DEBUG: server process (PID 41794) exited with exit code 0 Jan 13 08:26:57 prod-app-1 postgres[563]: [915-1] 563 DEBUG: checkpoint starting Jan 13 08:26:57 prod-app-1 postgres[563]: [916-1] 563 DEBUG: recycled transaction log file "0001005D0069" Jan 13 08:26:57 prod-app-1 postgres[563]: [917-1] 563 DEBUG: checkpoint complete; 0 transaction log file(s) added, 0 removed, 1 recycled Jan 13 08:27:02 prod-app-1 postgres[560]: [40172-1] 560 DEBUG: forked new backend, pid=42368 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40173-1] 560 DEBUG: forked new backend, pid=42369 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40174-1] 560 DEBUG: forked new backend, pid=42370 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40175-1] 560 DEBUG: server process (PID 42370) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40176-1] 560 DEBUG: forked new backend, pid=42371 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40177-1] 560 DEBUG: server process (PID 42369) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40178-1] 560 DEBUG: server process (PID 42371) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40179-1] 560 DEBUG: forked new backend, pid=42372 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40180-1] 560 DEBUG: server process (PID 42372) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40181-1] 560 DEBUG: forked new backend, pid=42373 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40182-1] 560 DEBUG: server process (PID 42373) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40183-1] 560 DEBUG: forked new backend, pid=42374 socket=8 Jan 13 08:27:02 prod-app-1 postgres[560]: [40184-1] 560 DEBUG: server process (PID 42374) exited with exit code 0 Jan 13 08:27:02 prod-app-1 postgres[560]: [40185-1] 560 DEBUG: server process (PID 42368) exited with exit code 0 Jan 13 08:27:08 prod-app-1 postgres[560]: [40186-1] 560 DEBUG: forked new backend, pid=42375 socket=8 Jan 13 08:27:08 prod-app-1 postgres[560]: [40187-1] 560 DEBUG: server process (PID 42375) exited with exit code 0 Jan 13 08:27:23 prod-app-1 postgres[560]: [40188-1] 560 DEBUG: forked new backend, pid=42376 socket=8 Jan 13 08:27:23 prod-app-1 postgres[560]: [40189-1] 560 DEBUG: server process (PID 42376) exited with exit code 0 Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-1] jboss 92257 ERROR: could not access status of transaction 2107200825 Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-2] jboss 92257 DETAIL: could not open file "pg_clog/07D9": No such file or directory Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-3] jboss 92257 CONTEXT: SQL statement "DELETE FROM audit_metadata WHERE user_id <> -1" Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-4] PL/pgSQL function "disable_auditing" line 2 at SQL statement Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-5] jboss 92257 STATEMENT: select disable_auditing() Jan 13 08:27:38 prod-app-1 postgres[560]: [40190-1] 560 DEBUG: forked new backend, pid=42377 socket=8 Jan 13 08:27:38 prod-app-1 postgres[560]: [40191-1] 560 DEBUG: server process (PID 42377) exited with exit code 0 Jan 13 08:27:49 prod-app-1 postgres[560]: [40192-1] 560 DEBUG: forked new backend, pid=42378 socket=8 Jan 13 08:27:50 prod-app-1 postgres[560]: [40193-1] 560 DEBUG: forked new backend, pid=42379 socket=8 Jan 13 08:27:50 prod-app-1 postgres[560]: [40194-1] 560 DEBUG: forked new backend, pid=42380 socket=8 Jan 13 08:27:53 prod-app-1 postgres[560]: [40195-1] 560 DEBUG: forked new backend, pid=42381 socket=8 Jan 13 08:27:53 prod-app-1 postgres[42382]: [40196-1] 42382 LOG: autovacuum: processing database "postgres" Jan 13 08:27:53 prod-app-1 postgres[560]: [40196-1] 560 DEBUG: server process (PID 42381) exited with exit code 0 Jan 13 08:28:02 prod-app-1 postgres[560]: [40197-1] 560 DEBUG: forked new backend, pid=42951 socket=8 Jan 13 08:28:02 prod-app-1 postgres[560]: [40198-1] 560 DEBUG: forked new backend, pid=42952 socket=8 pg_clog dir looks like this: -rw--- 1 pgsql wheel 262144 Jan 13 05:41 07DA -rw--- 1 pgsql wheel 262144 Jan 13 08:06 07DB -rw--- 1 pgsql wheel 90112 Jan 13 08:51 07DC Now that table, audit_meta
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Tom Lane <[EMAIL PROTECTED]> wrote: Well, if the problem is indeed in pg_statistic, it'll be easy to repair (think TRUNCATE...). Have you turned up the logging level to find out? BTW, please don't do anything to try to correct the problem until we're pretty sure we understand how this happened --- we might ask you for more info. AFAICS this isn't having any bad effects except for bleats in your log file, so you can wait. We turned up the logging this morning.(debug2) and are awaiting the issue to re-occur. Will report back then. Thanks. - It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar.
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Alvaro Herrera <[EMAIL PROTECTED]> wrote: Did you perchance connect to template0 (probably marking it as connectable in the process), made some modification there, and then mark it as non-connectable again, without executing VACUUM FREEZE on it? AFAICS we only execute VACUUM FREEZE on it, so we shouldn't leave any unfrozen tuples. Nope. Have never touched template0...haven't made any changes to pg_hba.conf (which is how I assume you would make template0 connectable) until last night when we determined that template0 might be at fault and we were going to run pg_filedump on it to see if we could find anything useful. I assume this is similar/identical to this issue: http://archives.postgresql.org/pgsql-hackers/2006-03/msg01294.php Any ideas on how should I move forward? - Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Tom Lane <[EMAIL PROTECTED]> wrote:So apparently there's some row in template0 that didn't get marked committed before the pg_clog segment for it went away. Given 8.1's rather schizophrenic view of whether it can modify template0 or not, this is not too surprising, but I thought we'd put in some defense against this happening. Alvaro? Jeff, had you changed your autovac settings recently? Given that autovac seems to be trying to hit template0 every few minutes, it's hard to see how the tuple got missed for long enough to be a problem. Sure enough I did make autovacuum more agressive about 30-45 days ago (have to check the logs to find the exact date). Was originally whatever default settings that came out of the box with 8.1. Naptime is currently set to 60 seconds. Am I to assume that this probably isn't the result of some operating system or filesystem misfeasance and that corrupt data in my 'real' databases is probably not an issue? - Now that's room service! Choose from over 150,000 hotels in 45,000 destinations on Yahoo! Travel to find your fit.
Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Looking backwards in the logs we see it a few other times this month... (Autovacuum occurring just prior)...same transaction ID How could it be the same transaction ID from several days prior? Jan 2 03:05:04 prod-app-1 postgres[8524]: [4-1] 8524 LOG: autovacuum: processing database "template0" Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-1] 8524 ERROR: could not access status of transaction 1924023481 Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-2] 8524 DETAIL: could not open file "pg_clog/072A": No such file or directory Jeff Amiel <[EMAIL PROTECTED]> wrote: "PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518" Started seeing this in the logs this afternoon...scaring the begeezus out of me. Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481 Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481 Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481 Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory I could find nothing any transaction refereced in pg_locks...nor could I find a file with that designation (was always the same transaction id) ls -l /db/pg_clog/ total 984 -rw--- 1 pgsql wheel 262144 Jan 11 09:55 07CF -rw--- 1 pgsql wheel 262144 Jan 11 13:45 07D0 -rw--- 1 pgsql wheel 262144 Jan 11 17:00 07D1 -rw--- 1 pgsql wheel 172032 Jan 11 20:39 07D2 At first I thought it was related to a constantly running batch process which I halted, and sure enough, the problem seemed to go away. If I restarted, the problem returned. I then tried select * from on most of the tables used by that process (at least the last days worth) and found no obvious issues or errors. About that time, I noticed that the errors were all IMMEDIATELY preceeded by an autovacuum of template0. So the logs actually looked like this: Jan 11 19:20:19 prod-app-1 postgres[1752]: [4-1] 1752 LOG: autovacuum: processing database "template0" Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481 Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1] 4094 LOG: autovacuum: processing database "template0" Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481 Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1] 6728 LOG: autovacuum: processing database "template0" Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481 Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with. I restarted the batch process that I thought was the culprit and the issue has 'gone away'. I'm sure there is some corruption somewhere (from reading various similar posts), but not sure where to start. Any help would be appreciated. - Access over 1 million songs - Yahoo! Music Unlimited. - Check out the all-new Yahoo! Mail beta - Fire up a more powerful email and get things done faster.
[GENERAL] Corrupt database? 8.1/FreeBSD6.0
"PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518" Started seeing this in the logs this afternoon...scaring the begeezus out of me. Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481 Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481 Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481 Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory I could find nothing any transaction refereced in pg_locks...nor could I find a file with that designation (was always the same transaction id) ls -l /db/pg_clog/ total 984 -rw--- 1 pgsql wheel 262144 Jan 11 09:55 07CF -rw--- 1 pgsql wheel 262144 Jan 11 13:45 07D0 -rw--- 1 pgsql wheel 262144 Jan 11 17:00 07D1 -rw--- 1 pgsql wheel 172032 Jan 11 20:39 07D2 At first I thought it was related to a constantly running batch process which I halted, and sure enough, the problem seemed to go away. If I restarted, the problem returned. I then tried select * from on most of the tables used by that process (at least the last days worth) and found no obvious issues or errors. About that time, I noticed that the errors were all IMMEDIATELY preceeded by an autovacuum of template0. So the logs actually looked like this: Jan 11 19:20:19 prod-app-1 postgres[1752]: [4-1] 1752 LOG: autovacuum: processing database "template0" Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481 Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1] 4094 LOG: autovacuum: processing database "template0" Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481 Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1] 6728 LOG: autovacuum: processing database "template0" Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481 Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with. I restarted the batch process that I thought was the culprit and the issue has 'gone away'. I'm sure there is some corruption somewhere (from reading various similar posts), but not sure where to start. Any help would be appreciated. - Access over 1 million songs - Yahoo! Music Unlimited.
Re: [GENERAL] MAGIC_MODULE and libc
yup... I did this CREATE OR REPLACE FUNCTION getpid() RETURNS integer AS $BODY$ DECLARE thepid integer; BEGIN select into thepid pg_backend_pid from pg_backend_pid(); return thepid; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; thanks!!! Martijn van Oosterhout wrote: On Tue, Dec 19, 2006 at 08:20:23AM -0600, Jeff Amiel wrote: ERROR: incompatible library "/lib/libc.so.6": missing magic block SQL state: XX000 Hint: Extension libraries are required to use the PG_MODULE_MAGIC macro. I've added the appropriate macro to all my actual C code...but in this case surely I am not expected to recompile libc...am I? Is there an easier way to get the pid of the current process? In general you should use a wrapper library, but in your case pg_backend_pid() will do it. Have a nice day,
[GENERAL] MAGIC_MODULE and libc
I just upgraded to v8.2 and have an issue with one of my stored proc/functions that worked under 8.1: CREATE OR REPLACE FUNCTION getpid() RETURNS integer AS '/lib/libc.so.6', 'getpid' LANGUAGE 'c' VOLATILE; ERROR: incompatible library "/lib/libc.so.6": missing magic block SQL state: XX000 Hint: Extension libraries are required to use the PG_MODULE_MAGIC macro. I've added the appropriate macro to all my actual C code...but in this case surely I am not expected to recompile libc...am I? Is there an easier way to get the pid of the current process? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] FreeBSD 6.0 PostgreSQL upgrade
We have a production FreeBSD 6.0 system with Postgresql 8.1 where we have avoided upgrading/updating the ports to avoid compatability and other unknown issues. We have our supfile default date set on our production, test and development environments to ensure that they all have the same versions. I want to upgrade postgresql to v8.2 and started 'toying' with how to do so on a development server (that is set up identically to our production/test servers) What I ended up trying was to download the postgresql82-client and -server tars from freebsd.org/ports and unpacking in the /usr/ports/database directory. I did a make deinstall on the 8.1 directory and then did a make install clean on the 8.2 client. no issues. When I went to do the postgresql82-server directory and tried "make install clean" and received this error: echo "===> postgresql-server-8.2.0 "Unknown\ PostgreSQL\ version:\ 82.;exit 1 ===> postgresql-server-8.2.0 Unknown PostgreSQL version: 82. *** Error code 1 h.. Little digging...little research leads me to /usr/ports/Mk/bsd.database.mk LIB_DEPENDS+= pq.${PGSQL${PGSQL_VER}_LIBVER}:${PORTSDIR}/databases/postgresql${PGSQL_VER}-client .else IGNORE=Unknown PostgreSQL version: ${PGSQL_VER} On this box (and I assume the production box as well), the file has a series of entries that look like this: PGSQL73_LIBVER= 3 PGSQL74_LIBVER= 3 PGSQL80_LIBVER= 4 PGSQL81_LIBVER= 4 but no mention of 8.2 (duh...I assume this file gets updated as part of cvsup portsdb stuff) so...I added: PGSQL82_LIBVER= 5 And then the build succeeded. Now...I know this sounds quite hackish. I know it was probably wrong. I would never do this on a production box (I was experimenting) Please go easy on me. How should I go about upgrading postgres versions, affecting as little as I can on the box? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Odd transaction timestamp sequence issue
it is done using now() But what I don't understand is how the transaction that started first could 'see' the record that hadn't been changed yet by the initial update to 'COMPLETE'? I thought: "Each transaction sees a snapshot (database version) as of its starttime, no matter what other transactions are doing while it runs" How is the "when" column determined? You did not show it in your SQL commands. If it's being driven off now() or CURRENT_TIMESTAMP, then the above isn't all that surprising, because the value is the time of transaction start not the time at which the update was made. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Odd transaction timestamp sequence issue
PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518 We have triggers on each of our tables that create audit table entries on each insert/update/delete. The audit table (in addition to containing information about the change that was made) contains a timestamp field and a serial as the primary key. The timestamp is generated by calling now() inside the audit trigger, so should contain the timestamp of when the transaction (that yielded the insert/update/delete) began. We have a single (and very important) table that we perform the majority of our operations on..I'll simplify the definition of the table to this: CREATE TABLE thetable ( id int8 NOT NULL DEFAULT nextval('thetable_id_seq'::regclass), flag char(1) NOT NULL, state char(8) NOT NULL } we have a scenario where a batch process takes records from this table in a certain state and one by one, moves them to a new state. update thetable set state='COMPLETE', flag='X' where state='INITIAL' and id=? Another batch process is looking for any records in that new state...and then updates certain elements of it. update thetable set flag='Y' where id in (select id from thetable where state='COMPLETE') This update statement is run as it's own transaction (there is nothing else done in the transaction). The audit logs for some transactions show something very odd. For example, for id 210210 we have an audit trail that looks like this... audit_idrecord_idwhen columnold_val new_val ----- --- --- --- 12102102006-04-20 12:49:03.92 state INITIAL COMPLETE 22102102006-04-20 12:49:03.74 flag X Y By looking at the timestamps, the second update started BEFORE the first update even though the second update cannot occur if the state hasn't been changed and committed by the first one! Even weirder is that the order of the sequence (audit_id) shows them occurring in the opposite order Am I missing something obvious here? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly