from:"\"Jeff Amiel\""

[GENERAL] Catalog Bloat

2015-01-30 Thread Jeff Amiel

Probably temp table related ...but catalog bloat on one of my databases
appears to be pretty bad.

Is the below bloat (table and index) something to worry about?
pg_stat_all_tables show the relations ARE getting successfully vacuumed...

Any suggestions on eliminating?  Not sure if tools like pg_reorg are
appropriate (or effective) or even vacuum full (yikes).
I'd prefer not to take a complete outage - but I would if this bloat is
really an issue.

(I know about "reindex system" (duh) - but as that requires me to take an
outage, my question about IF the bloat is a cause for concern still
stands)

schemaname |  tablename  | tbloat | wastedbytes |  iname
  | ibloat | wastedibytes
+--++-+-++--
pg_catalog | pg_attribute |9.0 |27648000 |
pg_attribute_relid_attnam_index |  243.5 |361627648
pg_catalog | pg_attribute |9.0 |27648000 |
pg_attribute_relid_attnum_index |  168.5 |253894656
pg_catalog | pg_type  |  10.8 |4890624 | pg_type_oid_index
|  135.8 |28721152
pg_catalog | pg_type  |  10.8 |4890624 | pg_type_typname_nsp_index
|  287.2 |60956672
pg_catalog | pg_class|  10.3 |4562944 | pg_class_oid_index
|  94.1 |26689536
pg_catalog | pg_class|  10.3 |4562944 | pg_class_relname_nsp_index
|  270.1 |77144064
pg_catalog | pg_depend|5.3 |3948544 |
pg_depend_reference_index  |  337.0 |156901376
pg_catalog | pg_depend|5.3 |3948544 | pg_depend_depender_index
  |  359.6 |167436288
pg_catalog | pg_index|6.1 |1130496 | pg_index_indexrelid_index
|  72.9 |  7659520
pg_catalog | pg_index|6.1 |1130496 | pg_index_indrelid_index
|  72.9 |  7659520
(10 rows)

Thanks in advance

Re: [GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.

2014-01-09 Thread Jeff Amiel





On Thursday, January 9, 2014 4:03 PM, Jeff Amiel  
wrote:
I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base 
postgresql reindex functions underneath - and I have learned by using ‘verbose’ 
that the actual clustering of the table is quick - it’s the reindexing that is 
slow (It’s doing each reindex sequentially instead of concurently)


I'm second guessing this - I'm not sure I REALLY know how long the table 
cluster takes versus the index - 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.

2014-01-09 Thread Jeff Amiel

I have a maintenance window coming up and using pg_upgrade to upgrade from 
9.2.X to 9.3.X.
As part of the window, I’d like to ‘cluster’ each table by its primary key.  
After doing so, I see amazing performance improvements (probably mostly because 
of index bloat - but possibly due to table fragmentation)

That being said, I have a single table that is blowing my window - 
at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy 
is my white whale. There are 10 indexes (not including the primary key).  Yes - 
10 is a lot - but I’ve been monitoring their use (most are single column or 
partial indexes) and all are used.

That being said, I’ve been reading and experimenting in trying to get a cluster 
of this table (which re-indexes all 10/11 indexes) to complete in a reasonable 
amount of time.

There are lots of settings and ranges to chose from and while my experiments 
continue, I was looking to get some input.  Lowest I have gotten for clustering 
this table is just under 6 hours.  

I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base 
postgresql reindex functions underneath - and I have learned by using ‘verbose’ 
that the actual clustering of the table is quick - it’s the reindexing that is 
slow (It’s doing each reindex sequentially instead of concurently)

PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit
500 gig of ram
2.7gig processors (48 cores)
Shared buffers set to 120gig
Maintenance work men set to 1gig
work men set to 500 meg

Things I have read/seen/been told to tweak…

fsync (set to off)
setting wal_level to minimal (to avoid wal logging of cluster activity)
bumping up maintenance work men (but I’ve also seen/read that uber high values 
cause disk based sorts which ultimately slow things down)
Tweaking checkpoint settings (although with wal_level set to minimal - I don’t 
think it comes into play)

any good suggestions for lighting a fire under this process?

If worse comes to worse, I can vacuum full the table and reindex each index 
concurrently -   but it won’t give me the benefit of having the tuples ordered 
by their oft-grouped primary key.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_prewarm status

2013-12-12 Thread Jeff Amiel

Trying to follow the threads and other references - but I can't determine where 
this patch ended up.
(http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com)

I'm trying to experiment with some new hardware - and the functionality to add 
specific tables/indexes into cache ahead of time will benefit me greatly.

I found a page describing how to apply the patch to 9.2.4 (jumping through some 
hoops - http://issues.collectionspace.org/browse/UCJEPS-432) and was hoping to 
get a version to apply to 9.3.X

Can anyone advise me on how I might get this 'applied' to a 9.3.X source code 
base or any other options to denote specific relations that I'd like to get 
directly into shared_buffers?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.

2013-11-04 Thread Jeff Amiel

On Monday, November 4, 2013 3:23 PM, Adrian Klaver  
wrote:

>Probably poor choice of words:). So then, what we are looking at is 
>other clients trying to update user_profile but not succeeding because 
>pid 4899 is blocking. At this point all I can see is that the offending 
>query is updating some fields the others are not; disabled and reset_code.
>
>Is that always the case?
>
>If so any thing in the code path that is different when those fields are 
>updated?

We have scenarios where exact same query is in play in all instances.
Any thoughts as to the fact that this could be a full table_lock simply based 
on the use of username (non primary key - but specifically unique constraint) 
in the where clause?  I'm grasping I know

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.

2013-11-04 Thread Jeff Amiel






On Monday, November 4, 2013 2:25 PM, Adrian Klaver  
wrote:

>
> I grepped the schema (just to be sure) - no foreign keys on columns or table 
> at all.
> I do have an audit trigger on the table for updates - inserts into an audit 
> table when changes are made and it DOES do a separate select from 
> user_profile for other reasons - but not "for update" or anything - no 
> explicit locking.
>

>Would it be possible to see that audit function?

it's kind of long (really just a lot of compares of old/new values.
The relevant portion (that selects from user_profile) looks like this:

BEGIN
            SELECT user_id, user_ip INTO my_user_id, my_user_ip FROM 
audit_metadata WHERE pg_pid = getpid();
            IF ((NOT FOUND) OR (my_user_id = -1)) THEN
                SELECT user_id INTO my_user_id FROM user_profile WHERE username 
= 'db-'||CURRENT_USER and user_type='DBASE';
                IF (NOT FOUND) THEN
                     RAISE EXCEPTION 'USERNAME NOT FOUND IN USER_PROFILE: % 
',CURRENT_USER;
                END IF;
                my_user_ip := inet_client_addr();
            END IF;

            INSERT INTO audit .
        
        END;



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.

2013-11-04 Thread Jeff Amiel

On Monday, November 4, 2013 1:48 PM, Adrian Klaver  
wrote:

>Any triggers on user_profile?
>Any FK relationship in either direction?

I grepped the schema (just to be sure) - no foreign keys on columns or table at 
all.
I do have an audit trigger on the table for updates - inserts into an audit 
table when changes are made and it DOES do a separate select from user_profile 
for other reasons - but not "for update" or anything - no explicit locking.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table lock when where clause uses unique constraing instead of primary key.

2013-11-04 Thread Jeff Amiel



to: Rob Sargent

The login references have nothing to do with postgres - is simply table/column 
names being used.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] table lock when where clause uses unique constraing instead of primary key.

2013-11-04 Thread Jeff Amiel

PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath), 64-bit
Have got an annoying scenario that has been creating issues for us for years….
Time to try to figure it out.
Essentially, we have a user table where we maintain username, id number, 
enabled/disabled state, etc.
When a user logs in successfully, we reset any failed login attempts on the 
user’s unique entry in this table.

CREATE TABLE user_profile
(
  user_id serial NOT NULL,
  username character varying(50) NOT NULL, 
  login_attempts integer DEFAULT 0,
  …
  CONSTRAINT user_id PRIMARY KEY (user_id),
  CONSTRAINT name UNIQUE (username)
)

However - we often get “lock storms” where SOMEHOW, updates for individual 
users are causing all other updates to ‘lock’ on each other.
Eventually the storm abates (sometimes in seconds - sometimes in minutes)
See edited screen cap:
http://i.imgur.com/x4DdYaV.png
(PID 4899 just has a “where username = $1 cut off that you can’t see out to the 
right)
All updates are done using the username (unique constraint) instead of the 
primary key (the serial)
In retrospect, I suppose these queries should be using the primary key (9 year 
old code) but I am flummoxed as to how these updates can be causing table? 
level locks.
I’ve never been able to catch the lock information during one of these storms - 
but I assume it is a table level lock causing this.
Thoughts?  Is this just ‘normal’ behavior that I am not expecting? (because 
postgres doesn’t know that the username is a unique field)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Jeff Amiel

Thanks much!
(sorry for top-posting, yahoo email sucks)




- Original Message -
From: Amit Langote 
To: Jeff Amiel 
Cc: "dep...@depesz.com" ; "pgsql-general@postgresql.org" 

Sent: Monday, May 20, 2013 9:51 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

I also found one other discussion which has similar issues addressed:

http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html

--
Amit Langote



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Jeff Amiel

Ok - I agree - 

Can somebody help me understand where the row estimates come from on a 
nested-loop operation in postgres then?

- Original Message -
From: hubert depesz lubaczewski 
To: Jeff Amiel 
Cc: "pgsql-general@postgresql.org" 
Sent: Saturday, May 18, 2013 3:39 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense
 to me

Not sure if it helps, but it's apparently not a very rare thing.
Quick analysis on data from explain.depesz.com showed that > 12% of
plans with nested loop have such estimate.

Couple of examples:

http://explain.depesz.com/s/Qm4
http://explain.depesz.com/s/qmW
http://explain.depesz.com/s/qnG
http://explain.depesz.com/s/QO
http://explain.depesz.com/s/qov

...

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel





- Original Message -
From: Amit Langote 
To: Jeff Amiel 
Cc: "pgsql-general@postgresql.org" 
Sent: Friday, May 17, 2013 2:21 PM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

>> explain analyze
>> select  era.child_entity  from entity_rel era  join user_entity ue on 
>> ue.entity_id = era.parent_entity and ue.user_id=12345
>>
>> Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
>> time=0.028..0.274 rows=201 loops=1)
>>   ->  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
>>rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
>>         Index Cond: (user_id = 12345)
>>         Heap Fetches: 1
>>   ->  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
>>rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
>>         Index Cond: (parent_entity = ue.entity_id)
>> Total runtime: 0.361 ms


>I noticed when the explain output in your first mail shows Index Cond:
>(user_id = 10954) whereas your query says: ue.user_id=12345. Something
>with that? Although, your explain analyze does show the same values at
>both places with the row estimate being 29107 in both cases, which,
>well, looks awful and quite unexpected though there seem to have been
>similar observations before

That was a weak attempt at hiding 'real' data - intended to change them all to 
12345.
:)

>Did you also check select count(*) on both the relations and found
>related numbers?

Nothing related (that I could find)  on the rowcounts - one table has 20 
million rows or so ad the other 65K.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel





> explain analyze
> select  era.child_entity  from entity_rel era  join user_entity ue on 
> ue.entity_id = era.parent_entity and ue.user_id=12345
>
> Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
> time=0.028..0.274 rows=201 loops=1)
>   ->  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
>rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
>         Index Cond: (user_id = 12345)
>         Heap Fetches: 1
>   ->  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
>rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
>         Index Cond: (parent_entity = ue.entity_id)
> Total runtime: 0.361 ms

>Have you tried analyze (it's probably a case of insufficient/outdated
>statistics to planner's disposal) or probably consider changing
>default_statistics_target?


Again - my question revolves not around the whether or not I am getting good or 
bad estimates - my question is related to the fact that the nested-loop row 
estimation does not appear to be derived from the nodes below it - it is off by 
orders of magnitude.  I've never seen this before.
That aside, yes - I did analyze and tweak stats target during experimentation - 
no change.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel





- Original Message -
From: Amit Langote 
To: Jeff Amiel 
Cc: "pgsql-general@postgresql.org" 
Sent: Friday, May 17, 2013 11:37 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel  wrote:
> On most nested loops that I do explain/explain analyze on, the row estimation 
> for the nested-loop itself is a product of the inner nodes of the nested loop.
> However in this case, I am stumped!
>
> explain
> select  era.child_entity  from entity_rel era  join user_entity ue on 
> ue.entity_id = era.parent_entity and ue.user_id=12345
>
> Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
>   ->  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
>rows=2 width=4)
>         Index Cond: (user_id = 10954)
>   ->  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
>rows=317 width=8)
>         Index Cond: (parent_entity = ue.entity_id)
>
>
> How can the estimated number of rows for the nested loop node EXCEED the 
> product of the 2 row estimates of the tables being joined?
> Not only does it exceed it - but it is orders of magnitude greater.
>
> Am I missing something obvious here?  I an see the nested loop row estimate 
> being LESS but certainly not more.
>

> Can you also post the output of explain analyze ?

I'm not worried about performance (per se) but the row estimation issue which 
propagates up as part of a bigger query.  But here ya go:

explain analyze
select  era.child_entity  from entity_rel era  join user_entity ue on 
ue.entity_id = era.parent_entity and ue.user_id=12345

Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 
rows=201 loops=1)
  ->  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
    Index Cond: (user_id = 12345)
    Heap Fetches: 1
  ->  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
    Index Cond: (parent_entity = ue.entity_id)
Total runtime: 0.361 ms


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel





> Can you provide a self-contained test case that does this?

That response scares me.
:)
I can try - Every other table set (small, easy to experiment with)  returns 
results as expected - 
Is the implication that this looks 'unusual'?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel

On most nested loops that I do explain/explain analyze on, the row estimation 
for the nested-loop itself is a product of the inner nodes of the nested loop.
However in this case, I am stumped!

explain 
select  era.child_entity  from entity_rel era  join user_entity ue on 
ue.entity_id = era.parent_entity and ue.user_id=12345

Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
  ->  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4)
    Index Cond: (user_id = 10954)
  ->  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8)
    Index Cond: (parent_entity = ue.entity_id)


How can the estimated number of rows for the nested loop node EXCEED the 
product of the 2 row estimates of the tables being joined?
Not only does it exceed it - but it is orders of magnitude greater.  

Am I missing something obvious here?  I an see the nested loop row estimate 
being LESS but certainly not more.



PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath), 64-bit



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] When did this behavior change (and where else might it bite me)?

2013-03-18 Thread Jeff Amiel

In prepping for an upgrade to 9.2.3, I stumbled across this:

CREATE TABLE foo
(
  myint integer,
  string1 text,
  string2 text
)
WITH (
  OIDS=FALSE
);

insert into foo values (12345,'Y','N');

 select * from foo f where f.myint = 12345 or f.name='Y'

In 9.2.3, this returns:
ERROR:  column f.name does not exist
LINE 1:  select * from foo f where myint = 12345 or f.name='Y'

in 8.4.6 ,this returns no error (and gives me the row from the table)

It looks like the parser is short-circuiting in 8.4.6 before stumbling upon the 
invalid column name - EXCEPT when the column name is NOT a reserved word  
(although according to 
http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html, 
'name' is not a reserved word).

Example - in 8.4.6, this WILL return an error:
 select * from foo f where f.myint = 12345 or f.poopy='Y'
ERROR:  column f.poopy does not exist
LINE 2:   select * from foo f where f.myint = 12345 or f.poopy='Y'
   ^

NOTE:  The problem (assuming the problem is in 8.4.6) only manifests itself 
when I use table aliases .
 select * from foo f where myint = 12345 or name='Y' 

gives an error I would expect:
ERROR:  column "name" does not exist
LINE 2:    select * from foo f where myint = 12345 or name='Y'
  ^


Any insight into what change (I poured through the release notes and couldn't 
find anything) may have 'fixed'  this behavior so that I might better head 
these off before my conversion?

(yes, my example was contrived - and I did have an bug where the wrong column 
name was used)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] planner, newly added records and most common values

2012-01-19 Thread Jeff Amiel

Ive got a scenario where I've got a 2 million row table.  Data from inbound 
files gets processed into it.  
A new file might have 10 resulting rows in this table...might have 
40K...depends on the source, day of month, etc.

I've got a process that parses the file and loads the records into the 
table...giving it a unique file_id for the overall load and places that value 
on each record.
Another process will perform a series of queries...joining against that table 
(for only records with that file_id).  
The problem is that the planner has no idea how many records might exist for 
that file_id.  
If I throw a file_id at the planner that is not in the most common value list, 
it picks a nice number like 384 as it's row count estimate.  
So when I am referencing a new file_id (that obviously isn't IN the most common 
value list as yet..regardless of how many 
records I just loaded because I haven't run analyze yet),  the planner 
dutifully estimates that I will get only 384 rows.  
For large files, this is off by 2 (or god forbid, 3) orders of magnitude.  
That yields very bad overall plans (regardless of the fact that I have indexes 
on the file_id column)

It seems like I am in a no-win situation.  The query I am executing is fairly 
complex...and when the planner is off by multiple orders of magnitude on a 
rowcount, it goes way off the tracks in terms of planning.

I COULD do an analyze after loading the file...but there is no guarantee that 
the file I just loaded will end up in the most common value listand I end 
up with bad plan.

Any thoughts?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel

--- On Fri, 12/2/11, Tom Lane  wrote:

> The only real fix for that will require cross-column
> statistics, which
> we don't have yet --- without such, there's no way for the
> planner to
> know that distributors have an atypical number of child
> customers.

The only caveat that I can think of here is that each of the customer_id's in 
the subselect ARE in the MCV list for the parent_customer column for that 
customer_rel table.  If I had listed SPECIFIC customer ids in the subselect, I 
would get spot-on row estimates.  Is there anything there that gives me any 
hope?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel

--- On Fri, 12/2/11, Tom Lane  wrote:

> The only real fix for that will require cross-column
> statistics, which
> we don't have yet --- without such, there's no way for the
> planner to
> know that distributors have an atypical number of child
> customers.
> 

I suspected as such.

> At the moment I think the only way to work around this is
> to denormalize
> your schema a bit.  

And I feared as much.
It's biting me in other areas as well...this unusual distribution of 
data...certain types of customers have completely different data patterns than 
others.  
Back to the drawing board...thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel



--- On Fri, 12/2/11, David Johnston  wrote:

> 
> 
> Can you wrap the query into an SQL or PL/pgSQL function so
> that, at least,
> then planner will not be able to see the embedded plan info
> in the outer
> queries?  You use-case may allow you to create one or
> more VIEWs wrapping
> the function call with pre-determined parameters so that
> you do not lose the
> ability to write simple select queries without the need for
> explicit
> function calls.
> 
> No idea how that would impact the planner for the other
> queries but maybe
> worth a try while waiting for someone more knowledgeable
> than myself to
> respond.


I had actually considered this...because you can define what the 'row estimate' 
is for a functionexcept in my case, I have some dynamic inputs that would 
change what those estimates would need to be.
Thanks for the help though!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel



--- On Fri, 12/2/11, David Johnston  wrote:

> From: David Johnston 

> -
> My, possibly naïve, observation:
>  
> So aside from the fact the estimates seem to be off the
> planner has still
> chosen the most effective plan?  In that situation no
> matter how accurate
> you get the statistics you will not gain any performance
> because the planner
> will never choose a different plan.
> 
Thanks.
I'm not interested in optimizing this query.it's fast and efficient.  
However, the planner thinks that it is going to return a million rows when it 
is only going to return one thousand.  When this query is used as an element of 
a BIGGER query, that causes me all sorts of planner issues
That's the problem I am trying to solve...why the planner is 3 orders of 
magnitude off in row estimation.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel



--- On Fri, 12/2/11, David Johnston  wrote:

> What happens if you disable, say, nested loops and/or index
> scans?

planner selects different join/indexing techniques (query is slower) but row 
estimates (bad) remain identical.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel


--- On Fri, 12/2/11, David Johnston  wrote:

> From: David Johnston 
> What kind of plan does the following give?
> 
> EXPLAIN ANALYZE
> SELECT *
> FROM customer_rel p
> JOIN customer c ON (p.parent_customer = c.customer_id)
> WHERE c.customer_type = 'DISTRIBUTOR'

Nearly identical output

"Nested Loop  (cost=0.00..29624.69 rows=931 width=97) (actual 
time=0.032..1330.208 rows=1025401 loops=1)"
"  ->  Seq Scan on customer c  (cost=0.00..25429.10 rows=136 width=71) (actual 
time=0.017..212.059 rows=68 loops=1)"
"Filter: (customer_type = 'DISTRIBUTOR'::bpchar)"
"  ->  Index Scan using rel_parent on customer_rel p  (cost=0.00..30.76 rows=7 
width=26) (actual time=0.006..7.732 rows=15079 loops=68)"
"Index Cond: (parent_customer = c.customer_id)"
"Total runtime: 1544.281 ms"


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Oddball data distribution giving me planner headaches

2011-12-02 Thread Jeff Amiel


Oddball data distribution giving me headaches.

We have a distinct 'customer' table with customer_id, type and name/demographic 
information.
Assume some 1 million rows in the customer table.

We then have a customer 'relationship' table which simply contains 2 
columns…designating parent and child relationships…but allowing complex 
hierarchies between customers.


CREATE TABLE customer_rel
(
  parent_customer integer NOT NULL,
  child_customer integer NOT NULL,
 )

8 million rows in this table.  Oddball distribution.  We have some 8 levels of 
hierarchy (customer type) represented with this table.  Every customer gets an 
entry where parent/child is themselves…and then for every 'upline'.  At the 
highest level, we have 'distributors' which have all other customer types 
underneath them.  Assuming we had some 68 distributors, the entries where THEY 
are the parent_customer represent nearly a million rows of the 8 million.

I have extracted a simple case from a larger query that was generating an 
off-beat plan because of the unexpected planner row-counts being spewed by a 
low level query.

explain analyze 
select * from customer_rel where parent_customer in (select customer_id from 
customer where customer_type='DISTRIBUTOR')


"  Nested Loop  (cost=25429.44..29626.39 rows=931 width=0) (actual 
time=216.325..1238.091 rows=1025401 loops=1)"
"->  HashAggregate  (cost=25429.44..25430.80 rows=136 width=4) (actual 
time=216.304..216.339 rows=68 loops=1)"
"  ->  Seq Scan on customer  (cost=0.00..25429.10 rows=136 width=4) 
(actual time=0.018..216.226 rows=68 loops=1)"
"Filter: (customer_type = 'DISTRIBUTOR'::bpchar)"
"->  Index Scan using rel_parent on customer_rel  (cost=0.00..30.76 
rows=7 width=4) (actual time=0.006..8.190 rows=15079 loops=68)"
"  Index Cond: (parent_customer = customer.customer_id)"
"Total runtime: 1514.810 ms"

The fact that the top level nested loop THINKS it only will be returning 931 
rows (instead of over 1 million) is the killer here…3 orders of magnitude.  The 
results of this  query are used as part of a bigger query and the screwed up 
stats are causing all sorts of havoc upline.

I'm experimenting in 9.1.0…have set the statistics to 1000 (and 1) on both 
columns (parent and child) to little effect.  Have hardcoded the n_distinct on 
the parent_customer column to be 1,000,000 also…with no effect (doing analyze 
of table after each change)

Does this oddball data distribution doom me to poor planning forever?
Any other thoughts?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Covert database from ASCII to UTF-8

2011-02-22 Thread Jeff Amiel

It's come time to bite the bullet and convert a half-terraybyte database from 
ASCII to UTF8.  Have gone through a bit of effort to track down the unclean 
ascii text and repair it but would like to avoid the outage of a many-many hour 
dump-restore.

Using Postgres 8.4.X.

Are there any other magic options open to me?  
Any way to do an in-place conversion?
I assume slony replication is an option.
What about some sort of wal log shipping replication?

Any thoughts would be appreciated.



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Table Vacuum (to prevent wraparound)

2011-01-04 Thread Jeff Amiel

PostgreSQL 8.4.4 on x86_64-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath), 64-bit

2 large tables (around a billion rows each) recently received attention from 
autovacuum.  They both currently show up in my currently running queries as 
"autovacuum: VACUUM tablename (to prevent wraparound)".

These tables are insert-only audit tables….no updates, no deletes.
Autovacuum settings are default with no overrides for these specific tables 
(the only override is to set 10 max_workers)

I will admit that the "Preventing Transaction ID Wraparound Failures" chapter 
in the manual (and the subsequent autovacuum daemon chapter) make my head spin.

Is there some specific query related to relfrozenxid, datfrozenxid that I 
should be utilizing to do some analysis?  Obviously SOME threshhold has been 
reached…hence autovacuum stepping in.

My two real questions are:

  1.  Am I in any real danger here?  I have run autovacuum continuously since 
the dump-restore that created this database 6 months ago.
  2.  These autovacuums have been running for 5 days now.  I have a database 
maintenance window tonight where I have to take down the server.  I assume 
autovacuum will re-awaken and decide to start vacuuming these again.  I assume 
I am effecting performance with these needless? vacuums.  Any tweak I 
can/should make to any of the freeze_age params to prevent this?






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Advice on contingency plan for DAS array with separate local WAL drives

2010-06-14 Thread Jeff Amiel

On 6/14/10 8:05 AM, "Jeff Amiel"  wrote:

> What is recommended in terms of prep/switchover in this instance?  Should we
> be rsyncing or using built-in wal-log shipping of these transaction logs to
> our stand-by server?  Simply pop out these drives and hand-move them to the
> standby-server? (assuming they are not the issue in the first place)

I should note that this 'similar' server is not intended to actually be a
real warm standby server in the postgres sense...it serves other
purposes...it simply possesses the right configuration (memory, cpu) to act
as a temporary database server.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Advice on contingency plan for DAS array with separate local WAL drives

2010-06-14 Thread Jeff Amiel

Recently migrated to a shiny new 8.4.4 postgres instancedata stored on 
attached storage array.  Transaction logs stored on 2 local mirrored drives 
(local to the database server itself) for best performance.

While we are replicating (using slony) to our DR site, our first-choice plan 
(in the event of an issue simply with the database server itself) was to 
disconnect the SAS cables from it (connected to the DAS) and connect them to a 
'similar' box we have on standby.  With the WAL logs physically on the drives 
of the original database server, this obviously becomes an issue.  

What is recommended in terms of prep/switchover in this instance?  Should we be 
rsyncing or using built-in wal-log shipping of these transaction logs to our 
stand-by server?  Simply pop out these drives and hand-move them to the 
standby-server? (assuming they are not the issue in the first place)   

Any thoughts would be appreciated.





  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.

2010-06-08 Thread Jeff Amiel

On 6/8/10 2:03 PM, "Alvaro Herrera"  wrote:

> 
> I've seen this problem (and others) in a high-load environment.  Not
> Slony related though.
> 
> I wrote a small tool to check btree index files for consistency problems
> such as this one, by parsing pg_filedump output.  I've seen strange
> things such as index pointers pointing to pages that shouldn't have been
> pointed to; mismatching sibling pointers; and others.
> 
> Do you have a copy of the broken index file?

Alas, no.  But I have another 5 days to reproduce the problem before
changing hardware/versions (which will probably bring a whole new set of
'opportunities' to deal with).  If it happens again, I will snag the index
file for analysis.

Any way you can share your tool source so I can proactively look for issues
in my indexes?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.

2010-06-08 Thread Jeff Amiel

On 6/8/10 1:15 PM, "Jaime Casanova"  wrote:

> On Tue, Jun 8, 2010 at 12:49 PM, Jeff Amiel  wrote:
>> 
>> Does Slony manage it's own vacuuming separate from postgres' autovacuum?
>> 
> 
> Yes it does: http://www.slony.info/documentation/maintenance.html

" It seems preferable to configure autovacuum to avoid vacuum
Slony-I-managed configuration tables. "

HmmmI don't do this.
Surely this is not relative to my corrupt indexes2 attempted vacuums on
same indexes?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.

2010-06-08 Thread Jeff Amiel




On 6/8/10 12:56 PM, "Tom Lane"  wrote:

> Jeff Amiel  writes:
>> On a side note, I am 100% sure that autovacuum was disabled when I brought
>> the database back up after the core dump(s).  However, minutes after
>> restarting, some of my larger tables started getting vacuumed by pgsql user.
>> Any way that a vacuum would kick off for a particular table (or series of
>> tables) even when autovacuum was off in the postgresql.conf?
> 
> Anti-transaction-wraparound vacuums, perhaps?

I would hope not. :)
This is postgres 8.2.X.  Autovacuum has been enabled forever (seemingly with
no errors)
Anything I can look for ? (I searched the logs for references to "must be
vacuumed within" but found nothing)

SELECT datname, age(datfrozenxid) FROM pg_database;

postgres178649703

prod204588751

template1178653277

template0178653131



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 3rd time is a charm.....right sibling is not next child crash.

2010-06-08 Thread Jeff Amiel

On 6/8/10 11:23 AM, "Tom Lane"  wrote:

> In your original report you mentioned that the next autovacuum attempt
> on the same table succeeded without incident.  Has that been true each
> time?  I wonder whether this is some transient state, rather than actual
> corruption that you need to REINDEX to recover from.
> 
I didn't waste time during this event and reindexed as quick as I could.

I will note, however, that these indexes (specifically the one that
generated the error) were in use (successfully) AFTER the event BEFORE the
reindex by the slony triggers (by virtue of inserts into the log tables and
such) even though I stopped autovacuum and slon daemon.  Not reads,
obviously...just inserts/updates.

If nobody else has seen/is seeing this, I will chalk this whole scenario up
to oddball SAN issues (we've logged many a write/read error over the year(s)
causing corruption on these heavily manipulated indexes.  I'll be glad to
move to my attached storage as quickly as possible.

On a side note, I am 100% sure that autovacuum was disabled when I brought
the database back up after the core dump(s).  However, minutes after
restarting, some of my larger tables started getting vacuumed by pgsql user.
Any way that a vacuum would kick off for a particular table (or series of
tables) even when autovacuum was off in the postgresql.conf?  My only manual
vacuum process is kicked off late at night, so this was not it.

Alsobefore I had a chance to disable the slon daemon I also noticed
other slony tables being vacuum analyzed (even though autovacuum was off)
Does Slony manage it's own vacuuming separate from postgres' autovacuum?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Restore roles with same oid

2010-06-08 Thread Jeff Amiel

On 6/8/10 10:47 AM, "Stephen Frost"  wrote:

> Then you've been through this before..  Perhaps you should go check out
> what you did then.  Back before 8.1, we didn't use OIDs for
> users/groups. :)  Changing to OIDs was part of the work that I did to
> add role support.

Hmmm...this code has been in place since March 2005. using the usesysid
from pg_shadow.  8.1 wsn't released until november of 2005.

Doesn't matterI still have some work to do.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Restore roles with same oid

2010-06-08 Thread Jeff Amiel


On 6/8/10 10:39 AM, "Stephen Frost"  wrote:

> I'm afriad you're not going to have a choice..  I would recommend
> creating a mapping from the old IDs to the new ones as part of this
> upgrade, to keep the historical information.  Guess it's not nice to
> point this out- but you really shouldn't have ever used OIDs for
> something external to PG (or even internally, really). :)

To be honest...this was done early in our postgresql careers (back in the
7.X days).  We knew so little.   :)

 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Restore roles with same oid

2010-06-08 Thread Jeff Amiel

On 6/8/10 10:30 AM, "Thom Brown"  wrote:

> Can't you switch to using role names?  I don't think oids are intended
> to be used by anything other than PostgreSQL.

:( If only I couldmassive audit tables contain these IDs with years of
data 

We have a plan to change to sequence values stored in one of our tables (one
for each of our postgresql users) but don't want to pull the trigger on that
plan as part of this upgrade if we can help it.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Restore roles with same oid

2010-06-08 Thread Jeff Amiel

We currently use the 'usesysid' column from pg_shadow (which is really
pg_authid.oid I assume) for a trigger-based auditing mechanism.

We are about to do a dump from an 8.2 database into 8.4 and would like to
preserve the usesysid/oid when restoring.

No matter what options I throw ad pg_dumpall, it gives me 'create role'
commands...and obviously no preservation of the oids.

Any suggestions on how I can manage this?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] 3rd time is a charm.....right sibling is not next child crash.

2010-06-08 Thread Jeff Amiel

Not looking for help...just putting some data out there.

2 previous crashes caused by corrupt slony indexes

http://archives.postgresql.org/pgsql-general/2010-02/msg00022.php

http://archives.postgresql.org/pgsql-general/2009-12/msg01172.php

New one yesterday.

Jun  7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 
2010-06-07 15:05:01.087 CDT9334PANIC:  right sibling 169 of block 168 is 
not next child of 249 in index "sl_seqlog_idx"

We are on the eve of switching off our SAN to some direct attached storage and 
upgrading postgres and slony in the process this weekendso any thoughts 
that it might be hardware, driver or even postgres/slony should be alleviated 
by the fact that everything is changing.

That being said, the fact that each time this has happened, it has been a slony 
index that has been corrupt, I find it 'odd'.  While I can't imagine a bug in 
slony corrupting postgres indexes...and I can't imagine a bug in postgres 
corrupting only slony indexes, I don't really know what to think.  Just putting 
this out there in case anyone has similar issues or can use this data in some 
meaningful way. 

Stack trace looks similar to last time.

Program terminated with signal 6, Aborted.
#0  0xfecba227 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfecba227 in _lwp_kill () from /lib/libc.so.1
#1  0xfecb598f in thr_kill () from /lib/libc.so.1
#2  0xfec61ed3 in raise () from /lib/libc.so.1
#3  0xfec41d0d in abort () from /lib/libc.so.1
#4  0x0821b8a6 in errfinish (dummy=0) at elog.c:471
#5  0x0821c74b in elog_finish (elevel=22, fmt=0x82b7780 "right sibling %u of 
block %u is not next child of %u in index \"%s\"") at elog.c:964
#6  0x0809e1a0 in _bt_pagedel (rel=0x867bcd8, buf=139905, stack=0x86b3768, 
vacuum_full=0 '\0') at nbtpage.c:1141
#7  0x0809f835 in btvacuumscan (info=0x8043f70, stats=0x86b5c30, callback=0, 
callback_state=0x0, cycleid=29488) at nbtree.c:936
#8  0x0809fc65 in btbulkdelete (fcinfo=0x0) at nbtree.c:547
#9  0x0821f424 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at 
fmgr.c:1215
#10 0x0809a89f in index_bulk_delete (info=0x8043f70, stats=0x0, 
callback=0x812ffc8 , callback_state=0x86b5818) at indexam.c:573
#11 0x0812ff54 in lazy_vacuum_index (indrel=0x867bcd8, stats=0x86b5b70, 
vacrelstats=0x86b5818) at vacuumlazy.c:660
#12 0x0813055a in lazy_vacuum_rel (onerel=0x867b7f8, vacstmt=0x86659b8) at 
vacuumlazy.c:487
#13 0x0812e910 in vacuum_rel (relid=140925368, vacstmt=0x86659b8, 
expected_relkind=114 'r') at vacuum.c:1107
#14 0x0812f95a in vacuum (vacstmt=0x86659b8, relids=0x8665bc0) at vacuum.c:400
#15 0x08186e16 in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914
#16 0x08187278 in autovac_start () at autovacuum.c:178
#17 0x0818bfed in ServerLoop () at postmaster.c:1252
#18 0x0818d16d in PostmasterMain (argc=3, argv=0x833adc8) at postmaster.c:966
#19 0x08152cce in main (argc=3, argv=0x833adc8) at main.c:188
(gdb) 







  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Another PANIC corrupt index/crash ...any thoughts?

2010-02-01 Thread Jeff Amiel

About a month ago I posted about a database crash possibly caused by corrupt 
index..

Dec 30 17:41:57 db-1 postgres[28957]: [ID 748848 local0.crit] [34004622-1] 
2009-12-30 17:41:57.825 CST28957PANIC:  right sibling 2019 of block 2018 is 
not next child of 1937 in index "sl_log_2_idx1"

Has since happened again with a DIFFERENT index (interestingly also a slony 
related index)

Jan 29 15:17:42 db-1 postgres[29025]: [ID 748848 local0.crit] [4135622-1] 
2010-01-29 15:17:42.915 CST29025PANIC:  right sibling 183 of block 182 is 
not next child of 158 in index "sl_seqlog_idx"

I re-indexed the table...and restarted the database and all appears well 
(shut down autovacuum and slony for a while first to get feet underneath and 
then restarted after a few hours with no apparent ill effects)

Coincidentally (or not) started getting disk errors about a minute AFTER the 
above error (db storage is on a fibre attached SAN)

/var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi_vhci: [ID 734749 
kern.warning] WARNING: vhci_scsi_reset 0x1
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi: [ID 243001 
kern.warning] WARNING: /p...@0,0/pci10de,5...@d/pci1077,1...@0/f...@0,0 (fcp1):
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 
kern.warning] WARNING: /scsi_vhci/d...@g000b08001c001958 (sd9):
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 
kern.notice]  Requested Block: 206265378 Error Block: 206265378
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 
kern.notice]  Vendor: Pillar Serial Number: 

/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 
kern.notice]  Sense Key: Unit Attention
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 
kern.notice]  ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, 
FRU: 0x0

Stack trace from recent crash is below:

Program terminated with signal 6, Aborted.
#0  0xfed00c57 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfed00c57 in _lwp_kill () from /lib/libc.so.1
#1  0xfecfe40e in thr_kill () from /lib/libc.so.1
#2  0xfecad083 in raise () from /lib/libc.so.1
#3  0xfec90b19 in abort () from /lib/libc.so.1
#4  0x0821b6ea in errfinish (dummy=0) at elog.c:471
#5  0x0821c58f in elog_finish (elevel=22, fmt=0x82b7200 "right sibling %u of 
block %u is not next child of %u in index \"%s\"") at elog.c:964
#6  0x0809e0a8 in _bt_pagedel (rel=0x8602f78, buf=377580, stack=0x881d660, 
vacuum_full=0 '\0') at nbtpage.c:1141
#7  0x0809f73d in btvacuumscan (info=0x8043f60, stats=0x8578410, callback=0, 
callback_state=0x0, cycleid=20894) at nbtree.c:936
#8  0x0809fb6d in btbulkdelete (fcinfo=0x0) at nbtree.c:547
#9  0x0821f268 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at 
fmgr.c:1215
#10 0x0809a7a7 in index_bulk_delete (info=0x8043f60, stats=0x0, 
callback=0x812fea0 , callback_state=0x85765e8) at indexam.c:573
#11 0x0812fe2c in lazy_vacuum_index (indrel=0x8602f78, stats=0x85769c8, 
vacrelstats=0x85765e8) at vacuumlazy.c:660
#12 0x08130432 in lazy_vacuum_rel (onerel=0x8602140, vacstmt=0x85d9f48) at 
vacuumlazy.c:487
#13 0x0812e7e8 in vacuum_rel (relid=140353352, vacstmt=0x85d9f48, 
expected_relkind=114 'r') at vacuum.c:1107
#14 0x0812f832 in vacuum (vacstmt=0x85d9f48, relids=0x85d9f38) at vacuum.c:400
#15 0x08186cee in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914
#16 0x08187150 in autovac_start () at autovacuum.c:178
#17 0x0818bec5 in ServerLoop () at postmaster.c:1252
#18 0x0818d045 in PostmasterMain (argc=3, argv=0x83399a8) at postmaster.c:966
#19 0x08152ba6 in main (argc=3, argv=0x83399a8) at main.c:188

Any thoughts on how I should proceed?
We are planning an upgrade to 8.4 in the short-term, but I can see no evidence 
of fixes since the 8.2 version that would relate to index corruption.  I have 
no real evidence of bad disks...iostat -E reports:

# iostat -E
sd2   Soft Errors: 1 Hard Errors: 4 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300Revision:  Serial No:
Size: 2.20GB <2200567296 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 4 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd3   Soft Errors: 1 Hard Errors: 32 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300Revision:  Serial No:
Size: 53.95GB <53948448256 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 32 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd7   Soft Errors: 1 Hard Errors: 40 Transport Errors: 8
Vendor: Pillar   Product: Axiom 300Revision:  Serial No:
Size: 53.95GB <53948448256 bytes>
Media Error: 0 Device Not Ready: 1 No Device: 33 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd8   Soft Errors: 1 Hard Errors: 34 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300Revision:  Serial No:
Size: 107.62GB <107622432256 bytes>
Media Error: 0 De

Re: [GENERAL] PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1"

2009-12-30 Thread Jeff Amiel

--- On Wed, 12/30/09, Jeff Amiel  wrote:
> Subject: PANIC:  right sibling 2019 of block 2018 is not next child of 1937 
> in index "sl_log_2_idx1"


I am assuming a re-index for that particular index will rebuild/fix the index 
(if it happens again).  Any other thoughts? 


  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PANIC: right sibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1"

2009-12-30 Thread Jeff Amiel

I know I don't have a lot of data right now (still looking for core dump)
Any obvious thoughts or advice until I can get more info?

Dec 30 17:41:57 db-1 postgres[28957]: [ID 748848 local0.crit] [34004622-1] 
2009-12-30 17:41:57.825 CST28957PANIC:  right sibling 2019 of block 2018 is 
not next child of 1937 in index "sl_log_2_idx1"
Dec 30 17:46:17 db-1 postgres[17576]: [ID 748848 local0.info] [34005240-1] 
2009-12-30 17:46:17.279 CST17576LOG:  autovacuum process (PID 28957) was 
terminated by signal 6
Dec 30 17:46:17 db-1 postgres[17576]: [ID 748848 local0.info] [34005241-1] 
2009-12-30 17:46:17.279 CST17576LOG:  terminating any other active server 
processes

Database recovered itself...and autovacuum of sl_log_2 started up again with no 
obvious issue right afterwords.

PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath)

Oldish version of slony (1.2.10)




  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.

2009-11-27 Thread Jeff Amiel

--- On Fri, 11/27/09, Tom Lane  wrote:

> You didn't show us any evidence of that, either.  Both
> of your test
> cases are using the index.

Ok...third try.  The cost when passing in an empty string is SIGNIFICANTLY 
higher than when not.  Wouldn't seem that the planner is using the index 
effectively.

Aggregate  (cost=231884.57..231884.57 rows=1 width=0)

versus 

Aggregate  (cost=34.83..34.83 rows=1 width=0)


By 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.

2009-11-27 Thread Jeff Amiel

hmm...ok...planner is not using the index effectively (as effectively as when a 
non-empty value is passed in)

--- On Fri, 11/27/09, Tom Lane  wrote:

> From: Tom Lane 
> Subject: Re: [GENERAL] empty string causes planner to avoid index. Makes me 
> sad.
> To: "Jeff Amiel" 
> Cc: pgsql-general@postgresql.org
> Date: Friday, November 27, 2009, 3:14 PM
> Jeff Amiel 
> writes:
> > It appears that somehow the empty string is causing
> the planner to abandon the index.
> 
> You didn't actually show us such a case...
> 
>            
> regards, tom lane
> 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] empty string causes planner to avoid index. Makes me sad.

2009-11-27 Thread Jeff Amiel

PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath)

CREATE TABLE items
(
  field1 character(9) NOT NULL,
  field2 character varying(17) NOT NULL
};


CREATE INDEX "field1-field2"
  ON items
  USING btree
  (field1, field2);

About 15 million rows in the items table.

explain select count(*) from items where field1 = '102100400' and field2 = '';

Aggregate  (cost=231884.57..231884.57 rows=1 width=0)
  ->  Bitmap Heap Scan on items  (cost=4286.53..231841.95 rows=170468 width=0)
Recheck Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = 
''::text))
->  Bitmap Index Scan on "field1-field2-check"  (cost=0.00..4282.27 
rows=170468 width=0)
  Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = 
''::text))


explain select count(*) from items where field1 = '102100400' and field2 = '
 ';  /*17 spaces*/

Aggregate  (cost=34.83..34.83 rows=1 width=0)
  ->  Index Scan using "field1-field2" on items  (cost=0.00..34.82 rows=18 
width=0)
Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ' 
'::text))


If I have any value in field2 other than an empty string '' (like '1' or 
'space'), it will use the index.
It appears that somehow the empty string is causing the planner to abandon the 
index.

Can I get any insights into this?


  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Frustrated...pg_dump/restore

2008-10-06 Thread Jeff Amiel

--- On Mon, 10/6/08, Scott Marlowe <[EMAIL PROTECTED]> wrote:

> I'm wondering if the OP has some line breaks in his
> data that are
> getting misinterpreted, or maybe his encoding on the two
> dbs is
> different and he's not taking care of that.


Ahhh
*looks at encoding*

Well..they are both the same...BUT...they are set to
ENCODING = 'SQL_ASCII';

That explains a lotthey should probably be set to Unicode UTF8
Duh

Any way to change encoding without dumping/restoring database?





  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] function returning setof..select versus select * from

2008-10-06 Thread Jeff Amiel

What is the difference between:

select foo();
and
select * from foo();

Foo is defined as:

CREATE OR REPLACE FUNCTION foo()
  RETURNS SETOF integer AS
'SELECT column from foo_table;'
  LANGUAGE 'sql' STABLE;

Explain shows difference...

explain select * from foo()
"Function Scan on foo  (cost=0.00..1.25 rows=1000 width=4)"

Explain select foo();
"Result  (cost=0.00..0.00 rows=1 width=0)"

They both return the same results..yet yield different plans...and different 
speeds when using 'real' data.

Why come?



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Frustrated...pg_dump/restore

2008-10-06 Thread Jeff Amiel


I performed a pg_dump on a database and created a new schema-only database to 
copy that data into.

However trying to use psql -f to load the data in, I get a plethora of syntax 
errors including the dreaded "invalid command \N".

I even tried to pipe the pg_dump results directly into the psql command

/usr/local/pgsql/bin/pg_dump -U  pgsql --data-only db1  | 
/usr/local/pgsql/bin/psql -U pgsql db2

Same results.

Why?

using -d (switching to inserts instead of COPY) seems to work just fine but is 
so slow as to be unusable.

Is the COPY pg_dump method useful at all in ANY situation?

Do I have to do a pg_dump using a custom archive option and use pg_resore to 
make this work? (sounds silly to me).

Any help would be appreciated.


  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] function returning setof..select versus select * from

2008-10-02 Thread Jeff Amiel

What is the difference between:

select foo();
and
select * from foo();

Foo is defined as:

CREATE OR REPLACE FUNCTION foo()
  RETURNS SETOF integer AS
'SELECT column from foo_table;'
  LANGUAGE 'sql' STABLE;

Explain shows difference...

explain select * from foo()
"Function Scan on foo  (cost=0.00..1.25 rows=1000 width=4)"

Explain select foo();
"Result  (cost=0.00..0.00 rows=1 width=0)"

They both return the same results..yet yield different plans...and
different speeds when using 'real' data.

Why come?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] The planner hates me.

2008-09-25 Thread Jeff Amiel


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 


The Right Way (tm) to do this would be something like

create temp table dates as select * from get_dates(...);
analyze dates;
... original select, but join against temp table ...

which would leave the planner armed with some stats about the range
of dates of interest.  



I actually tried that   but minus the analyze. :(

Btw, congrats on whomever contributed to the bitmap scans (v8.1 I
believe)the ability to use multiple indexes for a single table in
the same query is fantastic!!




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] The planner hates me.

2008-09-25 Thread Jeff Amiel


-Original Message-
From: Hoover, Jeffrey [mailto:[EMAIL PROTECTED] 

change t.date2< dates.date to t.date2+0=  dates.date) and t.date2+0http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] The planner hates me.

2008-09-25 Thread Jeff Amiel

"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC)
3.4.3 (csl-sol210-3_4-branch+sol_rpath)"
(test environment)

Picture a table called 'transaction' with 1 million rows.
most (> 99% of the records have date1 and date2 values in the past
(spread over 4 years)
99.99% of the records have a state OTHER than 'I'.


CREATE TABLE transaction
(
  amount numeric(10,2) NOT NULL,
  date1 date NOT NULL,
  state character(1) NOT NULL,
  date2 date DEFAULT date(now())
)

CREATE INDEX t_date1_index
  ON transaction
  USING btree
  (date1);

CREATE INDEX t_date2_index
  ON transaction
  USING btree
  (date2);

CREATE INDEX t_state_index
  ON transaction
  USING btree
  (state);



explain analyze
 select sum(amount),  dates.date as date  
from transaction t
 join (select get_dates as date from
get_dates('09/17/08','09/24/08')) dates on  
(t.state='I' or   t.date1 >=  dates.date)   
   group by dates.date

get_dates simply returns each date between (and including) the passed
dates ...and is a 'stable' function).  Yes...I know I probably could
have used a generate_series or something...but this was written before I
knew anything about that

"HashAggregate  (cost=1290485.15..1290485.40 rows=200 width=17) (actual
time=277.804..277.809 rows=8 loops=1)"
"  ->  Nested Loop  (cost=270.37..1123134.88 rows=334700533 width=17)
(actual time=3.182..153.741 rows=120536 loops=1)"
"->  Function Scan on get_dates  (cost=0.00..1.25 rows=1000
width=4) (actual time=0.057..0.065 rows=8 loops=1)"
"->  Bitmap Heap Scan on "transaction" t  (cost=270.37..618.60
rows=336357 width=22) (actual time=3.093..10.958 rows=15067 loops=8)"
"  Recheck Cond: ((t.state = 'I'::bpchar) OR (t.date1 >=
get_dates.get_dates))"
"  ->  BitmapOr  (cost=270.37..270.37 rows=336357 width=0)
(actual time=2.853..2.853 rows=0 loops=8)"
"->  Bitmap Index Scan on t_state_index
(cost=0.00..8.97 rows=4971 width=0) (actual time=2.842..2.842 rows=15067
loops=8)"
"  Index Cond: (state = 'I'::bpchar)"
"->  Bitmap Index Scan on t_date1_index
(cost=0.00..252.99 rows=331387 width=0) (actual time=0.009..0.009 rows=2
loops=8)"
"  Index Cond: (t.date1 >= get_dates.get_dates)"
"Total runtime: 277.883 ms"


Uses bitmap scans for optimization and performs admirably.

BUT...when I change the query thusly... (adding in an addition AND
clause)

explain analyze
 select sum(amount),  dates.date as date  
from transaction t
 join (select get_dates as date from
get_dates('09/17/08','09/24/08')) dates on  
(t.state='I' or   t.date1 >=  dates.date)and t.date2
< dates.date
   group by dates.date

"HashAggregate  (cost=1222618.09..1222618.34 rows=200 width=15) (actual
time=7538.193..7538.199 rows=8 loops=1)"
"  ->  Nested Loop  (cost=0.00..1166174.15 rows=112887885 width=15)
(actual time=0.889..7411.997 rows=120522 loops=1)"
"Join Filter: ((t.state = 'I'::bpchar) OR (t.date1 >=
get_dates.get_dates))"
"->  Function Scan on get_dates  (cost=0.00..1.25 rows=1000
width=4) (actual time=0.055..0.062 rows=8 loops=1)"
"->  Index Scan using t_date2_index on "transaction" t
(cost=0.00..590.77 rows=328800 width=24) (actual time=0.018..492.348
rows=986273 loops=8)"
"  Index Cond: (t.date2 < get_dates.get_dates)"
"Total runtime: 7538.259 ms"

I am pulling my hair out.  Why does it insist on using the index on the
date2 field?   It was doing so well with the bitmap indexing on the
other fields...if the planner would simply do the same with date2, all
would be well.  I really need The date2 portion of the query to filter
AFTER the first two clauses, because those are quicker and narrows the
data down to a manageable size


Any suggestions on query or index changes?  I still need the date2 index
for other queriesbut the fact that it gets used exclusively for this
query drives me batty!.

I've tried various combinations of multi-column indexes to no
avail...because I still need the date2 index for other reasons, this
query ALWAYS chooses it for some reason.


Any help would be appreciated.







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] more SSL crash woes....

2008-04-09 Thread Jeff Amiel

"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3
(csl-sol210-3_4-branch+sol_rpath)"

As the proud author of this previous post:
http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php

I never found a real answer except to disable SSL on the connections between my 
master and
subscriber nodes.

Things have been peachy ever since.

Todayfirst db crash in quite some time...seemingly unrelated to slony

Stack trace looks eerily familiar:

Core was generated by `/usr/local/pgsql/bin/postgres -D /db'.
Program terminated with signal 11, Segmentation fault.
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8

*grumble*

I have located nothing unusual occurring at the time of the event
We have developers that connect from win32 and Fedora boxes via PGAdminIII and 
they use SSL
connections...but they have ALWAYS connected using SSL.

Any suggestions?  I really need to try to either provide an explanation or make 
SOME change to
prevent

upgrade openssl? (we are on 9.8e)
remove ALL encrypted connection capabilities (via pg_hba.conf)?
punt?

I'm dubious that that stack trace simply 'starts' in the middle of the 
libcrypto stuff.
shouldn't I see some postgresql functions in there somewhere?
(postgres compiled with --enable-debug)

Any help would be appreciated.











  

You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] More SSL crash woes

2008-04-08 Thread Jeff Amiel


Tom Lane wrote:

One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is
it possible that when you compile PG it is finding the header files
for some other version?
  



Sure
enough...I put a #ERROR at the top of the 'old/incorrect' ssl..h and 
did a make clean/make and errored out.

So I was building with 0.9.8 libraries...but 0.9.7 header files.

That can't be good.
I guess hat would explain why nobody else on the planet has seen this 
issue  :)   



thanks much for the assist!




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] More SSL crash woes

2008-04-08 Thread Jeff Amiel


Tom Lane wrote:

One idea: you are linking to /usr/local/ssl/lib/libssl.so, but is
it possible that when you compile PG it is finding the header files
for some other version?
  


yes...if I could figure out how the include path is being set on the 
postgresql build.
I'm looking at the config.log and I see no reference to -I (to set the 
include path)

It simply references the header files as "openssl/ssl.h"

Any way to tell the default include path for gcc?
There are two sets:

/usr/sfw/include/openssl/ssl.h (older incorrect one)
/usr/local/ssl/include/openssl/ssl.h (newer 'correct one)

I guess I could build something that #includes openssl/ssl.h and 
'butcher' the bad one and see what happens.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] More SSL crash woes

2008-04-08 Thread Jeff Amiel


Jeff Amiel wrote:



Now their are 2 subscriber nodes that connect to this node for slony 
replication...
One is running the same version (libssl 0.9.8e) but one is running  
0.9.7e-p1 2.

could this be an issue?

Note that both nodes are set to 'hostnossl' in the pg_hba.conf


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] More SSL crash woes

2008-04-08 Thread Jeff Amiel


Tom Lane wrote:

The previous thread suggested that you might have a problem with
different bits of code being linked to different versions of libssl.
Did you ever resolve that?  Given the lack of other reports, I'm
pretty suspicious that it's something like that, rather than a real
bug in either slony or PG.
  


# ldd /usr/local/pgsql/bin/postgres
   ...
   libssl.so.0.9.8 =>   /usr/local/ssl/lib/libssl.so.0.9.8
   libcrypto.so.0.9.8 =>/usr/local/ssl/lib/libcrypto.so.0.9.8
# ldd /usr/local/pgsql/bin/slon
   ...
   libssl.so.0.9.8 =>   /usr/local/ssl/lib/libssl.so.0.9.8
   libcrypto.so.0.9.8 =>/usr/local/ssl/lib/libcrypto.so.0.9.8

Now their are 2 subscriber nodes that connect to this node for slony 
replication...
One is running the same version (libssl 0.9.8e) but one is running  
0.9.7e-p1 2.

could this be an issue?

so let's ask what is different between my config and the rest of the 
world


The stack trace actually was one more level deep and the reference to 
'output_cert_chain' got me thinking

#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8
#1  0xfef5b05b in ssl3_output_cert_chain () from 
/usr/local/ssl/lib/libssl.so.0.9.8

#2  0x in ?? ()

Is it unique that I use SSL for encryption but not for authentication?  
I have no root.crt (and see the warning in my logs about   "could not 
load root certificate file "root.crt": No such file or directory.  Will 
not verify client certificates.")

Is this unusual?  Do other people use SSL with postgres JUST for encryption?

Is there something wrong with the way we build/install libssl?
We currently do a pkgadd of the binary from sunfreeware:

/usr/sfw/bin/wget 
ftp://ftp.sunfreeware.com/pub/freeware/intel/10/openssl-0.9.8e-sol10-x86-local.gz

gzip -d openssl-0.9.8e-sol10-x86-local.gz
pkgadd -d openssl-0.9.8e-sol10-x86-local

I went back an researched the nearly identical problems we were having 
under FreeBSD and the stack trace (using a slightly different/older 
version of libssl) looks like a different spot:


(gdb) bt
#0  0x2838e492 in SHA1_Init () from /lib/libcrypto.so.3
#1  0x2838a14a in X509_check_private_key () from /lib/libcrypto.so.3
#2  0x2838a459 in EVP_DigestInit_ex () from /lib/libcrypto.so.3

Any other thoughts?















--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] More SSL crash woes

2008-04-08 Thread Jeff Amiel





"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3
(csl-sol210-3_4-branch+sol_rpath)"

As the proud author of this previous post:
http://archives.postgresql.org/pgsql-general/2007-08/msg01911.php

I never found a real answer except to disable SSL on the connections between my master and
subscriber nodes (instead shuttling data over a secure tunnel)

Things have been peachy ever since.

Todayfirst db crash in quite some time...seemingly unrelated to slony

Stack trace looks eerily familiar:

Core was generated by `/usr/local/pgsql/bin/postgres -D /db'.
Program terminated with signal 11, Segmentation fault.
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8

*grumble*

I have located nothing unusual occurring at the time of the event
We have developers that connect from win32 and Fedora boxes via PGAdminIII and they use SSL
connections...but they have ALWAYS connected using SSL.

Any suggestions?  I really need to try to either provide an explanation or make SOME change to
prevent

upgrade openssl? (we are on 9.8e)
remove ALL encrypted connection capabilities (via pg_hba.conf) and force connectivity over secure tunnel?
punt?

Looks like this box does not have postgres compiled with --enable-debugbut dunno if it would help anyway being 
that crash occurs in libcrypto

Any help would be appreciated.

Re: [GENERAL] oddly high cost but low actual_time (but not in slony replicants)

2008-03-26 Thread Jeff Amiel


Sam Mason wrote:
Looks like you've "disabled" seqscans.  

  


Sure enough.  Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] oddly high cost but low actual_time (but not in slony replicants)

2008-03-26 Thread Jeff Amiel


And yes...I vacuumed...analyzed...vacuum analyzed the table to no avail.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] oddly high cost but low actual_time (but not in slony replicants)

2008-03-26 Thread Jeff Amiel


Only 19 rows in the table.

explain analyze select * from table_a;
"Seq Scan on table_a  (cost=1.00..10001.02 rows=19 
width=103) (actual time=0.007..0.012 rows=19 loops=1)"

"Total runtime: 0.040 ms"

If I run the same query on the same table in a different database that I 
pg_restored from a recent dump...same results


"Seq Scan on table_a  (cost=1.00..10001.02 rows=20 
width=135) (actual time=0.009..0.011 rows=20 loops=1)"

"Total runtime: 0.046 ms"

But if I run from a slony replicated node...

"Seq Scan on table_a  (cost=0.00..1.20 rows=20 width=103)"

Weird..

No indexes on the table except the primary key (bigserial)...

Not that it has any impact but wherever I get the high cost it is 
running on Solaris with the database on a zfs mounted SAN
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 
3.4.3 (csl-sol210-3_4-branch+sol_rpath)"


Low cost comes from FreeBSD nodes running database locally
"PostgreSQL 8.2.4 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 
3.4.4 [FreeBSD] 20050518"


It appears that may of my tables exhibit this characteristic...
If, however, I use any REAL indexes, cost is much more 'normal'.

any ideas?

I'm not really worried...but I was troubleshooting a high-cost query 
that led me to this table specifically.














--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] AutoVacuum Behaviour Question

2007-11-01 Thread Jeff Amiel



Bruce Momjian wrote:


No, it isn't.  Please add a TODO item about it:
 * Prevent long-lived temp tables from causing frozen-Xid advancement
   starvation




  


Can somebody explain this one to me?  because of our auditing technique, 
we have many LONG lived temp tables.(one per pooled connection)...so 
as long as the pool isn't disturbed, these temp tables can exist for a 
long time (weeksmonths?)


(previous thread about our use of temp tables and autovacuum/xid issues)
http://archives.postgresql.org/pgsql-general/2007-01/msg00690.php
http://archives.postgresql.org/pgsql-general/2007-01/msg00691.php

Re: [GENERAL] SSL and crash woes.

2007-08-29 Thread Jeff Amiel

--- Scott Marlowe <[EMAIL PROTECTED]> wrote:

> 
> Interesting.  Is it possible that either you've got 2 versions of
> openssl?  Maybe slony is being compiled against one, then using the
> other lib, etc.?

yes...I suppose it is.Solaris came with one...we installed another.
hm...
# find /usr /lib -name libssl*
/usr/lib/mps/amd64/libssl3.so
/usr/lib/mps/secv1/amd64/libssl3.so
/usr/lib/mps/secv1/libssl3.so
/usr/lib/mps/libssl3.so
/usr/sfw/lib/amd64/libssl.so
/usr/sfw/lib/amd64/libssl.so.0.9.7
/usr/sfw/lib/libssl.so
/usr/sfw/lib/libssl.so.0.9.7
/usr/sfw/lib/mozilla/libssl3.so
/usr/apache/libexec/libssl.so
/usr/local/ssl/lib/libssl.a
/usr/local/ssl/lib/libssl.so
/usr/local/ssl/lib/libssl.so.0.9.8
/usr/local/ssl/lib/pkgconfig/libssl.pc

Yahoo! oneSearch: Finally, mobile search 
that gives answers, not web links. 
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] SSL and crash woes.

2007-08-29 Thread Jeff Amiel

A couple of years back (2005) we were setting up replication for the first time 
(using slony) from
our production database server to a stand-by box  sitting next to it and a 
remote box in a DR
site.  We were running FreeBSD 5.X/6.X on all systems on Dell servers and 
postgres 7.4.X and then
8.0.X

Replication appeared to crash our production database...a lot.  After looking 
at the core
dumps/stack traces  at the time, we determined that we were crashing in the ssl 
layersso we
disabled SSL (via pg_hba.conf and the slony conn settings) and haven't had an 
issue for the last
couple of years. 

 Stable as a rock.

Wellwe just upgraded our hardware (Sun X4600s) and operating sytems 
(solaris 10) , postgres
versions (8.2.4), and slony (1.2.10).  Rock solid.  
However, our first indication of an issue was an issue with executing pg_dump 
from a remote backup
server. (see http://archives.postgresql.org/pgsql-general/2007-08/msg01347.php)
Local pg_dump's have no issue.   So we changed our backup scheme to do local 
dumps and push the
files off the server to the backup location.  Problem solved.

Then...replication woes again.  With these fresh installs, we didn't think too 
much about the SSL
settingsand bing-bang...crash.  Crash.  Crash.  Crash.  Stopped 
replication.  Problem goes
away.  Start replication...crash crash.  So we stopped replication.

We recompiled postgres with debug info on a test db box and loaded up the most 
recent database
dump.  We then attempted a remote pg_dump from another local server.  Crash.  
Took a look at the
core dump...

Core was generated by `/usr/local/pgsql/bin/postgres -D /testdb'.
Program terminated with signal 11, Segmentation fault.
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8
(gdb) bt
#0  0xfee8ec23 in sk_value () from /usr/local/ssl/lib/libcrypto.so.0.9.8
#1  0xfef5b05b in ssl3_output_cert_chain () from 
/usr/local/ssl/lib/libssl.so.0.9.8
#2  0x in ?? ()


Hmmm...that looked familiar (from years ago)

Sowe set up the connection to be 'hostnossl' in pg_hba.conf and tried 
again.  Success.
Changed it back to 'hostssl'.crash.  Same place.  

I am going to take the time and set up test environment for the replication as 
well, but I assume
I will experience the same thing.  SSL means crash...no SSL means no crash.

Anyone have any thoughts?





   

Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-28 Thread Jeff Amiel

--- Alvaro Herrera <[EMAIL PROTECTED]> wrote:

>2.168.20.44 28785LOG:  duration: 22606.146 ms  execute : select
> 
> Interesting.  What's your log_line_prefix?  Does it have "%q" somewhere?

No, no %q...not quite sure what it means: "stop here in non-session processes"

Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-28 Thread Jeff Amiel

I notice in the log entries for the out of memory events have no username, 
database name or host
identifier (while regular logged events do)  Does that mean anything to anybody?

Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 
2007-08-28
08:25:50.081 CDT29019ERROR:  out of memory
Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 
2007-08-28
08:25:50.081 CDT29019DETAIL:  Failed on request of size 536870910.

(regular log entry)
Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 
2007-08-28 08:26:45.413
CDT jboss prod 192.168.20.44 28785LOG:  duration: 22606.146 ms  execute 
: select


--- Tom Lane <[EMAIL PROTECTED]> wrote:

> Can you correlate these occurrences with anything in the regular system
> logs (kernel log in particular)?  The Postgres log shows nothing out of
> the ordinary --- it's simply that the kernel won't give us 512M for some
> reason.  I'm guessing it's got something to do with overall system load.
> 
>   regards, tom lane
> 



   

Moody friends. Drama queens. Your life? Nope! - their life, your story. Play 
Sims Stories at Yahoo! Games.
http://sims.yahoo.com/  

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Tables dissapearing

2007-08-28 Thread Jeff Amiel


Kamil Srot wrote:
Heh, is the pool still open? Maybe I can make at least something from 
it :-D




Current odds

Application bug:  even money
Application configuration issue:  2-1
Rogue cron job or other maintenance process:  4-1
Somebody messing with you (or SQL injection):  8-1
XID wraparound issue :10-1
Alien or supernatural intervention:  18-1
Obscure postgresql bug nobody else has ever seen:  25-1







---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Tables dissapearing

2007-08-28 Thread Jeff Amiel

My entire shop has set up a betting pool on the outcome of this...so I 
hope you post results regardless of the outcome, Kamil.


(in all seriousness, we hope you find/fix the problem before things get 
really ugly)


Kamil Srot wrote:

Richard Huxton wrote:

Kamil Srot wrote:
In the version used on this particular server, there is no 
automatic/programing way of changing
the schema. Upgrades are done manually and application itself 
doesn't need schema changes

for routine operations...


In that case, you can settle the matter by making sure your 
application connects as a user that doesn't own any of the tables. 
That way they can't be dropped.


Yes, I'll do it in the future, but now, I want to know what is the 
problem and solve it rather than doing some workarounds (even if they 
are good also from the principle indeed).


Regards,



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-27 Thread Jeff Amiel

--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:
> 
> The machine we are tracking this problem on is also 64bit.

H.looks like 3 different people are tracking a similar issue on 64 bit 
platforms.you,
Erik and myself. 

Ready
 for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-27 Thread Jeff Amiel

--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> We are actually diagnosing a similar problem on this
> end, where we get a
> failure at 1920... I am currently trying to get some
> DEBUG output.

Tracking for last few days.
Does not appear to happen when little or no user
activity (like Saturday)  I don't know if that rules
out autovacuum or not (if no update threshholds are
reached, no vacuuming will take place anyway)

Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848
local0.warning] [2-1] 2007-08-23 11:11:51 CDT   ERROR:
 out of memory
Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848
local0.warning] [2-2] 2007-08-23 11:11:51 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848
local0.warning] [2-1] 2007-08-23 11:34:49 CDT   ERROR:
 out of memory
Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848
local0.warning] [2-2] 2007-08-23 11:34:49 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848
local0.warning] [2-1] 2007-08-23 12:06:47 CDT   ERROR:
 out of memory
Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848
local0.warning] [2-2] 2007-08-23 12:06:47 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848
local0.warning] [2-1] 2007-08-23 12:58:47 CDT   ERROR:
 out of memory
Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848
local0.warning] [2-2] 2007-08-23 12:58:47 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848
local0.warning] [2-1] 2007-08-23 15:15:35 CDT   ERROR:
 out of memory
Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848
local0.warning] [2-2] 2007-08-23 15:15:35 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848
local0.warning] [2-1] 2007-08-23 16:50:47 CDT   ERROR:
 out of memory
Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848
local0.warning] [2-2] 2007-08-23 16:50:47 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848
local0.warning] [2-1] 2007-08-24 10:46:46 CDT   ERROR:
 out of memory
Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848
local0.warning] [2-2] 2007-08-24 10:46:46 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848
local0.warning] [2-1] 2007-08-24 11:29:00 CDT   ERROR:
 out of memory
Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848
local0.warning] [2-2] 2007-08-24 11:29:00 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848
local0.warning] [2-1] 2007-08-24 11:50:04 CDT   ERROR:
 out of memory
Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848
local0.warning] [2-2] 2007-08-24 11:50:04 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848
local0.warning] [2-1] 2007-08-24 12:00:33 CDT   ERROR:
 out of memory
Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848
local0.warning] [2-2] 2007-08-24 12:00:33 CDT  
DETAIL:  Failed on request of size 536870910.
--
Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848
local0.warning] [2493-1] 2007-08-24 16:03:19.296 CDT  
 18263ERROR:  out of memory
Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848
local0.warning] [2493-2] 2007-08-24 16:03:19.296 CDT  
 18263DETAIL:  Failed on request of size 536870910.
--
Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848
local0.warning] [3356-1] 2007-08-24 16:45:46.804 CDT  
 19313ERROR:  out of memory
Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848
local0.warning] [3356-2] 2007-08-24 16:45:46.804 CDT  
 19313DETAIL:  Failed on request of size 536870910.
--
Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848
local0.warning] [4238-1] 2007-08-24 17:29:16.926 CDT  
 20379ERROR:  out of memory
Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848
local0.warning] [4238-2] 2007-08-24 17:29:16.926 CDT  
 20379DETAIL:  Failed on request of size 536870910.
--
Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848
local0.warning] [4452-1] 2007-08-24 17:40:02.682 CDT  
 20651ERROR:  out of memory
Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848
local0.warning] [4452-2] 2007-08-24 17:40:02.682 CDT  
 20651DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848
local0.warning] [56115-1] 2007-08-26 11:14:56.077 CDT 
  22161ERROR:  out of memory
Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848
local0.warning] [56115-2] 2007-08-26 11:14:56.077 CDT 
  22161DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848
local0.warning] [56381-1] 2007-08-26 11:27:41.141 CDT 
  22477ERROR:  out of memory
Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848
local0.warning] [56381-2] 2007-08-26 11:27:41.141 CDT 
  22477DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848
local0.warning] [56603-1] 2007-08-26 11:37:27.476 CDT 
  22729ERROR:  out of memory
Aug 26 11:37:27 db-1 postgres[22729]: [ID 74

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-27 Thread Jeff Amiel

--- Tom Lane <[EMAIL PROTECTED]> wrote:

> Jeff Amiel <[EMAIL PROTECTED]> writes:
> > is log_min_messages one of them that requires a
> > restart?
> 
> No, SIGHUP (pg_ctl reload) should be sufficient.

Weird
looks like some items are going to syslog and some to
my defined postgres logfile (from -L option). 
Bizarre.
AnywayI hope this helps someone.

At 10:46, I find this in my syslog files..

Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 
  27008ERROR:  out of memory
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 
  27008DETAIL:  Failed on request of size 536870910.

and at the same time in my postgres logfile I see this
(and only this)

TopMemoryContext: 169608 total in 10 blocks; 18832
free (34 chunks); 150776 used
TopTransactionContext: 8192 total in 1 blocks; 7648
free (9 chunks); 544 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
Type information cache: 8192 total in 1 blocks; 1800
free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 4872
free (0 chunks); 3320 used
Autovacuum context: 8192 total in 1 blocks; 5416 free
(8 chunks); 2776 used
smgr relation table: 8192 total in 1 blocks; 2808 free
(0 chunks); 5384 used
TransactionAbortContext: 32768 total in 1 blocks;
32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks);
0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0
chunks); 4816 used
CacheMemoryContext: 659000 total in 19 blocks; 264904
free (15 chunks); 394096 used
sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
PartInd_istream_replication_cluster_sl_log_1-node-1:
1024 total in 1 blocks; 392 free (0 chunks); 632 used
sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_index_indrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_autovacuum_vacrelid_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328
free (0 chunks); 696 used
pg_type_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_statistic_relid_att_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_auth_members_member_role_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1
blocks; 256 free (0 chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_operator_oprname_l_r_n_index: 1024 total in 1
blocks; 192 free (0 chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_opclass_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks;
216 free (0 chunks); 808 used
pg_namespace_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_namespace_nspname_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_language_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_language_name_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_authid_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_database_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks;
192 free (0 chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_class_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_cast_source_target_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_attribute_relid_attnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_attribute_relid_attnam_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 288
free (0 chunks); 736 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
Per-database table: 122880 total i

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-27 Thread Jeff Amiel

--- Tom Lane <[EMAIL PROTECTED]> wrote:

> Jeff Amiel <[EMAIL PROTECTED]> writes:
> > is log_min_messages one of them that requires a
> > restart?
> 
> No, SIGHUP (pg_ctl reload) should be sufficient.

Weird
looks like some items are going to syslog and some to
my defined postgres logfile (from -L option). 
Bizarre.
AnywayI hope this helps someone.

At 10:46, I find this in my syslog files..

Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 
  27008ERROR:  out of memory
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 
  27008DETAIL:  Failed on request of size 536870910.

and at the same time in my postgres logfile I see this
(and only this)

TopMemoryContext: 169608 total in 10 blocks; 18832
free (34 chunks); 150776 used
TopTransactionContext: 8192 total in 1 blocks; 7648
free (9 chunks); 544 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
Type information cache: 8192 total in 1 blocks; 1800
free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 4872
free (0 chunks); 3320 used
Autovacuum context: 8192 total in 1 blocks; 5416 free
(8 chunks); 2776 used
smgr relation table: 8192 total in 1 blocks; 2808 free
(0 chunks); 5384 used
TransactionAbortContext: 32768 total in 1 blocks;
32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks);
0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0
chunks); 4816 used
CacheMemoryContext: 659000 total in 19 blocks; 264904
free (15 chunks); 394096 used
sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
PartInd_istream_replication_cluster_sl_log_1-node-1:
1024 total in 1 blocks; 392 free (0 chunks); 632 used
sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_index_indrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_autovacuum_vacrelid_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328
free (0 chunks); 696 used
pg_type_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_statistic_relid_att_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_auth_members_member_role_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1
blocks; 256 free (0 chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_operator_oprname_l_r_n_index: 1024 total in 1
blocks; 192 free (0 chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_opclass_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks;
216 free (0 chunks); 808 used
pg_namespace_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_namespace_nspname_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_language_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_language_name_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_authid_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_database_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks;
192 free (0 chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_class_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_cast_source_target_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_attribute_relid_attnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_attribute_relid_attnam_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 288
free (0 chunks); 736 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
Per-database table: 122880 total i

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-27 Thread Jeff Amiel

--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> Having log_line_prefix with at least %p and %m (or
> %t) plus a
> log_min_messages of DEBUG2 would be great.

i am getting the additional timestampt/pid on my log
lines nowbut no additional debug output...
is log_min_messages one of them that requires a
restart?

Be a better Heartthrob. Get better relationship answers from someone who knows. 
Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=list&sid=396545433

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Out of Memory - 8.2.4

2007-08-24 Thread Jeff Amiel


--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:


> We are actually diagnosing a similar problem on this
> end, where we get a
> failure at 1920... I am currently trying to get some
> DEBUG output.

We are actually getting it semi-regularly today (3
times already)I would be happy to provide some
more info if somebody  guides me (just set
log_min_messages to one of the debug settings?)





  

Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Out of Memory - 8.2.4

2007-08-24 Thread Jeff Amiel

"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"


Week-old installstill tuning and tweaking this
thing.

Over last 2 days, have spotted 10 "Out of Memory"
errors in postgres logs (never saw before with same
app/usage patterns on tuned hardware/postgres under
FreeBSD)

Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
 out of memory.
Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
local0.warning] [6-2] 2007-08-22 18:08:24 CDT  
DETAIL:  Failed on request of size 536870910.

What I found interesting is that It's ALWAYS the same
size536870910

I am running autovacuum and slony.but I see
nothing in the logs anywhere near the "out of memory"
errors related to either (autovacuum used to under
8.0.X log INFO messages every time it vacuumed which
came in handy...I assume it doesn't so this any more?)
 

The events are fairly spread out...and cannot (by
looking at app logs and rest of DB logs) correlate to
any specific query or activity.

Any help would be appreciated

Box is a Sun X4600 with 8 dual-core processors and 32
gig of ram.

# su - pgsql
Sun Microsystems Inc.   SunOS 5.10  Generic
January 2005
-bash-3.00$ ulimit -a
core file size(blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
open files(-n) 256
pipe size  (512 bytes, -p) 10
stack size(kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes(-u) 16357
virtual memory(kbytes, -v) unlimited

shared_buffers = 3GB# min 128kB or
max_connections*16kB
temp_buffers = 1000 # min 800kB  
was 8MB
max_prepared_transactions = 450 # can be 0 or
more
work_mem = 100MB# min
64kB
maintenance_work_mem = 512MB# min 1MB
#max_stack_depth = 2MB  # min 100kB
max_fsm_pages = 208000  # min
max_fsm_relations*16, 6 bytes each
max_fsm_relations = 1   # min 100, ~70
bytes each
#max_files_per_process = 1000   # min 25
#shared_preload_libraries = ''  # (change
requires restart)
fsync = on  # turns forced
synchronization on or off
wal_sync_method = fdatasync # the default
is the first option
full_page_writes = off  # recover from
partial page writes
wal_buffers = 2300  # min 32kB
commit_delay = 10   # range
0-10, in microseconds
#commit_siblings = 5# range 1-1000
checkpoint_segments = 128   # in logfile
segments, min 1, 16MB each
checkpoint_timeout = 5min   # range 30s-1h
checkpoint_warning = 99s# 0 is off




   

Be a better Globetrotter. Get better travel answers from someone who knows. 
Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] pg_dump causes postgres crash

2007-08-23 Thread Jeff Amiel

--- Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> I can't help thinking you are looking at generalized
> system
> instability.  Maybe someone knocked a few cables
> loose while
> installing new network hardware?

Database server/storage instability or network
instability?  

There is no doubt that there is something flaky about
the networking between the db server and the box(es)
trying to do the pg_dump.  We have indeed had issues
(timeouts, halts, etc) moving large quantities of data
across various segments to and from these boxes...like
the db serverbut how would this effect something
like a pg_dump? 

Would a good stack trace (assuming I want to crash my
database again) help here?

Ready
 for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] pg_dump causes postgres crash

2007-08-22 Thread Jeff Amiel

>From the logs tonight when the second crash occurred..

Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
local0.info] [6-1] 2007-08-22 20:45:12 CDT   LOG: 
received smart shutdown request
Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
local0.info] [7-1] 2007-08-22 20:45:12 CDT   LOG: 
server process (PID 20188) was terminated by signal 11
Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
local0.info] [8-1] 2007-08-22 20:45:12 CDT   LOG: 
terminating any other active server processes

There was a core file created...but I believe I do not
have postgresql compiled with debug info.(well, a
pstack provided nothing useful)
pstack core  |more
core 'core' of 20188:   /usr/local/pgsql/bin/postgres
-D /db
 fee8ec23 sk_value (10023d, 105d8b00, d2840f,
1c7f, f20f883, 10584) + 33
 0c458b51  (0, 0, 511f1600, 2000400, ff001c09,
467f71ea)
   ()

Once again...a local pg_dump worked just fine 30
minutes later..

We have introduced some new network architecture which
is acting odd lately (dell managed switches, netscreen
ssgs, etc) and the database itself resides on a zfs
partition on a Pillar SAN (connected via fibre
channel)

Any thoughts would be appreciated.


   

Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] pg_dump causes postgres crash

2007-08-22 Thread Jeff Amiel

Fairly new (less than a week) install.
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"

database size around 43 gigabytes.

2 attempts at a pg_dump across the network caused the
database to go down...

The first time I thought it was because of mismatched
pg_dump (was version 8.0.X)...but the second time it
was definitely 8.2.4 version of pg_dump.

My first thought was corruption...but this database
has successfully seeded 2 slony subscriber nodes from
scratch as well running flawlessly under heavy load
for the past week.

Even more odd is that a LOCAL pg_dump (from on the
box) succeeded just fine tonight (after the second
crash).

Thoughts?

First Crash---

backup-srv2 prod_backup # time /usr/bin/pg_dump
--format=c --compress=9 --ignore-version
--username=backup --host=prod_server prod > x

pg_dump: server version: 8.2.4; pg_dump version:
8.0.13
pg_dump: proceeding despite version mismatch
pg_dump: WARNING:  terminating connection because of
crash of another server process
DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit,
because another server process exited abnormally and
possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to
the database and repeat your command.
pg_dump: server closed the connection unexpectedly
   This probably means the server terminated
abnormally
   before or while processing the request.
pg_dump: SQL command to dump the contents of table
"access_logs" failed: PQendcopy() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
   This probably means the server terminated
abnormally
   before or while processing the request.
pg_dump: The command was: COPY public.access_logs (ip,
username, "action", date, params) TO stdout;


--Second Crash

backup-srv2 ~ # time /usr/bin/pg_dump --format=c
--compress=9  --username=backup --host=prod_server
prod | wc -l
pg_dump: Dumping the contents of table "audit" failed:
PQgetCopyData() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
   This probably means the server terminated
abnormally
   before or while processing the request.
pg_dump: The command was: COPY public.audit (audit_id,
entity_id, table_name, serial_id, audit_action,
when_ts, user_id, user_ip) TO stdout;








   

Moody friends. Drama queens. Your life? Nope! - their life, your story. Play 
Sims Stories at Yahoo! Games.
http://sims.yahoo.com/  

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] could not open file "pg_clog/0BFF"

2007-08-22 Thread Jeff Amiel


http://svr5.postgresql.org/pgsql-hackers/2006-03/msg01198.php
http://www.mail-archive.com/pgsql-general@postgresql.org/msg90845.html

This is related to an autovacuum bug and the freeze logic with 
template0and probably a corrupted pg_statistics table in template0
you should upgrade AND repair the template0 issue (I actually replaced 
template0 with a copy from a replicant database)
..But take heed, if you don't 'do' something about it, autovacuum is 
actually stopping with tempate 0 and never getting to the rest of your 
database (as I found out)



Patrick Lindeman wrote:

What could have caused the error "could not access status of transaction
3221180546" and is it more than just coincidence that since that moment
the vacuum stopped running?

Is there anything we can do to prevent this in future use?

Besides we are using PostgreSQL 8.1.3.

Any help would be appreciated.

Regards,
Patrick Lindeman

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
  



---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Postmaster does not shut down

2007-08-17 Thread Jeff Amiel

A 'bad' thing happened yesterday.
Postgresql 8.1.X FreeBSD 6.0
At some point in the day, ran out of space on the root
filesystem.  (db is elsewhere)

Took about 10 minutes to clear enough space to make
processes stop freaking out and to slow my heart-rate
down to below 200 beats per minute.

Everything 'seemed' to be fine after that.  Database
was constantly utilized with no errors for several
more hours.

Later that evening (during our peak activity time), a
process was inadvertently run that when combined with
peak activity time caused the load average on the box
to spike into the stratosphere (19).  *sigh*

Database activity was taking so long (heavy web app
server use), that db connection handles skyrocketed
(over 120 when normally stays below 30) and started
getting complaints from pool management  No
ManagedConnections available within configured
blocking timeout ( 3 [ms] )

At that point, started stopping all unnecessary
activity and even restarted app server to clear all
the connections.  Howeverthe db connections all
stayed in some sort of 'frozen' state.showing that
they were either idle (in transaction) or processing
SELECTs or whatever they were last doing.  (Normally
restarting app server will cause those connections to
reset pretty quickly)  Tried killing a few of the idle
ones manually...but it was like pissing on a forest
fire.

H.  It was time to do something hadn't done in a
LOOONG time, restart postgres.  
Uh oh "postmaster does not shut down".
I tried fast, I tried immediate.  No dice.  Postmaster
refused to die.  I don't know how many times I've
heard "don't kill -9 the postmaster", but that's what
I did  (on the 'head' of the snake referenced in the
pid file)it died, but all the other processes
remained. 
*grumble*.  Couldn't get a killall to get rid of the
rest (not sure why) so I was back to pissing on the
forest file typing kill kill kill.  My co-hort in the
office claimed that more processes were showing up for
each one I killed(I assume he was delirious at
that point) but this was taking too long anyway.  So I
bounced the box.

I prayed.  

Came back up fine...postgres, slony, etc.
no errors reported when the database started.

I see no errors in my logs through the night when
autovacuum and backups occurred, so the database
'appears' to be intact.

I do notice that all of my logs (all.log, etc) are
'missing' between the filesystem full issue earlier in
the day and the time I rebooted (I assume the OS was
NOT happy with me letting it run out of space on / )

So.other than the half-dozen things that I
obviously did wrong, was there anything else I could
have done with postgres when it refused to shut down?







   

Be a better Globetrotter. Get better travel answers from someone who knows. 
Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Way to determine index bloat stats?

2007-07-05 Thread Jeff Amiel

Got a bit of a rude surprise when I rebuilt a slony subscriber node from 
scratch and noticed the indexes for some tables using significantly less 
space than on their master-node counterpart.
In once case, it was 2 gigabytes versus 9 gigabytes for the same table 
indexs on the master node.  I'm assuming that because of the specific 
indexed fields and the various 'states' that they go through that we 
ended up with many many sparsely populated pages of index data.


While I am not too concerned (took 3 years to get this 'fragmented' plus 
the ability to do concurrent index creation in 8.2 may help), I am 
interested in determining a way to see how 'sparsely' populated the 
index file pages are at any particular point in time


Is there some magic way of determining the number of bytes in an index 
'row' and then comparing the size on disk of the index file?
Also...would another index type (hash?) might be better suited for some 
of these indexes (they are all currently btree)..and would that 
reduce the bloat?
(I wish I had tracked which specific indexes were 'bloated' so I could 
analyze the usage pattern of the fields used.)


Any help would be appreciated.







---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] low transaction ID wrap limit

2007-06-24 Thread Jeff Amiel

Whenever I read the documentation on the transaction wraparound stuff, 
my head spins with all the references to frozen xids and min/max ages.


When it comes down to it, my interpretation of the whole kaboodle is 
"run postgresql v8.1 or later and autovacuum, and you will not have 
to deal with the wraparound issue."


I was looking at some logs this morning and spotted this on my backup 
database


Jun 24 04:08:46 back-app-1 postgres[82445]: [2-1] pgsql 82445 LOG:  transaction ID wrap 
limit is 41612954, limited by database "back"


That seemed like an awfully low number compared to the 1 or 2 billion that I 
have seen in other posts on the subject.

postgres=# SELECT datname, age(datfrozenxid) FROM pg_database;
 datname  |age
---+
postgres  | 1073968236
back  | 1079399370
template0 | 1657840460
template1 | 1078788693

postgres=# select * from pg_database;
 datname  | datdba | encoding | datistemplate | datallowconn | datconnlimit | 
datlastsysoid | datvacuumxid | datfrozenxid | dattablespace | datconfig |  
datacl
---++--+---+--+--+---+--+--+---+---+--
postgres  | 10 |0 | f | t|   -1 |   
  10792 |   3268269560 |   2194527737 |  1663 |   |
back  | 10 |0 | f | t|   -1 |   
  10792 |   3268269655 |   2194527832 |  1663 |   |
template0 | 10 |0 | t | f|   -1 |   
  10792 |   1610655513 |   1610655513 |  1663 |   | 
{pgsql=CT/pgsql}
template1 | 10 |0 | t | t|   -1 |   
  10792 |   326849 |   2194754451 |  1663 |   | 
{pgsql=CT/pgsql}
(4 rows)

Should I be worried?







---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] pg_dump/restore and functions/triggers/trigger functions

2007-02-06 Thread Jeff Amiel

Tom Lane <[EMAIL PROTECTED]> wrote:I think that would have excluded anything 
that didn't demonstrably
belong to schema public, such as procedural languages.  Is it possible
that *all* your functions failed to load, and you only noted the ensuing
GRANT/REVOKE failures?

yes...ALL my functions did indeed fail to load.
  Weirdness is that I only saw errors on the GRANT/REVOKE failures.
  Huh.
  
  I created a simple test case with a single function...got the same results.
  I removed the --schema='public' and it worked fine.
  I compared the TOCs on the 2 different files and sure enough, there is an 
entry on the one that DIDN'T only use public that had the plpgsql entry.
  
  248; 2612 90212 PROCEDURAL LANGUAGE - plpgsql
  
 Huh.  I guess if I added plpgsql to template1 before I created the empty shell 
db to restore into, I would never have seen this issue.
  Ok
  I guess the real question is (other than related to this issue), it there any 
need to dump the catalog/informational schemas? 
  
 
-
Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel 
bargains.

Re: [GENERAL] pg_dump/restore and functions/triggers/trigger functions

2007-02-06 Thread Jeff Amiel

The original pg_dump used --schema="public" .

Could the fact that pg_catalog or information_schema weren't included cause 
these kinds of issues? (I can't imagine why)


 <[EMAIL PROTECTED]> wrote: did a pg_dump --format=c for a production database 
(on a 8.1.2 server) and attempted to pg_restore on a 8.2.0 server.

Things seemed to go fine with the exception of functions, triggers and trigger 
functions.

It was apparently doing a bunch of ACL work towards the end and spewed a slew 
of errors like this:


Feb  6 16:23:02 devl-app1 postgres[22552]: [624-1] ERROR:  function 
add_entity(text, text, integer, text) does not exist
Feb  6 16:23:02 devl-app1 postgres[22552]: [624-2] STATEMENT:  REVOKE ALL ON 
FUNCTION add_entity(text, text, integer, text) FROM PUBLIC;
so I checks the TOC to ensure that the function and ACLs were listed

[EMAIL PROTECTED] pg_restore -l prod-app-1.pgsql-prod.pgsql | grep add_entity
320; 1255 16412 FUNCTION public add_entity(text, text, integer, text) pgsql
3318; 0 0 ACL public add_entity(text, text, integer, text) pgsql

sure enoughthe function and the ACLs were there.   
Hmmm...When all was said and done, the only functions in the database were ones 
that originally came from pg_crypto (from template0 when I created the new 
database to load the data into)

Also...similar problem with triggers/trigger functions


Feb  6 16:07:27 devl-app1 postgres[22552]: [509-1] ERROR:  function 
batch_stat_populate() does not exist
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-2] STATEMENT:  CREATE TRIGGER 
batch_stats_updater
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-3]  AFTER INSERT ON 
batch_audit
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-4]  FOR EACH ROW
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-5]  EXECUTE PROCEDURE 
batch_stat_populate();

What up?  is this a version inconsistency issue?
Please tell me I don't have to -l into a listfile and manually re-order items 
and the use -L ?




   

-
Sucker-punch spam with award-winning protection.
 Try the free Yahoo! Mail Beta.

 
-
Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out.

[GENERAL] pg_dump/restore and functions/triggers/trigger functions

2007-02-06 Thread Jeff Amiel

did a pg_dump --format=c for a production database (on a 8.1.2 server) and 
attempted to pg_restore on a 8.2.0 server.

Things seemed to go fine with the exception of functions, triggers and trigger 
functions.

It was apparently doing a bunch of ACL work towards the end and spewed a slew 
of errors like this:


Feb  6 16:23:02 devl-app1 postgres[22552]: [624-1] ERROR:  function 
add_entity(text, text, integer, text) does not exist
Feb  6 16:23:02 devl-app1 postgres[22552]: [624-2] STATEMENT:  REVOKE ALL ON 
FUNCTION add_entity(text, text, integer, text) FROM PUBLIC;
so I checks the TOC to ensure that the function and ACLs were listed

[EMAIL PROTECTED] pg_restore -l prod-app-1.pgsql-prod.pgsql | grep add_entity
320; 1255 16412 FUNCTION public add_entity(text, text, integer, text) pgsql
3318; 0 0 ACL public add_entity(text, text, integer, text) pgsql

sure enoughthe function and the ACLs were there.  
Hmmm...When all was said and done, the only functions in the database were ones 
that originally came from pg_crypto (from template0 when I created the new 
database to load the data into)

Also...similar problem with triggers/trigger functions


Feb  6 16:07:27 devl-app1 postgres[22552]: [509-1] ERROR:  function 
batch_stat_populate() does not exist
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-2] STATEMENT:  CREATE TRIGGER 
batch_stats_updater
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-3]  AFTER INSERT ON 
batch_audit
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-4]  FOR EACH ROW
Feb  6 16:07:27 devl-app1 postgres[22552]: [509-5]  EXECUTE PROCEDURE 
batch_stat_populate();

What up?  is this a version inconsistency issue?
Please tell me I don't have to -l into a listfile and manually re-order items 
and the use -L ?





 
-
Sucker-punch spam with award-winning protection.
 Try the free Yahoo! Mail Beta.

[GENERAL] pg_restore single table privileges/indexes

2007-02-06 Thread Jeff Amiel

"PostgreSQL 8.2.0 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 
[FreeBSD] 20050518"

We just switched from 'pg_dumpall" to "pg_dump -format=c" for our nightly 
backups.
I wanted to experiment with restoring a single table (if the need should ever 
arise) from the dump file.

I use the --clean option on the restore to drop the table first

pg_restore -U pgsql --table codes  --dbname devl --clean backup.pgsql 

It seems to work fine with no issues.  However, none of the privileges are set.
I checked the dump with grep and sure enough, they are there:

[EMAIL PROTECTED] grep --binary-files=text "GRANT SELECT ON TABLE codes TO 
jboss;" backup.pgsql
GRANT SELECT ON TABLE codes TO jboss;

So...the privileges are in the dump...but do not get restored.  Is this issue 
unique to the --table option?  Is there some way to force it to restore the 
privileges?

I also notice that the indexes are not restored...that I understand.  I guess I 
have to do separate --index for each index on the table, correct?

Any help would be appreciated.

 
-
Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates.

Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel




Tom Lane wrote:

Yech.  So much for RAID reliability ... maybe you need to reconfigure
the array for more redundancy?
  
Yeah...I'm not sure if I screwed the pooch by trying the bring the drive 
back 'online'.in the past we just try re-seating it and the raid 
card 'does its thing' and rebuilds or takes it offline again. 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel

raid rebuilt...
ran fsck

PARTIALLY TRUNCATED INODE I=612353
SALVAGE? yes

INCORRECT BLOCK COUNT I=612353 (544 should be 416)
CORRECT? yes

PARTIALLY TRUNCATED INODE I=612389
SALVAGE? yes

INCORRECT BLOCK COUNT I=612389 (544 should be 416)
CORRECT? yes

INCORRECT BLOCK COUNT I=730298 (676448 should be 675520)
CORRECT? yes

[EMAIL PROTECTED] find /db -inum 612353
/db/pg_clog/0952

[EMAIL PROTECTED] find /db -inum 612389
/db/pg_clog/0951

[EMAIL PROTECTED] find /db -inum 730298
/db/base/1093090/1212223

hmmm...wanted to see what the third one was so I 

test=# select oid, relname from pg_class order by oid;

ERROR:  could not access status of transaction 2485385834
DETAIL:  could not open file "pg_clog/0942": No such file or directory

Soam I screwed here...just I just re-init-db and restore the entire kit and 
kaboodle from scratch?

Jeff Amiel <[EMAIL PROTECTED]> wrote: Had a drive failure on a raid 5 array of 
a backup box that a couple of postgres databases sit on.  One of the databases 
is a slony subscriber to a production database and the other is a 
test-environment database.  

The drive was offline...brought it back online, hoping it would start a 
rebuild...which it didn't. Almost immediately started getting errors from slony

could not access status of transaction 2463273456
could not open file "pg_clog/0937": No such file or directory
...
etc.

Looks like the subscriber database had some issues (at least with one specific 
table).  In addition, trying to access to the other (test) database yielded an 
error accessing pg_namespace.

Soreseated the drive which started a rebuild.  I stopped postgres.   When 
the rebuild is done (or if it fails, I will replace the drive), I will restart 
postgres and see what happens.

Question...should I just re-initdb and restore databases from
 backup?  Should I have done something differently once I noticed the failure?  
I've had drive failures before on this box and either rebuilt the array or 
replaced the drive with no postgres issues (although the amount of traffic was 
much less than now)

Any help would be appreciated.

   

-
Don't pick lemons.
 See all the new 2007 cars at Yahoo! Autos.

 
-
Any questions?  Get answers on any topic at Yahoo! Answers. Try it now.

[GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel

Had a drive failure on a raid 5 array of a backup box that a couple of postgres 
databases sit on.  One of the databases is a slony subscriber to a production 
database and the other is a test-environment database.  

The drive was offline...brought it back online, hoping it would start a 
rebuild...which it didn't. Almost immediately started getting errors from slony

could not access status of transaction 2463273456
could not open file "pg_clog/0937": No such file or directory
...
etc.

Looks like the subscriber database had some issues (at least with one specific 
table).  In addition, trying to access to the other (test) database yielded an 
error accessing pg_namespace.

Soreseated the drive which started a rebuild.  I stopped postgres.   When 
the rebuild is done (or if it fails, I will replace the drive), I will restart 
postgres and see what happens.

Question...should I just re-initdb and restore databases from backup?  Should I 
have done something differently once I noticed the failure?  I've had drive 
failures before on this box and either rebuilt the array or replaced the drive 
with no postgres issues (although the amount of traffic was much less than now)

Any help would be appreciated.


 
-
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-13 Thread Jeff Amiel



Tom Lane <[EMAIL PROTECTED]> wrote:
Really?  Wow, *that's* an interesting thought.  Is it likely that that
temp table could contain many-hour-old data?

Certainly...our connection pool used by jboss can have connections to postgres 
persisting for multiple days.  (We're still looking for a way to tell it to 
recycle these occasionally).  As each 'user' of our web based app performs some 
action, they acquire one of the connection pool connections and set their 
user_id in the temporary table used by that connection (we use that for our 
audit triggers)  Once they are 'done' with the connection, the connection is 
just released back to the pool but not actually closed...so the temp table 
still contains the  data from a previous iteration.

 
-
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-13 Thread Jeff Amiel



Tom Lane <[EMAIL PROTECTED]> wrote:
BTW, please don't do anything to try to correct the problem until we're
pretty sure we understand how this happened --- we might ask you for
more info.  AFAICS this isn't having any bad effects except for bleats
in your log file, so you can wait.

Happened again...however this time not appearingly related to an autovacuum.  I 
have a past example of this almost identical event a few weeks ago.  Here's 
what the logs look like with debug2 as the log_min level:

Jan 13 08:26:53 prod-app-1 postgres[41795]: [40171-1]  41795 LOG:  autovacuum: 
processing database "template1"
Jan 13 08:26:53 prod-app-1 postgres[560]: [40171-1]  560 DEBUG:  server process 
(PID 41794) exited with exit code 0
Jan 13 08:26:57 prod-app-1 postgres[563]: [915-1]  563 DEBUG:  checkpoint 
starting
Jan 13 08:26:57 prod-app-1 postgres[563]: [916-1]  563 DEBUG:  recycled 
transaction log file "0001005D0069"
Jan 13 08:26:57 prod-app-1 postgres[563]: [917-1]  563 DEBUG:  checkpoint 
complete; 0 transaction log file(s) added, 0 removed, 1 recycled
Jan 13 08:27:02 prod-app-1 postgres[560]: [40172-1]  560 DEBUG:  forked new 
backend, pid=42368 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40173-1]  560 DEBUG:  forked new 
backend, pid=42369 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40174-1]  560 DEBUG:  forked new 
backend, pid=42370 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40175-1]  560 DEBUG:  server process 
(PID 42370) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40176-1]  560 DEBUG:  forked new 
backend, pid=42371 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40177-1]  560 DEBUG:  server process 
(PID 42369) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40178-1]  560 DEBUG:  server process 
(PID 42371) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40179-1]  560 DEBUG:  forked new 
backend, pid=42372 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40180-1]  560 DEBUG:  server process 
(PID 42372) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40181-1]  560 DEBUG:  forked new 
backend, pid=42373 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40182-1]  560 DEBUG:  server process 
(PID 42373) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40183-1]  560 DEBUG:  forked new 
backend, pid=42374 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40184-1]  560 DEBUG:  server process 
(PID 42374) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40185-1]  560 DEBUG:  server process 
(PID 42368) exited with exit code 0
Jan 13 08:27:08 prod-app-1 postgres[560]: [40186-1]  560 DEBUG:  forked new 
backend, pid=42375 socket=8
Jan 13 08:27:08 prod-app-1 postgres[560]: [40187-1]  560 DEBUG:  server process 
(PID 42375) exited with exit code 0
Jan 13 08:27:23 prod-app-1 postgres[560]: [40188-1]  560 DEBUG:  forked new 
backend, pid=42376 socket=8
Jan 13 08:27:23 prod-app-1 postgres[560]: [40189-1]  560 DEBUG:  server process 
(PID 42376) exited with exit code 0
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-1] jboss 92257 ERROR:  could 
not access status of transaction 2107200825
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-2] jboss 92257 DETAIL:  
could not open file "pg_clog/07D9": No such file or directory
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-3] jboss 92257 CONTEXT:  SQL 
statement "DELETE FROM audit_metadata WHERE user_id <> -1"
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-4] PL/pgSQL function 
"disable_auditing" line 2 at SQL statement
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-5] jboss 92257 STATEMENT:  
select disable_auditing()
Jan 13 08:27:38 prod-app-1 postgres[560]: [40190-1]  560 DEBUG:  forked new 
backend, pid=42377 socket=8
Jan 13 08:27:38 prod-app-1 postgres[560]: [40191-1]  560 DEBUG:  server process 
(PID 42377) exited with exit code 0
Jan 13 08:27:49 prod-app-1 postgres[560]: [40192-1]  560 DEBUG:  forked new 
backend, pid=42378 socket=8
Jan 13 08:27:50 prod-app-1 postgres[560]: [40193-1]  560 DEBUG:  forked new 
backend, pid=42379 socket=8
Jan 13 08:27:50 prod-app-1 postgres[560]: [40194-1]  560 DEBUG:  forked new 
backend, pid=42380 socket=8
Jan 13 08:27:53 prod-app-1 postgres[560]: [40195-1]  560 DEBUG:  forked new 
backend, pid=42381 socket=8
Jan 13 08:27:53 prod-app-1 postgres[42382]: [40196-1]  42382 LOG:  autovacuum: 
processing database "postgres"
Jan 13 08:27:53 prod-app-1 postgres[560]: [40196-1]  560 DEBUG:  server process 
(PID 42381) exited with exit code 0
Jan 13 08:28:02 prod-app-1 postgres[560]: [40197-1]  560 DEBUG:  forked new 
backend, pid=42951 socket=8
Jan 13 08:28:02 prod-app-1 postgres[560]: [40198-1]  560 DEBUG:  forked new 
backend, pid=42952 socket=8

pg_clog dir looks like this:
-rw---  1 pgsql  wheel  262144 Jan 13 05:41 07DA
-rw---  1 pgsql  wheel  262144 Jan 13 08:06 07DB
-rw---  1 pgsql  wheel   90112 Jan 13 08:51 07DC

Now that table, audit_meta

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-12 Thread Jeff Amiel



Tom Lane <[EMAIL PROTECTED]> wrote:
Well, if the problem is indeed in pg_statistic, it'll be easy to repair
(think TRUNCATE...).  Have you turned up the logging level to find out?

BTW, please don't do anything to try to correct the problem until we're
pretty sure we understand how this happened --- we might ask you for
more info.  AFAICS this isn't having any bad effects except for bleats
in your log file, so you can wait.
We turned up the logging this morning.(debug2) and are awaiting the issue 
to re-occur.  Will report back then.  Thanks.



 
-
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-12 Thread Jeff Amiel


Alvaro Herrera <[EMAIL PROTECTED]> wrote:
Did you perchance connect to template0 (probably marking it as
connectable in the process), made some modification there, and then 
mark it as non-connectable again, without executing VACUUM FREEZE on it?
AFAICS we only execute VACUUM FREEZE on it, so we shouldn't leave any
unfrozen tuples.

Nope.  Have never touched template0...haven't made any changes to pg_hba.conf 
(which is how I assume you would make template0 connectable) until last night 
when we determined that template0 might be at fault and we were going to run 
pg_filedump on it to see if we could find anything useful.

I assume this is similar/identical to this issue:
http://archives.postgresql.org/pgsql-hackers/2006-03/msg01294.php

Any ideas on how should I move forward?

 
-
Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel 
bargains.

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-12 Thread Jeff Amiel


Tom Lane <[EMAIL PROTECTED]> wrote:So apparently there's some row in template0 
that didn't get marked
committed before the pg_clog segment for it went away.  Given 8.1's
rather schizophrenic view of whether it can modify template0 or not,
this is not too surprising, but I thought we'd put in some defense
against this happening.  Alvaro?

Jeff, had you changed your autovac settings recently?  Given that
autovac seems to be trying to hit template0 every few minutes, it's
hard to see how the tuple got missed for long enough to be a problem.



Sure enough I did make autovacuum more agressive about 30-45 days ago (have to 
check the logs to find the exact date).  Was originally whatever default 
settings that came out of the box with 8.1.  Naptime is currently set to 60 
seconds.
 
 Am I to assume that this probably isn't the result of some operating system or 
filesystem misfeasance and that corrupt data in my 'real' databases is probably 
not an issue?


 
-
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.

Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-11 Thread Jeff Amiel

Looking backwards in the logs we see it a few other times this month...
(Autovacuum occurring just prior)...same transaction ID
How could it be the same transaction ID from several days prior?

Jan  2 03:05:04 prod-app-1 postgres[8524]: [4-1]  8524 LOG:  autovacuum: 
processing database "template0"
Jan  2 03:05:05 prod-app-1 postgres[8524]: [5-1]  8524 ERROR:  could not access 
status of transaction 1924023481
Jan  2 03:05:05 prod-app-1 postgres[8524]: [5-2]  8524 DETAIL:  could not open 
file "pg_clog/072A": No such file or
 directory

Jeff Amiel <[EMAIL PROTECTED]> wrote: "PostgreSQL 8.1.2 on 
i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518"

Started seeing this in the logs this afternoon...scaring the begeezus out of me.

Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1]  1752 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2]  1752 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1]  4094 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2]  4094 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1]  6728 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2]  6728 DETAIL:  could not open 
file "pg_clog/072A": No such file or  directory

I could find nothing any transaction refereced in pg_locks...nor could I find a 
file with that designation (was always the same transaction id)

ls -l /db/pg_clog/
total 984
-rw---  1 pgsql  wheel  262144 Jan 11 09:55 07CF
-rw---  1 pgsql  wheel  262144 Jan 11 13:45 07D0
-rw---  1 pgsql  wheel  262144 Jan 11 17:00 07D1
-rw---  1 pgsql  wheel  172032 Jan 11 20:39 07D2


At first I thought it was related to a constantly running batch process which I 
halted, and sure enough, the problem seemed to go away.  If I restarted, the 
problem returned.

I then tried select * from on most of the tables used by that process (at least 
the last days worth) and found no obvious issues or errors.
About that time, I noticed that the errors were all IMMEDIATELY preceeded by an 
autovacuum of template0.  So the logs actually looked like this:

Jan  11 19:20:19 prod-app-1 postgres[1752]: [4-1]  1752 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1]  1752 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2]  1752 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1]  4094 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1]  4094 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2]  4094 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1]  6728 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1]  6728 ERROR:  could not access 
status of  transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2]  6728 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory

We've downloaded/compiled pg_filedump, but are stumped as to what relation (or 
even what database) to start with.

I restarted the batch process that I thought was the culprit and the issue has 
'gone away'.  I'm sure there is some corruption somewhere (from reading various 
similar posts), but not sure where to start.

Any help would be appreciated.



   

-
Access over 1 million songs - Yahoo! Music Unlimited.

 
-
Check out the all-new Yahoo! Mail beta - Fire up a more powerful email and get 
things done faster.

[GENERAL] Corrupt database? 8.1/FreeBSD6.0

2007-01-11 Thread Jeff Amiel

"PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 
[FreeBSD] 20050518"

Started seeing this in the logs this afternoon...scaring the begeezus out of me.

Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1]  1752 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2]  1752 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1]  4094 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2]  4094 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1]  6728 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2]  6728 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory

I could find nothing any transaction refereced in pg_locks...nor could I find a 
file with that designation (was always the same transaction id)

ls -l /db/pg_clog/
total 984
-rw---  1 pgsql  wheel  262144 Jan 11 09:55 07CF
-rw---  1 pgsql  wheel  262144 Jan 11 13:45 07D0
-rw---  1 pgsql  wheel  262144 Jan 11 17:00 07D1
-rw---  1 pgsql  wheel  172032 Jan 11 20:39 07D2


At first I thought it was related to a constantly running batch process which I 
halted, and sure enough, the problem seemed to go away.  If I restarted, the 
problem returned.

I then tried select * from on most of the tables used by that process (at least 
the last days worth) and found no obvious issues or errors.
About that time, I noticed that the errors were all IMMEDIATELY preceeded by an 
autovacuum of template0.  So the logs actually looked like this:

Jan 11 19:20:19 prod-app-1 postgres[1752]: [4-1]  1752 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1]  1752 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2]  1752 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1]  4094 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1]  4094 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2]  4094 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1]  6728 LOG:  autovacuum: 
processing database "template0"
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1]  6728 ERROR:  could not access 
status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2]  6728 DETAIL:  could not open 
file "pg_clog/072A": No such file or directory

We've downloaded/compiled pg_filedump, but are stumped as to what relation (or 
even what database) to start with.

I restarted the batch process that I thought was the culprit and the issue has 
'gone away'.  I'm sure there is some corruption somewhere (from reading various 
similar posts), but not sure where to start.

Any help would be appreciated.




 
-
Access over 1 million songs - Yahoo! Music Unlimited.

Re: [GENERAL] MAGIC_MODULE and libc

2006-12-19 Thread Jeff Amiel


yup...
I did this
CREATE OR REPLACE FUNCTION getpid()
 RETURNS integer AS
$BODY$
DECLARE
  thepid integer;
BEGIN
select into thepid pg_backend_pid from pg_backend_pid();
return thepid;
END;
$BODY$
 LANGUAGE 'plpgsql' VOLATILE;

thanks!!!


Martijn van Oosterhout wrote:

On Tue, Dec 19, 2006 at 08:20:23AM -0600, Jeff Amiel wrote:
  

ERROR: incompatible library "/lib/libc.so.6": missing magic block
SQL state: XX000
Hint: Extension libraries are required to use the PG_MODULE_MAGIC macro.

I've added the appropriate macro to all my actual C code...but in this
case surely I am not expected to recompile libc...am I?  Is there an
easier way to get the pid of the current process?



In general you should use a wrapper library, but in your case
pg_backend_pid() will do it.

Have a nice day,

[GENERAL] MAGIC_MODULE and libc

2006-12-19 Thread Jeff Amiel


I just upgraded to v8.2 and have an issue with one of my stored
proc/functions that worked under 8.1:

CREATE OR REPLACE FUNCTION getpid()
 RETURNS integer AS
'/lib/libc.so.6', 'getpid'
 LANGUAGE 'c' VOLATILE;

ERROR: incompatible library "/lib/libc.so.6": missing magic block
SQL state: XX000
Hint: Extension libraries are required to use the PG_MODULE_MAGIC macro.

I've added the appropriate macro to all my actual C code...but in this
case surely I am not expected to recompile libc...am I?  Is there an
easier way to get the pid of the current process?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] FreeBSD 6.0 PostgreSQL upgrade

2006-12-19 Thread Jeff Amiel


We have a production FreeBSD 6.0 system with Postgresql 8.1 where we
have avoided upgrading/updating the ports to avoid compatability and
other unknown issues.
We have our supfile default date set on our production, test and
development environments to ensure that they all have the same
versions.

I want to upgrade postgresql to v8.2 and started 'toying' with how to
do so on a development server (that is set up identically to our
production/test servers)

What I ended up trying was to download the postgresql82-client and
-server tars from freebsd.org/ports and unpacking in the
/usr/ports/database directory.

I did a make deinstall on the 8.1 directory and then did a make
install clean on the 8.2 client.
no issues.

When I went to do the postgresql82-server directory and tried "make
install clean"
and received this error:

echo "===>  postgresql-server-8.2.0 "Unknown\ PostgreSQL\ version:\ 82.;exit 1
===>  postgresql-server-8.2.0 Unknown PostgreSQL version: 82.
*** Error code 1


h..


Little digging...little research leads me to /usr/ports/Mk/bsd.database.mk


LIB_DEPENDS+=
pq.${PGSQL${PGSQL_VER}_LIBVER}:${PORTSDIR}/databases/postgresql${PGSQL_VER}-client
.else
IGNORE=Unknown PostgreSQL version: ${PGSQL_VER}

On this box (and I assume the production box as well), the file has a
series of entries that look like this:

PGSQL73_LIBVER= 3
PGSQL74_LIBVER= 3
PGSQL80_LIBVER= 4
PGSQL81_LIBVER= 4

but no mention of 8.2 (duh...I assume this file gets updated as part
of cvsup portsdb stuff)

so...I added:

PGSQL82_LIBVER= 5

And then the build succeeded.

Now...I know this sounds quite hackish.  I know it was probably wrong.
I would never do this on a production box (I was experimenting)
Please go easy on me.
How should I go about upgrading postgres versions, affecting as little
as I can on the box?

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Odd transaction timestamp sequence issue

2006-04-21 Thread Jeff Amiel


it is done using now()
But what I don't understand is how the transaction that started first 
could 'see' the record that hadn't been changed yet by the initial 
update to 'COMPLETE'?

I thought:
"Each transaction sees a snapshot (database version) as of its 
starttime, no matter what other transactions are doing while it runs"




How is the "when" column determined?  You did not show it in your SQL
commands.

If it's being driven off now() or CURRENT_TIMESTAMP, then the above
isn't all that surprising, because the value is the time of transaction
start not the time at which the update was made.

  


---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Odd transaction timestamp sequence issue

2006-04-21 Thread Jeff Amiel

PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 
3.4.4 [FreeBSD] 20050518


We have triggers on each of our tables that create audit table entries 
on each insert/update/delete. 
The audit table (in addition to containing information about the change 
that was made) contains a timestamp field and a serial as the primary key. 
The timestamp is generated by calling now() inside the audit trigger, so 
should contain the timestamp of when the transaction (that yielded the 
insert/update/delete) began.


We have a single (and very important) table that we perform the majority 
of our operations on..I'll simplify the definition of the table to this:

CREATE TABLE thetable
(
id int8 NOT NULL DEFAULT nextval('thetable_id_seq'::regclass),
flag char(1) NOT NULL,
state char(8) NOT NULL
}

we have a scenario where a batch process takes records from this table 
in a certain state and one by one, moves them to a new state.


update thetable set state='COMPLETE', flag='X' where state='INITIAL' and 
id=?


Another batch process is looking for any records in that new state...and 
then updates certain elements of it.


update thetable set flag='Y' where id in (select id from thetable where 
state='COMPLETE')


This update statement is run as it's own transaction (there is nothing 
else done in the transaction).


The audit logs for some transactions show something very odd.

For example, for id 210210 we have an audit trail that looks like this...

audit_idrecord_idwhen   columnold_val   
new_val
-----   ---   ---   
---
12102102006-04-20 12:49:03.92   state INITIAL   
COMPLETE

22102102006-04-20 12:49:03.74   flag  X  Y

By looking at the timestamps, the second update started BEFORE the first 
update even though the second update cannot occur if the state hasn't  
been changed and committed by the first one!


Even weirder is that the order of the sequence (audit_id) shows them 
occurring in the opposite order


Am I missing something obvious here? 







---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

1 2 >

1 - 100 of 130 matches

Mail list logo