Re: [GENERAL] Using the wrong index (very suboptimal), why?

2008-12-23 Thread Shane Wright
Tom,

>>  Index Scan using index_answers_nidiid1 on answers
>> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
>> rows=21891 loops=1)
>>Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>>  Total runtime: 2424.769 ms
>Well, here's the problem all right: 1 row estimated vs 21891 actual.
>Is there something odd about the joint distribution of these two
>columns?

I should also clarify, there will almost always be many many rows that
match any nid/iid1 pair (for any nid or iid1 value that exists in this
table anyway).  The chance of there actually being only one result is
very small.  Most of the time this will be hundreds/thousands - but will
also be tens of thousands on a regular basis.  A hundred thousand or
more periodically.


Many thanks

Shane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using the wrong index (very suboptimal), why?

2008-12-23 Thread Shane Wright
Hi Tom,

>> - increasing the statistics target to the maximum setting with SET
>> STATISTICS 1000 on columns rid, nid and iid1 of answers, then
>> re-vacuuming.
> I hope you meant re-analyzing.

Hehe absolutely yes; I always VACUUM VERBOSE ANALYSE :)

>>  Index Scan using index_answers_nidiid1 on answers
>> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
>> rows=21891 loops=1)
>>Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>>  Total runtime: 2424.769 ms
>Well, here's the problem all right: 1 row estimated vs 21891 actual.
>Is there something odd about the joint distribution of these two
>columns?

Good :) - I've been chasing the same thing, albeit not with any luck yet
:(

Shouldn't be anything odd about the data I wouldn't have thought...
There are ~670 million rows.  No nulls in nid, ~6% of iid1 are null
(they will always be null or not null for a given nid.  I.e. rows with a
given nid value will either all be null or all be not null).

nids are randomly selected, there are only ~27000 distinct values, all
between 1 and 9.
iid1s are also random in the same range, ~5 distinct values.  All
the rows for a given value of nid will have one of a small set of
possible iid1 values; usually 5-15 distinct values.  The frequency at
which each nid may occurs is quite uneven; some will be tens of times,
others will be a couple of hundred thousand.  Same applies to
corresponding iid1 values.

(the table stores answers to questions; nid is the question ID, iid1 is
the answer ID [for questions where the user picks from a pre-defined
list] - iid1 is NULL for textual answers.  iid1 values are grouped into
sets of options the user can pick from, defined elsewhere.  These sets
can be shared across questions/nids but not often)

Does the above sound strange?

There aren't any strange errors from the database, autovacuum is enabled
(but not vacuuming the table often enough).  This problem was triggered
this time when I manually vacuumed the table - which cleared a lot of
dead rows (again, I always VACUUM VERBOSE ANALYSE).


A separate installation with a similar data set (actually the same but
older; seems the most comparable) estimated 10 rows returned for the
same query.





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using the wrong index (very suboptimal), why?

2008-12-23 Thread Shane Wright
Tom,

> You need to
> look into what the estimated vs actual rowcounts are for just the
> two-column condition (ie, where nid = something AND iid1 = something).
> Getting those estimates to match reality is the key to getting a sane
> plan choice here.

Many thanks - I agree in principle it isn't automatically the worst
decision, however for this data set it unfortunately is :(

Improving the accuracy of the cost estimates is exactly what I'm trying
to achieve, so far I've tried

- increasing the statistics target to the maximum setting with SET
STATISTICS 1000 on columns rid, nid and iid1 of answers, then
re-vacuuming

- adjusting random_page_cost downwards even more to try to make it
prefer index IO

- increasing cpu_tuple_cost up to 1.0 (from 0.01) to try to discourage
it from processing data rows

- decreating cpu_index_tuple_cost to 0.0005 (from 0.005) to encourage it
to spend more time in index rows.


None of the above had any effect :(

Interestingly, I didn't revacuum between changing the cost figues in the
config file (I did reload).  The cost figures from EXPLAIN did change,
but using the queries below the 'wrong' index always comes up just over
1/3rd more expensive.

Here's a query that matches actual data in the table; it sees the cost
of the 2-column index as much less, but it takes an order of magnitude
longer to run (using nid & iid1 means filtering through around 22,000
rows).  The three queries; first is the one that's used in practise;
against all 3 columns.  The next 2 queries use the two different
combinations of just 2 columns:

emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND
nid=253869913 AND iid1=535292129;
  QUERY
PLAN

--
 Index Scan using index_answers_nidiid1 on answers  (cost=0.00..114.14
rows=1 width=60) (actual time=0.168..790.262 rows=1 loops=1)
   Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
   Filter: (rid = 668332334)
 Total runtime: 790.305 ms
(4 rows)

emystery=# explain analyse select * FROM answers WHERE rid=668332334 AND
nid=253869913;
   QUERY
PLAN


 Index Scan using index_answers_ridnidiid1 on answers
(cost=0.00..152.49 rows=1 width=60) (actual time=0.120..0.141 rows=18
loops=1)
   Index Cond: ((rid = 668332334) AND (nid = 253869913))
 Total runtime: 0.207 ms
(3 rows)

emystery=# explain analyse select * FROM ecos_answers WHERE
nid=253869913 AND iid1=535292129;
 
QUERY PLAN


---
 Index Scan using index_ecos_answers_nidiid1 on ecos_answers
(cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
rows=21891 loops=1)
   Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
 Total runtime: 2424.769 ms
(3 rows)




What setting should I be tweaking to make the two column index more
expensive?  I thought I would need to either...

- inform the database that there are likely to be a lot of rows that
need filtering (SET STATISTICS [maximum]???)
or
- tell the database that filtering rows is expensive (cpu_tuple_cost++,
random_page_cost--  ???)

...but both of which I've already triedbut it would seem that from
the EXPLAIN output that the first one is key; it shouldn't think (for
the last example( that it's only going to get 1 row back!) but aside
from increasing SET STATISTICS what other setting is there to inform it
of this?


The only thing I can think that's remaining is to rebuild/replace the
preferred index, that should make it smaller and *if* the index cost
estimates are partly based on # disk pages that should reduce the
perceived cost of using it.  Is this the case?   Even if it does that
still doesn't fix the underlying problem.

Any help appreciated,

Many thanks

Shane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using the wrong index (very suboptimal), why?

2008-12-22 Thread Shane Wright
Hi,

I have somewhat of a quandary with a large table in my database;
PostgreSQL is choosing the 'wrong' index for a certain kind of query;
causing performance to become an order of magnitude slower (query times
usually measured in milliseconds now become seconds/minutes!).

It's not that it isn't using any index (so enable_seqscan=off doesn't
help), it's that the index it picks is suboptimal.

The query is based on 3 of the table columns - there is an index on all
three, but it prefers to use an index on just two of them, then
filtering by the 3rd (EXPLAIN output is below - showing the query as
well).

I've tried everything I can find to make it not do this (config
parameters, statistics target, vacuuming more...)  - what am I missing?

It happened a couple of times before - after some unusual activity
patterns in a client script - but repeatedly vacuuming until the planner
changed its mind made it go away.  We then made config changes
(random_page_cost and effective_cache_size) thinking that fixed the
reason why it made the bad decision in the first place...but it would
appear not

The only thing remaining I can think of is rebuilding the 'preferred'
index; that should reduce its size from 27Gb down to something more
manageable; maybe that will let the database want to use it more because
there'll be less I/O?

I've considered upgrading to 8.3.5 as well, but I can't see anything in
the changelogs that would fix this issue (I appreciate upgrading is a
good idea anyway, that will be done at some point soon anyway).


Hopefully all relevant info is listed below - if anyone's got any ideas
I'd appreciate any help or pointers anyone can give, thanks...



The server is PostgreSQL 8.3.0 on Linux with 32Gb RAM.  /var/lib/pgsql/
is on a fibre-channel SAN.  This table has around 680 million rows - and
has been reasonably regularly vacuumed, but is probably in dire need of
a VACUUM FULL and REINDEX to reclaim dead space (see disk space info at
the bottom of the post).


emystery=# \d answers
  Table "public.answers"
 Column |  Type   | Modifiers
+-+-
---
 aid| integer | not null default
nextval(('seq_answers_aid'::text)::regclass)
 rid| integer | not null
 nid| integer | not null
 iid1   | integer |
 iid2   | integer |
 iid3   | integer |
 text   | text|
 extra  | bigint  |
Indexes:
"answers_pkey" PRIMARY KEY, btree (aid)
"index_answers_iid1" btree (iid1) WHERE iid1 > 0
"index_answers_iid2" btree (iid2) WHERE iid2 > 0
"index_answers_iid3" btree (iid3) WHERE iid3 > 0
"index_answers_nidiid1" btree (nid, iid1)
"index_answers_ridnidiid1" btree (rid, nid, iid1)



This is what it is doing [slowly] - but for many values of rid/nid/iid1
there are a lot of rows to filter through (tens/hundreds of thousands)
so this can take many seconds or minutes:

emystery=# explain select * from answers where rid=1 and nid=2 and
iid1=3;

-
 Index Scan using index_answers_nidiid1 on answers  (cost=0.00..28.74
rows=1 width=62)
   Index Cond: ((nid = 2) AND (iid1 = 3))
   Filter: (rid = 1)



This is the pattern it *should* use (and does use on other installations
of similar/older data).  When this pattern is used the query always
completes in <1 second, usually ~0.2 seconds!


emystery20080821=# explain select * from answers where rid=1 and nid=2
and iid1=3;


 Index Scan using index_answers_ridnidiid1 on answers  (cost=0.00..99.04
rows=1 width=67)
   Index Cond: ((rid = 1) AND (nid = 2) AND (iid1 = 3))




The PostgreSQL configuration has been altered to help prefer random
lookups [via an index], and to indicate to the database how much data
the OS 

#seq_page_cost = 1.0# measured on an arbitrary scale
random_page_cost = 1.0  # same scale as above
#cpu_tuple_cost = 0.01  # same scale as above
#cpu_index_tuple_cost = 0.005   # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 31GB


The table has been VACUUM ANALYSE'd (repeatedly!) to no avail.

We've also increased the statistics target for the columns in the table
to 200 (from 10) which still isn't making any difference (I thought
because the table is so large and the dataset is quite uneven that this
should help...it's now at 1000 and vacuuming again so we might be
lucky...)

alter table answers alter column rid set statistics 200;
alter table answers alter column nid set statistics 200;
alter table answers alter column iid1 set statistics 200;



Here is the [full] output from the most recent VACUUM:

emystery=# vacuum verbose analyse answers;
INFO:  vacuuming "public.answers"
INFO:  scanned index "index_a

Re: [GENERAL] Upgrade to 8.3.0?

2008-02-29 Thread Shane Wright
 
hmm - I don't suppose the RHEL builds already include your patch against this 
issue?  If so I guess it would be useful to know :)
 
I've run through the steps you outlined to reproduce the fault on a new box 
(after having reproduced it successfully on test servers), and it does not show 
the problem:
 
 
[EMAIL PROTECTED] data]# !psql
psql test -U X
Welcome to psql 8.3.0, the PostgreSQL interactive terminal.
Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit
test=# create table foo as select null::int as x from generate_series(1,2000) x;
SELECT
test=# delete from foo;
DELETE 2000
test=# select count(*) from foo;
 count
---
 0
(1 row)
test=# vacuum full foo;
VACUUM
test=#
test=# set debug_assertions to on;
ERROR:  assertion checking is not supported by this build
test=#

 
The RPMs installed from are these:
 
postgresql-8.3.0-1PGDG.rhel4
postgresql-libs-8.3.0-1PGDG.rhel4
postgresql-server-8.3.0-1PGDG.rhel4

Thanks
 
Shane



From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Fri 29/02/2008 22:55
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Upgrade to 8.3.0? 



"Shane Wright" <[EMAIL PROTECTED]> writes:
> I'm assuming that the default RPMs for RHEL 4 (on ftp.postgresql.org) are not 
> built with --enable-cassert - and that I can veryify this by using the same 
> reproduction case you demonstrated in the bug history?

I believe they aren't, but you could check by seeing what pg_config
reports as the configure options; or even more directly by seeing if
it will let you "SET debug_assertions TO on".

regards, tom lane




Re: [GENERAL] Upgrade to 8.3.0?

2008-02-29 Thread Shane Wright
Tom
 
Many thanks :)
 
I've tried to find this out for myself but have failed :(
 
I'm assuming that the default RPMs for RHEL 4 (on ftp.postgresql.org) are not 
built with --enable-cassert - and that I can veryify this by using the same 
reproduction case you demonstrated in the bug history?
 
(I should just get the memory allocation error on attempting to vacuum full - 
instead of an assertion failing and bringing down the whole PostgreSQL server 
(panic?))
 
Please could you confirm my assertion (sic) is correct?
 
(I get this behaviour on my test server and all appears to be ok)
 
Kind regards
 
Shane
 



From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Fri 29/02/2008 19:46
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Upgrade to 8.3.0? 



Shane Wright writes:
> -  Is the HOT/VACUUMFULL bug above dangerous or is its biggest
> side-effect an error until a non-full vacuum is run?

As long as you don't build with --enable-cassert, that bug isn't real
nasty.  Especially if you don't do VACUUM FULL routinely ;-)

regards, tom lane




[GENERAL] Upgrade to 8.3.0?

2008-02-29 Thread Shane Wright
Hi,

 

I'm considering an upgrade to 8.3.0 for some internal databases - would
normally wait until at least .1 release of anything for safety but
there's a lot of nice sounding stuff here!

 

After trawling the bug logs and doing some testing on some servers here
all looks ok except for this bug around VACUUM FULL and HOT tables:

http://archives.postgresql.org/pgsql-bugs/2008-02/msg00068.php

 

So my questions are thus...

 

-  Is the HOT/VACUUMFULL bug above dangerous or is its biggest
side-effect an error until a non-full vacuum is run?

 

-  Has anyone spotted anyone else that would discourage an
upgrade at this stage?

 

If it helps, our use of SQL is fairly basic - nothing fancy, and nothing
Postgres-specific.

 

Any help appreciated,

 

Thanks

 

Shane

 



Re: [GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright

Aw :(

Its at the default of 8Mb.  The table contains 220 million rows and 6 indices.  
It has a few deleted rows...


If I change vacuum_mem I'll need to at least 'pg_ctl reload' - will it apply 
straightaway with the next vacuum query or does it need a full restart?

Does vacuum_mem need shared memory? (i.e. is it subject to the OS's limit) - 
have looked in the docs and googled but can't see detail on this



If I have managed to vacuum all the catalog tables, and my script has ensured 
all user tables other than this one have been vacuumed, then...   will the 
first pass of vacuum on this have set the xid to FrozenXID for all rows - i.e. 
is the table safe?


What's the relative safety of restarting this vacuum with a bigger vacuum_mem, 
say at the end of the week when traffic is quieter?


Basically if its just datfrozenxid that's not updated I can live with delaying 
the vacuum a few days.  But if things are more serious then obviously I can't 
wait.

Is it safe to say that if the catalog tables are ok and an individual tables 
has been vacuumed then its data is safe?


S



-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: 24 October 2006 15:52
To: Shane Wright
Cc: pgsql-general@postgresql.org; Martijn van Oosterhout
Subject: Re: [GENERAL] recovery from xid wraparound 


"Shane Wright" <[EMAIL PROTECTED]> writes:
> Incidentally, how many passes of a table can vacuum make!

Lots, especially if the table hasn't been vacuumed in a long time... Perhaps 
you should be using a higher maintenance_work_mem? (Um, in 7.4 make that 
vacuum_mem.)  Larger work memory translates directly to fewer passes over the 
indexes.

regards, tom lane


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright

Incidentally, how many passes of a table can vacuum make!  Its currently
on its third trip through the 20Gb of indices, meaning another 7 hours
till completion [of this table]!.

Assume it only does three passes?  (it chooses based on the table
continuing to be updated while vacuum is running)

S



-Original Message-
From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] 
Sent: 24 October 2006 10:24
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound


On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote:
> Anyway - not noticed any data loss yet and was hoping it would be such

> that if all tables had been vacuumed recently (including system 
> catalog tables), that there would be no remaining rows that would 
> appear to have a future xid and so the database should be ok?

Running vacuum is the right solution, but I think you have to let it
finish. In particular, in that version a database-wide vacuum has to
complete before it will update the datfrozenxid (it's not tracked per
table).

> a) is my assumption about the database being ok correct - assuming all

> tables have been vacuumed recently, including catalog tables?

Should be ok, but apparently you missed one, or didn't do a database
wide vacuum.

> b) is it possible to safely abort my whole table vacuum now so I can 
> run it at the weekend when there's less traffic?

Aborting vacuum is safe, but you have to do a database-wide vacuum at
some point.

> c) if I have experienced data loss, on the assumption all the table 
> structure remains (looks like it does), and I have a working backup 
> from before the xid wraparound (I do), can I just reinsert any 
> detected-missing data at the application level without needing a 
> dump/reload?

A VACUUM will recover any data that slipped beyond the horizon less than
1 billion transactions ago, which I think covers you completely. The
only issue is that unique indexes may be confused because new
conflicting data may have been inserted while the old data was
invisible. Only you can say if that's going to be an issue.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability 
> to litigate.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright
Hi

I'm running 7.4 on RHAS 4, and I think I've had a transaction id
wraparound issue.  Running the command below gives the suitably
worrying negative number:

emystery=# SELECT datname, age(datfrozenxid) FROM pg_database;
 datname  | age
--+-
 [maindbname] | -2081610471
 [otherdbname]  |  1075601025
 [otherdbname] |  1257289757
 [otherdbname]  |  1074582099
 [otherdbname]   |  1257289757


Which is weird - because I have vacuumed the database quite a lot -
both individual tables and I thought a vacuum of the whole database a
month or so ago.

Anyway - not noticed any data loss yet and was hoping it would be such
that if all tables had been vacuumed recently (including system catalog
tables), that there would be no remaining rows that would appear to
have a future xid and so the database should be ok?

Obviously I'm now doing the write thing with a vacuumdb -a - however
this has been running 9 hours  now and looks like at least 7 hours to
go just on this one monstrous table

in the interests of risk reduction I've just knocked up a script to run
ahead and quickl vacuum all the other tables.

But my questions are thus...

a) is my assumption about the database being ok correct - assuming all
tables have been vacuumed recently, including catalog tables?

b) is it possible to safely abort my whole table vacuum now so I can
run it at the weekend when there's less traffic?

c) if I have experienced data loss, on the assumption all the table
structure remains (looks like it does), and I have a working backup
from before the xid wraparound (I do), can I just reinsert any
detected-missing data at the application level without needing a
dump/reload?

Any help appreciated in this really not-fun time,

thanks

S


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright
Tom,

Thanks

But are there just 28 (the 28 that have been vacuumed), or are there more (in 
7.4).

Happy there's no guarantee, but would help to know any possible damager in my 
current situation,

Thanks

S



-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: 24 October 2006 15:23
To: Shane Wright
Cc: Martijn van Oosterhout; pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound 


"Shane Wright" <[EMAIL PROTECTED]> writes:
>> Just make sure you've really covered *all* the system tables.

> I've been under the impression system tables get done first, then 
> user(me)-created tables after -

No, there's no such guarantee.  A database-wide vacuum just does the tables in 
the order it finds 'em in pg_class.

regards, tom lane


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright
Martijn,

Thanks,

>Just make sure you've really covered *all* the system tables. If they
go you
>get really weird results.

I've been under the impression system tables get done first, then
user(me)-created tables after - which means my previous [aborted]
attempts at vacuuming them would have covered it, unless I'm missing
something?

(db was created by initdb, then pg_restore to load data into it,
database was then vacuumed before production work began)

I've looked at the list of catalog tables from 7.4's docs (URL below),
and all 28 have been processed in this vacuum, so presumably same order
for previous attempts:
http://www.postgresql.org/docs/7.4/static/catalogs.html

Checked with 'grep vacuuming vacuum.log | grep pg_ | grep toast -v |
sort | uniq'

Does this sound like a fair assumption?

(it is on the first database in the cluster, these aren't coming up from
other databases)

Many thanks for your help!

S


-Original Message-
From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] 
Sent: 24 October 2006 11:50
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound


On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote:
> If I was to abort this vacuum, given that all other tables are 
> vacuumed (including system catalog tables), what's the worst case 
> scenario? - given that more transactions are happening on the database

Only tables that havn't been vacuumed in the last billion transactions
are at risk. It's possible that if you've vacuumed that large table
recently by itself that all the data is actually safe, just the system
doesn't realise it.

Just make sure you've really covered *all* the system tables. If they go
you get really wierd results.

> If I understand correctly, it would be that some rows could disappear 
> from this large unvacuumed table if their xid was too old - but no 
> other consequence?

The VACUUM would make them reappear. To truly disappear they would have
to be 3 billion transactions old. That leaves the unique index issue I
mentioned.

> (fully aware that a db-wide vacuum is needed, but if it can [safely] 
> wait for the weekend that would be preferable)

That's risk-management. For example, all the really old tuples are
possibly near the beginning of the table, thus this current vacuum will
have fixed them already. But to get a handle on that you need to analyse
your tuple turnover and usage ratio.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability 
> to litigate.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright
Martin,

Thanks :)

>Running vacuum is the right solution, but I think you have to let it
>finish. In particular, in that version a database-wide vacuum has to
>complete before it will update the datfrozenxid (it's not tracked per
>table).
>> a) is my assumption about the database being ok correct - assuming
all 
>> tables have been vacuumed recently, including catalog tables?
>Should be ok, but apparently you missed one, or didn't do a database
wide
>vacuum.

Yes, probably missed this 220 million row beast that's still running
now..

If I was to abort this vacuum, given that all other tables are vacuumed
(including system catalog tables), what's the worst case scenario? -
given that more transactions are happening on the database 

If I understand correctly, it would be that some rows could disappear
from this large unvacuumed table if their xid was too old - but no other
consequence?

(fully aware that a db-wide vacuum is needed, but if it can [safely]
wait for the weekend that would be preferable)

Many thanks,

S



-Original Message-
From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED] 
Sent: 24 October 2006 10:24
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound


On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote:
> Anyway - not noticed any data loss yet and was hoping it would be such

> that if all tables had been vacuumed recently (including system 
> catalog tables), that there would be no remaining rows that would 
> appear to have a future xid and so the database should be ok?

Running vacuum is the right solution, but I think you have to let it
finish. In particular, in that version a database-wide vacuum has to
complete before it will update the datfrozenxid (it's not tracked per
table).

> a) is my assumption about the database being ok correct - assuming all

> tables have been vacuumed recently, including catalog tables?

Should be ok, but apparently you missed one, or didn't do a database
wide vacuum.

> b) is it possible to safely abort my whole table vacuum now so I can 
> run it at the weekend when there's less traffic?

Aborting vacuum is safe, but you have to do a database-wide vacuum at
some point.

> c) if I have experienced data loss, on the assumption all the table 
> structure remains (looks like it does), and I have a working backup 
> from before the xid wraparound (I do), can I just reinsert any 
> detected-missing data at the application level without needing a 
> dump/reload?

A VACUUM will recover any data that slipped beyond the horizon less than
1 billion transactions ago, which I think covers you completely. The
only issue is that unique indexes may be confused because new
conflicting data may have been inserted while the old data was
invisible. Only you can say if that's going to be an issue.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability 
> to litigate.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] recovery from xid wraparound

2006-10-24 Thread Shane Wright
HiI'm running 7.4 on RHAS 4, and I think I've 
had a transaction idwraparound issue in a stats database we have. 
 Running the command below gives the suitablyworrying negative 
number:[dbname]=# SELECT datname, age(datfrozenxid) FROM 
pg_database;    datname      |     
age--+- [maindbname]     
    | -2081610471 [otherdbname]      | 
 1075601025 [otherdbname] | 
 1257289757 [otherdbname]  | 
 1074582099 [otherdbname]   | 
 1257289757Which is weird - because I have vacuumed the 
database quite a lot -both individual tables and I thought a vacuum of the 
whole database amonth or so ago.Anyway - not noticed any data loss 
yet and was hoping it would be suchthat if all tables had been vacuumed 
recently (including system catalogtables), that there would be no remaining 
rows that would appear tohave a future xid and so the database should be 
ok?Obviously I'm now doing the write thing with a vacuumdb -a - 
howeverthis has been running 9 hours  now and looks like at least 7 
hours togo just on this one monstrous tablein the interests of risk 
reduction I've just knocked up a script to runahead and quickly vacuum all 
the other tables.But my questions are thus...a) is my assumption 
about the database being ok correct - assuming alltables have been vacuumed 
recently, including catalog tables?b) is it possible to safely abort my 
whole table vacuum now so I canrun it at the weekend when there's less 
traffic?c) if I have experienced data loss, on the assumption all the 
tablestructure remains (looks like it does), and I have a working 
backupfrom before the xid wraparound (I do), can I just reinsert 
anydetected-missing data at the application level without needing 
adump/reload?Any help appreciated in this really not-fun 
time,thanksS

 

Re: [GENERAL] mount -o async - is it safe?

2006-01-20 Thread Shane Wright
Hi Tom,

> > If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
> > journalling  keeps the filesystem clean [assuming other mount options
> > left at defaults], and then everything should be ok with either a server
> > crash, power failure, storage failure, whatever.  right?
>
> I checked around with some of Red Hat's kernel folk, and the bottom line
> seems to be that it's OK as long as you trust the hardware:

fabulous, thanks :)

> :> Question is, can fsync(2) be trusted to behave properly, ie, not return
> :> until all writes are down to disk, if the SAN is mounted -o async ?
> :
> : async is the default, which is the whole point of having things like
> : fsync, fdatasync, O_DIRECT, etc.  You can trust fsync as far as you can
> : trust the hardware.  The call will not return until the SAN says the
> : data has been written.
> :
> : In reality, the SAN is probably buffering these writes (possibly into
> : SRAM or battery-backed RAM), and the disks are probably buffering them
> : again, but you've got redundant power supplies and UPSs, right?

that sounds true (and it has) - but presumably this is the case whether we 
mount -o sync or not?   I.e. if its going to buffer, then its going to do so 
whether its postgres or the kernel sync'ing the writes?

(specifically that the SAN likely buffers anyway - IMO having to trust the 
hardware to some degree is a given ;)

Cheers

Shane


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] mount -o async - is it safe?

2006-01-19 Thread Shane Wright
Hi,

thanks :)

> > If -o async means "all I/O is asyncronous except stuff explicitly
> > fsync()ed" you're fine. Otherwise...
>
> That's the way it works.  Async is the default setting for most
> filesystems, but fsync() is always honored, at last as far as
> non-lying hardware will allow.  :)

That sounds good :)

ext's journalling should take care of the rest I guess - does that sound ok?  
I have read in various places I think that pgSQL doesn't need any 
directory-level operations in keeping WAL up to date so provided the ext3 
partition remains mountable then the database should be fine,

> > The usual advice is to stick the WAL on a properly synced partition and
> > stick the rest somewhere else. Note, I have no experience with this,
> > it's just what I've heard.
>
> This might not be optimal, as having every write synchronous actually
> results in more synced writes than are strictly necessary.

Actually I thought that *all* the database had to have fsync() work correctly; 
not for integrity on failed transactions, but to maintain integrity during 
checkpointing as well.  But I could well be wrong!

thanks,

Shane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] mount -o async - is it safe?

2006-01-19 Thread Shane Wright
Hi,

We've recently set up our database (7.4.9) with our new hosting provider. 
We have two database servers running RHEL 4 in a cluster; one active and
one hot-spare.  They share a [fibre-channel connected] SAN partition; the
active server has it mounted.


Now my question is this; the provider has, by default, mounted it with -o
sync; so all reads/writes are synchronous.  This doesn't result in the
greatest of performance, and indeed remounting -o async is significantly
faster.

They tell me this is so mySQL databases don't get corrupted in the event of
a crash.  which is fine...

But as Postgres uses fsync() to force committed transactions to disk, then
this shouldn't be necessary, right?

(I know this is based on the assumption the SAN doesn't lie about its syncs, 
but then surely it would lie to the kernel with -o sync anyway?)


If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
journalling  keeps the filesystem clean [assuming other mount options left at 
defaults], and then everything should be ok with either a server crash, power
failure, storage failure, whatever.  right?


I've googled and come up with some info; the most relevant of
which is here:
http://archives.postgresql.org/pgsql-general/2003-11/msg01515.php
http://archives.postgresql.org/pgsql-general/2003-11/msg01592.php


If anyone can confirm either way that'd be great - or even just point me in 
the direction of enough firm info to work it out myself ;)

Thanks,

Shane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] weird quote bug

2003-06-30 Thread Shane Wright

> > They are identical!  I can't work out whats going on!  Please, if anyone
> > can see what's wrong it'll stop me careering into my early grave!
>
> I've never tried this but the docs for LIKE (secfion 6.6.1 in the 7.3
> docs) say that to match a literal \ you need to type . An alternative
> might be to try adding an ESCAPE '' clause at the end (also in the same
> doc section).

Ah.  I had s read that bit of the manual - must have missed it.  Phew it works now!

Thanks (and thanks to the others who answered my question!)

Cheers

Shane


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] weird quote bug

2003-06-30 Thread Shane Wright


Should have said - I'm using postgreSQL 7.3.3 on Gentoo Linux.  The problem occurs 
both through psql and through PHP4.3.3.

Cheers

Shane

On Monday 30 Jun 2003 12:25 pm, Shane Wright wrote:
> Hi
>
> This is really driving me silly - I can't work it out, can anyone see what
> I'm doing thats stupid and causing this not to match?
>
>
> This shows that the row exists in the table:
>
>
> emystery=> select aid,useragent from useragent where useragent like
> '%ntserver-ps%'; aid|   
> useragent
> ---+---
>-- 875741007 | Mozilla/4.0
> (compatible; MSIE 6.0; Windows NT 5.0;
> file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row)
>
>
> this tries to select it (note that I've escaped the backslashes):
>
>
> emystery=> select * from useragent where useragent ilike 'Mozilla/4.0
> (compatible; MSIE 6.0; Windows NT 5.0;
> file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; aid |
> useragent | date_added | data_browser | data_version | data_os
> -+---++--+--+- (0
> rows)
>
>
> no match!  This one encodes the backslashes (\xxx octal for ASCII value) in
> a different way:
>
>
> emystery=> select * from useragent where useragent ilike 'Mozilla/4.0
> (compatible; MSIE 6.0; Windows NT 5.0;
> file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)';
> aid | useragent | date_added | data_browser | data_version | data_os
> -+---++--+--+- (0
> rows)
>
>
> again no match!  And to show that the above queries were correct:
>
>
> emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;
> file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)'; ?column?
> ---
>-- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
> 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row)
>
> emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;
> file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)';
> ?column?
> ---
>-- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
> 5.0; file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins) (1 row)
>
>
>
> They are identical!  I can't work out whats going on!  Please, if anyone
> can see what's wrong it'll stop me careering into my early grave!
>
> Thanks
>
> Shane
>
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] weird quote bug

2003-06-30 Thread Shane Wright
Hi

This is really driving me silly - I can't work it out, can anyone see what I'm doing 
thats stupid and causing this not to match?


This shows that the row exists in the table:


emystery=> select aid,useragent from useragent where useragent like '%ntserver-ps%';
aid|useragent
---+-
 875741007 | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins)
(1 row)


this tries to select it (note that I've escaped the backslashes):


emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 (compatible; 
MSIE 6.0; Windows NT 5.0; file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)';
 aid | useragent | date_added | data_browser | data_version | data_os
-+---++--+--+-
(0 rows)


no match!  This one encodes the backslashes (\xxx octal for ASCII value) in a 
different way:


emystery=> select * from useragent where useragent ilike 'Mozilla/4.0 (compatible; 
MSIE 6.0; Windows NT 5.0; 
file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)';
 aid | useragent | date_added | data_browser | data_version | data_os
-+---++--+--+-
(0 rows)


again no match!  And to show that the above queries were correct:


emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
file://ntserver-ps\\publicsw\\nt\\ie6\\ins\\active.ins)';
?column?
-
 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins)
(1 row)

emystery=> select 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
file://\134\134ntserver-ps\134publicsw\134nt\134ie6\134ins\134active.ins)';
?column?
-
 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
file://\\ntserver-ps\publicsw\nt\ie6\ins\active.ins)
(1 row)



They are identical!  I can't work out whats going on!  Please, if anyone can see 
what's wrong it'll stop me careering into my early grave!

Thanks

Shane


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] backend stalls

2001-09-27 Thread Shane Wright

Hi

I'm tearing my hair out here, the backend seems to randomly decide to stall 
and not serve any more queries (processes are listed as waiting).

It happens on all sorts of queries (including CREATE TABLE and CREATE INDEX).

I have completely uninstalled and deleted the whole installation and 
reinstalled - only to have it do it again mid way through entering the SQL to 
rebuild my database (not a big one ~25 tables, and no rows to speak of - 
havent got that far yet).

I was thinking corrupted tables - or corrupt system catalog, but even 
vacuumdb stalls as well...

Any help would be appreciated, I have so much work to do and I cant do 
anything without the database working :(

My install is Postgres 7.0.3-8 from the RedHat RPMs runing on RH7.1

I know I havent given much specific information, I dont really know where to 
start.  What its doing at this very minute is hanging on an insert into a 
freshly created table on an installation thats only existed for 30 minutes 
and only just had all the tables created.

Thanks

Shane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster