Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-12-10 Thread Henry Drexler
On Sun, Dec 9, 2012 at 7:16 PM, Jeff Janes  wrote:

> The obvious difference is that this one finds all 5 buffers it needs
> in buffers already, while the first one had to read them in.  So this
> supports the idea that your data has simply grown too large for your
> RAM.
>
> Cheers,
>
> Jeff
>


Jeff thanks for that explanation and taking the time to expose me to the
explain analyze.  I am currently reading through the docs so I can use them
and understand them.

Thank you again for all of your help.


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-12-09 Thread Jeff Janes
On Mon, Dec 3, 2012 at 5:56 AM, Henry Drexler  wrote:
> On Sun, Dec 2, 2012 at 12:44 AM, Jeff Janes  wrote:
>>
>> Could you do it for the recursive
>> SQL (the one inside the function) like you had previously done for the
>> regular explain?
>>
>> Cheers,
>>
>> Jeff
>
>
> Here they are:
>
> for the 65 million row table:
> "Index Scan using ctn_source on massive  (cost=0.00..189.38 rows=1 width=28)
> (actual time=85.802..85.806 rows=1 loops=1)"
> "  Index Cond: (ctn = 1302050134::bigint)"
> "  Filter: (dateof <@ '["2012-07-03 14:00:00","2012-07-10
> 14:00:00"]'::tsrange)"
> "  Buffers: shared read=6"
> "Total runtime: 85.891 ms"

If you execute it repeatedly (so that the data is in buffers the next
time) does it then get faster?

> for the 30 million row table:
> "Index Scan using ctn_dateof on massive  (cost=0.00..80.24 rows=1 width=24)
> (actual time=0.018..0.020 rows=1 loops=1)"
> "  Index Cond: (ctn = 1302050134::bigint)"
> "  Filter: (dateof <@ '[2012-07-03,2012-07-11)'::daterange)"
> "  Buffers: shared hit=5"
> "Total runtime: 0.046 ms"

The obvious difference is that this one finds all 5 buffers it needs
in buffers already, while the first one had to read them in.  So this
supports the idea that your data has simply grown too large for your
RAM.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-12-03 Thread Henry Drexler
On Sun, Dec 2, 2012 at 12:44 AM, Jeff Janes  wrote:

> Could you do it for the recursive
> SQL (the one inside the function) like you had previously done for the
> regular explain?
>
> Cheers,
>
> Jeff
>

Here they are:

for the 65 million row table:
"Index Scan using ctn_source on massive  (cost=0.00..189.38 rows=1
width=28) (actual time=85.802..85.806 rows=1 loops=1)"
"  Index Cond: (customer_id = ::bigint)"
"  Filter: (dateof <@ '["2012-07-03 14:00:00","2012-07-10
14:00:00"]'::tsrange)"
"  Buffers: shared read=6"
"Total runtime: 85.891 ms"



for the 30 million row table:
"Index Scan using ctn_dateof on massive  (cost=0.00..80.24 rows=1 width=24)
(actual time=0.018..0.020 rows=1 loops=1)"
"  Index Cond: (customer_id = ::bigint)"
"  Filter: (dateof <@ '[2012-07-03,2012-07-11)'::
daterange)"
"  Buffers: shared hit=5"
"Total runtime: 0.046 ms"


Thank you.


On Sun, Dec 2, 2012 at 12:44 AM, Jeff Janes  wrote:

> On Fri, Nov 30, 2012 at 12:22 PM, Henry Drexler 
> wrote:
> > On Fri, Nov 30, 2012 at 1:42 PM, Jeff Janes 
> wrote:
> >>
> >> Can you report the EXPLAIN (ANALYZE, BUFFERS) instead?
> >
> >
> > Thanks, here they are:
> >
> > for the approx 65 million row approx 50 min version:
> >
> > EXPLAIN (ANALYZE, BUFFERS)
> > select
> > massive_expansion(ctn,the_range)
> > from
> > critical_visitors;
> >
> > "Seq Scan on critical_visitors  (cost=0.00..168722.28 rows=628778
> width=40)
> > (actual time=0.655..3003921.066 rows=628778 loops=1)"
> > "  Buffers: shared hit=4513040 read=1591722 dirtied=5234 written=10"
> > "Total runtime: 3004478.053 ms"
> >
> >
> > for the approx 30 million row approx 4 min version:
> >
> > EXPLAIN (ANALYZE, BUFFERS)
> > select
> > massive_expansion(ctn,the_range)
> > from
> > critical_visitors;
> >
> > "Seq Scan on critical_visitors  (cost=0.00..746587.90 rows=2782315
> width=40)
> > (actual time=393.001..277108.379 rows=2782315 loops=1)"
> > "  Buffers: shared hit=26370078 read=400301 dirtied=33772 written=1030"
> > "Total runtime: 278988.544 ms"
> >
>
> I can't much sense out of those.  Could you do it for the recursive
> SQL (the one inside the function) like you had previously done for the
> regular explain?
>
> Cheers,
>
> Jeff
>


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-12-03 Thread Henry Drexler
On Sun, Dec 2, 2012 at 12:44 AM, Jeff Janes  wrote:

> Could you do it for the recursive
> SQL (the one inside the function) like you had previously done for the
> regular explain?
>
> Cheers,
>
> Jeff
>

Here they are:

for the 65 million row table:
"Index Scan using ctn_source on massive  (cost=0.00..189.38 rows=1
width=28) (actual time=85.802..85.806 rows=1 loops=1)"
"  Index Cond: (ctn = 1302050134::bigint)"
"  Filter: (dateof <@ '["2012-07-03 14:00:00","2012-07-10
14:00:00"]'::tsrange)"
"  Buffers: shared read=6"
"Total runtime: 85.891 ms"



for the 30 million row table:
"Index Scan using ctn_dateof on massive  (cost=0.00..80.24 rows=1 width=24)
(actual time=0.018..0.020 rows=1 loops=1)"
"  Index Cond: (ctn = 1302050134::bigint)"
"  Filter: (dateof <@ '[2012-07-03,2012-07-11)'::daterange)"
"  Buffers: shared hit=5"
"Total runtime: 0.046 ms"


Thank you.


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-12-01 Thread Jeff Janes
On Fri, Nov 30, 2012 at 12:22 PM, Henry Drexler  wrote:
> On Fri, Nov 30, 2012 at 1:42 PM, Jeff Janes  wrote:
>>
>> Can you report the EXPLAIN (ANALYZE, BUFFERS) instead?
>
>
> Thanks, here they are:
>
> for the approx 65 million row approx 50 min version:
>
> EXPLAIN (ANALYZE, BUFFERS)
> select
> massive_expansion(ctn,the_range)
> from
> critical_visitors;
>
> "Seq Scan on critical_visitors  (cost=0.00..168722.28 rows=628778 width=40)
> (actual time=0.655..3003921.066 rows=628778 loops=1)"
> "  Buffers: shared hit=4513040 read=1591722 dirtied=5234 written=10"
> "Total runtime: 3004478.053 ms"
>
>
> for the approx 30 million row approx 4 min version:
>
> EXPLAIN (ANALYZE, BUFFERS)
> select
> massive_expansion(ctn,the_range)
> from
> critical_visitors;
>
> "Seq Scan on critical_visitors  (cost=0.00..746587.90 rows=2782315 width=40)
> (actual time=393.001..277108.379 rows=2782315 loops=1)"
> "  Buffers: shared hit=26370078 read=400301 dirtied=33772 written=1030"
> "Total runtime: 278988.544 ms"
>

I can't much sense out of those.  Could you do it for the recursive
SQL (the one inside the function) like you had previously done for the
regular explain?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-11-30 Thread Henry Drexler
On Fri, Nov 30, 2012 at 1:23 PM, Kevin Grittner  wrote:

> Henry Drexler wrote:
>
> > why would the query time go from 4 minutes to over 50, for an
> > increase in table rows from 30 million to 65 million?
>
> Did the active (frequently referenced) portion of the database go
> from something which fit in cache to something which didn't? Did
> any hash table or sort nodes in plans go from fitting in work_mem
> to spilling to disk? Did any indexes need an extra level in the
> tree? Did any plans change based on size to something which is less
> than optimal, suggesting a need to tune the cost factors?
>
> -Kevin
>

Thank you for the list - I will research those in the manual.


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-11-30 Thread Henry Drexler
On Fri, Nov 30, 2012 at 1:42 PM, Jeff Janes  wrote:

> Can you report the EXPLAIN (ANALYZE, BUFFERS) instead?


Thanks, here they are:

for the approx 65 million row approx 50 min version:

EXPLAIN (ANALYZE, BUFFERS)
select
massive_expansion(ctn,the_range)
from
critical_visitors;

"Seq Scan on critical_visitors  (cost=0.00..168722.28 rows=628778 width=40)
(actual time=0.655..3003921.066 rows=628778 loops=1)"
"  Buffers: shared hit=4513040 read=1591722 dirtied=5234 written=10"
"Total runtime: 3004478.053 ms"


for the approx 30 million row approx 4 min version:

EXPLAIN (ANALYZE, BUFFERS)
select
massive_expansion(ctn,the_range)
from
critical_visitors;

"Seq Scan on critical_visitors  (cost=0.00..746587.90 rows=2782315
width=40) (actual time=393.001..277108.379 rows=2782315 loops=1)"
"  Buffers: shared hit=26370078 read=400301 dirtied=33772 written=1030"
"Total runtime: 278988.544 ms"


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-11-30 Thread Jeff Janes
On Fri, Nov 30, 2012 at 5:22 AM, Henry Drexler  wrote:
> Hello, and thank you in advance.
>
>
> Beyond the date vs timestamp troubleshooting I did, I am not sure what else
> to look for, I know the increase of rows will have some affect but I just
> don't think the query should go from 4 minutes to over 50.

If the doubling of the size causes it to exceed the cache, when before
it did not, that could easily explain it.

...
> and
> massive.dateof <@ '(2012-07-22 17:00:00,2012-07-29 17:00:00]'::tsrange;

I don't think the <@ can use the btree index, but if you wrote it as a
"BETWEEN" it could.


> With a query plan of:
> "Index Scan using customer_id_sourcee on massive_m  (cost=0.00..113.98
> rows=1 width=28)"

Can you report the EXPLAIN (ANALYZE, BUFFERS) instead?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-11-30 Thread Kevin Grittner
Henry Drexler wrote:

> why would the query time go from 4 minutes to over 50, for an
> increase in table rows from 30 million to 65 million?

Did the active (frequently referenced) portion of the database go
from something which fit in cache to something which didn't? Did
any hash table or sort nodes in plans go from fitting in work_mem
to spilling to disk? Did any indexes need an extra level in the
tree? Did any plans change based on size to something which is less
than optimal, suggesting a need to tune the cost factors?

-Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance, though it was timestamps,maybe just table size?

2012-11-30 Thread Henry Drexler
On Fri, Nov 30, 2012 at 8:22 AM, Henry Drexler  wrote:

> Hello, and thank you in advance.
>
>
> Beyond the date vs timestamp troubleshooting I did,
>

I realize this could be confusing - since I ruled out that difference, the
real question is - given this setup, why would the query time go from 4
minutes to over 50, for an increase in table rows from 30 million to 65
million?


Re: [GENERAL] Query performance help with 'shadow table' approach.

2011-09-14 Thread Alban Hertroys
On 14 Sep 2011, at 20:45, Brian Fehrle wrote:

>> That is only about 1/30th of your table. I don't think a seqscan makes sense 
>> here unless your data is distributed badly.
>> 
> Yeah the more I look at it, the more I think it's postgres _thinking_ that 
> it's faster to do a seqential scan. I'll be playing with the random_page_cost 
> that Ondrej suggested, and schedule a time where I can do some explain 
> analyzes (production server and all).

Before you do that, turn off seqscans (there's a session option for that) and 
see if index scans are actually faster.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query performance help with 'shadow table' approach.

2011-09-14 Thread Brian Fehrle

On 09/14/2011 01:10 AM, Alban Hertroys wrote:

On 13 Sep 2011, at 23:44, Brian Fehrle wrote:


These queries basically do a 'select max(primary_key_column) from table group by 
column1, column2." Because of the group by, we would result in a sequential 
scan of the entire table which proves to be costly.

That seems to suggest a row where the primary key that has the max value is "special" in 
some way. Making them more easily distinguishable from "normal" rows seems like a good 
idea here.


Since the table has a ton of columns, I set up a smaller table that will house 
a copy of some of the data that the query uses, the Primary Key colum, and the 
two columns I do my 'group by' on.

That's one way to distinguish these special rows from the rest. You could also 
mark them as special using an extra column and/or create an expression-based 
index over just those rows.

However, especially with the below section in mind, it would appear your data 
could be normalised a bit more (your splitting off that shadow table is a step 
in doing so, in fact).

I'm also wondering, does your primary key have actual meaning? It would appear 
to just indicate the order in which the records were created (I'm assuming it's 
a serial type surrogate PK, and not a natural one).


It isn't a serial type, and the id increment is handled by the application.

This shadow table will also only contain one row for every column1 and column2 
combination (due to the group by), and for those rows, will have the max of the 
primary key. Even with this, the 'shadow' table will have about 14 million 
rows, compared to the 15 million in the main table.

Don't (column1, column2) make up a key then? I get the feeling you should split 
your table in 3 sections:
Table 1: main lookup (PK: pkey_sid)
Table 2: Variation lookup (PK: (column1, column2), FK: pkey_sid)
Table 3: Data (FK: the above)

(column1, column2) could possibly have multiple occurrences of the 
combination. Such as, 4 rows where column1 = 54 and column2 = 86, in 
these cases with multiple rows, I just want the one with the 
max(primary_key).


I'm looking into options like this, but at this moment changing the base 
table structure is out of the question, but adding tables along the side 
to try to speed things up is ok. Im trying to not cause changes in the 
application.

So the issue here comes in retrieving the needed data from my main table. The 
resulting rows is estimated to be 409,600, and the retrieving of the primary 
key's that are associated with those rows is actually really easy. However, 
when we take those 409,600 rows back to the main table to retrieve the other 
columns I need, the planner is just doing a sequential scan as it's most likely 
going to be faster than hitting the index then retrieving the columns I need 
for all 400K+ rows.

Is that estimate accurate? If not, see Ondrej's suggestions.

That is only about 1/30th of your table. I don't think a seqscan makes sense 
here unless your data is distributed badly.

Yeah the more I look at it, the more I think it's postgres _thinking_ 
that it's faster to do a seqential scan. I'll be playing with the 
random_page_cost that Ondrej suggested, and schedule a time where I can 
do some explain analyzes (production server and all).

Things to note:
1. If I reduce my where clause's range, then the sequential scan turns into an 
index scan, but sadly this can't always be done.

Does it make sense to CLUSTER your data in some sense? That would improve the 
data distribution issue and would probably push the threshold for a seqscan up 
some.

Cheers,

Alban Hertroys


Thanks, I'll be reporting back in with my next findings.

- Brian F

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query performance help with 'shadow table' approach.

2011-09-14 Thread Alban Hertroys
On 13 Sep 2011, at 23:44, Brian Fehrle wrote:

> These queries basically do a 'select max(primary_key_column) from table group 
> by column1, column2." Because of the group by, we would result in a 
> sequential scan of the entire table which proves to be costly.

That seems to suggest a row where the primary key that has the max value is 
"special" in some way. Making them more easily distinguishable from "normal" 
rows seems like a good idea here.

> Since the table has a ton of columns, I set up a smaller table that will 
> house a copy of some of the data that the query uses, the Primary Key colum, 
> and the two columns I do my 'group by' on.

That's one way to distinguish these special rows from the rest. You could also 
mark them as special using an extra column and/or create an expression-based 
index over just those rows.

However, especially with the below section in mind, it would appear your data 
could be normalised a bit more (your splitting off that shadow table is a step 
in doing so, in fact). 

I'm also wondering, does your primary key have actual meaning? It would appear 
to just indicate the order in which the records were created (I'm assuming it's 
a serial type surrogate PK, and not a natural one). 

> This shadow table will also only contain one row for every column1 and 
> column2 combination (due to the group by), and for those rows, will have the 
> max of the primary key. Even with this, the 'shadow' table will have about 14 
> million rows, compared to the 15 million in the main table.

Don't (column1, column2) make up a key then? I get the feeling you should split 
your table in 3 sections:
Table 1: main lookup (PK: pkey_sid)
Table 2: Variation lookup (PK: (column1, column2), FK: pkey_sid) 
Table 3: Data (FK: the above)

> So the issue here comes in retrieving the needed data from my main table. The 
> resulting rows is estimated to be 409,600, and the retrieving of the primary 
> key's that are associated with those rows is actually really easy. However, 
> when we take those 409,600 rows back to the main table to retrieve the other 
> columns I need, the planner is just doing a sequential scan as it's most 
> likely going to be faster than hitting the index then retrieving the columns 
> I need for all 400K+ rows.

Is that estimate accurate? If not, see Ondrej's suggestions.

That is only about 1/30th of your table. I don't think a seqscan makes sense 
here unless your data is distributed badly.

> Things to note:
> 1. If I reduce my where clause's range, then the sequential scan turns into 
> an index scan, but sadly this can't always be done.

Does it make sense to CLUSTER your data in some sense? That would improve the 
data distribution issue and would probably push the threshold for a seqscan up 
some.

Cheers,

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query performance help with 'shadow table' approach.

2011-09-13 Thread Ondrej Ivanič
Hi,

On 14 September 2011 07:44, Brian Fehrle  wrote:
> 2. I have appropriate indexes where they need to be. The issue is in the
> query planner not using them due to it (i assume) just being faster to scan
> the whole table when the data set it needs is as large as it is.

Try to reduce random_page cost to 2, which biased planner towards
index scans,  (set random_page = 2 before the query; assuming that
default seq_page_cost and random_page_cost are 1 and 4 respectively)
and run "explain analyze". Sometimes is worth to disable nested loops
join (set enable_nestloop = off). Finally you can increase
default_statistics_target (or ALTER TABLE SET STATISTICS) to 100 (8.4
has this as a default) on selected columns or table (and run analyze
on that table).

-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Cox, Brian

> [ scratches head... ]  Your example command works as expected for me.

> [ rereads thread... ]  Oh, you're running 8.1.  I think you have to
> do the command as a superuser to get that output in 8.1.  Later versions
> are less picky.

Yes, with the right incantations, the FSM information does appear. The perils
of being a bit behind the times, I guess.

Thanks for your help,
Brian






Re: [GENERAL] query performance

2008-04-24 Thread Tom Lane
Brian Cox <[EMAIL PROTECTED]> writes:
> I've already posted the tail of this output previously.
> I conclude that these lines are not in this file. Where
> did they go?

[ scratches head... ]  Your example command works as expected for me.

[ rereads thread... ]  Oh, you're running 8.1.  I think you have to
do the command as a superuser to get that output in 8.1.  Later versions
are less picky.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Brian Cox

Tom Lane [EMAIL PROTECTED] wrote:


At the very end ... you're looking for these messages:

ereport(elevel,
(errmsg("free space map contains %d pages in %d relations",
storedPages, numRels),
errdetail("A total of %.0f page slots are in use (including 
overhead).\n"

  "%.0f page slots are required to track all free space.\n"
  "Current limits are:  %d page slots, %d relations, using %.0f 
kB.",

  Min(needed, MaxFSMPages),
  needed,
  MaxFSMPages, MaxFSMRelations,
  (double) FreeSpaceShmemSize() / 1024.0)));

if (numRels == MaxFSMRelations)
ereport(elevel,
(errmsg("max_fsm_relations(%d) equals the number of 
relations checked",

MaxFSMRelations),
 errhint("You have at least %d relations.  "
 "Consider increasing the configuration 
parameter \"max_fsm_relations\".",

 numRels)));
else if (needed > MaxFSMPages)
ereport(elevel,
(errmsg("number of page slots needed (%.0f) exceeds 
max_fsm_pages (%d)",

needed, MaxFSMPages),
 errhint("Consider increasing the configuration 
parameter \"max_fsm_pages\" "

 "to a value over %.0f.", needed)));

regards, tom lane



The following greps of the vacuum verbose output return no lines:

fgrep -i fsm
fgrep 'free space'
fgrep 'page slots'
fgrep 'relations'

I've already posted the tail of this output previously.
I conclude that these lines are not in this file. Where
did they go?

Thanks,
Brian

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Tom Lane
Brian Cox <[EMAIL PROTECTED]> writes:
> Any hints as to where the FSM info is in this file?

At the very end ... you're looking for these messages:

ereport(elevel,
(errmsg("free space map contains %d pages in %d relations",
storedPages, numRels),
errdetail("A total of %.0f page slots are in use (including overhead).\n"
  "%.0f page slots are required to track all free space.\n"
  "Current limits are:  %d page slots, %d relations, using %.0f kB.",
  Min(needed, MaxFSMPages),
  needed,
  MaxFSMPages, MaxFSMRelations,
  (double) FreeSpaceShmemSize() / 1024.0)));

if (numRels == MaxFSMRelations)
ereport(elevel,
(errmsg("max_fsm_relations(%d) equals the number of relations 
checked",
MaxFSMRelations),
 errhint("You have at least %d relations.  "
 "Consider increasing the configuration parameter 
\"max_fsm_relations\".",
 numRels)));
else if (needed > MaxFSMPages)
ereport(elevel,
(errmsg("number of page slots needed (%.0f) exceeds 
max_fsm_pages (%d)",
needed, MaxFSMPages),
 errhint("Consider increasing the configuration parameter 
\"max_fsm_pages\" "
 "to a value over %.0f.", needed)));

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Brian Cox

Scott Marlowe [EMAIL PROTECTED] wrote:

There's bits spread throughout the file, but the summary is at the bottom.


Here's a tail of the 'vacuum verbose' output:

INFO:  vacuuming "pg_toast.pg_toast_797619965"
INFO:  index "pg_toast_797619965_index" now contains 0 row versions in 1 
pages

DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_797619965": found 0 removable, 0 nonremovable row 
versions in 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

I don't see anything that looks like a "summary".

Thanks,
Brian




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Scott Marlowe
On Thu, Apr 24, 2008 at 6:23 PM, Brian Cox <[EMAIL PROTECTED]> wrote:
> Tom Lane [EMAIL PROTECTED] wrote:
>
> > You need a database-wide vacuum verbose (not just 1 table) to get that
> > output ...
> >
>
>  I ran:
>
>  > pgsql -U admin -d cemdb -c 'vacuum verbose' > /tmp/pgvac.log 2>&1
>
>  the output file has 2593 lines and, while I haven't looked at all of them,
> a:
>
>  > fgrep -i fsm /tmp/pgvac.log
>
>  returns no lines.
>
>  Any hints as to where the FSM info is in this file?

There's bits spread throughout the file, but the summary is at the bottom.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Brian Cox

Tom Lane [EMAIL PROTECTED] wrote:

You need a database-wide vacuum verbose (not just 1 table) to get that
output ...


I ran:

> pgsql -U admin -d cemdb -c 'vacuum verbose' > /tmp/pgvac.log 2>&1

the output file has 2593 lines and, while I haven't looked at all of 
them, a:


> fgrep -i fsm /tmp/pgvac.log

returns no lines.

Any hints as to where the FSM info is in this file?

Thanks,
Brian


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Tom Lane
Brian Cox <[EMAIL PROTECTED]> writes:
> I read in another thread that vacuum verbose would tell me how much FSM 
> is needed, but I ran it and didn't see any output about this.

You need a database-wide vacuum verbose (not just 1 table) to get that
output ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-24 Thread Brian Cox

Tom Lane [EMAIL PROTECTED] wrote:

I suspect that your FSM settings are too low, causing free space found
by VACUUM to be forgotten about.


I read in another thread that vacuum verbose would tell me how much FSM 
is needed, but I ran it and didn't see any output about this. What is 
the way to determine how much FSM is needed (other than wait for hints 
in the log).


  You might also need to consider

vacuuming more than once a day (there's a tradeoff between how often
you vacuum and how much FSM space you need).


What is the trade-off? Anyway to predict how much more (presumably) FSM 
is needed if you vacuum more often?


Thanks,
Brian

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-04-23 Thread Tom Lane
Brian Cox <[EMAIL PROTECTED]> writes:
> I have a largish (pg_dump output is 4G) database. The query:
> select count(*) from some-table
> was taking 120 secs to report that there were 151,000+ rows.
> This seemed very slow. This db gets vacuum'd regularly (at least once
> per day). I also did a manual 'vacuum analyze', but after it completed,
> the query ran no faster. However, after dumping the database and 
> recreating it from the backup, the same query takes 2 secs.

> Why the dramatic decrease?

Presumably, the table was really bloated (lots of unused space).

> Would 'vacuum full' have achieved the
> same performance improvements?

It would've compacted the table all right, but probably left the indexes
worse off.

> Is there anything else that needs to be done
> regularly to  prevent this performance degradation?

I suspect that your FSM settings are too low, causing free space found
by VACUUM to be forgotten about.  You might also need to consider
vacuuming more than once a day (there's a tradeoff between how often
you vacuum and how much FSM space you need).

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query performance

2008-01-17 Thread Alban Hertroys

On Jan 14, 2008, at 3:58 AM, pepone.onrez wrote:

I have this query in a table with 150 thowsand tuples and it takes  
to long


t_documentcontent._id AS _id
FROM t_documentcontent LIMIT 50 OFFSET 8



You want an ORDER BY there. Not only will it probably speed things  
up, without it there's no guaranteed order in the results returned.


As table records have no specific order and updates and inserts on  
that table take the first free position, you risk ending up showing  
some records twice in your set (same id on different 'pages') and  
missing others because you've already skipped past them when there's  
concurrent access.


There's still some risk for that if you order, but as you seem to  
order on a sequence-generated column, updates aren't a problem for  
you and inserts end up at the end anyway. But only if you order them.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,478f1e139491365710960!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] query performance

2008-01-13 Thread Alex Turner
Oh - if you do this then make sure that you have the primary key index on
overview too.

Alex

On Jan 14, 2008 12:53 AM, Alex Turner <[EMAIL PROTECTED]> wrote:

> If you combine it with Tom Lane's suggestion - it will go even better,
> something like:
>
> select * from t_documentcontent where _id in (select _id from overview
> where _id>x order by _id limit 50);
>
> Alex
>
>
> On Jan 13, 2008 11:59 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:
>
> >
> > Thanks Alex
> >
> > I test your solution and is realy more faster.
> >
> > Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=
> > 101.695..106.178 rows=50 loops=1)
> >   ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual
> > time=101.509..101.567 rows=50 loops=1)
> > ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50
> > width=108) (actual time=101.327..101.456 rows=50 loops=1)
> >   ->  Limit  (cost=1741.60..1742.69 rows=50 width=108)
> > (actual time=101.313..101.383 rows=50 loops=1)
> > ->  Seq Scan on overview  
> > (cost=0.00..3283.07rows=150807 width=108) (actual time=
> > 0.036..72.249 rows=80050 loops=1)
> >   ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
> > 0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
> > Index Cond: ((t_documentcontent._id)::text =
> > ("outer"._id)::text)
> > Total runtime: 106.323 ms
> >
> > I now need to see what trigers i need to add, and test the insertions.
> > Thanks again
> >
> >
> > On Jan 14, 2008 5:54 AM, Alex Turner < [EMAIL PROTECTED]> wrote:
> >
> > > Here is a table I threw together to demonstrate the approximate speed
> > > of a materialized view in this case:
> > >
> > > trend=# explain analyze select property_id from overview order by
> > > property_id limit 50 offset 5;
> > >  QUERY
> > > PLAN
> > >
> > > 
> > >  Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=
> > > 446.048..446.125 rows=50 loops=1)
> > >->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual
> > > time=384.788..424.433 rows=50050 loops=1)
> > >  Sort Key: property_id
> > >  ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094
> > > width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
> > >  Total runtime: 447.578 ms
> > > (5 rows)
> > >
> > > trend=# select count(*) from overview;
> > >  count
> > > 
> > >  173409
> > > (1 row)
> > >
> > > trend=#
> > >
> > > It's not great - but it's better than 47 seconds (The machine I'm
> > > running it on is far from big iron, so these results should be fairly
> > > typical for any modern x86 box - also this materialized view is almost
> > > certainly in RAM, and therefore IO speed is irrelevant).
> > >
> > > Tom lane has already suggested another approach, whereby you order
> > > your results, then select the next 10 from the set where the id is greater
> > > than the greatest of the last one:
> > >
> > > select id from overview order by id limit 50;
> > >
> > > x = get row['id']  // for row 1
> > > do something
> > > x=get row['id']  // for row 2
> > > do something
> > > ...
> > > x=get row['id']  // for row 50
> > >
> > > select id from overview where id>x order by id limit 50.
> > >
> > > The order by is relevant and infact imperative, because you must order
> > > your results somehow, otherwise your pagination will produce different
> > > results each time you try it as database updates will affect the order the
> > > rows come back by default without an order by clause.
> > >
> > > Let me say that again to be clear:  The order rows come back if you
> > > don't specify an order by can change!  so pulling rows without an order by
> > > is a REALLY bad idea.  This will break your pagination if a database 
> > > update
> > > happens between someone viewing a page and hitting next to view the next
> > > page.
> > >
> > > Alex
> > >
> > >
> > > On Jan 13, 2008 11:43 PM, Alex Turner <[EMAIL PROTECTED] > wrote:
> > >
> > > > If you have to access the data this way (with no where clause at all
> > > > - which sometimes you do) then I have already provided a solution that 
> > > > will
> > > > work reasonably well.  If you create what is essentially a materialized 
> > > > view
> > > > of just the id field, the sequence scan will return much fewer pages 
> > > > than
> > > > when you do it on the main table.  Then you join it to the indexed main
> > > > table, and page in just the rows you need.  Voila - much faster result. 
> > > >  Of
> > > > course we haven't really talked about how that will affect insert speed 
> > > > and
> > > > delete speed if you trigger then up, but you haven't really talked 
> > > > about any
> > > > requirements there.
> > > >
> > > > Alex
> > > >
> > > >
> > > > On Jan 13, 2008 11:27 PM, pepone

Re: [GENERAL] query performance

2008-01-13 Thread Alex Turner
If you combine it with Tom Lane's suggestion - it will go even better,
something like:

select * from t_documentcontent where _id in (select _id from overview where
_id>x order by _id limit 50);

Alex

On Jan 13, 2008 11:59 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:

>
> Thanks Alex
>
> I test your solution and is realy more faster.
>
> Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=
> 101.695..106.178 rows=50 loops=1)
>   ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual
> time=101.509..101.567 rows=50 loops=1)
> ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50
> width=108) (actual time=101.327..101.456 rows=50 loops=1)
>   ->  Limit  (cost=1741.60..1742.69 rows=50 width=108) (actual
> time=101.313..101.383 rows=50 loops=1)
> ->  Seq Scan on overview  (cost=0.00..3283.07rows=150807 
> width=108) (actual time=
> 0.036..72.249 rows=80050 loops=1)
>   ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
> 0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
> Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
> Total runtime: 106.323 ms
>
> I now need to see what trigers i need to add, and test the insertions.
> Thanks again
>
>
> On Jan 14, 2008 5:54 AM, Alex Turner <[EMAIL PROTECTED]> wrote:
>
> > Here is a table I threw together to demonstrate the approximate speed of
> > a materialized view in this case:
> >
> > trend=# explain analyze select property_id from overview order by
> > property_id limit 50 offset 5;
> >  QUERY
> > PLAN
> >
> > 
> >  Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=
> > 446.048..446.125 rows=50 loops=1)
> >->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=
> > 384.788..424.433 rows=50050 loops=1)
> >  Sort Key: property_id
> >  ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094
> > width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
> >  Total runtime: 447.578 ms
> > (5 rows)
> >
> > trend=# select count(*) from overview;
> >  count
> > 
> >  173409
> > (1 row)
> >
> > trend=#
> >
> > It's not great - but it's better than 47 seconds (The machine I'm
> > running it on is far from big iron, so these results should be fairly
> > typical for any modern x86 box - also this materialized view is almost
> > certainly in RAM, and therefore IO speed is irrelevant).
> >
> > Tom lane has already suggested another approach, whereby you order your
> > results, then select the next 10 from the set where the id is greater than
> > the greatest of the last one:
> >
> > select id from overview order by id limit 50;
> >
> > x = get row['id']  // for row 1
> > do something
> > x=get row['id']  // for row 2
> > do something
> > ...
> > x=get row['id']  // for row 50
> >
> > select id from overview where id>x order by id limit 50.
> >
> > The order by is relevant and infact imperative, because you must order
> > your results somehow, otherwise your pagination will produce different
> > results each time you try it as database updates will affect the order the
> > rows come back by default without an order by clause.
> >
> > Let me say that again to be clear:  The order rows come back if you
> > don't specify an order by can change!  so pulling rows without an order by
> > is a REALLY bad idea.  This will break your pagination if a database update
> > happens between someone viewing a page and hitting next to view the next
> > page.
> >
> > Alex
> >
> >
> > On Jan 13, 2008 11:43 PM, Alex Turner <[EMAIL PROTECTED] > wrote:
> >
> > > If you have to access the data this way (with no where clause at all -
> > > which sometimes you do) then I have already provided a solution that will
> > > work reasonably well.  If you create what is essentially a materialized 
> > > view
> > > of just the id field, the sequence scan will return much fewer pages than
> > > when you do it on the main table.  Then you join it to the indexed main
> > > table, and page in just the rows you need.  Voila - much faster result.  
> > > Of
> > > course we haven't really talked about how that will affect insert speed 
> > > and
> > > delete speed if you trigger then up, but you haven't really talked about 
> > > any
> > > requirements there.
> > >
> > > Alex
> > >
> > >
> > > On Jan 13, 2008 11:27 PM, pepone. onrez <[EMAIL PROTECTED] >
> > > wrote:
> > >
> > > > Sorry Alex i forget mention that i have setscan of in my last test.
> > > >
> > > > now I have set seqscan on  and indexscan on and added order by _id
> > > >
> > > > The table has an index in the _id field
> > > >
> > > > CREATE INDEX i_documentcontent_document
> > > >   ON t_documentcontent
> > > >   USING btree
> > > >   (_document);
> > > >
> > > > The database 

Re: [GENERAL] query performance

2008-01-13 Thread pepone . onrez
Thanks Alex

I test your solution and is realy more faster.

Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=
101.695..106.178 rows=50 loops=1)
  ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual time=
101.509..101.567 rows=50 loops=1)
->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50
width=108) (actual time=101.327..101.456 rows=50 loops=1)
  ->  Limit  (cost=1741.60..1742.69 rows=50 width=108) (actual
time=101.313..101.383 rows=50 loops=1)
->  Seq Scan on overview
(cost=0.00..3283.07rows=150807 width=108) (actual time=
0.036..72.249 rows=80050 loops=1)
  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
Total runtime: 106.323 ms

I now need to see what trigers i need to add, and test the insertions.
Thanks again

On Jan 14, 2008 5:54 AM, Alex Turner <[EMAIL PROTECTED]> wrote:

> Here is a table I threw together to demonstrate the approximate speed of a
> materialized view in this case:
>
> trend=# explain analyze select property_id from overview order by
> property_id limit 50 offset 5;
>  QUERY
> PLAN
>
> 
>  Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=
> 446.048..446.125 rows=50 loops=1)
>->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=
> 384.788..424.433 rows=50050 loops=1)
>  Sort Key: property_id
>  ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094
> width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
>  Total runtime: 447.578 ms
> (5 rows)
>
> trend=# select count(*) from overview;
>  count
> 
>  173409
> (1 row)
>
> trend=#
>
> It's not great - but it's better than 47 seconds (The machine I'm running
> it on is far from big iron, so these results should be fairly typical for
> any modern x86 box - also this materialized view is almost certainly in RAM,
> and therefore IO speed is irrelevant).
>
> Tom lane has already suggested another approach, whereby you order your
> results, then select the next 10 from the set where the id is greater than
> the greatest of the last one:
>
> select id from overview order by id limit 50;
>
> x = get row['id']  // for row 1
> do something
> x=get row['id']  // for row 2
> do something
> ...
> x=get row['id']  // for row 50
>
> select id from overview where id>x order by id limit 50.
>
> The order by is relevant and infact imperative, because you must order
> your results somehow, otherwise your pagination will produce different
> results each time you try it as database updates will affect the order the
> rows come back by default without an order by clause.
>
> Let me say that again to be clear:  The order rows come back if you don't
> specify an order by can change!  so pulling rows without an order by is a
> REALLY bad idea.  This will break your pagination if a database update
> happens between someone viewing a page and hitting next to view the next
> page.
>
> Alex
>
>
> On Jan 13, 2008 11:43 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
>
> > If you have to access the data this way (with no where clause at all -
> > which sometimes you do) then I have already provided a solution that will
> > work reasonably well.  If you create what is essentially a materialized view
> > of just the id field, the sequence scan will return much fewer pages than
> > when you do it on the main table.  Then you join it to the indexed main
> > table, and page in just the rows you need.  Voila - much faster result.  Of
> > course we haven't really talked about how that will affect insert speed and
> > delete speed if you trigger then up, but you haven't really talked about any
> > requirements there.
> >
> > Alex
> >
> >
> > On Jan 13, 2008 11:27 PM, pepone. onrez <[EMAIL PROTECTED] > wrote:
> >
> > > Sorry Alex i forget mention that i have setscan of in my last test.
> > >
> > > now I have set seqscan on  and indexscan on and added order by _id
> > >
> > > The table has an index in the _id field
> > >
> > > CREATE INDEX i_documentcontent_document
> > >   ON t_documentcontent
> > >   USING btree
> > >   (_document);
> > >
> > > The database was rencently vacum analyze , but not vacun full
> > >
> > > here is the explain of 2 diferent queries , when i put a large OFFSET
> > >
> > >  EXPLAIN ANALYZE SELECT
> > > t_documentcontent._id AS _id
> > > FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 5
> > >
> > > "Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=
> > > 41119.702..41119.792 rows=50 loops=1)"
> > > "  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual
> > > time=41064.802..41100.424 rows=50050 loops=1)"
> > > " 

Re: [GENERAL] query performance

2008-01-13 Thread Alex Turner
Here is a table I threw together to demonstrate the approximate speed of a
materialized view in this case:

trend=# explain analyze select property_id from overview order by
property_id limit 50 offset 5;
 QUERY
PLAN

 Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=
446.048..446.125 rows=50 loops=1)
   ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=
384.788..424.433 rows=50050 loops=1)
 Sort Key: property_id
 ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094 width=8)
(actual time=0.012..88.691 rows=173409 loops=1)
 Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
 count

 173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it
on is far from big iron, so these results should be fairly typical for any
modern x86 box - also this materialized view is almost certainly in RAM, and
therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your
results, then select the next 10 from the set where the id is greater than
the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id']  // for row 1
do something
x=get row['id']  // for row 2
do something
...
x=get row['id']  // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your
results somehow, otherwise your pagination will produce different results
each time you try it as database updates will affect the order the rows come
back by default without an order by clause.

Let me say that again to be clear:  The order rows come back if you don't
specify an order by can change!  so pulling rows without an order by is a
REALLY bad idea.  This will break your pagination if a database update
happens between someone viewing a page and hitting next to view the next
page.

Alex

On Jan 13, 2008 11:43 PM, Alex Turner <[EMAIL PROTECTED]> wrote:

> If you have to access the data this way (with no where clause at all -
> which sometimes you do) then I have already provided a solution that will
> work reasonably well.  If you create what is essentially a materialized view
> of just the id field, the sequence scan will return much fewer pages than
> when you do it on the main table.  Then you join it to the indexed main
> table, and page in just the rows you need.  Voila - much faster result.  Of
> course we haven't really talked about how that will affect insert speed and
> delete speed if you trigger then up, but you haven't really talked about any
> requirements there.
>
> Alex
>
>
> On Jan 13, 2008 11:27 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:
>
> > Sorry Alex i forget mention that i have setscan of in my last test.
> >
> > now I have set seqscan on  and indexscan on and added order by _id
> >
> > The table has an index in the _id field
> >
> > CREATE INDEX i_documentcontent_document
> >   ON t_documentcontent
> >   USING btree
> >   (_document);
> >
> > The database was rencently vacum analyze , but not vacun full
> >
> > here is the explain of 2 diferent queries , when i put a large OFFSET
> >
> >  EXPLAIN ANALYZE SELECT
> > t_documentcontent._id AS _id
> > FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 5
> >
> > "Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=
> > 41119.702..41119.792 rows=50 loops=1)"
> > "  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual
> > time=41064.802..41100.424 rows=50050 loops=1)"
> > "Sort Key: _id"
> > "->  Seq Scan on t_documentcontent  (cost= 
> > 0.00..110772.07rows=150807 width=58) (actual time=
> > 106.679..33267.194 rows=150807 loops=1)"
> > "Total runtime: 41120.015 ms"
> >
> >  EXPLAIN ANALYZE SELECT
> > t_documentcontent._id AS _id
> > FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 1
> >
> > "Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=
> > 1172.969..1194.228 rows=50 loops=1)"
> > "  ->  Index Scan using i_documentcontent_id on t_documentcontent
> > (cost=0.00..600805.54 rows=150807 width=58) (actual time=
> > 0.077..1189.688 rows=10050 loops=1)"
> > "Total runtime: 1194.316 ms"
> >
> > Tom
> >  i using uuid for the _id field that is the primary key  add a WHERE id
> > > ?  don 't apply
> > the cursor aproach is also not suitable for same of my queries
> >
> > I use this query for paginate contents of a filesysstem with lots of
> > documents avoid offset is not posible always
>
>
>


Re: [GENERAL] query performance

2008-01-13 Thread Alex Turner
If you have to access the data this way (with no where clause at all - which
sometimes you do) then I have already provided a solution that will work
reasonably well.  If you create what is essentially a materialized view of
just the id field, the sequence scan will return much fewer pages than when
you do it on the main table.  Then you join it to the indexed main table,
and page in just the rows you need.  Voila - much faster result.  Of course
we haven't really talked about how that will affect insert speed and delete
speed if you trigger then up, but you haven't really talked about any
requirements there.

Alex

On Jan 13, 2008 11:27 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:

> Sorry Alex i forget mention that i have setscan of in my last test.
>
> now I have set seqscan on  and indexscan on and added order by _id
>
> The table has an index in the _id field
>
> CREATE INDEX i_documentcontent_document
>   ON t_documentcontent
>   USING btree
>   (_document);
>
> The database was rencently vacum analyze , but not vacun full
>
> here is the explain of 2 diferent queries , when i put a large OFFSET
>
>  EXPLAIN ANALYZE SELECT
> t_documentcontent._id AS _id
> FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 5
>
> "Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=
> 41119.702..41119.792 rows=50 loops=1)"
> "  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual
> time=41064.802..41100.424 rows=50050 loops=1)"
> "Sort Key: _id"
> "->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07rows=150807 
> width=58) (actual time=
> 106.679..33267.194 rows=150807 loops=1)"
> "Total runtime: 41120.015 ms"
>
>  EXPLAIN ANALYZE SELECT
> t_documentcontent._id AS _id
> FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 1
>
> "Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=
> 1172.969..1194.228 rows=50 loops=1)"
> "  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
> 0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688rows=10050 
> loops=1)"
> "Total runtime: 1194.316 ms"
>
> Tom
>  i using uuid for the _id field that is the primary key  add a WHERE id >
> ?  don 't apply
> the cursor aproach is also not suitable for same of my queries
>
> I use this query for paginate contents of a filesysstem with lots of
> documents avoid offset is not posible always


Re: [GENERAL] query performance

2008-01-13 Thread pepone . onrez
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id

The table has an index in the _id field

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT
t_documentcontent._id AS _id
FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 5

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=
41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=
41064.802..41100.424 rows=50050 loops=1)"
"Sort Key: _id"
"->  Seq Scan on t_documentcontent  (cost=
0.00..110772.07rows=150807 width=58) (actual time=
106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT
t_documentcontent._id AS _id
FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 1

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=
1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=
0.00..600805.54 rows=150807 width=58) (actual time=
0.077..1189.688rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?
don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of
documents avoid offset is not posible always


Re: [GENERAL] query performance

2008-01-13 Thread Tom Lane
"Scott Marlowe" <[EMAIL PROTECTED]> writes:
> On Jan 13, 2008 8:58 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:
>> t_documentcontent._id AS _id
>> FROM t_documentcontent LIMIT 50 OFFSET 8

> with no order by, and possibly no index on t_documentcontent._id,
> there's no choice but a seq scan.

More to the point: a query with a huge OFFSET is *always* going to suck,
because there is no choice but to read through all those records before
getting to the ones you want.  You need to fundamentally rethink how you
are going about this.  I'm assuming that this is actually just one query
in a series that are intended to eventually fetch the whole table.

One solution is to set up a cursor and FETCH 50 rows at a time from it.
However that requires holding a transaction open, which might not work
well in your environment.

Another possibility, if you have a primary key on the table, is to do
something like

SELECT ... FROM ... WHERE id > ? ORDER BY id LIMIT 50

where you leave out the WHERE clause on the first call, and on
subsequent calls '?' is the last id value seen in the prior call.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] query performance

2008-01-13 Thread Scott Marlowe
On Jan 13, 2008 8:58 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:
> I have this query in a table with 150 thowsand tuples and it takes to long
>
> t_documentcontent._id AS _id
> FROM t_documentcontent LIMIT 50 OFFSET 8
>
> here is the explain output
>
> "Limit  (cost= 100058762.30..100058799.02 rows=50 width=58) (actual
> time=19433.474..19433.680 rows=50 loops=1)"
> "  ->  Seq Scan on t_documentcontent  (cost=1.00..100110772.07
> rows=150807 width=58) (actual time=53.934..19402.030 rows=80050 loops=1)"
> "Total runtime: 19433.748 ms"

looks like you've set enable_seqscan=off.  When looking for help on
queries it's a good idea to mention such things...

with no order by, and possibly no index on t_documentcontent._id,
there's no choice but a seq scan.

try adding both.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] query performance

2008-01-13 Thread Alex Turner
If you haven't already, make sure you've done a vacuum full recently.  When
in doubt, pg_dump the db, and reload it, and see if that helps, but this
works for me:

create table overview as select _id from t_documentcontent;
alter table overview add constraint overview_pkey primary key (_id);

select * from t_documentcontent where _id in (select _id
FROM overview LIMIT 50 OFFSET 8);

create a trigger on insert/delete/update to keep this table in sync, or if
you don't need to then just re-run the create every so often depending on
your needs (I'll be happy to demonstrate the required triggers if you need
it).

make sure that you have adequate RAM available for file cache, hitting the
disk everytime you query will suck no matter what you do.

Alex

On Jan 13, 2008 9:58 PM, pepone. onrez <[EMAIL PROTECTED]> wrote:

> I have this query in a table with 150 thowsand tuples and it takes to long
>
> t_documentcontent._id AS _id
> FROM t_documentcontent LIMIT 50 OFFSET 8
>
> here is the explain output
>
> "Limit  (cost= 100058762.30..100058799.02 rows=50 width=58) (actual time=
> 19433.474..19433.680 rows=50 loops=1)"
> "  ->  Seq Scan on t_documentcontent  
> (cost=1.00..100110772.07rows=150807 width=58) (actual time=
> 53.934..19402.030 rows=80050 loops=1)"
> "Total runtime: 19433.748 ms"
>
> here the table structure
>
> CREATE TABLE t_documentcontent(
> _id varchar(60) NOT NULL,
> _filesystem varchar(60) NOT NULL,
> _parent varchar(60) NOT NULL,
> _document varchar(60) NOT NULL,
> _title varchar NOT NULL,
> _resume varchar,
> _content varchar,
> _lang integer NOT NULL,
> _creationdate timestamp NOT NULL DEFAULT now(),
> _updatedate timestamp NOT NULL DEFAULT now(),
> _indexeddate timestamp NOT NULL DEFAULT now(),
> CONSTRAINT documentcontent_pkey PRIMARY KEY (_id),
> CONSTRAINT documentcontent_filesystem_fkey
> FOREIGN KEY (_filesystem) REFERENCES t_filesystem(_id) ON UPDATE
> CASCADE ON DELETE NO ACTION,
> CONSTRAINT documentcontent_parent_fkey FOREIGN KEY (_parent)
> REFERENCES t_node(_id) ON DELETE NO ACTION,
> CONSTRAINT documentcontent_document_fkey
> FOREIGN KEY (_document) REFERENCES t_node(_id) ON UPDATE CASCADE
> ON DELETE NO ACTION,
> CONSTRAINT documentcontent_lang_lang_fkey FOREIGN KEY (_lang)
> REFERENCES t_lang(_id)) WITHOUT OIDS;
>
> Any ideas for improve this query performance.
>


Re: [GENERAL] Query Performance Test

2007-11-18 Thread Scott Ribe
> So, how can I do to execute it as if it was the first
> time again?

Reboot. 

As Lew pointed out, that might not actually be a good idea, because caching
means that most queries will most of the time not run with that "first time"
performance.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Query Performance Test

2007-11-17 Thread Lew

dcrespo wrote:

I have a query that I want to test twice to see its performance with
and without another program running (anti-virus program). I know that
if you run the same query for the second time, it will be faster than
the first time. So, how can I do to execute it as if it was the first
time again?


In the first place, you cannot make a valid trend line with one data point.

Don't compare the first run.  Compare the nth runs in both cases, where n > 
some positive integer.


To get statistical confidence, you should measure k runs starting at run n.

Example: Profile runs [4, 13].  n = 4, k = 10.

(Statistics trick: acquire (k+2) samples for each scenario, throw away the 
largest and smallest measurements, analyze the remaining k values.)


PG has optimized the query in both scenarios, so you are rendering the PG 
aspect constant.  Report both the mean and the standard deviation of your k 
runs.  Compare with and without anti-virus running.


How much control do you have over the test machine's operating profile, i.e., 
what services and background tasks are running, network I/O, other factors 
that can influence timing?


--
Lew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Query performance strangeness..

2007-07-19 Thread Richard Huxton

Steve Spicklemire wrote:


Here is the function body...  the data is stored in and XML "pickle". I 
had hoped that it would only be called in building the index.


Since the query uses it in the 'filter' step.. I'm not sure if it's 
using the index or not.


Does marking the function immutable help the planner know whether it can 
use the index or not?


Well, since you've got an index using it, you *must* have already marked 
it immutable. Presumably it's not genuinely immutable though.



CCOC=# \df+ get_cem_for_directBurial
  List of functions
 Schema |   Name   | Result data type | Argument data 
types |  Owner  | Language | Source code | Description
+--+--+-+-+--+-+- 

 public | get_cem_for_directburial | text | character 
varying   | webuser | plpgsql  |

DECLARE
personID ALIAS for $1;
qResult RECORD;

BEGIN
SELECT INTO qResult 
get_xml_value('/params/param/value/struct/member/*[contains(text(),''cemid'')]/parent::*/value/string/text()','People',personID,'') 
as cem;

return qResult.cem;
END;


That might be stable, but I don't see how it could be immutable unless 
the xml is in your "people" table.



CCOC=# \d people
 Table "public.people"
   Column   |Type 
| Modifiers
+-+ 

 personid   | character varying(40)   | not null default 
('AUTO'::text || (nextval(('People_seq'::text)::regclass))::text)

 modified   | timestamp without time zone | default now()
 created| timestamp without time zone | default now()
 enabled| boolean |
 first  | character varying(40)   |
 middle | character varying(15)   |
 last   | character varying(80)   |
 gender | character varying(2)|
 sbirthdate | character varying(30)   |
 sdeathdate | character varying(30)   |
 status | character varying(30)   |
Indexes:
"people_pkey" PRIMARY KEY, btree (personid)
"idx_people_cemid" btree (get_cem_for_directburial(personid))
"idx_people_lower_concat3_last" btree (lower_concat3("last", 
"first", (middle::text || personid::text)::character varying))
"idx_people_servicenum" btree 
(get_numeric_servicenumber_for_personid(personid))

"idx_people_status" btree (status)
"idx_people_take4_personid_" btree (take4(personid))


 Filter: (('STJ'::text = 
get_cem_for_directburial(personid)) AND ((status)::text <> 'R'::text) 
AND ((status)::text <> 'F'::text))


 Filter: (('HCC'::text = 
get_cem_for_directburial(personid)) AND ((status)::text <> 'R'::text) 
AND ((status)::text <> 'F'::text))


If the query is actually calling get_cem_for_directburial during the 
query... then I'm sunk. I'll have to move that data to a regular indexed 
field. I can do that... it's just a lot of work. ;-(


Where you see it calling "Filter" it's evaluating the function I'm 
afraid. It's possible for the executor to call the function more than 
once too.


You've got so much data hidden behind functions, it's unlikely you'll 
get the planner to make any informed decisions as to the quickest plan.


You're testing for inequality on status, so unless <> F / R is uncommon 
a partial query on that probably won't help much either.


Just to be sure, cheat. Run get_cem_for_directburial() over the whole 
people table and dump the results into a cache table. Then, try the same 
query with a function that just does a table lookup - see if that makes 
things faster.


Then I'd just write a couple of trigger functions to keep the cache 
table up-to-date and join against it. That will let the planner see 
common values and make better predictions for its plans.



If you (or anyone else) would like to explore functional indexes and 
calling, I've attached a small demo script.


--
  Richard Huxton
  Archonet Ltd
BEGIN;

CREATE TABLE fit (
	a SERIAL,
	b int4,
	PRIMARY KEY (a)
);

CREATE SEQUENCE dummy_seq;

CREATE FUNCTION fit_fn(id int4) RETURNS int4 AS $$
BEGIN
	PERFORM nextval('dummy_seq');
	RETURN (id % 100);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE INDEX fit_fn_idx ON fit ( fit_fn(a) );

INSERT INTO fit SELECT generate_series(1,1000), round(random() * 100);

COMMIT;

VACUUM ANALYSE fit;

SELECT nextval('dummy_seq');
EXPLAIN ANALYSE SELECT a,b FROM fit WHERE fit_fn(a) = 7;
SELECT nextval('dummy_seq');
EXPLAIN ANALYSE SELECT a,b FROM fit WHERE fit_fn(a) = b;
SELECT nextval('dummy_seq');


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Query performance strangeness..

2007-07-19 Thread Steve Spicklemire

Hi Richard,

On Jul 19, 2007, at 12:49 AM, Richard Huxton wrote:


Steve Spicklemire wrote:
I also have a function "get_cem_for_directBurial(personid)" that  
is expensive to call, but it's also indexed, so I hoped that the  
index would normally be used (essentially as a cache). It returns  
a 'cemetery code' so I can search for folks buried in a particular  
cemetery. (The cemetery code was added to a different table after  
the 'people' table was more or less frozen.. I'd like to keep it  
that way if possible.)


How is this function defined? Is it marked "Immutable" or similar?  
The body might be interesting too.


Here is the function body...  the data is stored in and XML "pickle".  
I had hoped that it would only be called in building the index.


Since the query uses it in the 'filter' step.. I'm not sure if it's  
using the index or not.


Does marking the function immutable help the planner know whether it  
can use the index or not?


CCOC=# \df+ get_cem_for_directBurial
  List of functions
 Schema |   Name   | Result data type | Argument  
data types |  Owner  | Language | Source code | Description
+--+-- 
+-+-+--+-+-
 public | get_cem_for_directburial | text | character  
varying   | webuser | plpgsql  |

DECLARE
personID ALIAS for $1;
qResult RECORD;

BEGIN
SELECT INTO qResult get_xml_value('/params/param/value/ 
struct/member/*[contains(text(),''cemid'')]/parent::*/value/string/ 
text()','People',personID,'') as cem;

return qResult.cem;
END;
|
(1 row)




Sometimes I need to search for rows from the view that satisfy  
certain criteria, sorted in some specific order. Here's where the  
trouble starts. In the view I compute something I call 'lc3key',  
defined as: lower_concat3(p."last", p."first", (p.middle::text ||  
p.personid::text)::character varying) where 'lower_concat3' just  
returns a lower case version of three strings all concatenated  
together. The string is basically lastname, firstname, middle and  
personid (to guarantee uniqueness). It seems like most of the time  
sorting by last, first, middle should be the same as sorting by  
lc3key (all of these things are indexed BTW). So here goes:


Definitions for the three tables and their indexes would be nice to  
check against too.


CCOC=# \d people
 Table  
"public.people"
   Column   |Type  
| Modifiers
+- 
+--- 
-
 personid   | character varying(40)   | not null default  
('AUTO'::text || (nextval(('People_seq'::text)::regclass))::text)

 modified   | timestamp without time zone | default now()
 created| timestamp without time zone | default now()
 enabled| boolean |
 first  | character varying(40)   |
 middle | character varying(15)   |
 last   | character varying(80)   |
 gender | character varying(2)|
 sbirthdate | character varying(30)   |
 sdeathdate | character varying(30)   |
 status | character varying(30)   |
Indexes:
"people_pkey" PRIMARY KEY, btree (personid)
"idx_people_cemid" btree (get_cem_for_directburial(personid))
"idx_people_lower_concat3_last" btree (lower_concat3("last",  
"first", (middle::text || personid::text)::character varying))
"idx_people_servicenum" btree  
(get_numeric_servicenumber_for_personid(personid))

"idx_people_status" btree (status)
"idx_people_take4_personid_" btree (take4(personid))

CCOC=# \d intermentspacepersons
   Table  
"public.intermentspacepersons"
 Column |Type  
| Modifiers
+- 
+--- 

 intermentspacepersonid | character varying(40)   | not null  
default ('AUTO'::text || (nextval 
(('IntermentSpacePersons_seq'::text)::regclass))::text)

 modified   | timestamp without time zone | default now()
 created| timestamp without time zone | default now()
 enabled| boolean |
 objectid   | character varying(30)   |
 personid   | character varying(30)   |
 roleid | character varying(30)   |
Indexes:
"idx_intermentspacepersons_obje" btree (objectid)
"idx_intermentspacepersons_pers" btree (personid)


CCOC=# \d intermentspaceroles
  Table  
"public

Re: [GENERAL] Query performance strangeness..

2007-07-18 Thread Richard Huxton

Steve Spicklemire wrote:
I also have a function "get_cem_for_directBurial(personid)" that is 
expensive to call, but it's also indexed, so I hoped that the index 
would normally be used (essentially as a cache). It returns a 'cemetery 
code' so I can search for folks buried in a particular cemetery. (The 
cemetery code was added to a different table after the 'people' table 
was more or less frozen.. I'd like to keep it that way if possible.)


How is this function defined? Is it marked "Immutable" or similar? The 
body might be interesting too.


Sometimes I need to search for rows from the view that satisfy certain 
criteria, sorted in some specific order. Here's where the trouble 
starts. In the view I compute something I call 'lc3key', defined as: 
lower_concat3(p."last", p."first", (p.middle::text || 
p.personid::text)::character varying) where 'lower_concat3' just returns 
a lower case version of three strings all concatenated together. The 
string is basically lastname, firstname, middle and personid (to 
guarantee uniqueness). It seems like most of the time sorting by last, 
first, middle should be the same as sorting by lc3key (all of these 
things are indexed BTW). So here goes:


Definitions for the three tables and their indexes would be nice to 
check against too.


   ->  Index Scan using idx_people_lower_concat3_last on 
people p  (cost=0.00..130784.91 rows=43872 width=40) (actual 
time=0.366..47.016 rows=171 loops=1)
 Index Cond: (lower_concat3("last", "first", 
(((middle)::text || (personid)::text))::character varying) >= 
'jonesA'::text)
 Filter: (('STJ'::text = 
get_cem_for_directburial(personid)) AND ((status)::text <> 'R'::text) 
AND ((status)::text <> 'F'::text))


OK.. not too bad. If I do the same query... but ask for 'HCC' rather 
than 'STJ', just a different cemetery code, I get 91 seconds... about 
1000 times longer!


   ->  Index Scan using idx_people_lower_concat3_last on 
people p  (cost=0.00..130784.91 rows=759 width=40) (actual 
time=8.722..91396.606 rows=256 loops=1)
 Index Cond: (lower_concat3("last", "first", 
(((middle)::text || (personid)::text))::character varying) >= 
'jonesA'::text)
 Filter: (('HCC'::text = 
get_cem_for_directburial(personid)) AND ((status)::text <> 'R'::text) 
AND ((status)::text <> 'F'::text))


In this case, look at the actual times. There are two possibilities:
1. The first query had its data/indexes in cache whereas the second 
didn't. Run each three times in a row and see if the times stay roughly 
constant.


2. Calls to get_cem_for_directburial() can vary widely in their 
execution time.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Query performance inconsistant.

2006-09-08 Thread Tom Lane
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> Here is the proc that has very inconsistent (anywhere from 25ms to
> 8000ms) performance:
> ...
> This setup, with concurrency, is returning very inconsistent query
> performance.  Sometimes its very fast, other times it's slow and waits.
>  This makes me think I have a table locking issue, but I'm not sure
> since pg_locks rarely reports more than this:

Yeah, I suspect a locking issue too, but you won't find out what it is
unless you can capture the content of pg_locks at a time when the proc
is being delayed.

You could try making the proc do
LOCK TABLE ... IN ROW EXCLUSIVE MODE NOWAIT
inside an exception block, catching the error, and logging the contents
of pg_locks to someplace.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Query performance inconsistant.

2006-09-07 Thread Matthew Schumacher
Tom Lane wrote:
> Jeff Davis <[EMAIL PROTECTED]> writes:
>> From what you described, I would not expect many locking problems. Are
>> there any other types of queries you run that may cause a lock?
> 
> Row locks (SELECT FOR UPDATE/SHARE) are a possible problem, particularly
> if this is a pre-8.1 Postgres where exclusive row locks were used for
> foreign key constraints.
> 
>   regards, tom lane

Tom,

I'm still having issues with this so lemme provide more information
perhaps there is something obvious

Here is the proc that has very inconsistent (anywhere from 25ms to
8000ms) performance:

CREATE FUNCTION acctmessage(_accttype character varying, _username
character varying, _ipaddress character varying, _nastimestamp
 timestamp with time zone, _sessionid character varying, _nassessionid
character varying, _nasipaddress character varying, _input
octets bigint, _outputoctets bigint, _inputgigawords integer,
_outputgigawords integer, _sessionlength bigint, _termcause charact
er varying, _nasidentifier character varying, _clientipaddress character
varying, _nasport character varying, _framedprotocol cha
racter varying, _servicetype character varying, _connectinfo character
varying) RETURNS void
AS $$
DECLARE
session_rec RECORD;
BEGIN

IF _clientipaddress <> '127.0.0.1' THEN

  INSERT into accounting_tab (
acctType,
userName,
ipAddress,
nasTimestamp,
sessionId,
nasSessionId,
nasIpAddress,
inputOctets,
outputOctets,
inputGigaWords,
outputGigaWords,
sessionLength,
termCause,
nasIdentifier,
clientIpAddress,
nasPort,
framedProtocol,
serviceType,
connectInfo
  ) values (
_acctType,
_userName,
_ipAddress,
_nasTimestamp,
_sessionId,
_nasSessionId,
_nasIpAddress,
_inputOctets,
_outputOctets,
_inputGigaWords,
_outputGigaWords,
_sessionLength,
_termCause,
_nasIdentifier,
   _clientIpAddress,
_nasPort,
_framedProtocol,
_serviceType,
_connectInfo
  );

END IF;

SELECT INTO session_rec sessionId FROM radutmp_tab WHERE sessionId =
_sessionId;

IF session_rec.sessionId IS NULL AND _acctType = 'start' THEN

  INSERT into radutmp_tab (
lastAcctType,
userName,
ipAddress,
nasStartTimestamp,
sessionId,
nasSessionId,
nasIpAddress,
inputOctets,
outputOctets,
inputGigaWords,
outputGigaWords,
sessionLength,
termCause,
nasIdentifier,
clientIpAddress,
nasPort,
framedProtocol,
serviceType
  ) values (
_acctType,
_userName,
_ipAddress,
_nasTimestamp,
_sessionId,
_nasSessionId,
_nasIpAddress,
_inputOctets,
_outputOctets,
_inputGigaWords,
_outputGigaWords,
_sessionLength,
_termCause,
_nasIdentifier,
_clientIpAddress,
_nasPort,
_framedProtocol,
_serviceType
  ) ;

ELSIF session_rec.sessionId IS NOT NULL AND _acctType = 'stop' THEN

UPDATE
  radutmp_tab
SET
  lastAcctType = _acctType,
  nasStopTimestamp = _nasTimestamp,
  ipAddress = _ipAddress,
  sessionlength = _sessionlength,
  inputOctets = _inputOctets,
  outputOctets = _outputOctets,
  inputgigawords = _inputgigawords,
  outputgigawords = _outputgigawords,
  nasSessionId = _nasSessionId,
  nasIPAddress = _nasIPAddress,
  clientIPAddress = _clientIPAddress,
  nasPort = _nasPort,
  framedProtocol = _framedProtocol,
  termCause = _termCause
WHERE
  sessionId = _sessionId
AND
  userName = _userName
AND
  serviceType = _serviceType;

END IF;

END;
$$
LANGUAGE plpgsql;

It looks long, but it's really pretty simple, it inserts data into the
accounting_tab and then updates or inserts into the radutmp_tab table
based on whether the session ID is known or not.

Here are the tables:

   Table "public.accounting_tab"
 Column  |   Type   |   Modifiers
-+--+---
 sessionid   | character varying(32)| not null
 nassessionid| character varying(32)| not null
 accttype| character varying(6) | not null
 username| character varying(20)| not null
 nastimestamp| timestamp with time zone |
 nasipaddress| character varying(15)| not null
 nasidentifier   | character varying(15)|
 clientipaddress | character varying(15)| not null
 servicetype | character varying(6) | not null
 sessionlength   | bigint   | default 0
 inputoctets | bigint   | default 0
 outputoctets| bigint   | default 0
 inputgigawords  | integer  | default 0
 outputgigawords | integer  | default 0
 nasport | character varying(32)|
 ipaddress   | character varying(32)|
 framedprotocol  | character varying(32)|
 termcause   | character varying(32)|
 timestamp   | timestamp with time zone | default now()
 connectinfo | character varying(100)  

Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Tom Lane
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> I have "autovacuum = on" in the config file with a pretty frequent
> autovacuum_naptime, but I'm unsure if that does a vacuum or vacuum full.

autovacuum *never* does a vacuum full, because that would lead to
unexpected blockages of foreground queries.  Still though, autovac could
be contributing to the problem indirectly.  I'm assuming that most of
your transactions on the problem table are short.  It's possible that
one or more clients are grabbing quasi-exclusive table locks, and
normally you don't notice because they are able to get the lock quickly,
do their work, and get out.  But if autovac is working on the table then
the requestor of the exclusive lock blocks ... and everyone else queues
up behind him, until the vacuum command finishes with the table.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> From what you described, I would not expect many locking problems. Are
> there any other types of queries you run that may cause a lock?

Row locks (SELECT FOR UPDATE/SHARE) are a possible problem, particularly
if this is a pre-8.1 Postgres where exclusive row locks were used for
foreign key constraints.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Matthew Schumacher
Jeff Davis wrote:
> http://www.postgresql.org/docs/8.1/static/mvcc.html
> 
> In the "Explicit Locking" section it details the locks acquired by
> UPDATE, etc.
> 
>>From what you described, I would not expect many locking problems. Are
> there any other types of queries you run that may cause a lock? Do you
> run periodic "VACUUM FULL" or something? "VACUUM FULL" causes a full
> table lock, and is usually not necessary. If so, try running just
> "VACUUM" without "FULL".
> 
> Regards,
>   Jeff Davis
> 

Jeff,

I have "autovacuum = on" in the config file with a pretty frequent
autovacuum_naptime, but I'm unsure if that does a vacuum or vacuum full.

schu


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Jeff Davis
On Thu, 2006-08-31 at 11:04 -0800, Matthew Schumacher wrote:
> Tom Lane wrote:
> > Matthew Schumacher <[EMAIL PROTECTED]> writes:
> >> I have been having performance problems with my DB so this morning I
> >> added some config to log queries that take more than 250ms.  The result
> >> is surprising because some queries will take as long as 10 seconds, but
> >> then you do a explain analyze on them they show that indexes are being
> >> used and they run very fast.
> > 
> > Is it possible that it's not directly that query's fault?  For instance
> > it could be blocked by a lock held by some other transaction.  I can't
> > unfortunately think of any very nice way to deduce this from log entries
> > ... you'd have to catch it in the act and look into pg_locks to find out
> > who's the perpetrator.
> > 
> > regards, tom lane
> 
> This does help me try to figure out where the problem is.  The proc in
> question inserts in a very large table, and updates another large table.
>  Since postgres puts each proc in it's own transaction I'm thinking the
> problem may be the database locking these large tables while this proc
> is called concurrently.
> 
> In order to understand this better I need to know how postgres locking
> works and when locks are used.  Do you know of any documentation that I
> can read that explains this?

http://www.postgresql.org/docs/8.1/static/mvcc.html

In the "Explicit Locking" section it details the locks acquired by
UPDATE, etc.

>From what you described, I would not expect many locking problems. Are
there any other types of queries you run that may cause a lock? Do you
run periodic "VACUUM FULL" or something? "VACUUM FULL" causes a full
table lock, and is usually not necessary. If so, try running just
"VACUUM" without "FULL".

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Matthew Schumacher
Tom Lane wrote:
> Matthew Schumacher <[EMAIL PROTECTED]> writes:
>> I have been having performance problems with my DB so this morning I
>> added some config to log queries that take more than 250ms.  The result
>> is surprising because some queries will take as long as 10 seconds, but
>> then you do a explain analyze on them they show that indexes are being
>> used and they run very fast.
> 
> Is it possible that it's not directly that query's fault?  For instance
> it could be blocked by a lock held by some other transaction.  I can't
> unfortunately think of any very nice way to deduce this from log entries
> ... you'd have to catch it in the act and look into pg_locks to find out
> who's the perpetrator.
> 
>   regards, tom lane

This does help me try to figure out where the problem is.  The proc in
question inserts in a very large table, and updates another large table.
 Since postgres puts each proc in it's own transaction I'm thinking the
problem may be the database locking these large tables while this proc
is called concurrently.

In order to understand this better I need to know how postgres locking
works and when locks are used.  Do you know of any documentation that I
can read that explains this?

Thanks,
schu

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Query performance inconsistant.

2006-08-31 Thread Tom Lane
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> I have been having performance problems with my DB so this morning I
> added some config to log queries that take more than 250ms.  The result
> is surprising because some queries will take as long as 10 seconds, but
> then you do a explain analyze on them they show that indexes are being
> used and they run very fast.

Is it possible that it's not directly that query's fault?  For instance
it could be blocked by a lock held by some other transaction.  I can't
unfortunately think of any very nice way to deduce this from log entries
... you'd have to catch it in the act and look into pg_locks to find out
who's the perpetrator.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Query performance

2006-08-03 Thread Chris Mair

> i have a table with around 57 million tuples, with the following columns: 
> pid(varchar), crit(varchar), val1(varchar), val2(varchar). Example:
> pidcritval1val2
> p1  c1  xy
> p1  c2  xz
> p1  c3  yx
> ...
> What i am doing is to query all val1 and val2 for one pid and all crit values:
> 
> select val1, val2, crit from mytable where pid='somepid' and crit in(select 
> crit from myCritTable);
> where myCritTable is a table that contains all crit values (around 42.000) 
> ordered by their insertion date.

In case myCritTable doesn't change a lot and this select by contrast is
executed a lot, have you considered precomputing whether a record from
your big table has a crit value from myCritTable?

Of course this info would be invalidated each time myCritTable is
updated, so you would trade fast selects on the big table vs. slow
updates on myCritTable. Don't know wether that makes sence for you...

Bye, Chris.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Query performance

2006-08-03 Thread Christian Rengstl
Hi,

here is the definition of the master table which is inherited by around 30 
tables based on the value of chr:
CREATE TABLE snp_master
(
  entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq'::regclass),
  pid varchar(15) NOT NULL,
  snp_id varchar(13) NOT NULL, 
  val1 varchar(1),
  val2 varchar(1),
  chr int2 NOT NULL,
  aendat text,
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
  CONSTRAINT "UNIQUE_SNP_ALLEL_MASTER" UNIQUE (pid, entry_no, snp_id)
) 
WITHOUT OIDS;

The thing is that i load the data from txt files which themselves have more or 
less the same structure. So for every pid of 1500 there are up to 42000 
different snp_id values and for each of this combinations there are different 
val1 and val2 entries (all together this accounts for up to around 58 million 
tuples in the biggest table). MyCritTable then just contains the distinct 
snp_ids so that at least this query does not take very long any more.
CREATE TABLE snps_master
(
  snp_id varchar(13) NOT NULL,
  chr int2,
  aendat timestamp,
  CONSTRAINT pk_snp_master PRIMARY KEY (snp_id)
) 
WITHOUT OIDS;

Up to now there are no foreign key constraints, as once the data is loaded into 
the db there will be no update or delete operations at all. I only have to 
export the data to different file formats for which i need the query posted 
originally.

"Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 11:36 am:
> Hi,
> 
> maybe you should overthink your db-design, but thats a bit premature 
> whithout your complete 
> table-definitions(including table-names, datatypes, indexes, 
> foreign-key constraints,etc.)
> 
> If your are using pgadmin3 just cut'n paste the content of the window 
> on the bottom left for 
> the corresponding tables.
> 
> If you're using psql try \d yur-table-name.
> 
> Best regards
> 
> Hakan Kocaman
> Software-Development
> 
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
> 
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: [EMAIL PROTECTED] 
> 
> 
> 
>> -Original Message-
>> From: Christian Rengstl 
>> [mailto:[EMAIL PROTECTED] 
>> Sent: Thursday, August 03, 2006 11:18 AM
>> To: Richard Huxton; Hakan Kocaman
>> Cc: pgsql-general@postgresql.org 
>> Subject: Re: [GENERAL] Query performance
>> 
>> 
>> Hi,
>> 
>> i would rather compare int4 too, but the snp_id can be 
>> something like "abc123" unfortunately.
>> 
>> "Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 11:08 am:
>> > Hi,
>> > 
>> > 
>> >> -Original Message-
>> >> From: Richard Huxton [mailto:[EMAIL PROTECTED] 
>> >> Sent: Thursday, August 03, 2006 11:00 AM
>> >> To: Christian Rengstl
>> >> Cc: Hakan Kocaman; pgsql-general@postgresql.org 
>> >> Subject: Re: [GENERAL] Query performance
>> >> 
>> >> 
>> >> Christian Rengstl wrote:
>> >> > Hi,
>> >> > 
>> >> > the complete query is the one i posted, but here comes the 
>> >> schema for mytable:
>> >> > entry_no int8 NOT NULL DEFAULT 
>> nextval('entry_no_seq''::regclass),
>> >> >   pid varchar(15) NOT NULL, 
>> >> >   crit varchar(13) NOT NULL,
>> >> >   val1 varchar(1),
>> >> >   val2 varchar(1),
>> >> >   aendat text,
>> >> >   aennam varchar(8),
>> >> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>> >> > 
>> >> > myCritTable:
>> >> >   crit varchar(13) NOT NULL,
>> >> >   chr int2,
>> >> >   aendat timestamp,
>> >> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>> >> 
>> >> Still doesn't match the EXPLAIN output - where's snp_id? 
>> >> Where's table 
>> >> test2?
>> >> 
>> > 
>> > Yep, that bothered me too.
>> > 
>> >> > My server is 8.1.4. As a matter of fact, i have no idea 
>> >> where the text 
>> >>  > type comes from, because as you can see from above 
>> there are only
>> >>  > varchar with maximum 15 characters.
>> >> 
>> >> PG is casting it to text. There's no real difference between 
>> >> the types 
>> >> (other than the size limit) and it's not expensive.
>> > 
>> > But wouldn't a comparison between int4 be much cheaper.
>> > If i see smth like "id" (here snp_id) in

Re: [GENERAL] Query performance

2006-08-03 Thread Hakan Kocaman
Hi,

maybe you should overthink your db-design, but thats a bit premature whithout 
your complete 
table-definitions(including table-names, datatypes, indexes, foreign-key 
constraints,etc.)

If your are using pgadmin3 just cut'n paste the content of the window on the 
bottom left for 
the corresponding tables.

If you're using psql try \d yur-table-name.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: [EMAIL PROTECTED]



> -Original Message-
> From: Christian Rengstl 
> [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 03, 2006 11:18 AM
> To: Richard Huxton; Hakan Kocaman
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query performance
> 
> 
> Hi,
> 
> i would rather compare int4 too, but the snp_id can be 
> something like "abc123" unfortunately.
> 
> "Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 11:08 am:
> > Hi,
> > 
> > 
> >> -Original Message-
> >> From: Richard Huxton [mailto:[EMAIL PROTECTED] 
> >> Sent: Thursday, August 03, 2006 11:00 AM
> >> To: Christian Rengstl
> >> Cc: Hakan Kocaman; pgsql-general@postgresql.org 
> >> Subject: Re: [GENERAL] Query performance
> >> 
> >> 
> >> Christian Rengstl wrote:
> >> > Hi,
> >> > 
> >> > the complete query is the one i posted, but here comes the 
> >> schema for mytable:
> >> > entry_no int8 NOT NULL DEFAULT 
> nextval('entry_no_seq''::regclass),
> >> >   pid varchar(15) NOT NULL, 
> >> >   crit varchar(13) NOT NULL,
> >> >   val1 varchar(1),
> >> >   val2 varchar(1),
> >> >   aendat text,
> >> >   aennam varchar(8),
> >> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
> >> > 
> >> > myCritTable:
> >> >   crit varchar(13) NOT NULL,
> >> >   chr int2,
> >> >   aendat timestamp,
> >> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
> >> 
> >> Still doesn't match the EXPLAIN output - where's snp_id? 
> >> Where's table 
> >> test2?
> >> 
> > 
> > Yep, that bothered me too.
> > 
> >> > My server is 8.1.4. As a matter of fact, i have no idea 
> >> where the text 
> >>  > type comes from, because as you can see from above 
> there are only
> >>  > varchar with maximum 15 characters.
> >> 
> >> PG is casting it to text. There's no real difference between 
> >> the types 
> >> (other than the size limit) and it's not expensive.
> > 
> > But wouldn't a comparison between int4 be much cheaper.
> > If i see smth like "id" (here snp_id) in a fieldname it should be a 
> > int-type, i think.
> > 
> >> 
> >> > "Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 
> >> 10:34 am:
> >> >> Hi,
> >> >>
> >> >> can you post the complete query,schema- and 
> >> >> table-definition,server-version etc. ?
> >> >> This will help to identity the main problem.
> >> >>
> >> >> So at the moment i'm just guessing:
> >> >>
> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
> >> >> ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 
> >>  rows=37120 
> >> >> width=23) 
> >> >> (actual time=291.600..356707.737 rows=37539 loops=1)
> >> >> This part is very expensive, but i got no clue why.
> >> 
> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of 
> >> this. You 
> >> might want to increase work_mem, it could be that the bitmap 
> >> is spilling 
> >> to disk (which is much slower than keeping it all in RAM)
> >> 
> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou 
> > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
> > 
> > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see 
> what happens.
> > 
> > -- 
> >Richard Huxton
> >Archonet Ltd
> > 
> > 
> > 
> > Hakan Kocaman
> > Software-Development
> > 
> > digame.de GmbH
> > Richard-Byrd-Str. 4-8
> > 50829 Köln
> > 
> > Tel.: +49 (0) 221 59 68 88 31
> > Fax: +49 (0) 221 59 68 88 98
> > Email: [EMAIL PROTECTED] 
> > 
> > ---(end of 
> broadcast)---
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >choose an index scan if your joining column's 
> datatypes do not
> >match
> 
> 
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230
> 
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Query performance

2006-08-03 Thread Christian Rengstl
Hi,

i would rather compare int4 too, but the snp_id can be something like "abc123" 
unfortunately.

"Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 11:08 am:
> Hi,
> 
> 
>> -Original Message-
>> From: Richard Huxton [mailto:[EMAIL PROTECTED] 
>> Sent: Thursday, August 03, 2006 11:00 AM
>> To: Christian Rengstl
>> Cc: Hakan Kocaman; pgsql-general@postgresql.org 
>> Subject: Re: [GENERAL] Query performance
>> 
>> 
>> Christian Rengstl wrote:
>> > Hi,
>> > 
>> > the complete query is the one i posted, but here comes the 
>> schema for mytable:
>> > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
>> >   pid varchar(15) NOT NULL, 
>> >   crit varchar(13) NOT NULL,
>> >   val1 varchar(1),
>> >   val2 varchar(1),
>> >   aendat text,
>> >   aennam varchar(8),
>> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>> > 
>> > myCritTable:
>> >   crit varchar(13) NOT NULL,
>> >   chr int2,
>> >   aendat timestamp,
>> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>> 
>> Still doesn't match the EXPLAIN output - where's snp_id? 
>> Where's table 
>> test2?
>> 
> 
> Yep, that bothered me too.
> 
>> > My server is 8.1.4. As a matter of fact, i have no idea 
>> where the text 
>>  > type comes from, because as you can see from above there are only
>>  > varchar with maximum 15 characters.
>> 
>> PG is casting it to text. There's no real difference between 
>> the types 
>> (other than the size limit) and it's not expensive.
> 
> But wouldn't a comparison between int4 be much cheaper.
> If i see smth like "id" (here snp_id) in a fieldname it should be a 
> int-type, i think.
> 
>> 
>> > "Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 
>> 10:34 am:
>> >> Hi,
>> >>
>> >> can you post the complete query,schema- and 
>> >> table-definition,server-version etc. ?
>> >> This will help to identity the main problem.
>> >>
>> >> So at the moment i'm just guessing:
>> >>
>> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>> >> ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 
>>  rows=37120 
>> >> width=23) 
>> >>   (actual time=291.600..356707.737 rows=37539 loops=1)
>> >> This part is very expensive, but i got no clue why.
>> 
>> Yep, it looks like the "Bitmap Heap Scan" is at the heart of 
>> this. You 
>> might want to increase work_mem, it could be that the bitmap 
>> is spilling 
>> to disk (which is much slower than keeping it all in RAM)
>> 
>> http://www.postgresql.org/docs/8.1/static/runtime-config-resou 
> rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
> 
> If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.
> 
> -- 
>Richard Huxton
>Archonet Ltd
> 
> 
> 
> Hakan Kocaman
> Software-Development
> 
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
> 
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: [EMAIL PROTECTED] 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Query performance

2006-08-03 Thread Hakan Kocaman
Hi,


> -Original Message-
> From: Richard Huxton [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 03, 2006 11:00 AM
> To: Christian Rengstl
> Cc: Hakan Kocaman; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query performance
> 
> 
> Christian Rengstl wrote:
> > Hi,
> > 
> > the complete query is the one i posted, but here comes the 
> schema for mytable:
> > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
> >   pid varchar(15) NOT NULL, 
> >   crit varchar(13) NOT NULL,
> >   val1 varchar(1),
> >   val2 varchar(1),
> >   aendat text,
> >   aennam varchar(8),
> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
> > 
> > myCritTable:
> >   crit varchar(13) NOT NULL,
> >   chr int2,
> >   aendat timestamp,
> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
> 
> Still doesn't match the EXPLAIN output - where's snp_id? 
> Where's table 
> test2?
> 

Yep, that bothered me too.

> > My server is 8.1.4. As a matter of fact, i have no idea 
> where the text 
>  > type comes from, because as you can see from above there are only
>  > varchar with maximum 15 characters.
> 
> PG is casting it to text. There's no real difference between 
> the types 
> (other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a int-type, i 
think.

> 
> > "Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 
> 10:34 am:
> >> Hi,
> >>
> >> can you post the complete query,schema- and 
> >> table-definition,server-version etc. ?
> >> This will help to identity the main problem.
> >>
> >> So at the moment i'm just guessing:
> >>
> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
> >> ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 
>  rows=37120 
> >> width=23) 
> >>(actual time=291.600..356707.737 rows=37539 loops=1)
> >> This part is very expensive, but i got no clue why.
> 
> Yep, it looks like the "Bitmap Heap Scan" is at the heart of 
> this. You 
> might want to increase work_mem, it could be that the bitmap 
> is spilling 
> to disk (which is much slower than keeping it all in RAM)
> 
> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

-- 
   Richard Huxton
   Archonet Ltd



Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Query performance

2006-08-03 Thread Richard Huxton

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the schema for mytable:
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
  pid varchar(15) NOT NULL, 
  crit varchar(13) NOT NULL,

  val1 varchar(1),
  val2 varchar(1),
  aendat text,
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
  crit varchar(13) NOT NULL,
  chr int2,
  aendat timestamp,
  CONSTRAINT pk_crit_master PRIMARY KEY (crit)


Still doesn't match the EXPLAIN output - where's snp_id? Where's table 
test2?


My server is 8.1.4. As a matter of fact, i have no idea where the text 

> type comes from, because as you can see from above there are only
> varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between the types 
(other than the size limit) and it's not expensive.



"Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 10:34 am:

Hi,

can you post the complete query,schema- and 
table-definition,server-version etc. ?

This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120 
width=23) 
	(actual time=291.600..356707.737 rows=37539 loops=1)

This part is very expensive, but i got no clue why.


Yep, it looks like the "Bitmap Heap Scan" is at the heart of this. You 
might want to increase work_mem, it could be that the bitmap is spilling 
to disk (which is much slower than keeping it all in RAM)


http://www.postgresql.org/docs/8.1/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Query performance

2006-08-03 Thread Hakan Kocaman
Hi,

can you post the complete query,schema- and table-definition,server-version 
etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120 
width=23) 
 (actual time=291.600..356707.737 
rows=37539 loops=1)
This part is very expensive, but i got no clue why.
Maybe the text-type is not so ideal.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: [EMAIL PROTECTED]



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Christian Rengstl
> Sent: Thursday, August 03, 2006 10:13 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query performance
> 
> 
> Hi everyone,
> 
> i have a table with around 57 million tuples, with the 
> following columns: pid(varchar), crit(varchar), 
> val1(varchar), val2(varchar). Example:
> pidcritval1val2
> p1  c1  xy
> p1  c2  xz
> p1  c3  yx
> ...
> What i am doing is to query all val1 and val2 for one pid and 
> all crit values:
> 
> select val1, val2, crit from mytable where pid='somepid' and 
> crit in(select crit from myCritTable);
> where myCritTable is a table that contains all crit values 
> (around 42.000) ordered by their insertion date.
> 
> 
> QUERY PLAN
> 
> --
> --
> --
>  Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23) 
> (actual time=357.11
> 6..356984.535 rows=37539 loops=1)
>Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 
> rows=37120 width=23) (
> actual time=291.600..356707.737 rows=37539 loops=1)
>  Recheck Cond: ((pid)::text = '1'::text)
>  ->  Bitmap Index Scan on idx_test2_pid  
> (cost=0.00..232.92 rows=37120 w
> idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
>Index Cond: ((pid)::text = '1'::text)
>->  Hash  (cost=700.20..700.20 rows=40220 width=13) 
> (actual time=65.055..65.0
> 55 rows=40220 loops=1)
>  ->  Seq Scan on snps_test  (cost=0.00..700.20 
> rows=40220 width=13) (act
> ual time=0.020..30.131 rows=40220 loops=1)
>  Total runtime: 357017.259 ms
> 
> Unfortunately the query takes pretty long for the big table, 
> so maybe one of you has a suggestion on how to make it faster.
> 
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230
> 
> 
> ---(end of 
> broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Query performance

2006-08-03 Thread Christian Rengstl
Hi,

the complete query is the one i posted, but here comes the schema for mytable:
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
  pid varchar(15) NOT NULL, 
  crit varchar(13) NOT NULL,
  val1 varchar(1),
  val2 varchar(1),
  aendat text,
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
  crit varchar(13) NOT NULL,
  chr int2,
  aendat timestamp,
  CONSTRAINT pk_crit_master PRIMARY KEY (crit)

My server is 8.1.4. As a matter of fact, i have no idea where the text type 
comes from, because as you can see from above there are only varchar with 
maximum 15 characters.

"Hakan Kocaman" <[EMAIL PROTECTED]> wrote on 08/03/06 10:34 am:
> Hi,
> 
> can you post the complete query,schema- and 
> table-definition,server-version etc. ?
> This will help to identity the main problem.
> 
> So at the moment i'm just guessing:
> 
> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
> ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120 
> width=23) 
>(actual time=291.600..356707.737 
> rows=37539 loops=1)
> This part is very expensive, but i got no clue why.
> Maybe the text-type is not so ideal.
> 
> Best regards
> 
> Hakan Kocaman
> Software-Development
> 
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
> 
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: [EMAIL PROTECTED] 
> 
> 
> 
>> -Original Message-
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On Behalf Of 
>> Christian Rengstl
>> Sent: Thursday, August 03, 2006 10:13 AM
>> To: pgsql-general@postgresql.org 
>> Subject: [GENERAL] Query performance
>> 
>> 
>> Hi everyone,
>> 
>> i have a table with around 57 million tuples, with the 
>> following columns: pid(varchar), crit(varchar), 
>> val1(varchar), val2(varchar). Example:
>> pidcritval1val2
>> p1  c1  xy
>> p1  c2  xz
>> p1  c3  yx
>> ...
>> What i am doing is to query all val1 and val2 for one pid and 
>> all crit values:
>> 
>> select val1, val2, crit from mytable where pid='somepid' and 
>> crit in(select crit from myCritTable);
>> where myCritTable is a table that contains all crit values 
>> (around 42.000) ordered by their insertion date.
>> 
>> 
>> QUERY PLAN
>> 
>> --
>> --
>> --
>>  Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23) 
>> (actual time=357.11
>> 6..356984.535 rows=37539 loops=1)
>>Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>>->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 
>> rows=37120 width=23) (
>> actual time=291.600..356707.737 rows=37539 loops=1)
>>  Recheck Cond: ((pid)::text = '1'::text)
>>  ->  Bitmap Index Scan on idx_test2_pid  
>> (cost=0.00..232.92 rows=37120 w
>> idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
>>Index Cond: ((pid)::text = '1'::text)
>>->  Hash  (cost=700.20..700.20 rows=40220 width=13) 
>> (actual time=65.055..65.0
>> 55 rows=40220 loops=1)
>>  ->  Seq Scan on snps_test  (cost=0.00..700.20 
>> rows=40220 width=13) (act
>> ual time=0.020..30.131 rows=40220 loops=1)
>>  Total runtime: 357017.259 ms
>> 
>> Unfortunately the query takes pretty long for the big table, 
>> so maybe one of you has a suggestion on how to make it faster.
>> 
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>> 
>> 
>> ---(end of 
>> broadcast)---
>> TIP 4: Have you searched our list archives?
>> 
>>http://archives.postgresql.org 
>> 


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Query performance

2006-08-03 Thread Nikolay Samokhvalov

On 8/3/06, Christian Rengstl <[EMAIL PROTECTED]> wrote:

...
Unfortunately the query takes pretty long for the big table, so maybe one of 
you has a suggestion on how to make it faster.



try smth like this:

select val1, val2, crit from mytable as a where pid='somepid' and
exists(select 1 from myCritTable as b where a.crit = b.crit);


--
Best regards,
Nikolay

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Query performance problem

2005-03-19 Thread Greg Stark
Paul Tillotson <[EMAIL PROTECTED]> writes:

> >Total runtime: 12.241 ms
> >
> > Still this is a third of the time of the sub-query route but 4 times longer
> > than mysql - this must be an install issue?
> 
> Just about any query will usually take a few milliseconds (try SELECT 1; to 
> see
> the absolute lowest), and so 12 ms is probably about as good as you can  get.
> For my own part, I consider 50 ms good enough for any query that is not run
> inside of a loop.  

Consider that typical drive seek times are on the order of 10ms. So if you're
getting anything better than that from MySQL from *any* query it's purely
because all the data is cached in RAM. If you can afford to keep your entire
data set cached in RAM and are worried about guaranteeing response like 1-3ms
then perhaps you should consider whether a database is the right solution for
you. Perhaps something simpler like libdb or memcached would be more
appropriate.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] Query performance problem

2005-03-19 Thread Paul Tillotson
Phil,
Just about any query will usually take a few milliseconds (try SELECT 1; 
to see the absolute lowest), and so 12 ms is probably about as good as 
you can  get.  For my own part, I consider 50 ms good enough for any 
query that is not run inside of a loop.  If you want to write suitably 
efficient code/SQL for this, I suggest filling your tables with more 
data (say, 10 times as much as you have now) and then see how the 
timings work. 

Are you already working with what you would consider a "typical" data 
size?  Or is it smaller than what someone would typically have?

If you post any more timings on this list, please post the EXPLAIN 
ANALYZE as well.  This allows us to see what plan the planner picked, 
how much time each step took, and how many rows were actually affected.  
To get the EXPLAIN ANALYZE, just type EXPLAIN ANALYZE  and copy the output.

Regards,
Paul Tillotson
Phil Daintree wrote:
I can also do the same thing without sub-queries - I messed about some more 
since I was keen to ensure backward compatibility with prior versions of 
mysql that have left/right joins but no subqueries ... quite a bit quicker 
still!

Query took 0.0037 sec - 1/10th of the sub-query time.
SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster INNER JOIN periods ON True
LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode 
AND periods.periodno = chartdetails.period
WHERE periods.periodno >=1 AND periods.periodno <=63 AND 
chartdetails.accountcode IS NULL  LIMIT 0 , 30

In postgres:
SQL executed.
Total runtime: 12.241 ms
Still this is a third of the time of the sub-query route but 4 times longer 
than mysql - this must be an install issue?

Thanks again for this idea Paul
phil
 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Query performance problem

2005-03-18 Thread Phil Daintree
I can also do the same thing without sub-queries - I messed about some more 
since I was keen to ensure backward compatibility with prior versions of 
mysql that have left/right joins but no subqueries ... quite a bit quicker 
still!

Query took 0.0037 sec - 1/10th of the sub-query time.

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster INNER JOIN periods ON True
LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode 
AND periods.periodno = chartdetails.period
WHERE periods.periodno >=1 AND periods.periodno <=63 AND 
chartdetails.accountcode IS NULL  LIMIT 0 , 30


In postgres:

SQL executed.

Total runtime: 12.241 ms

Still this is a third of the time of the sub-query route but 4 times longer 
than mysql - this must be an install issue?


Thanks again for this idea Paul

phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
> INSERT INTO chartdetails (accountcode, period)
>   SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>   SELECT accountcode, period FROM chardetails WHERE period =
> $period
>   );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>  SELECT accountcode, period FROM accountcode, period
>  WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>  SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>   );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> --
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE );
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip]
> >
> >
> > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> > While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> > echo '' . _('Period Number') . ' ' . $PeriodNo . 
> > '';
> >
> > // Check if there is an chart details record set up
> > $sql = 'SELECT count(*) FROM chartdetails
> > WHERE 
> > accountcode='.$AccountRow['accountcode'].'
> > AND period=' . $PeriodNo;
> > $InsChartDetails = DB_query($sql,$db,'','','',false);
> > $CountRows = DB_fetch_row($InsChartDetails);
> > $AccountExistsAlready = $CountRows[0];
> > DB_free_result($InsChartDetails);
> > if(! $AccountExistsAlready) {
> > $sql = 'INSERT INTO chartdetails (accountcode,
> > period)
> > VALUES (' . $AccountRow['accountcode'] 
> > . ',
> > ' . $PeriodNo . ')';
> > $InsChartDetails = DB_query($sql,$db);
> > DB_free_result($InsChartDetails);
> > }
> >
> > }
> >
> > /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> > $sql = 'SELECT accountcode,
> > period,
> > actual + bfwd AS cfwd,
> > budget + bfwdbudget AS cfwdbudget
> > FROM chartdetails WHERE period =' . ($PeriodNo  
> > - 1);
> > $ChartDetailsCFwd = DB_query($sql,$db);
> >
> > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> > $sql = 'UPDATE chartdetails SET bfwd =' . 
> > $myrow['cfwd'] . ',
> > bfwdbudget =' . 
> > $myrow['cfwdbudget'] . '
> >

Re: [GENERAL] Query performance problem

2005-03-18 Thread Phil Daintree
First time I ran it it took 5127.243 ms .. then I did a full vacuum.

then ...

SQL executed.

Total runtime: 33.707 ms


I am keen to just have the one lot of code all in the scripts ... so  I was 
pleased when the identical sql also worked on mysql!!!

Your SQL-query has been executed successfully (Query took 0.0350 sec)

SQL-query : [Edit] [Explain SQL] [Create PHP Code]

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster, periods
WHERE (
periods.periodno
BETWEEN 1 AND 12
) AND (
chartmaster.accountcode, periods.periodno
) NOT IN 
(SELECT accountcode, period
FROM chartdetails
WHERE period
BETWEEN 1 AND 12
) LIMIT 0 , 30

You'll notice the discrepancy on the timings though!

Whilst pg is not performing the way mysql does with innodb - it is at least 
usable this way. I am guessing there is some gremlin with my install - I'll 
try an upgrade to v 8.

Phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
> INSERT INTO chartdetails (accountcode, period)
>   SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>   SELECT accountcode, period FROM chardetails WHERE period =
> $period
>   );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>  SELECT accountcode, period FROM accountcode, period
>  WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>  SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>   );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> --
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE );
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip]
> >
> >
> > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> > While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> > echo '' . _('Period Number') . ' ' . $PeriodNo . 
> > '';
> >
> > // Check if there is an chart details record set up
> > $sql = 'SELECT count(*) FROM chartdetails
> > WHERE 
> > accountcode='.$AccountRow['accountcode'].'
> > AND period=' . $PeriodNo;
> > $InsChartDetails = DB_query($sql,$db,'','','',false);
> > $CountRows = DB_fetch_row($InsChartDetails);
> > $AccountExistsAlready = $CountRows[0];
> > DB_free_result($InsChartDetails);
> > if(! $AccountExistsAlready) {
> > $sql = 'INSERT INTO chartdetails (accountcode,
> > period)
> > VALUES (' . $AccountRow['accountcode'] 
> > . ',
> > ' . $PeriodNo . ')';
> > $InsChartDetails = DB_query($sql,$db);
> > DB_free_result($InsChartDetails);
> > }
> >
> > }
> >
> > /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> > $sql = 'SELECT accountcode,
> > period,
> > actual + bfwd AS cfwd,
> > budget + bfwdbudget AS cfwdbudget
> > FROM chartdetails WHERE period =' . ($PeriodNo  
> > - 1);
> > $ChartDetailsCFwd = DB_query($sql,$db);
> >
> > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> > $sql = 'UPDATE chartdetails SET bfwd =' . 
> > $myrow['cfwd'] . ',
> > 

Re: [GENERAL] Query performance problem

2005-03-18 Thread Paul Tillotson
Phil Daintree wrote:
Appreciated you help Paul - many thanks for taking the time.
 

I view this as merely passing on all the pearls of wisdom I have gleaned 
from this list.  : )

Advice:
Are you running this inside a transaction?  Do so, because if you don't,
then each UPDATE or INSERT or SELECT runs inside its own transaction,
and committing each transaction has overhead associated with it.
   


It looks like just putting this code inside a transaction has dramatically 
reduced the problem. Of course I knew this but it needed me to be embarassed 
to actually do it :-)

 

Glad to hear that this helped.  In case you are interested, the reason 
this makes such a dramatic difference is that each transaction's commit 
record must be logged to the commit log, which is a sequentially written 
file.  (Thus, you can only commit one transaction per revolution of the 
disk, and so if you have a 7200 rpm disk, you can't get more than 120 
transactions / second on a safely configured system unless your drive 
has a battery-backed write cache.)

This block of code is INSIDE a while loop that loops once for each row
in chartmaster:
for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
$sql = 'SELECT accountcode,
period,
actual + bfwd AS cfwd,
budget + bfwdbudget AS cfwdbudget
FROM chartdetails WHERE period =' . ($PeriodNo  
- 1);
$ChartDetailsCFwd = DB_query($sql,$db);
while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
$sql = 'UPDATE chartdetails SET bfwd =' . 
$myrow['cfwd'] . ',
bfwdbudget =' . 
$myrow['cfwdbudget'] . '
WHERE accountcode = ' . 
$myrow['accountcode'] . '
AND period >=' . $PeriodNo;
$UpdChartDetails = DB_query($sql,$db, '', '', 
'', false);
DB_free_result($UpdChartDetails);
}
DB_free_result($ChartDetailsCFwd);
}
It looks like you're updating the same row in chartmaster multiple
times.  
   

chartmaster is not being updated ... 

Sorry--I mean chartdetails.
One tip to remember:  If you have a statement like this:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id';
And it is the case that foocol might already equal value, then write 
this instead:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id' AND foocol <> 
'value';

This will help because no disk write will actually happen if foocol 
happens to already equal value.

chartdetails is - the chartdetails 
relating to each chartmaster record for all periods >= $PeriodNo I have to 
update all the following periods as the balance b/fwd for all successive 
periods has now increased if we post transaction back in time normally there 
might only be a few chartdetails records for the chartmaster account under 
review with chartdetails records with a period later than the one being 
posted.

 

Am I correct in thinking that bfwd is basically a running tally of 
actual, and bfwdbudget is a running tally of budget, as one might 
normally find in a spreadsheet?

If so, you could use this view to calculate the correct value for every 
location in the table:

CREATE VIEW newtotals AS SELECT
   thismonth.accountcode,
   thismonth.periodno,
   (SELECT SUM(actual) FROM chartdetails AS q1 WHERE q1.accountcode = 
accountcode AND q1.periodno < thismonth.periodno) as cfwd,
   (SELECT SUM(budget) FROM chartdetails AS q1 WHERE q1.accountcode = 
accountcode AND q1.periodno < thismonth.periodno) as cfwdbudget,
FROM chartdetails AS thismonth;

And then you could use an update statement:
UPDATE chartdetails
SET bfwd = cfwd, bfwdbudget = cfwdbudget
FROM newtotals
WHERE chartdetails.accountcode = newtotals.accountcode AND 
chartdetails.periodno = newtotals.periodno -- JOIN condition
   AND period BETWEEN $CreateTo AND $CreateFrom
   AND (bfwd <> cfwd OR bfwdbudget <> cfwdbudget); -- AVOID needless 
updates.

Since I don't have your tables to work with, this might need minor 
syntax tweaking, but I'm pretty sure it will work.
I think MySQL doesn't support views yet, but you could replace 
"newtotals" in the above update with a big subselect (which I think they 
are supposed to support in the latest 4.x version.)

Also: if you don't already have one, that UPDATE statement would 
probably use a compound index on (accountcode, periodno).

Now I begin to comprehend why CompiereERP doesn't support MySQL.  ; )
a row in chartdetails will be updated every time there is a gltrans posting to 
the period and account of the chartdetails ie quite often.
 

If it gets updated often it will need vacuuming often as well.
M

Re: [GENERAL] Query performance problem

2005-03-17 Thread Paul Tillotson
Note: If you want to know WHY this takes so long, please tell us how 
many times each loop executes and how long each query takes.

Be sure to post an EXPLAIN ANALYZE for each of your queries that you are 
running.  This will show what plans the planner is using and how long 
they are actually taking.

--
Advice:
Are you running this inside a transaction?  Do so, because if you don't, 
then each UPDATE or INSERT or SELECT runs inside its own transaction, 
and committing each transaction has overhead associated with it.

This block of code is INSIDE a while loop that loops once for each row 
in chartmaster:

for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
$sql = 'SELECT accountcode,
period,
actual + bfwd AS cfwd,
budget + bfwdbudget AS cfwdbudget
FROM chartdetails WHERE period =' . ($PeriodNo  
- 1);
$ChartDetailsCFwd = DB_query($sql,$db);
while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
$sql = 'UPDATE chartdetails SET bfwd =' . 
$myrow['cfwd'] . ',
bfwdbudget =' . 
$myrow['cfwdbudget'] . '
WHERE accountcode = ' . 
$myrow['accountcode'] . '
AND period >=' . $PeriodNo;
$UpdChartDetails = DB_query($sql,$db, '', '', 
'', false);
DB_free_result($UpdChartDetails);
}
DB_free_result($ChartDetailsCFwd);
}
It looks like you're updating the same row in chartmaster multiple 
times.  In postgres, because of MVCC, this will cause unnecessary table 
bloat, as EVERY TIME you update the same row, it will leave behind 
another dead tuple that must be reclaimed by VACUUM.  (In other words, 
if you update every row row in chartdetails 10 times without VACUUMing, 
then this will cause the table to grow 10x.)

As I am still unsure what this code is supposed to do, please tell me: 
how many times is each row in chartdetail getting updated?  If it is 
getting updated more than once, could you find a way to update it only once?

First, why is this UPDATE statement inside the while loop at all?  It 
doesn't look like it references any members of $AccountRow, which is the 
variable that governs the WHILE loop that it is within.  Can you take it 
out of the while loop and still get the same results?

Second, could you write that update statement to say WHERE period = 
$periodno rather than WHERE period >= $period?  If not, why not?

Regards,
Paul Tillotson
Phil Daintree wrote:
Dear psqlers,
I need your help!
I administer/develop an open source PHP accounting software project (webERP) 
that was originally mysql only. Since Christmas I and another member of the 
team lower cased all the sql and changed some elements of the SQL to allow it 
to use postgres as well. All appears to work beautifully with just a single 
but important exception.

 

[snip]
Can anyone tell me why does the following code chokes ... literally - this 
works almost invisbly under mysql - pg takes more than an hour even on a very 
small 30 record database. 

(The table chartmaster is just a list of general ledger accounts accountcode 
and accountdescription. PK = accountcode)

$ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
While ($AccountRow = DB_fetch_array($ChartAccounts)){
for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
			echo '' . _('Period Number') . ' ' . $PeriodNo . '';
			
			// Check if there is an chart details record set up
			$sql = 'SELECT count(*) FROM chartdetails 
	WHERE accountcode='.$AccountRow['accountcode'].'
	AND period=' . $PeriodNo;
			$InsChartDetails = DB_query($sql,$db,'','','',false);
			$CountRows = DB_fetch_row($InsChartDetails);
			$AccountExistsAlready = $CountRows[0];
			DB_free_result($InsChartDetails);
			if(! $AccountExistsAlready) {
$sql = 'INSERT INTO chartdetails (accountcode,
			period)
	VALUES (' . $AccountRow['accountcode'] . ',
		' . $PeriodNo . ')';
$InsChartDetails = DB_query($sql,$db);
DB_free_result($InsChartDetails);
			}
			
		}

	/*Now run through each of the new chartdetail records created for each 
account and update them with the B/Fwd and B/Fwd budget no updates would be 
required where there were previously no chart details set up ie 
FirstPeriodPostedTo > 0 */

for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
$sql = 'SELECT accountcode,
period,
actual + bfwd AS cfwd,

Re: [GENERAL] Query performance problem

2005-03-17 Thread Paul Tillotson
See the syntax for INSERT ... SELECT shown here:
http://www.postgresql.org/docs/8.0/static/sql-insert.html
Instead of doing a nested loop to INSERT new records, do it like this:
For ($period = start; $period < end; $period++)
{
   INSERT INTO chartdetails (accountcode, period)
 SELECT accountcode, $period FROM chartdetails WHERE 
(accountcode, $period) NOT IN (
 SELECT accountcode, period FROM chardetails WHERE period = 
$period
 );
}

Or if you have some table that has 1 row for each period (call it 
"periods") then you could simply do:

INSERT INTO chartdetails (accountcode, period)
SELECT accountcode, period FROM accountcode, period
WHERE (period BETWEEN $start AND $end) AND (accountcode, 
period) NOT IN (
SELECT accountcode, period FROM chartdetails WHERE period 
BETWEEN $start AND $end
 );

Note to others: see the legitimate use of an unconstrained CROSS JOIN?
--
Postgres's SELECT count(*) is slow if many records meet the WHERE clause 
being used.  It looks like you're only using testing for 0 or >0 in your 
query, so you could use:

SELECT EXISTS (SELECT 1 FROM chartdetails WHERE );
This will be much faster since with EXISTS, postgres only runs the query 
long enough to find out whether even one row would be returned--if so, 
it stops.

Regards,
Paul Tillotson
Phil Daintree wrote:
Dear psqlers,
I need your help!
I administer/develop an open source PHP accounting software project (webERP) 
[snip]
$ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
While ($AccountRow = DB_fetch_array($ChartAccounts)){
for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
			echo '' . _('Period Number') . ' ' . $PeriodNo . '';
			
			// Check if there is an chart details record set up
			$sql = 'SELECT count(*) FROM chartdetails 
	WHERE accountcode='.$AccountRow['accountcode'].'
	AND period=' . $PeriodNo;
			$InsChartDetails = DB_query($sql,$db,'','','',false);
			$CountRows = DB_fetch_row($InsChartDetails);
			$AccountExistsAlready = $CountRows[0];
			DB_free_result($InsChartDetails);
			if(! $AccountExistsAlready) {
$sql = 'INSERT INTO chartdetails (accountcode,
			period)
	VALUES (' . $AccountRow['accountcode'] . ',
		' . $PeriodNo . ')';
$InsChartDetails = DB_query($sql,$db);
DB_free_result($InsChartDetails);
			}
			
		}

	/*Now run through each of the new chartdetail records created for each 
account and update them with the B/Fwd and B/Fwd budget no updates would be 
required where there were previously no chart details set up ie 
FirstPeriodPostedTo > 0 */

for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
$sql = 'SELECT accountcode,
period,
actual + bfwd AS cfwd,
budget + bfwdbudget AS cfwdbudget
FROM chartdetails WHERE period =' . ($PeriodNo  
- 1);
$ChartDetailsCFwd = DB_query($sql,$db);
while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
$sql = 'UPDATE chartdetails SET bfwd =' . 
$myrow['cfwd'] . ',
bfwdbudget =' . 
$myrow['cfwdbudget'] . '
WHERE accountcode = ' . 
$myrow['accountcode'] . '
AND period >=' . $PeriodNo;
$UpdChartDetails = DB_query($sql,$db, '', '', 
'', false);
DB_free_result($UpdChartDetails);
}
DB_free_result($ChartDetailsCFwd);
}
}
 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [GENERAL] Query performance problem

2005-03-17 Thread Richard Huxton
Phil Daintree wrote:
There are 2 tables used in the sql we need to optimise .
CREATE TABLE chartdetails (
CREATE TABLE gltrans (

So there is a chartdetail record for every period for every general ledger 
account.  So if there are 5 years x 12 periods (months) and 200 general 
ledger accounts this table will be 12,000 records.

There is a gltrans record for every side of a journal entry. This can get to 
be quite a significant table - easily more than 200,000 per annum - depending 
on the size of the business obviously.

Can anyone tell me why does the following code chokes ... literally - this 
works almost invisbly under mysql - pg takes more than an hour even on a very 
small 30 record database. 
There's nothing obvious in the code below. You could probably do it all 
with one (more complex) sql query (maybe two), but it shouldn't be too 
bad as it stands.

The code doesn't seem to use gltrans at all though (unless I'm missing 
something). Is there something missing?

Anyway, the first step will be to get some accurate figures. Can you 
turn statement logging on in postgresql.conf and also timestamps (unless 
it's logging to syslogd, in which case you'll get them automatically). 
That way we can identify exactly where the time is being spent.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Bruno Wolff III
On Wed, Jan 07, 2004 at 02:31:22 -0500,
  Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> I just a couple days ago added some logic to CVS tip to notice that the
> sub-select has a DISTINCT clause, and not add unnecessary unique-ifying
> processing on top of it.  So in 7.5, writing a DISTINCT clause will
> amount to forcing a particular query plan, which might or might not be
> the best thing but hopefully won't be too terrible.  But in 7.4 it has
> nothing to recommend it ...

Can't the DISTINCT be dropped if there isn't a LIMIT clause?
Similarly UNION, INTERSECTION and EXCEPT could also also be changed
to the ALL forms if there isn't a LIMIT.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Együd Csaba
Hi Tom,
thank you, I'll upgrade as soon as I can. Anyway I've already planned to do
so for a while.
I'll keep in mind your remarks concerning the DISTINCT clause too.

Bye and Best Regards,
-- Csaba

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
> Sent: 2004. január 6. 21:04
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED] (E-mail)
> Subject: Re: [GENERAL] Query performance question on a large table
>
>
> =?iso-8859-2?Q?Egy=FCd_Csaba?= <[EMAIL PROTECTED]> writes:
> > here is a sample query:
> > select  mertido, fomeazon, ertektipus, mertertek from
> t_me30 where fomeazon
> > in (select distinct fomeazon from t_fome where lower(inuse)
> = 'igen') and
> > mertido like '2003-12-17%' and ertektipus in ('+MW') order
> by mertido,
> > fomeazon, ertektipus;
>
> > Ohh, I nearly forgot the config:  Linux 7.1; Postgres 7.3.2;
>
> The first thing you ought to do is move to PG 7.4.  "foo IN
> (SELECT ...)"
> generally works a lot better under 7.4 than prior releases.
> I'd suggest
> dropping the "DISTINCT" when using 7.4, too.
>
>   regards, tom lane
>
> ---(end of
> broadcast)---
> TIP 1: subscribe and unsubscribe commands go to
> [EMAIL PROTECTED]
>
>
>
> -- Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.209 / Virus Database: 261 - Release Date: 2004. 01. 02.
>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Tom Lane
Sean Shanny <[EMAIL PROTECTED]> writes:
> Sort of piggybacking on this thread but why the suggestion to drop the 
> use of DISTINCT in 7.4?

Because the 7.4 planner can decide for itself whether DISTINCT'ifying
the sub-select output is the best way to proceed or not.  There is more
than one good way to do an "IN sub-SELECT" operation, and the 7.4
planner knows several.  (Pre-7.4 planners didn't know any :-( ... but
I digress.)  When you write "foo IN (SELECT DISTINCT ...)", the DISTINCT
doesn't change the semantics at all, it just adds overhead.

In fact it's worse than that: if the planner decides that the best way
to proceed is to make the subselect output unique, it will throw another
layer of sort/unique processing on top of what you did.  So writing
DISTINCT is actually a pessimization in 7.4.

>  Example:

>->  HashAggregate  (cost=1020025.13..1020025.13 rows=51 width=4) 
> (actual time=954049.317..954450.065 rows=82208 loops=1)
>  ->  Subquery Scan "IN_subquery"  (cost=983429.20..1020025.00 
> rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1)
>->  Unique  (cost=983429.20..1020024.49 rows=51 width=4) 
> (actual time=856641.230..952939.539 rows=82208 loops=1)
>  ->  Sort  (cost=983429.20..1001726.84 rows=7319058 
> width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1)
>Sort Key: useragent_key
>->  Index Scan using 

The sort/unique steps are coming from the DISTINCT.  The HashAggregate
step is the planner making sure the output rows are distinct :-(

I just a couple days ago added some logic to CVS tip to notice that the
sub-select has a DISTINCT clause, and not add unnecessary unique-ifying
processing on top of it.  So in 7.5, writing a DISTINCT clause will
amount to forcing a particular query plan, which might or might not be
the best thing but hopefully won't be too terrible.  But in 7.4 it has
nothing to recommend it ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Sean Shanny
Tom,

Sort of piggybacking on this thread but why the suggestion to drop the 
use of DISTINCT in 7.4?  We use DISTINCT all over the place to eliminate 
duplicates in sub select statements. Running 7.4.0 currently on 
FreeBSD5.1 Dell 2650 4GB RAM 5 disk SCSI array hardware RAID 0

Example:

explain analyze select t1.raw_agent_string from d_useragent t1 where 
t1.id in (select distinct useragent_key from f_pageviews where date_key 
between 356 and 362);
 
QUERY PLAN
 
--
Nested Loop  (cost=1020025.13..1020178.84 rows=51 width=79) (actual 
time=954080.021..970268.457 rows=82207 loops=1)
  ->  HashAggregate  (cost=1020025.13..1020025.13 rows=51 width=4) 
(actual time=954049.317..954450.065 rows=82208 loops=1)
->  Subquery Scan "IN_subquery"  (cost=983429.20..1020025.00 
rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1)
  ->  Unique  (cost=983429.20..1020024.49 rows=51 width=4) 
(actual time=856641.230..952939.539 rows=82208 loops=1)
->  Sort  (cost=983429.20..1001726.84 rows=7319058 
width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1)
  Sort Key: useragent_key
  ->  Index Scan using 
idx_pageviews_date_dec_2003 on f_pageviews  (cost=0.00..136434.63 
rows=7319058 width=4) (actual time=1.140..693400.464 rows=11067735 loops=1)
Index Cond: ((date_key >= 356) AND 
(date_key <= 362))
  ->  Index Scan using d_useragent_pkey on d_useragent t1  
(cost=0.00..3.00 rows=1 width=83) (actual time=0.169..0.174 rows=1 
loops=82208)
Index Cond: (t1.id = "outer".useragent_key)
Total runtime: 970657.888 ms
(11 rows)

t1.id is the primary key on d_useragent.  d_useragent  actually has   
390751 rows.
useragent_key has an index.  f_pageviews has roughly 120 million rows.

Is there a better way of writing this sort of query that will accomplish 
the same thing?

Thanks.

--sean

Tom Lane wrote:

=?iso-8859-2?Q?Egy=FCd_Csaba?= <[EMAIL PROTECTED]> writes:
 

here is a sample query:
select  mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon
in (select distinct fomeazon from t_fome where lower(inuse) = 'igen') and
mertido like '2003-12-17%' and ertektipus in ('+MW') order by mertido,
fomeazon, ertektipus;
   

 

Ohh, I nearly forgot the config:  Linux 7.1; Postgres 7.3.2;
   

The first thing you ought to do is move to PG 7.4.  "foo IN (SELECT ...)"
generally works a lot better under 7.4 than prior releases.  I'd suggest
dropping the "DISTINCT" when using 7.4, too.
			regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match