Re: [GENERAL] Query plan for Merge Semi Join

2017-11-01 Thread Laurenz Albe
Peter J. Holzer wrote:
> [PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian
> 4.9.2-10) 4.9.2, 64-bit]
> 
> While investigating a performance issue, I found this query plan:
> 
> wds=> explain analyze
>   select facttablename, columnname, term, concept_id, t.hidden, language, 
> register, c.id, canonicalname, description, parent, type, r.sortorder
>   from term t, concept c, relation r
>   where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen'
> and exists (select 1 from facttable_kon_eh f where 
> f.arbeitsvolumen=t.term  and thema in (values (E'E')))
> and c.id=concept_id and r.child=concept_id;
> ╔╗
> ║ 
>   QUERY PLAN  
>  ║
> ╟╢
> ║ Nested Loop  (cost=22984.13..241822.98 rows=1 width=169) (actual 
> time=667.608..3275.037 rows=6 loops=1)
> ║
> ║   ->  Nested Loop  (cost=22983.70..241822.50 rows=1 width=154) (actual 
> time=667.587..3274.972 rows=6 loops=1)
>   ║
> ║ ->  Merge Semi Join  (cost=22983.27..241819.04 rows=1 width=87) 
> (actual time=667.559..3274.877 rows=6 loops=1)
>  ║
> ║   Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text)   
>   
>  ║
> ║   ->  Index Scan using term_term_idx on term t  
> (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 
> loops=1) ║
> ║ Filter: (((facttablename)::text = 
> 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text))  
>║
> ║ Rows Removed by Filter: 3874190 
>   
>  ║
> ║   ->  Materialize  (cost=0.43..399167.10 rows=3548798 width=7) 
> (actual time=0.086..295.708 rows=184791 loops=1)  
>   ║
> ║ ->  Nested Loop Semi Join  (cost=0.43..390295.10 
> rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1)
> ║
> ║   Join Filter: ((f.thema)::text = ('E'::text))  
>   
>  ║
> ║   ->  Index Scan using 
> facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f  
> (cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 
> rows=184791 loops=1) ║
> ║   ->  Materialize  (cost=0.00..0.03 rows=1 
> width=32) (actual time=0.000..0.000 rows=1 loops=184791)  
>   ║
> ║ ->  Result  (cost=0.00..0.01 rows=1 
> width=0) (actual time=0.001..0.001 rows=1 loops=1)
>  ║
> ║ ->  Index Scan using concept_pkey on concept c  (cost=0.43..3.45 
> rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6)
> ║
> ║   Index Cond: (id = t.concept_id)   
>   
>  ║
> ║   ->  Index Scan using relation_child_idx on relation r  (cost=0.43..0.47 
> rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6)
>║
> ║ Index Cond: (child = c.id)  
>   
>  ║
> ║ Planning time: 15.624 ms
>  

[GENERAL] Query plan for Merge Semi Join

2017-11-01 Thread Peter J. Holzer
[PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian
4.9.2-10) 4.9.2, 64-bit]

While investigating a performance issue, I found this query plan:

wds=> explain analyze
  select facttablename, columnname, term, concept_id, t.hidden, language, 
register, c.id, canonicalname, description, parent, type, r.sortorder
  from term t, concept c, relation r
  where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen'
and exists (select 1 from facttable_kon_eh f where 
f.arbeitsvolumen=t.term  and thema in (values (E'E')))
and c.id=concept_id and r.child=concept_id;
╔╗
║   
QUERY PLAN  
 ║
╟╢
║ Nested Loop  (cost=22984.13..241822.98 rows=1 width=169) (actual 
time=667.608..3275.037 rows=6 loops=1)  
  ║
║   ->  Nested Loop  (cost=22983.70..241822.50 rows=1 width=154) (actual 
time=667.587..3274.972 rows=6 loops=1)  
║
║ ->  Merge Semi Join  (cost=22983.27..241819.04 rows=1 width=87) 
(actual time=667.559..3274.877 rows=6 loops=1)  
   ║
║   Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text) 

 ║
║   ->  Index Scan using term_term_idx on term t  
(cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 
loops=1) ║
║ Filter: (((facttablename)::text = 
'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text))
 ║
║ Rows Removed by Filter: 3874190   

 ║
║   ->  Materialize  (cost=0.43..399167.10 rows=3548798 width=7) 
(actual time=0.086..295.708 rows=184791 loops=1)
║
║ ->  Nested Loop Semi Join  (cost=0.43..390295.10 
rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1)  
  ║
║   Join Filter: ((f.thema)::text = ('E'::text))

 ║
║   ->  Index Scan using 
facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f  
(cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 
rows=184791 loops=1) ║
║   ->  Materialize  (cost=0.00..0.03 rows=1 width=32) 
(actual time=0.000..0.000 rows=1 loops=184791)  
  ║
║ ->  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.001..0.001 rows=1 loops=1)   
  ║
║ ->  Index Scan using concept_pkey on concept c  (cost=0.43..3.45 
rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6)  
  ║
║   Index Cond: (id = t.concept_id) 

 ║
║   ->  Index Scan using relation_child_idx on relation r  (cost=0.43..0.47 
rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6)  
 ║
║ Index Cond: (child = c.id)

 ║
║ Planning time: 15.624 ms  

 ║
║ Execution time: 3275.341 ms   
 

Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Victor Blomqvist
On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane  wrote:

> Victor Blomqvist  writes:
> > We just had a major issue on our databases, after a index was replaced a
> > user defined function didnt change its query plan to use the new index.
>
> I'm suspicious that this is some variant of the problem discussed a couple
> days ago:
>
>
> http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com
>
> However, that theory requires that the index not have been immediately
> usable, which implies that it initially had some broken HOT chains,
> which really should not have happened if you were simply replacing one
> index with an identical one.  (The pre-existing index should've been
> enough to ensure HOT chain consistency for its columns.)
>
> Perhaps you were doing something "cute" like replacing a single-column
> index with a multi-column one?
>

No the new index looked exactly as the old one. The index was created with
CONCURRENTLY, and we waited until it returned.

I could use the index just fine when running a query in a separate
connection, so I am not sure if it helps dropping the old index in a
transaction and run a query to verify that the new index is usable? How can
I know that the new index is usable from already open connections?

Another thing to note about this index is that it sometimes has an
unexpected zero page inside, resulting in this error: ERROR:  index
"user_pictures_picture_dhash_idx" contains unexpected zero page at block
123780. But it always happens sporadically on hour read slaves, and
rerunning the query that cause it doesnt reproduce the error. I asked about
this problem before, here:
http://www.postgresql.org/message-id/flat/cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com#cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com



>
> regards, tom lane
>


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Tom Lane
Victor Blomqvist  writes:
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one.  (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Vitaly Burovoy
On 2/18/16, Victor Blomqvist  wrote:
> Hello!
>
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index. At
> least this is our theory, since the function in question became much slower
> and as a result brought our system to a halt.
>
> Basically it went:
> 1. create new index (a simple btree on a bigint column index)
> 2. drop old index
> 3. rename new index to old index name
> 4. analyze table
>
> After these steps normally our functions will update their plans and use
> the new index just fine. However this time the function (only one function
> use this particular index) seemed to take forever to complete. This is a
> 40GB table so querying for something not indexed would take a long time.
> Therefore my suspicion is that the function didnt start to use the new
> index.

My guess is that backends somehow cached generic plan[1] and didn't
recalculate it.

> Adding to the strangeness is that if I ran the function manually it was
> fast,

It is because _backends_ (processes) cache plans, not DBMS (i.e. they
are not shared).
So you connected to DB (making a new backend process), run the
function; backend could not find cached plan and create it for itself.
Plan for _your_ connection includes the new index.

> only when called from our application through pg_bouncer it was slow.
> I should also say that the function is only used on our 3 read slaves setup
> to our database.
>
> Things we tried to fix this:
> 1. Analyze table
> 2. Restart our application
> 3. Recreate the function
> 4. Kill the slow running queries with pg_cancel_backend()
>
> These things did not help.

Since pgbouncer reuses connections then backends processes still have
cached plan. If you tried pg_terminate_backend() it could help.

> Instead what helped in the end was to replace the function with an extra
> useless where clause (in the hope that it would force it to create a new
> plan)
>
> So, the function only have a single SELECT inside:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X
>   LIMIT 100 OFFSET 0;
>
> And this is my modification that made it work again:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X AND 1=1
>   LIMIT 100 OFFSET 0;

Yes, it is a new query for PG, and therefore it requires a new plan
because it is not in a cache.

> Obviously we are now worried why this happened

Also my guess you did CREATE INDEX CONCURRENTLY and there is several
cases[2] when it can not be used ("invalid" state or waiting for
unfinished transactions).
When the old index is dropped but the new index is not accessible
while a query/function is running a generated plan does not include
that index. And in case of caching such plan is caching and uses later
without index too.

> and how we can avoid it in
> the future? We run Postgres 9.3 on CentOS 6.

Firstly you can drop the old index not immediately but a little later
depending on yours queries time.
Also after creating the new index (and possible waiting a little) you
can drop index in a _transaction_ and see whether the new index is
used in an EXPLAIN of any query that use it or not. In the first case
do COMMIT, in the second case just do ROLLBACK and leave old index for
using.

> Thanks!
> Victor

[1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
[2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Victor Blomqvist
The end goal is to get rid of index bloat. If there is a better way to
handle this Im all ears!

/Victor

On Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov  wrote:

>
>
> On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist  wrote:
>
>> Hello!
>>
>> We just had a major issue on our databases, after a index was replaced a
>> user defined function didnt change its query plan to use the new index. At
>> least this is our theory, since the function in question became much slower
>> and as a result brought our system to a halt.
>>
>> Basically it went:
>> 1. create new index (a simple btree on a bigint column index)
>> 2. drop old index
>> 3. rename new index to old index name
>>
>
> why do you did this !?
>
>
>> 3. analyze table
>>
>> After these steps normally our functions will update their plans and use
>> the new index just fine. However this time the function (only one function
>> use this particular index) seemed to take forever to complete. This is a
>> 40GB table so querying for something not indexed would take a long time.
>> Therefor my suspicion is that the function didnt start to use the new index.
>>
>> Adding to the strangeness is that if I ran the function manually it was
>> fast, only when called from our application through pg_bouncer it was slow.
>> I should also say that the function is only used on our 3 read slaves setup
>> to our database.
>>
>> Things we tried to fix this:
>> 1. Analyze table
>> 2. Restart our application
>> 3. Recreate the function
>> 4. Kill the slow running queries with pg_cancel_backend()
>>
>> These things did not help.
>>
>> Instead what helped in the end was to replace the function with an extra
>> useless where clause (in the hope that it would force it to create a new
>> plan)
>>
>> So, the function only have a single SELECT inside:
>> RETURN QUERY
>> SELECT * FROM table
>>   WHERE bigint_column = X
>>   LIMIT 100 OFFSET 0;
>>
>> And this is my modification that made it work again:
>> RETURN QUERY
>> SELECT * FROM table
>>   WHERE bigint_column = X AND 1=1
>>   LIMIT 100 OFFSET 0;
>>
>>
>> Obviously we are now worried why this happened and how we can avoid it in
>> the future? We run Postgres 9.3 on CentOS 6.
>>
>> Thanks!
>> Victor
>>
>
>


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Oleg Bartunov
On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist  wrote:

> Hello!
>
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index. At
> least this is our theory, since the function in question became much slower
> and as a result brought our system to a halt.
>
> Basically it went:
> 1. create new index (a simple btree on a bigint column index)
> 2. drop old index
> 3. rename new index to old index name
>

why do you did this !?


> 3. analyze table
>
> After these steps normally our functions will update their plans and use
> the new index just fine. However this time the function (only one function
> use this particular index) seemed to take forever to complete. This is a
> 40GB table so querying for something not indexed would take a long time.
> Therefor my suspicion is that the function didnt start to use the new index.
>
> Adding to the strangeness is that if I ran the function manually it was
> fast, only when called from our application through pg_bouncer it was slow.
> I should also say that the function is only used on our 3 read slaves setup
> to our database.
>
> Things we tried to fix this:
> 1. Analyze table
> 2. Restart our application
> 3. Recreate the function
> 4. Kill the slow running queries with pg_cancel_backend()
>
> These things did not help.
>
> Instead what helped in the end was to replace the function with an extra
> useless where clause (in the hope that it would force it to create a new
> plan)
>
> So, the function only have a single SELECT inside:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X
>   LIMIT 100 OFFSET 0;
>
> And this is my modification that made it work again:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X AND 1=1
>   LIMIT 100 OFFSET 0;
>
>
> Obviously we are now worried why this happened and how we can avoid it in
> the future? We run Postgres 9.3 on CentOS 6.
>
> Thanks!
> Victor
>


[GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Victor Blomqvist
Hello!

We just had a major issue on our databases, after a index was replaced a
user defined function didnt change its query plan to use the new index. At
least this is our theory, since the function in question became much slower
and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name
3. analyze table

After these steps normally our functions will update their plans and use
the new index just fine. However this time the function (only one function
use this particular index) seemed to take forever to complete. This is a
40GB table so querying for something not indexed would take a long time.
Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was
fast, only when called from our application through pg_bouncer it was slow.
I should also say that the function is only used on our 3 read slaves setup
to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra
useless where clause (in the hope that it would force it to create a new
plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
  WHERE bigint_column = X
  LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
  WHERE bigint_column = X AND 1=1
  LIMIT 100 OFFSET 0;


Obviously we are now worried why this happened and how we can avoid it in
the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Rodrigo Gonzalez
On Wed, 18 Sep 2013 16:24:16 +0100
Tim Kane  wrote:

> 
> 
> On 18/09/2013 14:44, "Rowan Collins"  wrote:
> 
> >
> >Running an EXPLAIN on the exact query should give you the plan. An
> >EXPLAIN ANALYZE would have to wait for the query to complete either
> >way, so you wouldn't be able to get it mid-way through a running
> >process.
> >
> >If you don't know the exact query running, then if the
> >stats_command_string config setting is on, you should be able to get
> >it by querying the pg_stat_activity view.
> >
> >The only other problem I can think of is if there are temporary
> >tables or other session-specific objects that would exist only
> >within the running process.
> >
> >Another possibility to consider (which would also show up in
> >pg_stat_activity or similar views) is that the query is waiting on
> >some kind of lock, rather than just executing slowly.
> 
> 
> I think the OP was hoping for a solution that would allow him to
> retrieve the query plan that was generated at execution time.  I've
> certainly wished for this type of facility in the past, in the
> scenario where a given table may have been ANALYZE in the interim -
> making it difficult to perform any useful diagnostics on the problem
> query.
> 

Maybe auto-explain is the solution?

http://www.postgresql.org/docs/9.1/static/auto-explain.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Tim Kane


On 18/09/2013 14:44, "Rowan Collins"  wrote:

>
>Running an EXPLAIN on the exact query should give you the plan. An
>EXPLAIN ANALYZE would have to wait for the query to complete either way,
>so you wouldn't be able to get it mid-way through a running process.
>
>If you don't know the exact query running, then if the
>stats_command_string config setting is on, you should be able to get it
>by querying the pg_stat_activity view.
>
>The only other problem I can think of is if there are temporary tables
>or other session-specific objects that would exist only within the
>running process.
>
>Another possibility to consider (which would also show up in
>pg_stat_activity or similar views) is that the query is waiting on some
>kind of lock, rather than just executing slowly.


I think the OP was hoping for a solution that would allow him to retrieve
the query plan that was generated at execution time.  I've certainly
wished for this type of facility in the past, in the scenario where a
given table may have been ANALYZE in the interim - making it difficult to
perform any useful diagnostics on the problem query.

In cases such as that, it is sometimes better to cancel the currently
running query and restart it in order to take advantage of the newer
statistics and generate a better QEP.


What might be a useful feature addition, is if at the time of the ANALYZE
operation - postgres could identify those currently running queries and
flag them as having begun execution with a dirty/stale QEP.  Possibly this
could be extended to heuristically determine if a query might return
faster if it were cancelled and re-executed under a new QEP?

Tim





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Rowan Collins

François Beausoleil wrote (on 18/09/2013):

Hi!

Is it possible to get the plan of a query that's currently running? I have 
queries which normally take about 15 minutes, but are now at 2+ hours. I forgot 
to add the code to dump the plan to STDOUT, so I really don't know.

It might be data volume, but I want to confirm that the plan isn't bogus.

I'm doing batch imports of data on "PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, 
compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit".

Thanks,
François Beausoleil



Running an EXPLAIN on the exact query should give you the plan. An 
EXPLAIN ANALYZE would have to wait for the query to complete either way, 
so you wouldn't be able to get it mid-way through a running process.


If you don't know the exact query running, then if the 
stats_command_string config setting is on, you should be able to get it 
by querying the pg_stat_activity view.


The only other problem I can think of is if there are temporary tables 
or other session-specific objects that would exist only within the 
running process.


Another possibility to consider (which would also show up in 
pg_stat_activity or similar views) is that the query is waiting on some 
kind of lock, rather than just executing slowly.


Regards,
--
Rowan Collins
[IMSoP]


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query plan for currently executing query?

2013-09-18 Thread François Beausoleil
Hi!

Is it possible to get the plan of a query that's currently running? I have 
queries which normally take about 15 minutes, but are now at 2+ hours. I forgot 
to add the code to dump the plan to STDOUT, so I really don't know.

It might be data volume, but I want to confirm that the plan isn't bogus.

I'm doing batch imports of data on "PostgreSQL 9.1.9 on 
x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 
64-bit".

Thanks,
François Beausoleil



smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] Query plan different depending on the value of where filter

2013-07-17 Thread Tom Lane
"Looby, Denis"  writes:
> What I don't understand is why the plan is different just because the 
> group_id = has changed value?
> Does the planner have some statistical info on the contents of non-indexed 
> rows?

Of course.  In this case it knows that a nestloop would be a loser
because many rows would be fetched from the outer table.

> I don't quite understand why this plan executes the sequential scan once, 
> whereas the slow one does it 5001 times, which I believe is the main source 
> of the difference.

In the hash join case, it builds the hash table, discovers that it's
empty (because there are no rows with type = 2), and concludes that it
need not scan the outer relation.  This is a corner case that the
planner doesn't try to account for because it couldn't be predicted
reliably.  I wouldn't get too excited about it if I were you, because
presumably the case isn't going to happen all that much in production
either (else why are you bothering with the query?).

Personally I'd suggest getting rid of the
WHERE outer_tab.outer_key = inner_tab_1.key_to
clause, which is entirely redundant with the IN condition and is
preventing the planner from seeing that it could optimize the IN
into a semijoin.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query plan different depending on the value of where filter

2013-07-17 Thread Looby, Denis
Hi All,

I have an interesting query scenario I'm trying to understand.
I came across this while investigating a slow query in our application.
I've been able to reproduce the scenario in a psql script that sets up the 
tables and runs the queries.
Script here http://pastebin.com/CBkdDmWp if anyone is interested.

This is the scenario.
Version is "PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 32-bit"
Three tables
Outer_tab  : 5000 rows
Inner_tab_1 : 1000 rows
Inner_tab_2 : 16000 rows

This is the query
SELECT outer_tab.outer_key
  FROM outer_tab
WHERE outer_tab.outer_key IN (
   SELECT inner_tab_1.key_to
 FROM inner_tab_2
INNER JOIN inner_tab_1 ON (inner_tab_2.outer_key = 
inner_tab_1.key_from AND inner_tab_1.type = 2)
WHERE outer_tab.outer_key = inner_tab_1.key_to AND 
inner_tab_2.group_id = 9
);

Two important things here, type = 2 does not occur in inner_tab_1 and group_id 
= 9 does not occur in inner_tab_2 and group_id is not indexed.
The result is 0 rows.
Now this is quite slow about 15 seconds on my machine.
Here is the explain plan http://explain.depesz.com/s/BVg
I understand that the seq scan on inner_tab_2 and its 16000 rows is the culprit 
and the easy fix is to swap inner_tab_2 and inner_tab_1 between the FROM and 
the JOIN.
This lets it drive off inner_tab_1 with an index scan and skip the sequential 
scan as seen here http://explain.depesz.com/s/pkG
Much better at 14ms.

That's fine but what has me somewhat confused is if group_id in the WHERE is 
changed to 1, which does exist in inner_tab_2, we get quite a different plan.
http://explain.depesz.com/s/FX4
It's quick too 63ms

What I don't understand is why the plan is different just because the group_id 
= has changed value?
Does the planner have some statistical info on the contents of non-indexed rows?
I don't quite understand why this plan executes the sequential scan once, 
whereas the slow one does it 5001 times, which I believe is the main source of 
the difference.

Also if I don't ANALYZE  the tables the original query will run in a few ms 
instead of 15 seconds, it actually uses the same query plan that swapping the 
tables creates.
So it runs the index scan on inner_tab_1 first.
It's a bit surprising that with ANALYSE it picks a plan that is so much worse.
Any one able to shed some light?

Thanks for your time,

Denis Looby



Re: [GENERAL] Query plan optimization: sorting vs. partitioning

2011-02-02 Thread Sergey Zaharchenko
2011/2/2, Tom Lane :

>> I see the database doesn't understand that there are no entries in the
>> main table, so it has to assume the Append data is not ordered. Is
>> there a way to avoid sorting?
>
> No.  In existing releases there is no plan type that can produce
> presorted output from an append relation (ie, an inheritance tree).
> 9.1 will be able to do that, but it wasn't exactly a small fix:
> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8

OK, I hope I'll be able to come up with a stored procedure to query
the tables directly, then. Thanks!

-- 
DoubleF

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan optimization: sorting vs. partitioning

2011-02-02 Thread Tom Lane
Sergey Zaharchenko  writes:
> I need to select some data in the time order. When I query a separate
> smaller table, the index is used an no sorting is needed. However,
> when I query the main table, it occurs:
> ...
>   ->  Sort ...
>  Sort Key: ...
>  Sort Method: ...
>  ->  Result ...
>->  Append ...
>  ->  Seq Scan on states
>Filter: ...
>  ->  Seq Scan on states_20101206
>Filter: ...
> ...

> I see the database doesn't understand that there are no entries in the
> main table, so it has to assume the Append data is not ordered. Is
> there a way to avoid sorting?

No.  In existing releases there is no plan type that can produce
presorted output from an append relation (ie, an inheritance tree).
9.1 will be able to do that, but it wasn't exactly a small fix:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query plan optimization: sorting vs. partitioning

2011-02-02 Thread Sergey Zaharchenko
Hello list,

I have a large time-indexed table (states) partitioned into several
tables based on the date. The smaller tables are clustered by their
time indices.The main table is empty.

I need to select some data in the time order. When I query a separate
smaller table, the index is used an no sorting is needed. However,
when I query the main table, it occurs:
...
  ->  Sort ...
 Sort Key: ...
 Sort Method: ...
 ->  Result ...
   ->  Append ...
 ->  Seq Scan on states
   Filter: ...
 ->  Seq Scan on states_20101206
   Filter: ...
...

I see the database doesn't understand that there are no entries in the
main table, so it has to assume the Append data is not ordered. Is
there a way to avoid sorting?

Please CC me as I'm not on the list. Thanks in advance,

-- 
DoubleF

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-20 Thread Yaroslav Tykhiy

On 14/09/2010, at 10:37 AM, Yaroslav Tykhiy wrote:


On 14/09/2010, at 12:41 AM, Tom Lane wrote:


Yaroslav Tykhiy  writes:

[...]


I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
"substring"(v.headervalue, 0,  255) ~~* '%<@mail.gmail.com>%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.


Well, that substring() and ILIKE combo looked suspicious to me,  
too.  However, there already was an index on  
substring(v.headervalue, 0, 255) but the fast query plan didn't seem  
to use it, it used a different index instead:



[...]
Meanwhile, a mate of mine lurking on this list pointed out that  
reducing random_page_cost might help here and it did:  
random_page_cost of 2 made the fast query favourable.


Can it mean that the default planner configuration slightly  
overfavours seq scans?




Funnily, after a few days of running with random_page_cost=2, exactly  
the same query became slow again and I had to reduce random_page_cost  
further to 1.5 to make it fast.  Can it be a sign of a problem in the  
planner?


Thanks!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Yaroslav Tykhiy

Hi Tom,

On 14/09/2010, at 12:41 AM, Tom Lane wrote:


Yaroslav Tykhiy  writes:

 ->  Bitmap Heap Scan on dbmail_headervalue v
(cost=1409.82..221813.70 rows=2805 width=16) (actual
time=28543.411..28623.623 rows=1 loops=1)
   Recheck Cond: (v.headername_id = n.id)
   Filter: ("substring"(v.headervalue, 0,
255) ~~* '%<@mail.gmail.com>%'::text)
   ->  Bitmap Index Scan on
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)
(actual time=17555.572..17555.572 rows=1877009 loops=1)
 Index Cond: (v.headername_id = n.id)


I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
"substring"(v.headervalue, 0,  255) ~~* '%<@mail.gmail.com>%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.


Well, that substring() and ILIKE combo looked suspicious to me, too.   
However, there already was an index on substring(v.headervalue, 0,  
255) but the fast query plan didn't seem to use it, it used a  
different index instead:


mail=# \d dbmail_headervalue
  Table "public.dbmail_headervalue"
 Column |  Type  | Modifiers
+ 
+---

 headername_id  | bigint | not null
 physmessage_id | bigint | not null
 id | bigint | not null default  
nextval('dbmail_headervalue_idnr_seq'::regclass)

 headervalue| text   | not null default ''::text
Indexes:
"dbmail_headervalue_pkey" PRIMARY KEY, btree (id)
"dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id)
"dbmail_headervalue_2" btree (physmessage_id)
"dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255))
...
EXPLAIN ANALYZE...
-> Index Scan using dbmail_headervalue_2 on  
dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual  
time=0.028..0.029 rows=0 loops=358)
  Index Cond: (v.physmessage_id =  
m.physmessage_id)
  Filter: ("substring"(v.headervalue, 0, 255)  
~~* '%<@mail.gmail.com>%'::text)

...

Meanwhile, a mate of mine lurking on this list pointed out that  
reducing random_page_cost might help here and it did: random_page_cost  
of 2 made the fast query favourable.


Can it mean that the default planner configuration slightly  
overfavours seq scans?


Thank you all guys!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Tom Lane
Yaroslav Tykhiy  writes:
>   ->  Bitmap Heap Scan on dbmail_headervalue v   
> (cost=1409.82..221813.70 rows=2805 width=16) (actual  
> time=28543.411..28623.623 rows=1 loops=1)
> Recheck Cond: (v.headername_id = n.id)
> Filter: ("substring"(v.headervalue, 0,  
> 255) ~~* '%<@mail.gmail.com>%'::text)
> ->  Bitmap Index Scan on  
> dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)  
> (actual time=17555.572..17555.572 rows=1877009 loops=1)
>   Index Cond: (v.headername_id = n.id)

I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
"substring"(v.headervalue, 0,  255) ~~* '%<@mail.gmail.com>%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Yaroslav Tykhiy

Hi Martin,

Thank you for your response!

On 13/09/2010, at 10:49 AM, Martin Gainty wrote:

a cursory look of the plan details a FTS on dbmail_headername  
invoked by the JOIN clause

JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both  
participating columns

v.headername_id
n.id


Granted, there was no index on v.headername_id but creating one just  
slowed the query down, with a different plan:


CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue  
(headername_id);

EXPLAIN ANALYSE ...
   QUERY 
 PLAN


 Sort  (cost=222020.81..222020.81 rows=1 width=8) (actual  
time=28636.426..28636.426 rows=0 loops=1)

   Sort Key: m.message_idnr
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=1409.82..222020.80 rows=1 width=8) (actual  
time=28636.409..28636.409 rows=0 loops=1)
 ->  Nested Loop  (cost=1409.82..222012.27 rows=1 width=24)  
(actual time=28636.405..28636.405 rows=0 loops=1)
   ->  Nested Loop  (cost=1409.82..221959.94 rows=6  
width=8) (actual time=28543.441..28624.750 rows=1 loops=1)
 ->  Seq Scan on dbmail_headername n   
(cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1  
loops=1)
   Filter: ((headername)::text ~~* 'MESSAGE- 
ID'::text)
 ->  Bitmap Heap Scan on dbmail_headervalue v   
(cost=1409.82..221813.70 rows=2805 width=16) (actual  
time=28543.411..28623.623 rows=1 loops=1)

   Recheck Cond: (v.headername_id = n.id)
   Filter: ("substring"(v.headervalue, 0,  
255) ~~* '%<@mail.gmail.com>%'::text)
   ->  Bitmap Index Scan on  
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)  
(actual time=17555.572..17555.572 rows=1877009 loops=1)

 Index Cond: (v.headername_id = n.id)
   ->  Index Scan using dbmail_messages_physmessage_idx  
on dbmail_messages m  (cost=0.00..8.71 rows=1 width=16) (actual  
time=11.646..11.646 rows=0 loops=1)

 Index Cond: (m.physmessage_id = v.physmessage_id)
 Filter: ((m.status = ANY ('{0,1}'::integer[]))  
AND (m.mailbox_idnr = 12345))
 ->  Index Scan using dbmail_physmessage_pkey on  
dbmail_physmessage p  (cost=0.00..8.52 rows=1 width=8) (never executed)

   Index Cond: (p.id = m.physmessage_id)
 Total runtime: 28636.517 ms
(19 rows)


I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id


Both columns already indexed here:

On public.dbmail_headervalue (alias v):
"dbmail_headervalue_2" btree (physmessage_id)

On public.dbmail_physmessage (alias p):
"dbmail_physmessage_pkey" PRIMARY KEY, btree (id)

Perhaps I should provide some data on the table sizes.

dbmail_headervalue is the largest table with respect to its record  
count: 36 million records.

dbmail_headername is small: 5640 records.
dbmail_physmessage and dbmail_messages are of an average size: ~2  
million records each.


Sorry for my cluelessness on this issue.  But Postgresql's ability to  
build a fast query plan for this query type at least occasionally is  
encouraging. :-)


Yar

the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id  
uses indexed for both participants


Martin
__
Verzicht und Vertraulichkeitanmerkung

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene  
Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede  
unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig.  
Diese Nachricht dient lediglich dem Austausch von Informationen und  
entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten  
Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den  
Inhalt uebernehmen.




From: y...@barnet.com.au
To: pgsql-general@postgresql.org
Subject: [GENERAL] Query plan choice issue
Date: Mon, 13 Sep 2010 09:36:35 +1000

Hi all,

I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
be executed using either of two different query plans, one taking a
few milliseconds and the other, tens of seconds. The work_mem setting
doesn't seem to affect it -- tried to increase or decrease it by 2 or
4 times, but it didn't seem to favour the fast plan choice. Honestly,
I have no idea what affects the plan choice, but I saw Postgresql
change it at random.

The query in question looks like this -- sorry, it's 

Re: [GENERAL] Query plan choice issue

2010-09-12 Thread Martin Gainty

a cursory look of the plan details a FTS on dbmail_headername invoked by the 
JOIN clause
JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both 
participating columns
v.headername_id
n.id

I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id
 
the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed 
for both participants

Martin 
__ 
Verzicht und Vertraulichkeitanmerkung

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger 
sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung 
oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem 
Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. 
Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung 
fuer den Inhalt uebernehmen.
 

> From: y...@barnet.com.au
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query plan choice issue
> Date: Mon, 13 Sep 2010 09:36:35 +1000
> 
> Hi all,
> 
> I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can 
> be executed using either of two different query plans, one taking a 
> few milliseconds and the other, tens of seconds. The work_mem setting 
> doesn't seem to affect it -- tried to increase or decrease it by 2 or 
> 4 times, but it didn't seem to favour the fast plan choice. Honestly, 
> I have no idea what affects the plan choice, but I saw Postgresql 
> change it at random.
> 
> The query in question looks like this -- sorry, it's rather complex:
> 
> SELECT message_idnr
> FROM dbmail_messages m
> JOIN dbmail_physmessage p ON m.physmessage_id=p.id
> JOIN dbmail_headervalue v ON v.physmessage_id=p.id
> JOIN dbmail_headername n ON v.headername_id=n.id
> WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
> headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
> ILIKE '%<@mail.gmail.com>%'
> ORDER BY message_idnr;
> 
> It comes from DBMail. That said, I don't think DBMail can be blamed 
> here because at least sometimes Postgresql is able to do the right 
> thing with respect to that query.
> 
> Here is the slow plan:
> QUERY 
> PLAN
> --
> Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual 
> time=20933.166..20933.168 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual 
> time=3327.658..20933.122 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) 
> (actual time=3327.599..20933.056 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) 
> (actual time=3327.503..20932.944 rows=1 loops=1)
> Join Filter: (v.headername_id = n.id)
> -> Seq Scan on dbmail_headername n 
> (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 
> loops=1)
> Filter: ((headername)::text ~~* 'MESSAGE- 
> ID'::text)
> -> Seq Scan on dbmail_headervalue v 
> (cost=0.00..1234407.96 rows=2666 width=16) (actual 
> time=3327.465..20931.942 rows=1 loops=1)
> Filter: ("substring"(v.headervalue, 0, 
> 255) ~~* '%<@mail.gmail.com>%'::text)
> -> Index Scan using dbmail_messages_physmessage_idx on 
> dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual 
> time=0.088..0.100 rows=1 loops=1)
> Index Cond: (m.physmessage_id = v.physmessage_id)
> Filter: ((m.status = ANY ('{0,1}'::integer[])) 
> AND (m.mailbox_idnr = 12345))
> -> Index Scan using dbmail_physmessage_pkey on 
> dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual 
> time=0.048..0.050 rows=1 loops=1)
> Index Cond: (p.id = m.physmessage_id)
> Total runtime: 20933.241 ms
> (17 rows)
> 
> And here is the fast plan:
> QUERY 
> PLAN
> ---
> Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual 
> time=14.163..14.165 rows=1 loops=1)
> Sort Key: m.message_idnr
> Sort Method: quicksort Memory: 25kB
> -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual 
> time=4.272..14.152 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) 
> (actual time=4.261..14.135 rows=1 loops=1)
> -> Nested Loop (cost=0.00..1175268.46 rows=1 w

[GENERAL] Query plan choice issue

2010-09-12 Thread Yaroslav Tykhiy

Hi all,

I'm seeing a funny behaviour in Postgresql 8.4.4.  Namely, a query can  
be executed using either of two different query plans, one taking a  
few milliseconds and the other, tens of seconds.  The work_mem setting  
doesn't seem to affect it -- tried to increase or decrease it by 2 or  
4 times, but it didn't seem to favour the fast plan choice.  Honestly,  
I have no idea what affects the plan choice, but I saw Postgresql  
change it at random.


The query in question looks like this -- sorry, it's rather complex:

SELECT message_idnr
FROM dbmail_messages m
JOIN dbmail_physmessage p ON m.physmessage_id=p.id
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
JOIN dbmail_headername n ON v.headername_id=n.id
WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
ILIKE '%<@mail.gmail.com>%'
ORDER BY message_idnr;

It comes from DBMail.  That said, I don't think DBMail can be blamed  
here because at least sometimes Postgresql is able to do the right  
thing with respect to that query.


Here is the slow plan:
  QUERY 
 PLAN

--
 Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual  
time=20933.166..20933.168 rows=1 loops=1)

   Sort Key: m.message_idnr
   Sort Method: quicksort Memory: 25kB
   -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual  
time=3327.658..20933.122 rows=1 loops=1)
 -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24)  
(actual time=3327.599..20933.056 rows=1 loops=1)
   -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8)  
(actual time=3327.503..20932.944 rows=1 loops=1)

 Join Filter: (v.headername_id = n.id)
 -> Seq Scan on dbmail_headername n  
(cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1  
loops=1)
   Filter: ((headername)::text ~~* 'MESSAGE- 
ID'::text)
 -> Seq Scan on dbmail_headervalue v  
(cost=0.00..1234407.96 rows=2666 width=16) (actual  
time=3327.465..20931.942 rows=1 loops=1)
   Filter: ("substring"(v.headervalue, 0,  
255) ~~* '%<@mail.gmail.com>%'::text)
   -> Index Scan using dbmail_messages_physmessage_idx on  
dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual  
time=0.088..0.100 rows=1 loops=1)

 Index Cond: (m.physmessage_id = v.physmessage_id)
 Filter: ((m.status = ANY ('{0,1}'::integer[]))  
AND (m.mailbox_idnr = 12345))
 -> Index Scan using dbmail_physmessage_pkey on  
dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual  
time=0.048..0.050 rows=1 loops=1)

   Index Cond: (p.id = m.physmessage_id)
 Total runtime: 20933.241 ms
(17 rows)

And here is the fast plan:
  QUERY 
 PLAN

---
 Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual  
time=14.163..14.165 rows=1 loops=1)

   Sort Key: m.message_idnr
   Sort Method: quicksort Memory: 25kB
   -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual  
time=4.272..14.152 rows=1 loops=1)
 -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24)  
(actual time=4.261..14.135 rows=1 loops=1)
   -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32)  
(actual time=4.249..14.117 rows=1 loops=1)
 -> Index Scan using dbmail_messages_mailbox_idx  
on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual  
time=0.043..2.810 rows=358 loops=1)

   Index Cond: (mailbox_idnr = 12345)
   Filter: (status = ANY ('{0,1}'::integer[]))
 -> Index Scan using dbmail_headervalue_2 on  
dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual  
time=0.028..0.029 rows=0 loops=358)
   Index Cond: (v.physmessage_id =  
m.physmessage_id)
   Filter: ("substring"(v.headervalue, 0,  
255) ~~* '%<@mail.gmail.com>%'::text)
   -> Index Scan using dbmail_headername_pkey on  
dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual  
time=0.007..0.008 rows=1 loops=1)

 Index Cond: (n.id = v.headername_id)
 Filter: ((n.headername)::text ~~* 'MESSAGE- 
ID'::text)
 -> Index Scan using dbmail_physmessage_pkey on  
dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual  
time=0.006..0.007 rows=1 loops=1)

   Index Cond: (p.id = m.physmessage_id)
 Total runtime: 14.231 ms
(18 row

Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Tom Lane
Albert Vernon Smith <[EMAIL PROTECTED]> writes:
> I just re-checked, and all the same index definitions are on both  
> tables.

Well, that seems to imply that the 8.0 planner is missing the plan that
7.4 finds ... which is both hard to believe and impossible to
investigate with this much info.  Would it be possible to get a copy of
the database?  Or at least a schema dump?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Albert Vernon Smith
I just re-checked, and all the same index definitions are on both  
tables.


Cheers,
-albert

On 17.6.2005, at 14:20, Tom Lane wrote:


Albert Vernon Smith <[EMAIL PROTECTED]> writes:


I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3.  I
dumped and reloaded the data, and things look good, until I try to
work with it.  When I try to make a query to a view based on a large
join (select * from 'view' limit 10), the 7.4.3 query works very
well, and the indexes are being used well.  But, when I go to 8.0.3,
there are some very costly joins, and the query is not workable.



Are you sure you have all the same indexes in the 8.0 database?  It
looks like one might be missing on assay2rs.assay_id.

regards, tom lane




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Tom Lane
Albert Vernon Smith <[EMAIL PROTECTED]> writes:
> I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3.  I  
> dumped and reloaded the data, and things look good, until I try to  
> work with it.  When I try to make a query to a view based on a large  
> join (select * from 'view' limit 10), the 7.4.3 query works very  
> well, and the indexes are being used well.  But, when I go to 8.0.3,  
> there are some very costly joins, and the query is not workable.   

Are you sure you have all the same indexes in the 8.0 database?  It
looks like one might be missing on assay2rs.assay_id.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Albert Vernon Smith
I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3.  I  
dumped and reloaded the data, and things look good, until I try to  
work with it.  When I try to make a query to a view based on a large  
join (select * from 'view' limit 10), the 7.4.3 query works very  
well, and the indexes are being used well.  But, when I go to 8.0.3,  
there are some very costly joins, and the query is not workable.   
(I've put the 'EXPLAIN' results below for the same database, and the  
same query.)  Also, the memory settings have been made the same for  
7.4 vs. 8.0.


I should point out that some of the indexes (and joins) are based on  
the type 'citext'.  (http://gborg.postgresql.org/project/citext/ 
projdisplay.php).   I did compile and add this to my Pg 8  
installation, but I am concerned that this might be the root  
problem.  There seems to be no update on this work since sometime in  
2003, and it might not be appropriate to use in 8.0.3.  But, it seems  
that indexes are being used when I do searches on the individual  
tables.  So, I'm a bit stumped at the moment.


So, I'd like to ask advice on my query, and what might be the cause.   
If 'citext' is the culprit, how can I migrate a very large database  
which uses this datatype extensively.


Any help is greatly appreciated.

Thanks,
-albert

===
In Pg 8.0.3--

EXPLAIN select * from snp_quality_overview limit 10;

QUERY PLAN
 
 
-

Limit  (cost=83734.45..84167.07 rows=10 width=415)
   ->  Hash Join  (cost=83734.45..10949200.55 rows=251154 width=415)
 Hash Cond: ("outer".panel_id = "inner".panel_id)
 ->  Nested Loop  (cost=83733.39..10924084.09 rows=251154  
width=359)
   ->  Nested Loop  (cost=83733.39..5685331.85  
rows=253240 width=324)
 ->  Hash Join  (cost=83733.39..3840930.37  
rows=423746 width=291)
   Hash Cond: ("outer".assay_id =  
"inner".assay_id)
   ->  Nested Loop  (cost=862.47..3680521.72  
rows=602829 width=259)
 ->  Hash Join   
(cost=862.47..258437.41 rows=602829 width=195)
   Hash Cond:  
("outer".genotype_submission_id = "inner".genotype_submission_id)
   ->  Seq Scan on genotype_set  
gs  (cost=0.00..218413.11 rows=6626211 width=117)
   ->  Hash   
(cost=861.68..861.68 rows=316 width=94)
 ->  Nested Loop   
(cost=450.77..861.68 rows=316 width=94)
   Join Filter:  
("inner".lsid = "outer".prot_lsid)
   ->  Merge Join   
(cost=446.02..671.68 rows=358 width=83)
 Merge Cond:  
("outer".batch_file_id = "inner".batch_file_id)
 ->  Index  
Scan using batch_file_id3 on genotype_submission gsb   
(cost=0.00..210.81 rows=3472 width=75)
 ->  Sort   
(cost=446.02..447.31 rows=517 width=20)
   Sort  
Key: batch_file.batch_file_id
   ->   
Merge Join  (cost=23.66..422.72 rows=517 width=20)
  
Merge Cond: ("outer".monthly_batch_id = "inner".monthly_batch_id)
  
Join Filter: ("outer".center_name = "inner".center_name)
  
->  Index Scan using monthly_batch_id on batch_file   
(cost=0.00..298.59 rows=5024 width=24)
  
->  Sort  (cost=23.66..24.02 rows=145 width=19)

Sort Key: monthly_batch.monthly_batch_id

->  Seq Scan on monthly_batch  (cost=0.00..18.45 rows=145 width=19)
   ->  Materialize   
(cost=4.74..4.97 rows=23 width=72)
 ->  Hash  
Join  (cost=1.10..4.72 rows=23 width=72)
   Hash  
Cond: ("outer".genotyping_platform_id = "inner".genotyping_platform_id)
   ->   
Seq Scan on protocol p  (cost=0.00..3.26 rows=26 width=65)
   ->   
Hash  (cost=1.08..1.08 rows=8

Re: [GENERAL] Query plan question

2003-06-26 Thread Maksim Likharev
Thanks Tom,
works, have to test performance


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2003 7:36 AM
To: Maksim Likharev
Cc: GENERAL
Subject: Re: [GENERAL] Query plan question 


"Maksim Likharev" <[EMAIL PROTECTED]> writes:
> basically I complaining that PG does not do what I told to do or 
> was hoping to do.

Okay, now I get the point: you want to prevent the "pt" sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

...
FROM prod.t_p AS p INNER JOIN t_temp AS t 
ON p.did = t.did 
LEFT OUTER JOIN prod.t_pinv AS pi
ON p.kid = pi.kid AND pi.orderid = 'S' 
-- hack to keep this separate from outer plan:
OFFSET 0
) AS pt

LEFT OUTER JOIN prod.t_dmp AS pdb
ON pt.kid = pdb.kid
...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Query plan question

2003-06-26 Thread Tom Lane
"Maksim Likharev" <[EMAIL PROTECTED]> writes:
> basically I complaining that PG does not do what I told to do or 
> was hoping to do.

Okay, now I get the point: you want to prevent the "pt" sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

...
FROM prod.t_p AS p INNER JOIN t_temp AS t 
ON p.did = t.did 
LEFT OUTER JOIN prod.t_pinv AS pi
ON p.kid = pi.kid AND pi.orderid = 'S' 
-- hack to keep this separate from outer plan:
OFFSET 0
) AS pt

LEFT OUTER JOIN prod.t_dmp AS pdb
ON pt.kid = pdb.kid
...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] Query plan: varchar vs char indexes

2001-02-03 Thread Tom Lane

"Richard Huxton" <[EMAIL PROTECTED]> writes:
> With indexed varchar fields the explain changes - performing a seq-scan on
> users rather than using the index.
> Is this because the estimator guesses costs differently for char vs varchar
> or is it because I'm not explicitly casting the id fields?

The estimator has no special ideas about either char or varchar.
However there are some discrepancies in the sets of available functions
for the two datatypes, so what appears to be the same expression may
translate into different function invocations --- especially if you are
doing random combinations of datatypes and expecting the system to pick
an operator for you.  I suspect it is picking a combination that doesn't
work out to be semantically equivalent to the '=' operator in the
index's operator class, so it doesn't think it can use the index.

> And yes - I know I probably shouldn't be joining an int to a varchar.

Not without being pretty darn careful.  You didn't actually say what the
datatype of tag_list.id is, however.

regards, tom lane



Re: [GENERAL] QUERY PLAN:

1999-10-27 Thread Ross J. Reedstrom

On Wed, Oct 27, 1999 at 11:03:45AM +0100, Tim Joyce wrote:
> can someone point me at documentation so that i can unserstand the results
> of an EXPLAIN, eg:
> 
> Index Scan using words_id_idx on books_idx  (cost=441.19 rows=7644 width=8)
> 
> explain select * from books_idx where wrd_id=1;
> 
> I am paticularly interested in what the rows= figure means.

I've picked up a little info on this from following the hackers list,
so I don't know what docs to point you at. My understanding is that the
rows= represents the number of tuples the optimizer estimates will be
returned by that step of the execution plan. In the example you've shown,
it's an index scan of a field, and the estimate (based on last know number
of tuples in the table (as of your last VACUUM ANALYZE) and an estimate
of the selectivity of the operator being applied to this index (<, =,
>, etc), and the approximate dispersion of the values in that field.

For more detail, I'd suggest checking the archives of the pgsql-hackers
list, and perhaps then asking on the list itself. There's some hints in the
"PostgreSQL Programmer's Guide" in the "Extending SQL" sections on functions,
operators, and interfacing them to indices.

Ross
-- 
Ross J. Reedstrom, Ph.D., <[EMAIL PROTECTED]> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005





[GENERAL] QUERY PLAN:

1999-10-27 Thread Tim Joyce

can someone point me at documentation so that i can unserstand the results
of an EXPLAIN, eg:

Index Scan using words_id_idx on books_idx  (cost=441.19 rows=7644 width=8)

explain select * from books_idx where wrd_id=1;

I am paticularly interested in what the rows= figure means.

TIA

timj