Re: [GENERAL] Query plan for Merge Semi Join

2017-11-01 Thread Laurenz Albe
Peter J. Holzer wrote:
> [PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian
> 4.9.2-10) 4.9.2, 64-bit]
> 
> While investigating a performance issue, I found this query plan:
> 
> wds=> explain analyze
>   select facttablename, columnname, term, concept_id, t.hidden, language, 
> register, c.id, canonicalname, description, parent, type, r.sortorder
>   from term t, concept c, relation r
>   where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen'
> and exists (select 1 from facttable_kon_eh f where 
> f.arbeitsvolumen=t.term  and thema in (values (E'E')))
> and c.id=concept_id and r.child=concept_id;
> ╔╗
> ║ 
>   QUERY PLAN  
>  ║
> ╟╢
> ║ Nested Loop  (cost=22984.13..241822.98 rows=1 width=169) (actual 
> time=667.608..3275.037 rows=6 loops=1)
> ║
> ║   ->  Nested Loop  (cost=22983.70..241822.50 rows=1 width=154) (actual 
> time=667.587..3274.972 rows=6 loops=1)
>   ║
> ║ ->  Merge Semi Join  (cost=22983.27..241819.04 rows=1 width=87) 
> (actual time=667.559..3274.877 rows=6 loops=1)
>  ║
> ║   Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text)   
>   
>  ║
> ║   ->  Index Scan using term_term_idx on term t  
> (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 
> loops=1) ║
> ║ Filter: (((facttablename)::text = 
> 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text))  
>║
> ║ Rows Removed by Filter: 3874190 
>   
>  ║
> ║   ->  Materialize  (cost=0.43..399167.10 rows=3548798 width=7) 
> (actual time=0.086..295.708 rows=184791 loops=1)  
>   ║
> ║ ->  Nested Loop Semi Join  (cost=0.43..390295.10 
> rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1)
> ║
> ║   Join Filter: ((f.thema)::text = ('E'::text))  
>   
>  ║
> ║   ->  Index Scan using 
> facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f  
> (cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 
> rows=184791 loops=1) ║
> ║   ->  Materialize  (cost=0.00..0.03 rows=1 
> width=32) (actual time=0.000..0.000 rows=1 loops=184791)  
>   ║
> ║ ->  Result  (cost=0.00..0.01 rows=1 
> width=0) (actual time=0.001..0.001 rows=1 loops=1)
>  ║
> ║ ->  Index Scan using concept_pkey on concept c  (cost=0.43..3.45 
> rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6)
> ║
> ║   Index Cond: (id = t.concept_id)   
>   
>  ║
> ║   ->  Index Scan using relation_child_idx on relation r  (cost=0.43..0.47 
> rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6)
>║
> ║ Index Cond: (child = c.id)  
>   
>  ║
> ║ Planning time: 15.624 ms
>  

Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Victor Blomqvist
On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane  wrote:

> Victor Blomqvist  writes:
> > We just had a major issue on our databases, after a index was replaced a
> > user defined function didnt change its query plan to use the new index.
>
> I'm suspicious that this is some variant of the problem discussed a couple
> days ago:
>
>
> http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com
>
> However, that theory requires that the index not have been immediately
> usable, which implies that it initially had some broken HOT chains,
> which really should not have happened if you were simply replacing one
> index with an identical one.  (The pre-existing index should've been
> enough to ensure HOT chain consistency for its columns.)
>
> Perhaps you were doing something "cute" like replacing a single-column
> index with a multi-column one?
>

No the new index looked exactly as the old one. The index was created with
CONCURRENTLY, and we waited until it returned.

I could use the index just fine when running a query in a separate
connection, so I am not sure if it helps dropping the old index in a
transaction and run a query to verify that the new index is usable? How can
I know that the new index is usable from already open connections?

Another thing to note about this index is that it sometimes has an
unexpected zero page inside, resulting in this error: ERROR:  index
"user_pictures_picture_dhash_idx" contains unexpected zero page at block
123780. But it always happens sporadically on hour read slaves, and
rerunning the query that cause it doesnt reproduce the error. I asked about
this problem before, here:
http://www.postgresql.org/message-id/flat/cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com#cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com



>
> regards, tom lane
>


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Tom Lane
Victor Blomqvist  writes:
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one.  (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Vitaly Burovoy
On 2/18/16, Victor Blomqvist  wrote:
> Hello!
>
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index. At
> least this is our theory, since the function in question became much slower
> and as a result brought our system to a halt.
>
> Basically it went:
> 1. create new index (a simple btree on a bigint column index)
> 2. drop old index
> 3. rename new index to old index name
> 4. analyze table
>
> After these steps normally our functions will update their plans and use
> the new index just fine. However this time the function (only one function
> use this particular index) seemed to take forever to complete. This is a
> 40GB table so querying for something not indexed would take a long time.
> Therefore my suspicion is that the function didnt start to use the new
> index.

My guess is that backends somehow cached generic plan[1] and didn't
recalculate it.

> Adding to the strangeness is that if I ran the function manually it was
> fast,

It is because _backends_ (processes) cache plans, not DBMS (i.e. they
are not shared).
So you connected to DB (making a new backend process), run the
function; backend could not find cached plan and create it for itself.
Plan for _your_ connection includes the new index.

> only when called from our application through pg_bouncer it was slow.
> I should also say that the function is only used on our 3 read slaves setup
> to our database.
>
> Things we tried to fix this:
> 1. Analyze table
> 2. Restart our application
> 3. Recreate the function
> 4. Kill the slow running queries with pg_cancel_backend()
>
> These things did not help.

Since pgbouncer reuses connections then backends processes still have
cached plan. If you tried pg_terminate_backend() it could help.

> Instead what helped in the end was to replace the function with an extra
> useless where clause (in the hope that it would force it to create a new
> plan)
>
> So, the function only have a single SELECT inside:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X
>   LIMIT 100 OFFSET 0;
>
> And this is my modification that made it work again:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X AND 1=1
>   LIMIT 100 OFFSET 0;

Yes, it is a new query for PG, and therefore it requires a new plan
because it is not in a cache.

> Obviously we are now worried why this happened

Also my guess you did CREATE INDEX CONCURRENTLY and there is several
cases[2] when it can not be used ("invalid" state or waiting for
unfinished transactions).
When the old index is dropped but the new index is not accessible
while a query/function is running a generated plan does not include
that index. And in case of caching such plan is caching and uses later
without index too.

> and how we can avoid it in
> the future? We run Postgres 9.3 on CentOS 6.

Firstly you can drop the old index not immediately but a little later
depending on yours queries time.
Also after creating the new index (and possible waiting a little) you
can drop index in a _transaction_ and see whether the new index is
used in an EXPLAIN of any query that use it or not. In the first case
do COMMIT, in the second case just do ROLLBACK and leave old index for
using.

> Thanks!
> Victor

[1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
[2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Victor Blomqvist
The end goal is to get rid of index bloat. If there is a better way to
handle this Im all ears!

/Victor

On Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov  wrote:

>
>
> On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist  wrote:
>
>> Hello!
>>
>> We just had a major issue on our databases, after a index was replaced a
>> user defined function didnt change its query plan to use the new index. At
>> least this is our theory, since the function in question became much slower
>> and as a result brought our system to a halt.
>>
>> Basically it went:
>> 1. create new index (a simple btree on a bigint column index)
>> 2. drop old index
>> 3. rename new index to old index name
>>
>
> why do you did this !?
>
>
>> 3. analyze table
>>
>> After these steps normally our functions will update their plans and use
>> the new index just fine. However this time the function (only one function
>> use this particular index) seemed to take forever to complete. This is a
>> 40GB table so querying for something not indexed would take a long time.
>> Therefor my suspicion is that the function didnt start to use the new index.
>>
>> Adding to the strangeness is that if I ran the function manually it was
>> fast, only when called from our application through pg_bouncer it was slow.
>> I should also say that the function is only used on our 3 read slaves setup
>> to our database.
>>
>> Things we tried to fix this:
>> 1. Analyze table
>> 2. Restart our application
>> 3. Recreate the function
>> 4. Kill the slow running queries with pg_cancel_backend()
>>
>> These things did not help.
>>
>> Instead what helped in the end was to replace the function with an extra
>> useless where clause (in the hope that it would force it to create a new
>> plan)
>>
>> So, the function only have a single SELECT inside:
>> RETURN QUERY
>> SELECT * FROM table
>>   WHERE bigint_column = X
>>   LIMIT 100 OFFSET 0;
>>
>> And this is my modification that made it work again:
>> RETURN QUERY
>> SELECT * FROM table
>>   WHERE bigint_column = X AND 1=1
>>   LIMIT 100 OFFSET 0;
>>
>>
>> Obviously we are now worried why this happened and how we can avoid it in
>> the future? We run Postgres 9.3 on CentOS 6.
>>
>> Thanks!
>> Victor
>>
>
>


Re: [GENERAL] Query plan not updated after dropped index

2016-02-18 Thread Oleg Bartunov
On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist  wrote:

> Hello!
>
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index. At
> least this is our theory, since the function in question became much slower
> and as a result brought our system to a halt.
>
> Basically it went:
> 1. create new index (a simple btree on a bigint column index)
> 2. drop old index
> 3. rename new index to old index name
>

why do you did this !?


> 3. analyze table
>
> After these steps normally our functions will update their plans and use
> the new index just fine. However this time the function (only one function
> use this particular index) seemed to take forever to complete. This is a
> 40GB table so querying for something not indexed would take a long time.
> Therefor my suspicion is that the function didnt start to use the new index.
>
> Adding to the strangeness is that if I ran the function manually it was
> fast, only when called from our application through pg_bouncer it was slow.
> I should also say that the function is only used on our 3 read slaves setup
> to our database.
>
> Things we tried to fix this:
> 1. Analyze table
> 2. Restart our application
> 3. Recreate the function
> 4. Kill the slow running queries with pg_cancel_backend()
>
> These things did not help.
>
> Instead what helped in the end was to replace the function with an extra
> useless where clause (in the hope that it would force it to create a new
> plan)
>
> So, the function only have a single SELECT inside:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X
>   LIMIT 100 OFFSET 0;
>
> And this is my modification that made it work again:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X AND 1=1
>   LIMIT 100 OFFSET 0;
>
>
> Obviously we are now worried why this happened and how we can avoid it in
> the future? We run Postgres 9.3 on CentOS 6.
>
> Thanks!
> Victor
>


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Rowan Collins

François Beausoleil wrote (on 18/09/2013):

Hi!

Is it possible to get the plan of a query that's currently running? I have 
queries which normally take about 15 minutes, but are now at 2+ hours. I forgot 
to add the code to dump the plan to STDOUT, so I really don't know.

It might be data volume, but I want to confirm that the plan isn't bogus.

I'm doing batch imports of data on PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, 
compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit.

Thanks,
François Beausoleil



Running an EXPLAIN on the exact query should give you the plan. An 
EXPLAIN ANALYZE would have to wait for the query to complete either way, 
so you wouldn't be able to get it mid-way through a running process.


If you don't know the exact query running, then if the 
stats_command_string config setting is on, you should be able to get it 
by querying the pg_stat_activity view.


The only other problem I can think of is if there are temporary tables 
or other session-specific objects that would exist only within the 
running process.


Another possibility to consider (which would also show up in 
pg_stat_activity or similar views) is that the query is waiting on some 
kind of lock, rather than just executing slowly.


Regards,
--
Rowan Collins
[IMSoP]


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Tim Kane


On 18/09/2013 14:44, Rowan Collins rowan.coll...@gmail.com wrote:


Running an EXPLAIN on the exact query should give you the plan. An
EXPLAIN ANALYZE would have to wait for the query to complete either way,
so you wouldn't be able to get it mid-way through a running process.

If you don't know the exact query running, then if the
stats_command_string config setting is on, you should be able to get it
by querying the pg_stat_activity view.

The only other problem I can think of is if there are temporary tables
or other session-specific objects that would exist only within the
running process.

Another possibility to consider (which would also show up in
pg_stat_activity or similar views) is that the query is waiting on some
kind of lock, rather than just executing slowly.


I think the OP was hoping for a solution that would allow him to retrieve
the query plan that was generated at execution time.  I've certainly
wished for this type of facility in the past, in the scenario where a
given table may have been ANALYZE in the interim - making it difficult to
perform any useful diagnostics on the problem query.

In cases such as that, it is sometimes better to cancel the currently
running query and restart it in order to take advantage of the newer
statistics and generate a better QEP.


What might be a useful feature addition, is if at the time of the ANALYZE
operation - postgres could identify those currently running queries and
flag them as having begun execution with a dirty/stale QEP.  Possibly this
could be extended to heuristically determine if a query might return
faster if it were cancelled and re-executed under a new QEP?

Tim





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan for currently executing query?

2013-09-18 Thread Rodrigo Gonzalez
On Wed, 18 Sep 2013 16:24:16 +0100
Tim Kane tim.k...@gmail.com wrote:

 
 
 On 18/09/2013 14:44, Rowan Collins rowan.coll...@gmail.com wrote:
 
 
 Running an EXPLAIN on the exact query should give you the plan. An
 EXPLAIN ANALYZE would have to wait for the query to complete either
 way, so you wouldn't be able to get it mid-way through a running
 process.
 
 If you don't know the exact query running, then if the
 stats_command_string config setting is on, you should be able to get
 it by querying the pg_stat_activity view.
 
 The only other problem I can think of is if there are temporary
 tables or other session-specific objects that would exist only
 within the running process.
 
 Another possibility to consider (which would also show up in
 pg_stat_activity or similar views) is that the query is waiting on
 some kind of lock, rather than just executing slowly.
 
 
 I think the OP was hoping for a solution that would allow him to
 retrieve the query plan that was generated at execution time.  I've
 certainly wished for this type of facility in the past, in the
 scenario where a given table may have been ANALYZE in the interim -
 making it difficult to perform any useful diagnostics on the problem
 query.
 

Maybe auto-explain is the solution?

http://www.postgresql.org/docs/9.1/static/auto-explain.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan different depending on the value of where filter

2013-07-17 Thread Tom Lane
Looby, Denis denis.lo...@hp.com writes:
 What I don't understand is why the plan is different just because the 
 group_id = has changed value?
 Does the planner have some statistical info on the contents of non-indexed 
 rows?

Of course.  In this case it knows that a nestloop would be a loser
because many rows would be fetched from the outer table.

 I don't quite understand why this plan executes the sequential scan once, 
 whereas the slow one does it 5001 times, which I believe is the main source 
 of the difference.

In the hash join case, it builds the hash table, discovers that it's
empty (because there are no rows with type = 2), and concludes that it
need not scan the outer relation.  This is a corner case that the
planner doesn't try to account for because it couldn't be predicted
reliably.  I wouldn't get too excited about it if I were you, because
presumably the case isn't going to happen all that much in production
either (else why are you bothering with the query?).

Personally I'd suggest getting rid of the
WHERE outer_tab.outer_key = inner_tab_1.key_to
clause, which is entirely redundant with the IN condition and is
preventing the planner from seeing that it could optimize the IN
into a semijoin.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan optimization: sorting vs. partitioning

2011-02-02 Thread Tom Lane
Sergey Zaharchenko doublef.mob...@gmail.com writes:
 I need to select some data in the time order. When I query a separate
 smaller table, the index is used an no sorting is needed. However,
 when I query the main table, it occurs:
 ...
   -  Sort ...
  Sort Key: ...
  Sort Method: ...
  -  Result ...
-  Append ...
  -  Seq Scan on states
Filter: ...
  -  Seq Scan on states_20101206
Filter: ...
 ...

 I see the database doesn't understand that there are no entries in the
 main table, so it has to assume the Append data is not ordered. Is
 there a way to avoid sorting?

No.  In existing releases there is no plan type that can produce
presorted output from an append relation (ie, an inheritance tree).
9.1 will be able to do that, but it wasn't exactly a small fix:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan optimization: sorting vs. partitioning

2011-02-02 Thread Sergey Zaharchenko
2011/2/2, Tom Lane t...@sss.pgh.pa.us:

 I see the database doesn't understand that there are no entries in the
 main table, so it has to assume the Append data is not ordered. Is
 there a way to avoid sorting?

 No.  In existing releases there is no plan type that can produce
 presorted output from an append relation (ie, an inheritance tree).
 9.1 will be able to do that, but it wasn't exactly a small fix:
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8

OK, I hope I'll be able to come up with a stored procedure to query
the tables directly, then. Thanks!

-- 
DoubleF

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-20 Thread Yaroslav Tykhiy

On 14/09/2010, at 10:37 AM, Yaroslav Tykhiy wrote:


On 14/09/2010, at 12:41 AM, Tom Lane wrote:


Yaroslav Tykhiy y...@barnet.com.au writes:

[...]


I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
substring(v.headervalue, 0,  255) ~~* '%@mail.gmail.com%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.


Well, that substring() and ILIKE combo looked suspicious to me,  
too.  However, there already was an index on  
substring(v.headervalue, 0, 255) but the fast query plan didn't seem  
to use it, it used a different index instead:



[...]
Meanwhile, a mate of mine lurking on this list pointed out that  
reducing random_page_cost might help here and it did:  
random_page_cost of 2 made the fast query favourable.


Can it mean that the default planner configuration slightly  
overfavours seq scans?




Funnily, after a few days of running with random_page_cost=2, exactly  
the same query became slow again and I had to reduce random_page_cost  
further to 1.5 to make it fast.  Can it be a sign of a problem in the  
planner?


Thanks!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Yaroslav Tykhiy

Hi Martin,

Thank you for your response!

On 13/09/2010, at 10:49 AM, Martin Gainty wrote:

a cursory look of the plan details a FTS on dbmail_headername  
invoked by the JOIN clause

JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both  
participating columns

v.headername_id
n.id


Granted, there was no index on v.headername_id but creating one just  
slowed the query down, with a different plan:


CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue  
(headername_id);

EXPLAIN ANALYSE ...
   QUERY 
 PLAN


 Sort  (cost=222020.81..222020.81 rows=1 width=8) (actual  
time=28636.426..28636.426 rows=0 loops=1)

   Sort Key: m.message_idnr
   Sort Method:  quicksort  Memory: 25kB
   -  Nested Loop  (cost=1409.82..222020.80 rows=1 width=8) (actual  
time=28636.409..28636.409 rows=0 loops=1)
 -  Nested Loop  (cost=1409.82..222012.27 rows=1 width=24)  
(actual time=28636.405..28636.405 rows=0 loops=1)
   -  Nested Loop  (cost=1409.82..221959.94 rows=6  
width=8) (actual time=28543.441..28624.750 rows=1 loops=1)
 -  Seq Scan on dbmail_headername n   
(cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1  
loops=1)
   Filter: ((headername)::text ~~* 'MESSAGE- 
ID'::text)
 -  Bitmap Heap Scan on dbmail_headervalue v   
(cost=1409.82..221813.70 rows=2805 width=16) (actual  
time=28543.411..28623.623 rows=1 loops=1)

   Recheck Cond: (v.headername_id = n.id)
   Filter: (substring(v.headervalue, 0,  
255) ~~* '%@mail.gmail.com%'::text)
   -  Bitmap Index Scan on  
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)  
(actual time=17555.572..17555.572 rows=1877009 loops=1)

 Index Cond: (v.headername_id = n.id)
   -  Index Scan using dbmail_messages_physmessage_idx  
on dbmail_messages m  (cost=0.00..8.71 rows=1 width=16) (actual  
time=11.646..11.646 rows=0 loops=1)

 Index Cond: (m.physmessage_id = v.physmessage_id)
 Filter: ((m.status = ANY ('{0,1}'::integer[]))  
AND (m.mailbox_idnr = 12345))
 -  Index Scan using dbmail_physmessage_pkey on  
dbmail_physmessage p  (cost=0.00..8.52 rows=1 width=8) (never executed)

   Index Cond: (p.id = m.physmessage_id)
 Total runtime: 28636.517 ms
(19 rows)


I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id


Both columns already indexed here:

On public.dbmail_headervalue (alias v):
dbmail_headervalue_2 btree (physmessage_id)

On public.dbmail_physmessage (alias p):
dbmail_physmessage_pkey PRIMARY KEY, btree (id)

Perhaps I should provide some data on the table sizes.

dbmail_headervalue is the largest table with respect to its record  
count: 36 million records.

dbmail_headername is small: 5640 records.
dbmail_physmessage and dbmail_messages are of an average size: ~2  
million records each.


Sorry for my cluelessness on this issue.  But Postgresql's ability to  
build a fast query plan for this query type at least occasionally is  
encouraging. :-)


Yar

the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id  
uses indexed for both participants


Martin
__
Verzicht und Vertraulichkeitanmerkung

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene  
Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede  
unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig.  
Diese Nachricht dient lediglich dem Austausch von Informationen und  
entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten  
Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den  
Inhalt uebernehmen.




From: y...@barnet.com.au
To: pgsql-general@postgresql.org
Subject: [GENERAL] Query plan choice issue
Date: Mon, 13 Sep 2010 09:36:35 +1000

Hi all,

I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can
be executed using either of two different query plans, one taking a
few milliseconds and the other, tens of seconds. The work_mem setting
doesn't seem to affect it -- tried to increase or decrease it by 2 or
4 times, but it didn't seem to favour the fast plan choice. Honestly,
I have no idea what affects the plan choice, but I saw Postgresql
change it at random.

The query in question looks like this -- sorry, it's rather complex:

SELECT message_idnr
FROM dbmail_messages m
JOIN dbmail_physmessage p ON m.physmessage_id=p.id
JOIN dbmail_headervalue 

Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Tom Lane
Yaroslav Tykhiy y...@barnet.com.au writes:
   -  Bitmap Heap Scan on dbmail_headervalue v   
 (cost=1409.82..221813.70 rows=2805 width=16) (actual  
 time=28543.411..28623.623 rows=1 loops=1)
 Recheck Cond: (v.headername_id = n.id)
 Filter: (substring(v.headervalue, 0,  
 255) ~~* '%@mail.gmail.com%'::text)
 -  Bitmap Index Scan on  
 dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)  
 (actual time=17555.572..17555.572 rows=1877009 loops=1)
   Index Cond: (v.headername_id = n.id)

I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
substring(v.headervalue, 0,  255) ~~* '%@mail.gmail.com%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-13 Thread Yaroslav Tykhiy

Hi Tom,

On 14/09/2010, at 12:41 AM, Tom Lane wrote:


Yaroslav Tykhiy y...@barnet.com.au writes:

 -  Bitmap Heap Scan on dbmail_headervalue v
(cost=1409.82..221813.70 rows=2805 width=16) (actual
time=28543.411..28623.623 rows=1 loops=1)
   Recheck Cond: (v.headername_id = n.id)
   Filter: (substring(v.headervalue, 0,
255) ~~* '%@mail.gmail.com%'::text)
   -  Bitmap Index Scan on
dbmail_headervalue_testing  (cost=0.00..1409.82 rows=75940 width=0)
(actual time=17555.572..17555.572 rows=1877009 loops=1)
 Index Cond: (v.headername_id = n.id)


I think the major problem you're having is that the planner is
completely clueless about the selectivity of the condition
substring(v.headervalue, 0,  255) ~~* '%@mail.gmail.com%'
If it knew that that would match only one row, instead of several
thousand, it would likely pick a different plan.

In recent versions of PG you could probably make a noticeable
improvement in this if you just dropped the substring() restriction
... do you actually need that?  Alternatively, if you don't want to
change the query logic at all, I'd try making an index on
substring(v.headervalue, 0, 255).  I'm not expecting the query
to actually *use* the index, mind you.  But its existence will prompt
ANALYZE to collect stats on the expression's value, and that will
help the planner with estimating the ~~* condition.


Well, that substring() and ILIKE combo looked suspicious to me, too.   
However, there already was an index on substring(v.headervalue, 0,  
255) but the fast query plan didn't seem to use it, it used a  
different index instead:


mail=# \d dbmail_headervalue
  Table public.dbmail_headervalue
 Column |  Type  | Modifiers
+ 
+---

 headername_id  | bigint | not null
 physmessage_id | bigint | not null
 id | bigint | not null default  
nextval('dbmail_headervalue_idnr_seq'::regclass)

 headervalue| text   | not null default ''::text
Indexes:
dbmail_headervalue_pkey PRIMARY KEY, btree (id)
dbmail_headervalue_1 UNIQUE, btree (physmessage_id, id)
dbmail_headervalue_2 btree (physmessage_id)
dbmail_headervalue_3 btree (substring(headervalue, 0, 255))
...
EXPLAIN ANALYZE...
- Index Scan using dbmail_headervalue_2 on  
dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual  
time=0.028..0.029 rows=0 loops=358)
  Index Cond: (v.physmessage_id =  
m.physmessage_id)
  Filter: (substring(v.headervalue, 0, 255)  
~~* '%@mail.gmail.com%'::text)

...

Meanwhile, a mate of mine lurking on this list pointed out that  
reducing random_page_cost might help here and it did: random_page_cost  
of 2 made the fast query favourable.


Can it mean that the default planner configuration slightly  
overfavours seq scans?


Thank you all guys!

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query plan choice issue

2010-09-12 Thread Martin Gainty

a cursory look of the plan details a FTS on dbmail_headername invoked by the 
JOIN clause
JOIN dbmail_headername n ON v.headername_id=n.id
you would accelerate the seek appreciably by placing indexes on both 
participating columns
v.headername_id
n.id

I also see a FTS on domain_headervalue invoked by the JOIN cluase
JOIN dbmail_headervalue v ON v.physmessage_id=p.id
place indexes on both columns
v.physmessage_id
p.id
 
the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed 
for both participants

Martin 
__ 
Verzicht und Vertraulichkeitanmerkung

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger 
sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung 
oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem 
Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. 
Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung 
fuer den Inhalt uebernehmen.
 

 From: y...@barnet.com.au
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] Query plan choice issue
 Date: Mon, 13 Sep 2010 09:36:35 +1000
 
 Hi all,
 
 I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can 
 be executed using either of two different query plans, one taking a 
 few milliseconds and the other, tens of seconds. The work_mem setting 
 doesn't seem to affect it -- tried to increase or decrease it by 2 or 
 4 times, but it didn't seem to favour the fast plan choice. Honestly, 
 I have no idea what affects the plan choice, but I saw Postgresql 
 change it at random.
 
 The query in question looks like this -- sorry, it's rather complex:
 
 SELECT message_idnr
 FROM dbmail_messages m
 JOIN dbmail_physmessage p ON m.physmessage_id=p.id
 JOIN dbmail_headervalue v ON v.physmessage_id=p.id
 JOIN dbmail_headername n ON v.headername_id=n.id
 WHERE mailbox_idnr = 12345 AND status IN (0,1) AND
 headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255)
 ILIKE '%@mail.gmail.com%'
 ORDER BY message_idnr;
 
 It comes from DBMail. That said, I don't think DBMail can be blamed 
 here because at least sometimes Postgresql is able to do the right 
 thing with respect to that query.
 
 Here is the slow plan:
 QUERY 
 PLAN
 --
 Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual 
 time=20933.166..20933.168 rows=1 loops=1)
 Sort Key: m.message_idnr
 Sort Method: quicksort Memory: 25kB
 - Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual 
 time=3327.658..20933.122 rows=1 loops=1)
 - Nested Loop (cost=0.00..1234601.59 rows=1 width=24) 
 (actual time=3327.599..20933.056 rows=1 loops=1)
 - Nested Loop (cost=0.00..1234549.74 rows=6 width=8) 
 (actual time=3327.503..20932.944 rows=1 loops=1)
 Join Filter: (v.headername_id = n.id)
 - Seq Scan on dbmail_headername n 
 (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 
 loops=1)
 Filter: ((headername)::text ~~* 'MESSAGE- 
 ID'::text)
 - Seq Scan on dbmail_headervalue v 
 (cost=0.00..1234407.96 rows=2666 width=16) (actual 
 time=3327.465..20931.942 rows=1 loops=1)
 Filter: (substring(v.headervalue, 0, 
 255) ~~* '%@mail.gmail.com%'::text)
 - Index Scan using dbmail_messages_physmessage_idx on 
 dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual 
 time=0.088..0.100 rows=1 loops=1)
 Index Cond: (m.physmessage_id = v.physmessage_id)
 Filter: ((m.status = ANY ('{0,1}'::integer[])) 
 AND (m.mailbox_idnr = 12345))
 - Index Scan using dbmail_physmessage_pkey on 
 dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual 
 time=0.048..0.050 rows=1 loops=1)
 Index Cond: (p.id = m.physmessage_id)
 Total runtime: 20933.241 ms
 (17 rows)
 
 And here is the fast plan:
 QUERY 
 PLAN
 ---
 Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual 
 time=14.163..14.165 rows=1 loops=1)
 Sort Key: m.message_idnr
 Sort Method: quicksort Memory: 25kB
 - Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual 
 time=4.272..14.152 rows=1 loops=1)
 - Nested Loop (cost=0.00..1175276.41 rows=1 width=24) 
 (actual time=4.261..14.135 rows=1 loops=1)
 - Nested Loop (cost=0.00..1175268.46 rows=1 width=32) 
 (actual time=4.249..14.117 rows=1 loops=1)
 - Index Scan using dbmail_messages_mailbox_idx 
 on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual 
 time=0.043..2.810 rows=358 loops=1)
 Index Cond: (mailbox_idnr = 12345)
 Filter: (status = ANY ('{0,1}'::integer[]))
 - Index Scan using dbmail_headervalue_2 on 
 dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual 
 time=0.028..0.029 rows=0 loops=358)
 Index Cond: (v.physmessage_id = 

Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Tom Lane
Albert Vernon Smith [EMAIL PROTECTED] writes:
 I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3.  I  
 dumped and reloaded the data, and things look good, until I try to  
 work with it.  When I try to make a query to a view based on a large  
 join (select * from 'view' limit 10), the 7.4.3 query works very  
 well, and the indexes are being used well.  But, when I go to 8.0.3,  
 there are some very costly joins, and the query is not workable.   

Are you sure you have all the same indexes in the 8.0 database?  It
looks like one might be missing on assay2rs.assay_id.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Albert Vernon Smith
I just re-checked, and all the same index definitions are on both  
tables.


Cheers,
-albert

On 17.6.2005, at 14:20, Tom Lane wrote:


Albert Vernon Smith [EMAIL PROTECTED] writes:


I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3.  I
dumped and reloaded the data, and things look good, until I try to
work with it.  When I try to make a query to a view based on a large
join (select * from 'view' limit 10), the 7.4.3 query works very
well, and the indexes are being used well.  But, when I go to 8.0.3,
there are some very costly joins, and the query is not workable.



Are you sure you have all the same indexes in the 8.0 database?  It
looks like one might be missing on assay2rs.assay_id.

regards, tom lane




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] query plan in pg7.4 vs 8.0.3

2005-06-17 Thread Tom Lane
Albert Vernon Smith [EMAIL PROTECTED] writes:
 I just re-checked, and all the same index definitions are on both  
 tables.

Well, that seems to imply that the 8.0 planner is missing the plan that
7.4 finds ... which is both hard to believe and impossible to
investigate with this much info.  Would it be possible to get a copy of
the database?  Or at least a schema dump?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] Query plan question

2003-06-26 Thread Tom Lane
Maksim Likharev [EMAIL PROTECTED] writes:
 basically I complaining that PG does not do what I told to do or 
 was hoping to do.

Okay, now I get the point: you want to prevent the pt sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

...
FROM prod.t_p AS p INNER JOIN t_temp AS t 
ON p.did = t.did 
LEFT OUTER JOIN prod.t_pinv AS pi
ON p.kid = pi.kid AND pi.orderid = 'S' 
-- hack to keep this separate from outer plan:
OFFSET 0
) AS pt

LEFT OUTER JOIN prod.t_dmp AS pdb
ON pt.kid = pdb.kid
...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] Query plan question

2003-06-26 Thread Maksim Likharev
Thanks Tom,
works, have to test performance


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2003 7:36 AM
To: Maksim Likharev
Cc: GENERAL
Subject: Re: [GENERAL] Query plan question 


Maksim Likharev [EMAIL PROTECTED] writes:
 basically I complaining that PG does not do what I told to do or 
 was hoping to do.

Okay, now I get the point: you want to prevent the pt sub-select from
being flattened into the outer query.

7.3.1 through 7.3.3 will actually do what you want (they won't flatten a
sub-select that has any sub-selects in its output list) but we got a lot
of flak for that and 7.4 will go back to the prior behavior.  In most
scenarios it's a win for the planner to flatten wherever possible.

Probably the easiest way to handle it is to insert a DISTINCT or LIMIT
clause in the sub-select; that will unconditionally keep the planner
from flattening the sub-select.  For example,

...
FROM prod.t_p AS p INNER JOIN t_temp AS t 
ON p.did = t.did 
LEFT OUTER JOIN prod.t_pinv AS pi
ON p.kid = pi.kid AND pi.orderid = 'S' 
-- hack to keep this separate from outer plan:
OFFSET 0
) AS pt

LEFT OUTER JOIN prod.t_dmp AS pdb
ON pt.kid = pdb.kid
...

I don't foresee any future planner changes that would be likely to
bypass a LIMIT/OFFSET clause.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Query plan: varchar vs char indexes

2001-02-03 Thread Tom Lane

"Richard Huxton" [EMAIL PROTECTED] writes:
 With indexed varchar fields the explain changes - performing a seq-scan on
 users rather than using the index.
 Is this because the estimator guesses costs differently for char vs varchar
 or is it because I'm not explicitly casting the id fields?

The estimator has no special ideas about either char or varchar.
However there are some discrepancies in the sets of available functions
for the two datatypes, so what appears to be the same expression may
translate into different function invocations --- especially if you are
doing random combinations of datatypes and expecting the system to pick
an operator for you.  I suspect it is picking a combination that doesn't
work out to be semantically equivalent to the '=' operator in the
index's operator class, so it doesn't think it can use the index.

 And yes - I know I probably shouldn't be joining an int to a varchar.

Not without being pretty darn careful.  You didn't actually say what the
datatype of tag_list.id is, however.

regards, tom lane



Re: [GENERAL] QUERY PLAN:

1999-10-27 Thread Ross J. Reedstrom

On Wed, Oct 27, 1999 at 11:03:45AM +0100, Tim Joyce wrote:
 can someone point me at documentation so that i can unserstand the results
 of an EXPLAIN, eg:
 
 Index Scan using words_id_idx on books_idx  (cost=441.19 rows=7644 width=8)
 
 explain select * from books_idx where wrd_id=1;
 
 I am paticularly interested in what the rows= figure means.

I've picked up a little info on this from following the hackers list,
so I don't know what docs to point you at. My understanding is that the
rows= represents the number of tuples the optimizer estimates will be
returned by that step of the execution plan. In the example you've shown,
it's an index scan of a field, and the estimate (based on last know number
of tuples in the table (as of your last VACUUM ANALYZE) and an estimate
of the selectivity of the operator being applied to this index (, =,
, etc), and the approximate dispersion of the values in that field.

For more detail, I'd suggest checking the archives of the pgsql-hackers
list, and perhaps then asking on the list itself. There's some hints in the
"PostgreSQL Programmer's Guide" in the "Extending SQL" sections on functions,
operators, and interfacing them to indices.

Ross
-- 
Ross J. Reedstrom, Ph.D., [EMAIL PROTECTED] 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005