Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Tom Lane
Scott Cain <[EMAIL PROTECTED]> writes:
> So, the question is, is there a way I can force the query planner to use
> the index I want it to use?

No (and I don't think there should be).  Given that it *can* generate
the plan you want, this is clearly an estimation failure.  What is the
index it does use?  Would you show us EXPLAIN ANALYZE results when
using each index?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-11 Thread Martin Foster
Shridhar Daithankar wrote:

On 10 Jul 2003 at 0:43, Martin Foster wrote:

As for creating a new table, that in itself is a nice idea.   But it 
would cause issues for people currently in the realm.   Their posts 
would essentially dissapear from site and cause more confusion then its 
worth.


No they won't. Say you have a base table and your current post table is child 
of that. You can query on base table and get rows from child table. That way 
all the data would always be there.

While inserting posts, you would insert in child table. While qeurying you 
would query on base table. That way things will be optimal.


Inheritance would work, but the database would essentially just grow and 
grow and grow right?


Right. But there are two advantages.

1. It will always contain valid posts. No dead tuples.
2. You can work in chuncks of data. Each child table can be dealt with 
separately without affecting other child tables, whereas in case of a single 
large table, entire site is affected..

Deleting 100K posts from 101K rows table is vastly different than deleting 10K 
posts from 2M rows table. Later one would unnecessary starve the table with 
dead tuples and IO whereas in former case you can do create table as select 
from and drop the original..

HTH

Bye
 Shridhar
--
"[In 'Doctor' mode], I spent a good ten minutes telling Emacs what Ithought of 
it.  (The response was, 'Perhaps you could try to be lessabusive.')"(By Matt 
Welsh)

When I ran EXPLAIN on the views and queries making use of the inherited 
tables, I noticed that everything worked based on sequence scans and it 
avoided all indexes.While making use of ONLY kicked in full indexes.

This is even after having created a child table with the same indexes as 
the parent.   Is this a known issue, or just some sort of oddity on my 
setup?

Tables still cannot be removed easily, but I found a way to work around 
it for a day-to-day basis.  Essentailly I just clean out the tables 
containing old rows and delete them later.  However based on the above, 
I doubt performance would get any better.

Thanks for the advice however!

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Scott Cain
Hi Tom,

Embarrassingly, I can't.  I've been monkeying with the database so much
that I can't seem to get it back to the state where I reproduce the
behavior I want.  A database drop and reload may be the only way, but
since that is a time consuming thing to do, I won't be able to do it
until this evening.

Thanks,
Scott

On Fri, 2003-07-11 at 11:24, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> > So, the question is, is there a way I can force the query planner to use
> > the index I want it to use?
> 
> No (and I don't think there should be).  Given that it *can* generate
> the plan you want, this is clearly an estimation failure.  What is the
> index it does use?  Would you show us EXPLAIN ANALYZE results when
> using each index?
> 
>   regards, tom lane
-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Scott Cain
The problem (at least as it appears to me) is not that it is performing
a table scan instead of an index scan, it is that it is using the wrong
index.  Here is the output from EXPLAIN ANALYZE:

 QUERY PLAN

 Unique  (cost=494008.47..494037.59 rows=166 width=54) (actual 
time=114660.37..114660.38 rows=1 loops=1)
   ->  Sort  (cost=494008.47..494012.63 rows=1664 width=54) (actual 
time=114660.37..114660.37 rows=1 loops=1)
 Sort Key: f.name, fl.fmin, fl.fmax, fl.strand, f.type_id, f.feature_id
 ->  Nested Loop  (cost=0.00..493919.44 rows=1664 width=54) (actual 
time=2596.13..114632.90 rows=1 loops=1)
   ->  Index Scan using feature_pkey on feature f  (cost=0.00..134601.43 
rows=52231 width=40) (actual time=105.74..56048.87 rows=13825 loops=1)
 Filter: (type_id = 219)
   ->  Index Scan using featureloc_idx1 on featureloc fl  (cost=0.00..6.87 
rows=1 width=14) (actual time=4.23..4.23 rows=0 loops=13825)
 Index Cond: ("outer".feature_id = fl.feature_id)
 Filter: ((srcfeature_id = 6) AND (fmin <= 2585581) AND (fmax >= 
2565581))
 Total runtime: 114660.91 msec

This is the same regardless of enable_seqscan's setting.  The index that
it is using on featureloc (featureloc_idx1) is on the foreign key
feature_id.  It should instead be using another index, featureloc_idx3,
which is built on (srcfeature_id, fmin, fmax).

I should also mention that I've done a VACUUM FULL ANALYZE on this
database, and I've been using it for a while, and this is the primary
type of query I perform on the database.

Thanks,
Scott



On Fri, 2003-07-11 at 06:51, Rod Taylor wrote:
> On Thu, 2003-07-10 at 15:18, Scott Cain wrote:
> > Hello,
> > 
> > I am wondering if there is a way to force the use of a particular index
> > when doing a query.  I have two tables that are pretty big (each >3
> > million rows), and when I do a join between them the performance is
> > generally quite poor as it does not use the indexes that I think it
> > should use.  Here is an example query:
> 
> Please send the EXPLAIN ANALYZE results for that query with and without
> sequential scans enabled.
> 
> set enable_seqscan = true;
> EXPLAIN ANALYZE ;
> 
> set enable_seqscan = false;
> EXPLAIN ANALYZE ;
-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Scott Cain
Rod,

I see what you mean about the scan on the feature_pkey taking a long
time.  I tried several things to remedy that.  I created an index on
feature (feature_id,type_id) (which I don't think makes sense since
feature_id is the primary key, so add another column really doesn't
help).  I also created a index on feature (type_id, feature_id), but the
planner doesn't use it.  Also, there was an already existing index on
feature (type_id) that the planner never used.

One thing I tried that changed the query plan and improved performance
slightly (but still nowhere near what I need) was to add a partial index
on featureloc on (fmin,fmax) where scrfeature_id=6.  This is something I
could realistically do since there are relatively few (>30)
srcfeature_ids that I am interested in, so putting in place a partial
index for each of them would not be a big deal.  Nevertheless, the
performance is still not there.  Here is the EXPLAIN ANALYZE for this
situation:

   QUERY PLAN  
   
 
-
 Unique  (cost=156172.23..156200.11 rows=159 width=54) (actual time=63631.93..63631.93 
rows=1 loops=1)
   ->  Sort  (cost=156172.23..156176.21 rows=1594 width=54) (actual 
time=63631.93..63631.93 rows=1 loops=1)
 Sort Key: f.name, fl.fmin, fl.fmax, fl.strand, f.type_id, f.feature_id
 ->  Hash Join  (cost=135100.30..156087.46 rows=1594 width=54) (actual 
time=63631.29..63631.79 rows=1 loops=1)
   Hash Cond: ("outer".feature_id = "inner".feature_id)
   ->  Index Scan using featureloc_src_6 on featureloc fl  
(cost=0.00..18064.99 rows=101883 width=14) (actual time=26.11..430.00 rows=570 loops=1)
 Index Cond: ((fmin <= 2585581) AND (fmax >= 2565581))
 Filter: (srcfeature_id = 6)
   ->  Hash  (cost=134601.43..134601.43 rows=48347 width=40) (actual 
time=63182.86..63182.86 rows=0 loops=1)
 ->  Index Scan using feature_pkey on feature f  
(cost=0.00..134601.43 rows=48347 width=40) (actual time=69.98..62978.27 rows=13825 
loops=1)
   Filter: (type_id = 219)
 Total runtime: 63632.28 msec
(12 rows)

Any other ideas?

Thanks,
Scott

On Fri, 2003-07-11 at 09:38, Rod Taylor wrote:
> On Fri, 2003-07-11 at 13:17, Scott Cain wrote:
> > The problem (at least as it appears to me) is not that it is performing
> > a table scan instead of an index scan, it is that it is using the wrong
> > index.  Here is the output from EXPLAIN ANALYZE:
> > 
> >  QUERY PLAN
> > 
> >  Unique  (cost=494008.47..494037.59 rows=166 width=54) (actual 
> > time=114660.37..114660.38 rows=1 loops=1)
> >->  Sort  (cost=494008.47..494012.63 rows=1664 width=54) (actual 
> > time=114660.37..114660.37 rows=1 loops=1)
> >  Sort Key: f.name, fl.fmin, fl.fmax, fl.strand, f.type_id, f.feature_id
> >  ->  Nested Loop  (cost=0.00..493919.44 rows=1664 width=54) (actual 
> > time=2596.13..114632.90 rows=1 loops=1)
> >->  Index Scan using feature_pkey on feature f  
> > (cost=0.00..134601.43 rows=52231 width=40) (actual time=105.74..56048.87 
> > rows=13825 loops=1)
> >  Filter: (type_id = 219)
> >->  Index Scan using featureloc_idx1 on featureloc fl  
> > (cost=0.00..6.87 rows=1 width=14) (actual time=4.23..4.23 rows=0 loops=13825)
> >  Index Cond: ("outer".feature_id = fl.feature_id)
> >  Filter: ((srcfeature_id = 6) AND (fmin <= 2585581) AND (fmax 
> > >= 2565581))
> >  Total runtime: 114660.91 msec
> 
> > it is using on featureloc (featureloc_idx1) is on the foreign key
> > feature_id.  It should instead be using another index, featureloc_idx3,
> > which is built on (srcfeature_id, fmin, fmax).
> 
> Nope.. The optimizer is right in the decision to use featureloc_idx1. 
> You will notice it is expecting to retrieve a single row from this
> index, but the featureloc_idx3 is bound to be larger (due to indexing
> more data), thus take more disk reads for the exact same information (or
> in this case, lack thereof).
> 
> What is taking a long time is the scan on feature_pkey. It looks like it
> is throwing away a ton of rows that are not type_id = 219.  Either that,
> or you do a pile of deletes and haven't run REINDEX recently.
> 
> Create an index consisting of (feature_id, type_id).  This will probably
> make a significant different in execution t

Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Rod Taylor
On Fri, 2003-07-11 at 11:36, Scott Cain wrote:
> Rod,
> 
> I see what you mean about the scan on the feature_pkey taking a long
> time.  I tried several things to remedy that.  I created an index on
> feature (feature_id,type_id) (which I don't think makes sense since
> feature_id is the primary key, so add another column really doesn't

It may be the primary key, but the system looked like it was throwing
away many rows based on type_id.  If it was throwing away many more rows
than found, the index with type_id may have been cheaper.

It is difficult to tell from an EXPLAIN ANALYZE as it doesn't tell you
exactly how many rows were filtered, just the cost to read them and how
many were used after the filter.

> help).  I also created a index on feature (type_id, feature_id), but the
> planner doesn't use it.  Also, there was an already existing index on
> feature (type_id) that the planner never used.

It cannot use more than one index for a given table scan at the moment. 
There are proposals on how to 'fix' that, but those require significant 
overhauls of various systems.

> Any other ideas?

Out of curiosity, what do you get if you disable hash joins?

set enable_hashjoin = false;


How about a partial index on (feature_id) where type_id = 219?



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Scott Cain
On Fri, 2003-07-11 at 14:14, Rod Taylor wrote:
> > > set enable_hashjoin = false;
> > 
> > BINGO!
> 
> I'm not so sure about that.  Your dataset seems to have changed fairly
> significantly since the last test.
> 
> > ->  Index Scan using feature_pkey on feature f  (cost=0.00..134592.43 rows=47912 
> > width=39) (actual time=0.46..502.50 rows=431 loops=1)
> 
> Notice it only pulled out 431 rows where prior runs pulled out several
> thousand (~13000).  I think what really happened was something came
> along and deleted a bunch of stuff, then vacuum ran.

There is nearly a zero chance that happened.  This database is
accessible only by me, I haven't deleted anything.  The only things I
have done is to create and drop various indexes and run vacuum.  Is
there anything else that could explain the difference?  Is the index
scan on feature_pkey using information from the index scan on
featureloc_src_6 to limit the number of rows to get from feature?

Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] force the use of a particular index

2003-07-11 Thread Tom Lane
Scott Cain <[EMAIL PROTECTED]> writes:
> Embarrassingly, I can't.  I've been monkeying with the database so much
> that I can't seem to get it back to the state where I reproduce the
> behavior I want.

If the thing works as desired after a VACUUM ANALYZE, then I suggest
the estimation failure was just due to out-of-date statistics ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly