[SQL] Multi-column index not used, new flipped column index is

2006-05-10 Thread ogjunk-pgjedan
Hi,

I'm using PG 8.0.3 and recently spotted a query that was not using a 
multi-column index I had created.
The index looks like:

  CREATE INDEX . ON FooTable(fkColumnOne, fkColumnTwo);

The query that was not using the index was using:

  SELECT  a bunch of columns and joins  WHERE FooTable.fkColumnTwo=1000;

So I added another index where the indexed columns are flipped:

  CREATE INDEX . ON FooTable(fkColumnTwo, fkColumnOne);
 
Now the query started to use that index -- good!
But now I have 2 indices that are nearly the same, and that means overhead 
during INSERTs/DELETEs. :(

Is there a way to get this to use that first index, so I don't have to have 
this additional index?
If I had PG 8.1.3, would PG know how to use that first index?  I seem to recall 
something about this somewhere... but I don't see it at 
http://www.postgresql.org/docs/whatsnew .

Thanks,
Otis




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Multi-column index not used, new flipped column index is

2006-05-10 Thread Markus Schaber
Hi, Otis,

[EMAIL PROTECTED] wrote:

> I'm using PG 8.0.3 and recently spotted a query that was not using a
> multi-column index I had created. The index looks like:
> 
> CREATE INDEX . ON FooTable(fkColumnOne, fkColumnTwo);
> 
> The query that was not using the index was using:
> 
> SELECT  a bunch of columns and joins  WHERE
> FooTable.fkColumnTwo=1000;

That's how it is supposed to be currently. AFAIK, Oracle has a technique
called "index skip scan" that could take (limited) use of that second
index, but PostgreSQL currently does not support them.

> So I added another index where the indexed columns are flipped:
> 
> CREATE INDEX . ON FooTable(fkColumnTwo, fkColumnOne);
> 
> Now the query started to use that index -- good! But now I have 2
> indices that are nearly the same, and that means overhead during
> INSERTs/DELETEs. :(

> Is there a way to get this to use that first index, so I don't have
> to have this additional index? If I had PG 8.1.3, would PG know how
> to use that first index?  I seem to recall something about this
> somewhere... but I don't see it at
> http://www.postgresql.org/docs/whatsnew .

Bitmap Index Scans can be your solution, but AFAIK they were invented in
8.1.

For bitmap index scans, you have one index on fkColumnOne and one on
fkColumnTwo, and the query planner knows to combine them when both
columns are given in a WHERE clause.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread ogjunk-pgjedan
  Hello,

I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential 
scan on a multi-million row table.  I _thought_ I had all the appropriate 
indices, but apparently I do not.  I was wondering if anyone can spot a way I 
can speed up this query.
The query currently takes... *gulp*: 381119.201 ms :(
  
There are only 2 tables in the game: user_url and user_url_tag.  The latter has 
FKs pointing to the former.  The sequential scan happens on the latter - 
user_url_tag:

EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ 
from user_url_tag userurltag0_, user_url userurl1_ WHERE (((userurl1_.user_id=1 
)) AND ((userurltag0_.user_url_id=userurl1_.id ))) GROUP BY userurltag0_.tag 
ORDER BY  count(*) DESC;  
  

  QUERY PLAN   
---
  
 Unique  (cost=140972.22..140996.28 rows=3207 width=10) (actual 
time=381082.868..381110.094 rows=2546 loops=1)  
   ->  Sort  (cost=140972.22..140980.24 rows=3207 width=10) (actual 
time=381082.858..381091.733 rows=2546 loops=1)  
 Sort Key: count(*), userurltag0_.tag  
 ->  HashAggregate  (cost=140777.45..140785.46 rows=3207 width=10) 
(actual time=381032.844..381064.068 rows=2546 loops=1)  
   ->  Hash Join  (cost=2797.65..140758.50 rows=3790 width=10) 
(actual time=248.530..380635.132 rows=8544 loops=1)  
 Hash Cond: ("outer".user_url_id = "inner".id)  
 ->  Seq Scan on user_url_tag userurltag0_  
(cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 
rows=6259553 loops=1)  
 ->  Hash  (cost=2795.24..2795.24 rows=962 width=4) (actual 
time=199.840..199.840 rows=0 loops=1)  
   ->  Index Scan using ix_user_url_user_id_url_id on 
user_url userurl1_  (cost=0.00..2795.24 rows=962 width=4) (actual 
time=0.048..193.707 rows=1666 loops=1)  
 Index Cond: (user_id = 1)  
 Total runtime: 381119.201 ms  
(11 rows)  
  
  
  This is what the two tables look like (extra colums removed):

Table "public.user_url_tag"  
   Column| Type  |  Modifiers  
-+---+--
  
 id  | integer   | not null default 
nextval('public.user_url_tag_id_seq'::text)  
 user_url_id | integer   |  
 tag | character varying(64) |  
Indexes:  
"pk_user_url_tag_id" PRIMARY KEY, btree (id)  
"ix_user_url_tag_tag" btree (tag)  
"ix_user_url_tag_user_url_id" btree (user_url_id)  
Foreign-key constraints:  
"fk_user_url_tag_user_url_id" FOREIGN KEY (user_url_id) REFERENCES 
user_url(id)  
  
  Table "public.user_url"  
  Column  |Type |
Modifiers  
--+-+--
  
 id   | integer | not null default 
nextval('public.user_url_id_seq'::text)  
 user_id  | integer |  
 url_id   | integer |  
Indexes:  
"pk_user_url_id" PRIMARY KEY, btree (id)  
"ix_user_url_url_id_user_id" UNIQUE, btree (url_id, user_id)  
"ix_user_url_user_id_url_id" UNIQUE, btree (user_id, url_id)  
  
  
  Does anyone see a way to speed up this s-l-o-w query?
I cache DB results, but I'd love to get rid of that sequential scan.

Thanks,
Otis




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Multi-column index not used, new flipped column index is

2006-05-10 Thread Andreas Kretschmer
Markus Schaber <[EMAIL PROTECTED]> schrieb:
> Bitmap Index Scans can be your solution, but AFAIK they were invented in
> 8.1.

Right.

> 
> For bitmap index scans, you have one index on fkColumnOne and one on
> fkColumnTwo, and the query planner knows to combine them when both
> columns are given in a WHERE clause.

Yeah! The Bitmap Index Scan is a great thing. With some selects i have a
performance-boost from 5 to 10, compared with 8.0.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread Bruno Wolff III
On Wed, May 10, 2006 at 13:13:59 -0500,
  [EMAIL PROTECTED] wrote:
>   Hello,
> 
> I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential 
> scan on a multi-million row table.  I _thought_ I had all the appropriate 
> indices, but apparently I do not.  I was wondering if anyone can spot a way I 
> can speed up this query.
> The query currently takes... *gulp*: 381119.201 ms :(
>   
> There are only 2 tables in the game: user_url and user_url_tag.  The latter 
> has FKs pointing to the former.  The sequential scan happens on the latter - 
> user_url_tag:
> 
> EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ 
> from user_url_tag userurltag0_, user_url userurl1_ WHERE 
> (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) 
> GROUP BY userurltag0_.tag ORDER BY  count(*) DESC;  

While this isn't a big issue, it looks like DISTINCT is redundant in your
query and seems to be adding some extra work.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread ogjunk-pgjedan
Hi,

Thanks Bruno.  That was indeed a redundant DISTINCT.  It did reduce the amount 
of work, but as you said it doesn't get rid of the sequential scan, which is 
the real problem with this query.

Otis

- Original Message 
From: Bruno Wolff III <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Sent: Wednesday, May 10, 2006 3:23:29 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table

On Wed, May 10, 2006 at 13:13:59 -0500,
  [EMAIL PROTECTED] wrote:
>   Hello,
> 
> I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential 
> scan on a multi-million row table.  I _thought_ I had all the appropriate 
> indices, but apparently I do not.  I was wondering if anyone can spot a way I 
> can speed up this query.
> The query currently takes... *gulp*: 381119.201 ms :(
>   
> There are only 2 tables in the game: user_url and user_url_tag.  The latter 
> has FKs pointing to the former.  The sequential scan happens on the latter - 
> user_url_tag:
> 
> EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ 
> from user_url_tag userurltag0_, user_url userurl1_ WHERE 
> (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) 
> GROUP BY userurltag0_.tag ORDER BY  count(*) DESC;  

While this isn't a big issue, it looks like DISTINCT is redundant in your
query and seems to be adding some extra work.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread Tom Lane
<[EMAIL PROTECTED]> writes:
>->  Hash Join  (cost=2797.65..140758.50 rows=3790 width=10) 
> (actual time=248.530..380635.132 rows=8544 loops=1)  
>  Hash Cond: ("outer".user_url_id = "inner".id)  
>  ->  Seq Scan on user_url_tag userurltag0_  
> (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 
> rows=6259553 loops=1)  
>  ->  Hash  (cost=2795.24..2795.24 rows=962 width=4) 
> (actual time=199.840..199.840 rows=0 loops=1)  
>->  Index Scan using ix_user_url_user_id_url_id on 
> user_url userurl1_  (cost=0.00..2795.24 rows=962 width=4) (actual 
> time=0.048..193.707 rows=1666 loops=1)  
>  Index Cond: (user_id = 1)  

Hm, I'm not sure why it's choosing that join plan.  A nestloop indexscan
wouldn't be terribly cheap, but just counting on my fingers it seems
like it ought to come in at less than 10 cost units.  What do you
get if you set enable_hashjoin off?  (Then try disabling its
second-choice join type too --- I'm interested to see EXPLAIN ANALYZE
output for all three join types.)

What PG version is this exactly?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread ogjunk-pgjedan
Aha!  set hashjoin=off did the trick.
The PG version is: 8.0.3

NB: I removed that redundant "DISTINCT" after the SELECT.

EXPLAIN ANALYZE select userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from 
user_url_tag userurltag0_, user_url userurl1_ where (((userurl1_.user_id=1 
))AND((userurltag0_.user_url_id=userurl1_.id ))) group by  userurltag0_.tag 
order by  count(*)DESC;

   QUERY PLAN 
-
 Sort  (cost=155766.79..155774.81 rows=3207 width=10) (actual 
time=2387.756..2396.578 rows=2546 loops=1)
   Sort Key: count(*)
   ->  HashAggregate  (cost=155572.02..155580.03 rows=3207 width=10) (actual 
time=2365.643..2376.626 rows=2546 loops=1)
 ->  Nested Loop  (cost=0.00..12.68 rows=3867 width=10) (actual 
time=0.135...028 rows=8544 loops=1)
   ->  Index Scan using ix_user_url_user_id_url_id on user_url 
userurl1_  (cost=0.00..2798.12 rows=963 width=4) (actual time=0.067..9.744 
rows=1666 loops=1)
 Index Cond: (user_id = 1)
   ->  Index Scan using ix_user_url_tag_user_url_id on user_url_tag 
userurltag0_  (cost=0.00..157.34 rows=103 width=14) (actual time=1.223..1.281 
rows=5 loops=1666)
 Index Cond: (userurltag0_.user_url_id = "outer".id)
 Total runtime: 2405.691 ms
(9 rows)


Are you still interested in other "its second-choice join type"?  If you are, 
please tell me what join types those are, this is a bit beyond me. :(

Is there a way to force PG to use the index automatically?  This query is 
executed from something called Hibernate, and I'm not sure if that will let me 
set enable_hashjoin=off through its API...

Thanks,
Otis


- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Sent: Wednesday, May 10, 2006 8:27:01 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table 

<[EMAIL PROTECTED]> writes:
>->  Hash Join  (cost=2797.65..140758.50 rows=3790 width=10) 
> (actual time=248.530..380635.132 rows=8544 loops=1)  
>  Hash Cond: ("outer".user_url_id = "inner".id)  
>  ->  Seq Scan on user_url_tag userurltag0_  
> (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 
> rows=6259553 loops=1)  
>  ->  Hash  (cost=2795.24..2795.24 rows=962 width=4) 
> (actual time=199.840..199.840 rows=0 loops=1)  
>->  Index Scan using ix_user_url_user_id_url_id on 
> user_url userurl1_  (cost=0.00..2795.24 rows=962 width=4) (actual 
> time=0.048..193.707 rows=1666 loops=1)  
>  Index Cond: (user_id = 1)  

Hm, I'm not sure why it's choosing that join plan.  A nestloop indexscan
wouldn't be terribly cheap, but just counting on my fingers it seems
like it ought to come in at less than 10 cost units.  What do you
get if you set enable_hashjoin off?  (Then try disabling its
second-choice join type too --- I'm interested to see EXPLAIN ANALYZE
output for all three join types.)

What PG version is this exactly?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread Tom Lane
<[EMAIL PROTECTED]> writes:
> Aha!  set hashjoin=off did the trick.

>->  Index Scan using ix_user_url_tag_user_url_id on 
> user_url_tag userurltag0_  (cost=0.00..157.34 rows=103 width=14) (actual 
> time=1.223..1.281 rows=5 loops=1666)
>  Index Cond: (userurltag0_.user_url_id = "outer".id)

This seems to be the problem right here: the estimate of matching rows
is off by a factor of 20, and that inflates the overall cost estimate
for this plan about the same, causing the planner to think the other way
is cheaper.

What does the pg_stats row for user_url_tag.user_url_id contain?
Have you analyzed that table recently?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Help with a seq scan on multi-million row table

2006-05-10 Thread ogjunk-pgjedan
Not sure if I'm showing you what you asked for, but here it is:

select * from pg_stats  where tablename='user_url_tag' and 
attname='user_url_id';
 schemaname |  tablename   |   attname   | null_frac | avg_width | n_distinct | 
   most_common_vals   | 
most_common_freqs   |   
  histogram_bounds 
| correlation
+--+-+---+---++++--+-
 public | user_url_tag | user_url_id | 0 | 4 |  60825 | 
{458321,1485346,16304,68027,125417,153465,182503,201175,202973,218423} | 
{0.0013,0.001,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067}
 | 
{195,195993,325311,480323,647778,782598,1014527,1201726,1424822,1614712,1853719}
 |0.795521

You asked if the table has been analyzed recently.
I think so - I run ANALYZE on the whole DB every night, like this:

$ psql -U me -c "ANALYZE;" mydb

For a good measure, I just analyzed the table now: $ psql -U me -c "ANALYZE 
user_url_tag;" mydb
Then I set the enable_hashjoin back to ON and re-run the EXPLAIN ANALYZE.
I still get the sequential scan, even after analyzing the table :(

I'm not sure which numbers you are referring to when you said the estimate is 
off, but here are some numbers:
  The whole table has 6-7 M rows.
  That query matches about 2500 rows.

If there are other things I can play with and help narrow this down, please let 
me know.

Thanks,
Otis


- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Sent: Wednesday, May 10, 2006 9:53:49 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table 

<[EMAIL PROTECTED]> writes:
> Aha!  set hashjoin=off did the trick.

>->  Index Scan using ix_user_url_tag_user_url_id on 
> user_url_tag userurltag0_  (cost=0.00..157.34 rows=103 width=14) (actual 
> time=1.223..1.281 rows=5 loops=1666)
>  Index Cond: (userurltag0_.user_url_id = "outer".id)

This seems to be the problem right here: the estimate of matching rows
is off by a factor of 20, and that inflates the overall cost estimate
for this plan about the same, causing the planner to think the other way
is cheaper.

What does the pg_stats row for user_url_tag.user_url_id contain?
Have you analyzed that table recently?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] ORDER BY question

2006-05-10 Thread Christian Paul Cosinas
Hi,

For example I have a table like this

ID  Name
1   A
1   B
2   C
2   D

And I have a Query like this:
SELECT * FROM table ORDER BY ID.

Would my result always give me the same order or is there may be a possible
different result?

For example?

1   B
1   A
2   D
2   C


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] ORDER BY question

2006-05-10 Thread Michael Glaesemann


On May 11, 2006, at 21:00 , Christian Paul Cosinas wrote:


For example I have a table like this

ID  Name
1   A
1   B
2   C
2   D

And I have a Query like this:
SELECT * FROM table ORDER BY ID.

Would my result always give me the same order or is there may be a  
possible

different result?


It may give you a different result: if the order is not explicitly  
specified by the ORDER BY clause, you are not guaranteed to get the  
rows back in any particular order. So for cases where you may not be  
ordering on a unique column (such as above), you may want to do  
something like:


SELECT *
FROM table
ORDER BY "ID", "Name";

Hope this helps.

Michael Glaesemann
grzm seespotcode net




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq