Re: [SQL] Wrong query plan when using a left outer join

2012-01-17 Thread Feike Steenbergen
 BTW, add a foreign key and index on handhistory_plain.hand_id (unless you 
 have it already).
It's there already:

feiketracker=# \d+ handhistory_plain;
Table stage.handhistory_plain
 Column  |  Type   | Modifiers | Storage  | Description
-+-+---+--+-
 hand_id | integer | not null  | plain|
 history | text| not null  | extended |
Indexes:
handhistory_plain_pkey PRIMARY KEY, btree (hand_id) CLUSTER
Foreign-key constraints:
handhistory_plain_hand_id_fkey FOREIGN KEY (hand_id) REFERENCES
hand_meta(hand_id)

 BTW2, if you really don't care on handhistory you can just use
 original query with no join.

Well, sometimes I do, sometimes I don't. For easier application access
I wanted to create a view that joins both these tables together:
easier application design and better performance, as the analyzer
should know best when not to use the handhistory_plain table.


The design is as follows:

hand_meta - holds all metadata for a pokerhand
handhistory_plain holds the history for a pokerhand

hand_meta is going to be used the most, it is around 165 bytes per tuple
handhistory_plain is not going to be used often (it is there as a
reference); it is around 5000 bytes per tuple.

They both hold the same column as primary key, handhistory_plain holds
a fraction of the tuples of hand_meta, the split was only made to make
sure the processed data (hand_meta) is smaller in size and should
therefore require less I/O and thus increase performance.

I'm not sure what to make of:
 imagine that the view on the right side of join has some side effects.
I can see some side effects may occur, but as it is a left join, the
left hand side will always be part of the returning set (there is no
where clause), so the index should be used.
Even though I don't understand, you seem to be right, a natural join
is 30 times faster:

feiketracker=# explain analyze select max(hand_id) from hand_meta left
join handhistory_plain using(hand_id);
 QUERY
PLAN
-
 Aggregate  (cost=1049261.00..1049261.01 rows=1 width=4)
(actual time=31179.238..31179.241 rows=1 loops=1)
   -  Seq Scan on hand_meta  (cost=100.00..1043062.40
rows=2479440 width=4) (actual time=0.131..16039.886 rows=2479440
loops=1)
 Total runtime: 31179.725 ms
(3 rows)

Time: 31185.088 ms

feiketracker=# explain analyze select max(hand_id) from hand_meta join
handhistory_plain using(hand_id);

 QUERY PLAN
--
 Aggregate  (cost=53043.61..53043.62 rows=1 width=4) (actual
time=962.242..962.245 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..53029.93 rows=5470 width=4) (actual
time=0.400..920.582 rows=5470 loops=1)
 -  Index Scan using handhistory_plain_pkey on
handhistory_plain  (cost=0.00..14494.27 rows=5470 width=4) (actual
time=0.215..101.177 rows=5470 loops=1)
 -  Index Scan using hand_meta_pkey on hand_meta
(cost=0.00..7.03 rows=1 width=4) (actual time=0.100..0.115 rows=1
loops=5470)
   Index Cond: (hand_meta.hand_id = handhistory_plain.hand_id)
 Total runtime: 962.968 ms


 try to experiment with SET enable_seqscan TO false; - and see what happens.
Didn't make a difference; therefore I think postgres determines it is
unable to use the index, is that correct?


Thank you for now: I'll use the inner join (or natural join in this
case) for this specific view

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Wrong query plan when using a left outer join

2012-01-17 Thread Feike Steenbergen
oops, but ofcourse, a natural view will not give the correct answer,
back to the drawing board ...

On Tue, Jan 17, 2012 at 19:53, Feike Steenbergen
feikesteenber...@gmail.com wrote:
 BTW, add a foreign key and index on handhistory_plain.hand_id (unless you 
 have it already).
 It's there already:

 feiketracker=# \d+ handhistory_plain;
            Table stage.handhistory_plain
  Column  |  Type   | Modifiers | Storage  | Description
 -+-+---+--+-
  hand_id | integer | not null  | plain    |
  history | text    | not null  | extended |
 Indexes:
    handhistory_plain_pkey PRIMARY KEY, btree (hand_id) CLUSTER
 Foreign-key constraints:
    handhistory_plain_hand_id_fkey FOREIGN KEY (hand_id) REFERENCES
 hand_meta(hand_id)

 BTW2, if you really don't care on handhistory you can just use
 original query with no join.

 Well, sometimes I do, sometimes I don't. For easier application access
 I wanted to create a view that joins both these tables together:
 easier application design and better performance, as the analyzer
 should know best when not to use the handhistory_plain table.


 The design is as follows:

 hand_meta - holds all metadata for a pokerhand
 handhistory_plain holds the history for a pokerhand

 hand_meta is going to be used the most, it is around 165 bytes per tuple
 handhistory_plain is not going to be used often (it is there as a
 reference); it is around 5000 bytes per tuple.

 They both hold the same column as primary key, handhistory_plain holds
 a fraction of the tuples of hand_meta, the split was only made to make
 sure the processed data (hand_meta) is smaller in size and should
 therefore require less I/O and thus increase performance.

 I'm not sure what to make of:
 imagine that the view on the right side of join has some side effects.
 I can see some side effects may occur, but as it is a left join, the
 left hand side will always be part of the returning set (there is no
 where clause), so the index should be used.
 Even though I don't understand, you seem to be right, a natural join
 is 30 times faster:

 feiketracker=# explain analyze select max(hand_id) from hand_meta left
 join handhistory_plain using(hand_id);
                                                                 QUERY
 PLAN
 -
  Aggregate  (cost=1049261.00..1049261.01 rows=1 width=4)
 (actual time=31179.238..31179.241 rows=1 loops=1)
   -  Seq Scan on hand_meta  (cost=100.00..1043062.40
 rows=2479440 width=4) (actual time=0.131..16039.886 rows=2479440
 loops=1)
  Total runtime: 31179.725 ms
 (3 rows)

 Time: 31185.088 ms

 feiketracker=# explain analyze select max(hand_id) from hand_meta join
 handhistory_plain using(hand_id);

     QUERY PLAN
 --
  Aggregate  (cost=53043.61..53043.62 rows=1 width=4) (actual
 time=962.242..962.245 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..53029.93 rows=5470 width=4) (actual
 time=0.400..920.582 rows=5470 loops=1)
         -  Index Scan using handhistory_plain_pkey on
 handhistory_plain  (cost=0.00..14494.27 rows=5470 width=4) (actual
 time=0.215..101.177 rows=5470 loops=1)
         -  Index Scan using hand_meta_pkey on hand_meta
 (cost=0.00..7.03 rows=1 width=4) (actual time=0.100..0.115 rows=1
 loops=5470)
               Index Cond: (hand_meta.hand_id = handhistory_plain.hand_id)
  Total runtime: 962.968 ms


 try to experiment with SET enable_seqscan TO false; - and see what happens.
 Didn't make a difference; therefore I think postgres determines it is
 unable to use the index, is that correct?


 Thank you for now: I'll use the inner join (or natural join in this
 case) for this specific view

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Wrong query plan when using a left outer join

2012-01-16 Thread Feike Steenbergen
I have the following setup:

A table called hand:


Table stage.hand_meta
Column |   Type   |
Modifiers
---+--+-
 hand_id   | integer  | not null default
nextval('hand_meta_hand_id_seq'::regclass)
 hand_no   | bigint   | not null
 site_id   | smallint | not null
 game_id   | smallint | not null
 time  | timestamp with time zone | not null
 tournament_id | bigint   |
Indexes:
hand_meta_pkey PRIMARY KEY, btree (hand_id) CLUSTER
hand_meta_hand_no_site_unq UNIQUE, btree (hand_no, site_id)
hand_meta_time_idx btree (time)
hand_meta_tournament_id_idx btree (tournament_id)
Referenced by:
TABLE handhistory_plain CONSTRAINT
handhistory_plain_hand_id_fkey FOREIGN KEY (hand_id) REFERENCES
hand_meta(hand_id)
TABLE handhistory_staged CONSTRAINT staged_hand_hand_id_fkey
FOREIGN KEY (hand_id) REFERENCES hand_meta(hand_id)

Getting the max hand_id (primary key) results in using an index:


feiketracker= explain analyze select max(hand_id) from stage.hand;

 QUERY PLAN
---
 Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.379..0.383
rows=1 loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..0.03 rows=1 width=4) (actual
time=0.337..0.340 rows=1 loops=1)
   -  Index Scan Backward using hand_meta_pkey on hand_meta
(cost=0.00..82667.12 rows=2479440 width=4) (actual time=0.319..0.319
rows=1 loops=1)
 Index Cond: (hand_id IS NOT NULL)
 Total runtime: 0.823 ms
(6 rows)


Now, if i create a view which left outer joins another table and
select max hand_id it uses a seq_scan, which I think it should'nt use,
as it only needs to query hand_meta and then use the index:


feiketracker= create view seqscan_example as (select * from hand_meta
left join handhistory_plain using(hand_id));
CREATE VIEW
Time: 72.736 ms

feiketracker= explain analyze select max(hand_id) from seqscan_example;
 QUERY PLAN
-
 Aggregate  (cost=49261.00..49261.01 rows=1 width=4) (actual
time=34672.052..34672.054 rows=1 loops=1)
   -  Seq Scan on hand_meta  (cost=0.00..43062.40 rows=2479440
width=4) (actual time=0.180..16725.109 rows=2479440 loops=1)
 Total runtime: 34672.874 ms
(3 rows)


feiketracker= select version();
  version

 PostgreSQL 9.0.6 on armv5tejl-unknown-linux-gnueabi, compiled by GCC
gcc (GCC) 3.4.4 (release) (CodeSourcery ARM 2005q3-2), 32-bit
(1 row)


I cannot think of a reason to use a seqscan, the left join should
indicate all results from hand_meta should be used, hand_id is the
primary key, so selecting max(hand_id) from the table or the view
should result in the same execution plan or am I thinking wrong?

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Current transaction is aborted, commands ignored until end of transaction block

2012-01-04 Thread Feike Steenbergen
I recently started receiving this error as well, this was because I
disabled autocommit.
With the following option in .psqlrc the error doesn't wait for a
rollback but automatically creates a savepoint allowing you to fix the
error and continue:

This is now in my .psqlrc:

\set AUTOCOMMIT off
\set ON_ERROR_ROLLBACK on

http://www.postgresql.org/docs/9.1/static/app-psql.html look for
ON_ERROR_ROLLBACK

When on, if a statement in a transaction block generates an error,
the error is ignored and the transaction continues. When interactive,
such errors are only ignored in interactive sessions, and not when
reading script files.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql