[SQL] Use of partial index

2005-10-05 Thread Leif B. Kristensen
I'm a little confused about partial indexes. I have a couple of tables, 
like this:

CREATE TABLE events (
event_idINTEGER PRIMARY KEY,
tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
place_fkINTEGER REFERENCES places (place_id),
event_date  CHAR(18) NOT NULL DEFAULT '31',
sort_date   DATE NOT NULL DEFAULT '40041024BC',
event_text  TEXT NOT NULL DEFAULT '',
sentenceTEXT NOT NULL DEFAULT ''
);

To this table I have created a partial index:

CREATE INDEX events_born
ON events (tag_type_fk)
WHERE tag_type_fk = 2;

Another table:

CREATE TABLE participants ( -- the TMG 'E' file
participant_id  INTEGER PRIMARY KEY,
person_fk   INTEGER REFERENCES persons (person_id),
event_fkINTEGER REFERENCES events (event_id),
role_type_fkINTEGER REFERENCES role_types (role_type_id),
is_principalBOOLEAN NOT NULL DEFAULT 'f',
is_primary_eventBOOLEAN NOT NULL DEFAULT 'f',
participant_noteTEXT NOT NULL DEFAULT '',
participant_nameTEXT NOT NULL DEFAULT '',
age_meanINTEGER NOT NULL DEFAULT 0,
age_deviINTEGER NOT NULL DEFAULT 0,
CONSTRAINT person_event UNIQUE (person_id, event_id)
);

And a view:

CREATE OR REPLACE VIEW principals AS
SELECT
participants.person_fk AS person,
events.event_id AS event,
events.place_fk AS place,
events.event_date AS event_date,
events.sort_date AS sort_date,
events.tag_type_fk AS tag_type
FROM
events, participants
WHERE
events.event_id = participants.event_fk
AND
participants.is_principal IS TRUE;

Now, here's an explain select:

pgslekt= explain select event_date, place from principals where 
person=2 and tag_type=2;
   QUERY PLAN
---
 Nested Loop  (cost=0.00..23.15 rows=2 width=26)
   -  Index Scan using person_event on participants  
   (cost=0.00..13.63 rows=3 width=4)
 Index Cond: (person_fk = 2)
 Filter: (is_principal IS TRUE)
   -  Index Scan using events_pkey on events  
   (cost=0.00..3.16 rows=1 width=30)
 Index Cond: (events.event_id = outer.event_fk)
 Filter: (tag_type_fk = 2)
(7 rader)

Why doesn't this SELECT use the partial index events_born above? Is 
there any way to make this happen?
-- 
Leif Biberg Kristensen
http://solumslekt.org/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Use of partial index

2005-10-05 Thread Dmitri Bichko
As I understand it, partial indices are generally useful when you only
want to index a range of values, or if the select condition is on a
different field from the one being indexed (eg: ON foo (a) WHERE b IS
NOT NULL).

I am just guessing here, but it sounds like 'person_fk = 2' is going to
be a lot more selective (ie return less rows) than 'tag_type_fk = 2', so
it's quicker to use the pkey and then filter the results.

Depending on how many 'tag_type' values you have, indexing on it will
not help at all.  In other words, if more than a few percent of the rows
have the value '2' for 'tag_type_fg', postgres will tend to favour more
selective indices if you are doing a join, or a seqscan if you are doing
a straight select on that value.

Does that help?
Dmitri
PS Your query seems to be quite quick already, why don't you like this
plan?

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Leif B. 
 Kristensen
 Sent: Wednesday, October 05, 2005 11:17 AM
 To: Pgsql-sql@postgresql.org
 Subject: [SQL] Use of partial index
 
 
 I'm a little confused about partial indexes. I have a couple 
 of tables, 
 like this:
 
 CREATE TABLE events (
 event_idINTEGER PRIMARY KEY,
 tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
 place_fkINTEGER REFERENCES places (place_id),
 event_date  CHAR(18) NOT NULL DEFAULT '31',
 sort_date   DATE NOT NULL DEFAULT '40041024BC',
 event_text  TEXT NOT NULL DEFAULT '',
 sentenceTEXT NOT NULL DEFAULT ''
 );
 
 To this table I have created a partial index:
 
 CREATE INDEX events_born
 ON events (tag_type_fk)
 WHERE tag_type_fk = 2;
 
 Another table:
 
 CREATE TABLE participants ( -- the TMG 'E' file
 participant_id  INTEGER PRIMARY KEY,
 person_fk   INTEGER REFERENCES persons (person_id),
 event_fkINTEGER REFERENCES events (event_id),
 role_type_fkINTEGER REFERENCES role_types (role_type_id),
 is_principalBOOLEAN NOT NULL DEFAULT 'f',
 is_primary_eventBOOLEAN NOT NULL DEFAULT 'f',
 participant_noteTEXT NOT NULL DEFAULT '',
 participant_nameTEXT NOT NULL DEFAULT '',
 age_meanINTEGER NOT NULL DEFAULT 0,
 age_deviINTEGER NOT NULL DEFAULT 0,
 CONSTRAINT person_event UNIQUE (person_id, event_id)
 );
 
 And a view:
 
 CREATE OR REPLACE VIEW principals AS
 SELECT
 participants.person_fk AS person,
 events.event_id AS event,
 events.place_fk AS place,
 events.event_date AS event_date,
 events.sort_date AS sort_date,
 events.tag_type_fk AS tag_type
 FROM
 events, participants
 WHERE
 events.event_id = participants.event_fk
 AND
 participants.is_principal IS TRUE;
 
 Now, here's an explain select:
 
 pgslekt= explain select event_date, place from principals where 
 person=2 and tag_type=2;
QUERY PLAN
 --
 -
  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
-  Index Scan using person_event on participants  
(cost=0.00..13.63 rows=3 width=4)
  Index Cond: (person_fk = 2)
  Filter: (is_principal IS TRUE)
-  Index Scan using events_pkey on events  
(cost=0.00..3.16 rows=1 width=30)
  Index Cond: (events.event_id = outer.event_fk)
  Filter: (tag_type_fk = 2)
 (7 rader)
 
 Why doesn't this SELECT use the partial index events_born above? Is 
 there any way to make this happen?
 -- 
 Leif Biberg Kristensen
 http://solumslekt.org/
 
 ---(end of 
 broadcast)---
 TIP 6: explain analyze is your friend
 
The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Use of partial index

2005-10-05 Thread Tom Lane
Leif B. Kristensen [EMAIL PROTECTED] writes:
 I'm a little confused about partial indexes. I have a couple of tables, 
 like this:

 CREATE TABLE events (
 event_idINTEGER PRIMARY KEY,
 tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
 place_fkINTEGER REFERENCES places (place_id),
 event_date  CHAR(18) NOT NULL DEFAULT '31',
 sort_date   DATE NOT NULL DEFAULT '40041024BC',
 event_text  TEXT NOT NULL DEFAULT '',
 sentenceTEXT NOT NULL DEFAULT ''
 );

 To this table I have created a partial index:

 CREATE INDEX events_born
 ON events (tag_type_fk)
 WHERE tag_type_fk = 2;

 ...

 Now, here's an explain select:

 pgslekt= explain select event_date, place from principals where 
 person=2 and tag_type=2;
QUERY PLAN
 ---
  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
-  Index Scan using person_event on participants  
(cost=0.00..13.63 rows=3 width=4)
  Index Cond: (person_fk = 2)
  Filter: (is_principal IS TRUE)
-  Index Scan using events_pkey on events  
(cost=0.00..3.16 rows=1 width=30)
  Index Cond: (events.event_id = outer.event_fk)
  Filter: (tag_type_fk = 2)
 (7 rader)

 Why doesn't this SELECT use the partial index events_born above?

Because the plan it did choose is better.  events_born could only serve
to select the rows with tag_type_fk = 2; assuming there's more than one
of those rows, there would be multiple fetches needed to see if any of
them have the desired event_id.  With this plan it's getting at most one
row, by definition (since event_id is the primary key).

Had you created the partial index as

CREATE INDEX events_born
ON events (event_id)
WHERE tag_type_fk = 2;

then it would be competitive for this query, since the index could
effectively handle both constraints not just one.  (THe way you did
define it, the actual content of the index keys is just dead weight,
since they obviously must all be 2.  It's often better to define
the index column(s) of a partial index as some other column than the
one involved in the index predicate...)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Use of partial index

2005-10-05 Thread Stephan Szabo

On Wed, 5 Oct 2005, Leif B. Kristensen wrote:

 I'm a little confused about partial indexes. I have a couple of tables,
 like this:

 CREATE TABLE events (
 event_idINTEGER PRIMARY KEY,
 tag_type_fk INTEGER REFERENCES tag_types (tag_type_id),
 place_fkINTEGER REFERENCES places (place_id),
 event_date  CHAR(18) NOT NULL DEFAULT '31',
 sort_date   DATE NOT NULL DEFAULT '40041024BC',
 event_text  TEXT NOT NULL DEFAULT '',
 sentenceTEXT NOT NULL DEFAULT ''
 );

 To this table I have created a partial index:

 CREATE INDEX events_born
 ON events (tag_type_fk)
 WHERE tag_type_fk = 2;

 Another table:

 CREATE TABLE participants ( -- the TMG 'E' file
 participant_id  INTEGER PRIMARY KEY,
 person_fk   INTEGER REFERENCES persons (person_id),
 event_fkINTEGER REFERENCES events (event_id),
 role_type_fkINTEGER REFERENCES role_types (role_type_id),
 is_principalBOOLEAN NOT NULL DEFAULT 'f',
 is_primary_eventBOOLEAN NOT NULL DEFAULT 'f',
 participant_noteTEXT NOT NULL DEFAULT '',
 participant_nameTEXT NOT NULL DEFAULT '',
 age_meanINTEGER NOT NULL DEFAULT 0,
 age_deviINTEGER NOT NULL DEFAULT 0,
 CONSTRAINT person_event UNIQUE (person_id, event_id)
 );

 And a view:

 CREATE OR REPLACE VIEW principals AS
 SELECT
 participants.person_fk AS person,
 events.event_id AS event,
 events.place_fk AS place,
 events.event_date AS event_date,
 events.sort_date AS sort_date,
 events.tag_type_fk AS tag_type
 FROM
 events, participants
 WHERE
 events.event_id = participants.event_fk
 AND
 participants.is_principal IS TRUE;

 Now, here's an explain select:

 pgslekt= explain select event_date, place from principals where
 person=2 and tag_type=2;
QUERY PLAN
 ---
  Nested Loop  (cost=0.00..23.15 rows=2 width=26)
-  Index Scan using person_event on participants
(cost=0.00..13.63 rows=3 width=4)
  Index Cond: (person_fk = 2)
  Filter: (is_principal IS TRUE)
-  Index Scan using events_pkey on events
(cost=0.00..3.16 rows=1 width=30)
  Index Cond: (events.event_id = outer.event_fk)
  Filter: (tag_type_fk = 2)
 (7 rader)

 Why doesn't this SELECT use the partial index events_born above? Is
 there any way to make this happen?

I would think you'd want an index ON events(event_id) WHERE tag_type_fk=2
for the query given.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Use of partial index

2005-10-05 Thread Leif B. Kristensen
On Wednesday 05 October 2005 18:44, you wrote:
 As I understand it, partial indices are generally useful when you
 only want to index a range of values, or if the select condition is
 on a different field from the one being indexed (eg: ON foo (a) WHERE
 b IS NOT NULL).

 I am just guessing here, but it sounds like 'person_fk = 2' is going
 to be a lot more selective (ie return less rows) than 'tag_type_fk =
 2', so it's quicker to use the pkey and then filter the results.

That makes a lot of sense to me. As any person will usually participate 
in several events, I'll estimate that the ratio between person_fk=x and 
tag_type_fk=y is about 1:4.

 Depending on how many 'tag_type' values you have, indexing on it will
 not help at all.  In other words, if more than a few percent of the
 rows have the value '2' for 'tag_type_fg', postgres will tend to
 favour more selective indices if you are doing a join, or a seqscan
 if you are doing a straight select on that value.

I have only 53 different tag types, but most of them are rather rare. 
The most heavilyly used are birth=2, death=3, and marriage=4, and I've 
created similar partial indexes for the other two. In a previous MySQL 
project that's roughly sharing the same data model, I created a 
redundant table called marriages that speeded up the generation of a 
family view by a factor of four. That's why I thought that a partial 
index would have a similar effect here.

 Does that help?

Yessir, thank you very much!

 Dmitri
 PS Your query seems to be quite quick already, why don't you like
 this plan?

I didn't say that I didn't like it, - I just was a little perplexed :-)
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Use of partial index

2005-10-05 Thread Leif B. Kristensen
On Wednesday 05 October 2005 18:49, you wrote:
[Leif]
  Now, here's an explain select:
 
  pgslekt= explain select event_date, place from principals where 
  person=2 and tag_type=2;
                                         QUERY PLAN
  
---
   Nested Loop  (cost=0.00..23.15 rows=2 width=26)
     -  Index Scan using person_event on participants  
                         (cost=0.00..13.63 rows=3 width=4)
           Index Cond: (person_fk = 2)
           Filter: (is_principal IS TRUE)
     -  Index Scan using events_pkey on events  
                         (cost=0.00..3.16 rows=1 width=30)
           Index Cond: (events.event_id = outer.event_fk)
           Filter: (tag_type_fk = 2)
  (7 rader)
 
 Because the plan it did choose is better.  events_born could only
 serve to select the rows with tag_type_fk = 2; assuming there's more
 than one of those rows, there would be multiple fetches needed to see
 if any of them have the desired event_id.  With this plan it's getting
 at most one row, by definition (since event_id is the primary key).
 
 Had you created the partial index as
 
 CREATE INDEX events_born
     ON events (event_id)
     WHERE tag_type_fk = 2;
 
 then it would be competitive for this query, since the index could
 effectively handle both constraints not just one.  (THe way you did
 define it, the actual content of the index keys is just dead weight,
 since they obviously must all be 2.  It's often better to define
 the index column(s) of a partial index as some other column than the
 one involved in the index predicate...)
 
pgslekt= create index events_born2 on events (event_id) where 
tag_type_fk=2;
CREATE INDEX
pgslekt= explain select event_date, place from principals where 
person=2 and tag_type=2;
   QUERY PLAN

 Nested Loop  (cost=0.00..22.88 rows=2 width=26)
   -  Index Scan using person_event on participants  (cost=0.00..13.63 
rows=3 width=4)
 Index Cond: (person_fk = 2)
 Filter: (is_principal IS TRUE)
   -  Index Scan using events_born2 on events  (cost=0.00..3.07 rows=1 
width=30)
 Index Cond: (events.event_id = outer.event_fk)
 Filter: (tag_type_fk = 2)
(7 rader)

From 23.15 to 22.88 ... but now at least it used my partial index, as it 
does a slightly better job. I'm starting to get it - I think.

Thank you for your explanation. With regards to optimization, it seems 
that I'm still too hung up in MySQL issues. PostgreSQL seems to behave 
a lot more intelligently with queries.
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo-Linux/KDE

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match