Re: [PERFORM] Q on views and performance

2008-02-26 Thread Kynn Jones
On Mon, Feb 25, 2008 at 11:56 AM, Matthew [EMAIL PROTECTED] wrote:

 On Mon, 25 Feb 2008, Kynn Jones wrote:
  This is just GREAT!!!  It fits the problem to a tee.

 It makes the queries quick then?


It is good that you ask.  Clearly you know the story: a brilliant-sounding
optimization that in practice has only a small effect at best...

I'm totally puzzled.  It makes absolutely no sense to me...

For my analysis, in addition to creating the index on (type, zipk) that you
suggested, I also added an extra column to T containing a random integer in
the range 0..99, and created an index on this, so that I could produce a
totally shuffled clustering.  I compared the performance in going from a
randomly-clustered table to a (type, zipk)-clustered table, and the output
of EXPLAIN was encouraging, but when I ran the actual queries under EXPLAIN
ANALYZE the difference in execution time was negligible.

Live and learn!

Actually, what's driving me absolutely insane is the documentation for
EXPLAIN and for Pg's query planning in general.  I've read the docs (in
particular the chapter on performance), but I still can't make any sense of
EXPLAINs results, so I can't begin to understand why optimizations like the
one you suggested turned out to be ineffective.  For example, the first
lines of two recent EXPLAIN ANALYZE outputs are

Nested Loop Left Join  (cost=58.00..1154.22 rows=626 width=26) (actual time=
1.462..26.494 rows=2240 loops=1)
Merge Left Join  (cost=33970.96..34887.69 rows=58739 width=26) (actual time=
106.961..126.589 rows=7042 loops=1)

Actual runtimes are 27ms and 128ms.  The ratio 128/27 is much smaller than
one would expect from the relative costs of the two queries.  It looks like
there is no proportionality at all between the estimated costs and actual
running time...  (BTW, all these runs of EXPLAIN were done after calls to
VACUUM ANALYZE.)  This is one of the many things I don't understand about
this case...

What I would like to be able to do is to at least make enough sense of query
plans to determine whether they are reasonable or not.  This requires
knowing the algorithms behind each type of query tree node, but I have not
found this info...

On the positive side, in the course of all this analysis I must have done
*something* to improve the performance, because now even the unoptimized
queries are running pretty fast (e.g. queries that used to take about
1.5seconds are now taking 130ms).  But unfortunately I don't know what
was it
that I did to bring this speed-up about!

Anyway, be that as it may, thank you very much for your suggestion.

Kynn


Re: [PERFORM] Q on views and performance

2008-02-25 Thread Matthew

On Fri, 22 Feb 2008, Kynn Jones wrote:

Hi.  I'm trying to optimize...

(Q1)   SELECT a1.word, a2.word
FROM T a1 JOIN T a2 USING ( zipk )
   WHERE a1.type = int1
 AND a2.type = int2;


Okay, try this:

Create an index on T(type, zipk), and then CLUSTER on that index. That 
will effectively group all the data for one type together and sort it by 
zipk, making a merge join very quick indeed. I'm not sure whether Postgres 
will notice that, but it's worth a try.



More specifically, how can I go about building table T and the views
Vint?'s to maximize the performance of (Q1)?  For example, I'm thinking
that if T had an additional id column and were built in such a way that all
the records belonging to each Vint? were physically contiguous, and (say)
had contiguous values in the id column, then I could define each view like
this


The above index and CLUSTER will effectively do this - you don't need to 
introduce another field.


Alternatively, you could go *really evil* and pre-join the table. 
Something like this:


CREATE TABLE evilJoin AS SELECT a1.type AS type1, a2.type AS type2,
a1.zipk, a1.word AS word1, a2.word AS word2
  FROM T AS a1, T AS a2
  WHERE a1.zipk = a2.zipk
  ORDER BY a1.type, a2.type, a1.zipk;
CREATE INDEX evilIndex1 ON evilJoin(type1, type2, zipk);

Then your query becomes:

SELECT word1, word2
   FROM evilJoin
   WHERE type1 = int1
 AND type2 = int2

which should run quick. However, your cache usefulness will be reduced 
because of the extra volume of data.


Matthew

--
[About NP-completeness] These are the problems that make efficient use of
the Fairy Godmother.-- Computer Science Lecturer

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Q on views and performance

2008-02-25 Thread Matthew
So, this email is directed much more towards Postgres Powers That Be. I 
came across this problem a while ago, and I haven't checked whether it has 
been improved.


On Mon, 25 Feb 2008, I wrote:

Hi.  I'm trying to optimize...

(Q1)   SELECT a1.word, a2.word
FROM T a1 JOIN T a2 USING ( zipk )
   WHERE a1.type = int1
 AND a2.type = int2;


Create an index on T(type, zipk), and then CLUSTER on that index. That will 
effectively group all the data for one type together and sort it by zipk, 
making a merge join very quick indeed. I'm not sure whether Postgres will 
notice that, but it's worth a try.


Statistics are generated on fields in a table, and the one I'm interested 
in is the correlation coefficient which tells Postgres how costly an index 
scan sorted on that field would be. This entry is ONLY useful when the 
result needs to be sorted by that exact field only. For example:


CREATE TABLE test (a int, b int);
// insert a bazillion entries
CREATE INDEX testIndex ON test(a, b);
CLUSTER test ON testIndex;
ANALYSE;

So now we have a table sorted by (a, b), but the statistics only record 
the fact that it is sorted by a, and completely unsorted by b. If we run:


SELECT * FROM test ORDER BY a;

then the query will run quickly, doing an index scan. However, if we run:

SELECT * FROM test ORDER BY a, b;

then Postgres will not be able to use the index, because it cannot tell 
how sequential the fetches from the index will be. Especially if we run:


SELECT * FROM test WHERE a = something ORDER BY b;

then this is the case.

So, these observations were made a long time ago, and I don't know if they 
have been improved. A while back I suggested a partial sort algorithm 
that could take a stream sorted by a and turn it into a stream sorted by 
(a, b) at small cost. That would fix some instances of the problem. 
However, now I suggest that the statistics are in the wrong place.


At the moment, the correlation coefficient, which is an entry purely 
designed to indicate how good an index is at index scans, is a statistic 
on the first field of the index. Why not create a correlation coefficient 
statistic for the index as a whole instead, and store it elsewhere in the 
statistics data? That way, instead of having to infer from the first field 
how correlated an index is, and getting it wrong beyond the first field, 
you can just look up the correlation for the index.


Opinions?

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Q on views and performance

2008-02-25 Thread Kynn Jones
On Mon, Feb 25, 2008 at 8:45 AM, Matthew [EMAIL PROTECTED] wrote:

 On Fri, 22 Feb 2008, Kynn Jones wrote:
  Hi.  I'm trying to optimize...
 
  (Q1)   SELECT a1.word, a2.word
  FROM T a1 JOIN T a2 USING ( zipk )
 WHERE a1.type = int1
   AND a2.type = int2;

 Okay, try this:

 Create an index on T(type, zipk), and then CLUSTER on that index...


This is just GREAT!!!  It fits the problem to a tee.

Many, many thanks!

Also, including zipk in the index is a really nice extra boost.  (If you
hadn't mentioned it I would have just settled for clustering only on
type...)

Thanks for that also!

Kynn


Re: [PERFORM] Q on views and performance

2008-02-25 Thread Matthew

On Mon, 25 Feb 2008, Kynn Jones wrote:

This is just GREAT!!!  It fits the problem to a tee.


It makes the queries quick then?

Matthew

--
The only secure computer is one that's unplugged, locked in a safe,
and buried 20 feet under the ground in a secret location...and i'm not
even too sure about that one. --Dennis Huges, FBI

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Q on views and performance

2008-02-23 Thread Kynn Jones
On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) 
[EMAIL PROTECTED] wrote:

 On 2008-02-22 12:49, Kynn Jones wrote:
  Of course, I expect that using views Vint1 and Vint2... would
  result in a loss in performance relative to a version that used bona
  fide tables Tint1 and Tint2.  My question is, how can I minimize
  this performance loss?

 That used to be my thoughts too, but I have found over the years that
 the PostgreSQL execution planner is able to flatten SELECTs using
 VIEWs, ALMOST ALWAYS in a way that does not adversely affect
 performance, and often gives an IMPROVEMENT in performance, probably
 because by using VIEWs I am stating the query problem in a better way
 than if I try to guess the best way to optimize a SELECT.


Well, the last consideration you mention there does not apply to the two
alternatives I'm comparing because they differ only in that one uses views
V1, V2, V3, ... , V100 where the other one uses the corresponding tables T1,
T2, T3, ... , T100, so the query statements would be identical in both
cases.


 I have at least a 10:1 ratio of VIEWs to TABLEs.  Occasionally, with
 some query that is slow, I will try to rewrite it without VIEWs.  This
 ALMOST NEVER results in an improvement in performance...


That's truly amazing!  Just to make sure I get you right, you're saying that
when you replace a view by its equivalent table you see no performance gain?
 How could it be?  With views every query entails the additional work of
searching the underlying tables for the records that make up the views...

OK, if I think a bit more about it I suppose that a view could be
implemented for performance as a special sort of table consisting of a
single column of pointers to the true records, in which case using views
would entail only the cost of this indirection, and not the cost of a
search...  (And also the cost of maintaining this pointer table, if the
underlying tables are mutable.)  So I guess views could be implemented in
such a way that the difference in SELECT performance relative to replacing
them with tables would be negligible...

Anyway, your post once again reminded me of awesomeness of PostgreSQL.
 Props to the developers!

kynn


Re: [PERFORM] Q on views and performance

2008-02-23 Thread Kynn Jones
On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) 
[EMAIL PROTECTED] wrote:

 On 2008-02-22 12:49, Kynn Jones wrote:
  Of course, I expect that using views Vint1 and Vint2... would
  result in a loss in performance relative to a version that used bona
  fide tables Tint1 and Tint2.  My question is, how can I minimize
  this performance loss?

 That used to be my thoughts too, but I have found over the years that
 the PostgreSQL execution planner is able to flatten SELECTs using
 VIEWs, ALMOST ALWAYS in a way that does not adversely affect
 performance, and often gives an IMPROVEMENT in performance, probably
 because by using VIEWs I am stating the query problem in a better way
 than if I try to guess the best way to optimize a SELECT.

 I have at least a 10:1 ratio of VIEWs to TABLEs.  Occasionally, with
 some query that is slow, I will try to rewrite it without VIEWs.  This
 ALMOST NEVER results in an improvement in performance, and when it does,
 I am able to find another way to write the VIEW and SELECT to recapture
 the gain.


Since you have experience working with views, let me ask you this.  The
converse strategy to the one I described originally would be to create the
individual tables T1, T2, T3, ..., T100, but instead of keeping around the
original (and now redundant) table T, replace it with a view V made up of
the union of T1, T2, T3, ..., T100.  The problem with this alternative is
that one cannot index V, or define a primary key constraint for it, because
it's a view.  This means that a search in V, even for a primary key value,
would be *have to be* very inefficient (i.e. I don't see how even the very
clever PostgreSQL implementers could get around this one!), because the
engine would have to search *all* the underlying tables, T1 through T100,
even if it found the desired record in T1, since it has no way of knowing
that the value is unique all across V.

Is there a way around this?

kynn


Re: [PERFORM] Q on views and performance

2008-02-23 Thread Dean Gibson (DB Administrator)

On 2008-02-23 05:59, Kynn Jones wrote:
On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:


...


Since you have experience working with views, let me ask you this. 
 The converse strategy to the one I described originally would be to 
create the individual tables T1, T2, T3, ..., T100, but instead of 
keeping around the original (and now redundant) table T, replace it 
with a view V made up of the union of T1, T2, T3, ..., T100.  The 
problem with this alternative is that one cannot index V, or define a 
primary key constraint for it, because it's a view.  This means that a 
search in V, even for a primary key value, would be *have to be* very 
inefficient (i.e. I don't see how even the very clever PostgreSQL 
implementers could get around this one!), because the engine would 
have to search *all* the underlying tables, T1 through T100, even if 
it found the desired record in T1, since it has no way of knowing that 
the value is unique all across V.


Is there a way around this?

kynn

Oh, I wouldn't create separate tables and do a UNION of them, I'd think 
that would be inefficient.


I didn't look in detail at your previous eMail, but I will now:

1. You haven't told us the distribution of zipk, or what the tables 
are indexed on, or what type of performance you are expecting.  Your 
initial examples don't help much unless you actually have performance 
numbers or EXPLAIN output for them, since adding the third JOIN 
significantly changes the picture, as does changing one of the JOINs to 
a LEFT JOIN.


2. In your actual (Q1** and Q2**) examples, why is one JOIN an INNER 
JOIN and the other one a LEFT JOIN?  Given your description of Q1 at the 
top of your message, that doesn't make sense to me.


3. Why not write:

CREATE VIEW txt AS
 SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type 
AS type2

   FROM T a1 [LEFT] JOIN T a2 USING( zipk );  -- Use LEFT if appropriate
SELECT word1, word1
 FROM S JOIN txt ON word = word1
 WHERE type1 = int1 AND type2 = int2;

If either of those (either with or without the LEFT) are not 
equivalent to your problem, how about just:


SELECT a1.word AS word1, a2.word AS word2
 FROM S JOIN T a1 USING( word)
   [LEFT] JOIN T a2 USING( zipk )  -- Use LEFT if appropriate
 WHERE a1.type = int1 AND a2.type = int2;

Show us (using EXPLAIN) what the query planner thinks of each of these.

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.



Re: [PERFORM] Q on views and performance

2008-02-23 Thread Dean Gibson (DB Administrator)

On 2008-02-23 07:08, Dean Gibson (DB Administrator) wrote:

...


SELECT word1, word1
  FROM S JOIN txt ON word = word1
  WHERE type1 = int1 AND type2 = int2;

...

Oops that should be:

SELECT word1, word2
 FROM S JOIN txt ON word = word1
 WHERE type1 = int1 AND type2 = int2;


--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.



Re: [PERFORM] Q on views and performance

2008-02-23 Thread Kynn Jones
Hi, Dean.  The system I'm working with is very similar in spirit to a
large multilingual dictionary covering 100 languages.  Using this analogy,
the type column would correspond to the language, and the zipk column
would correspond to some language-independent key associated with a concept
(concept key for short).  So, if it were indeed a multilingual dictionary,
records in T would look like
  word   | zipk | language
-+--+---
 house   | 1234 | english
 casa| 1234 | spanish
 haus| 1234 | german
 piano   | 2345 | english
 piano   | 2345 | spanish
 cat | 3456 | english
 chat| 3456 | french
 chat| 4567 | english
 plausch | 4567 | german

...where I used the notation lang to denote the integer id assigned to
language lang.  Therefore typically there are about 100 records in T for
any given zipk, one for each language.  But the correspondence is not
perfect, since, for example, some languages have, proverbially, more than
one word for snow, and some (maybe from some tropical island in the South
Pacific) have none.  (This last case, BTW, is what accounts for the use of
left joins, as will become clear in a minute.)

The table S can be thought of a table consisting of a collection of words to
be translated to some target language.  In the first type of query (Q1), all
the words in S are effectively declared to belong to the same source
language, whereas in the second type of query (Q2) the source language for
the words in S is left unspecified (in this case S may contain words from
various languages, or words--like piano or chat in the example
above--that belong simultaneously to different languages, and which may (e.g.
piano) or may not (e.g. chat) have the same zipk [concept key] for each of
these languages).

So, regarding your question about (Q1**) and (Q2**):

(Q1**)  SELECT a1.word, sq.word FROM
   S  JOIN T a1 USING ( word )
 LEFT JOIN ( SELECT * FROM T a2
 WHERE a2.type = int2 ) sq USING ( zipk )
 WHERE a1.type = int1;

(Q2**)  SELECT a1.word, sq.word FROM
   S  JOIN T a1 USING ( word )
 LEFT JOIN ( SELECT * FROM T a2
 WHERE a2.type = int2 ) sq USING ( zipk )

...the inner join with S is intended to pick out all the records in the
source table (either Tint1 in Q1** or T in Q2**) corresponding to words in
S, while the second (left) join, is there to find all the translations in
the target language.  I use a left join so that even those words in S for
which no translations exist will show up in the query results.

3. Why not write:

 CREATE VIEW txt AS
   SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS
 type2
 FROM T a1 [LEFT] JOIN T a2 USING( zipk );  -- Use LEFT if
 appropriate
 SELECT word1, word1
   FROM S JOIN txt ON word = word1
   WHERE type1 = int1 AND type2 = int2;


This is would indeed produce the same results as Q1, but this approach would
require defining about 10,000 views, one for each possible pair of int1 and
int2 (or pair of languages, to continue the multilingual dictionary
analogy), which freaks me out for some reason.  (Actually, the number of
such views would be many more than that, because in the actual application
there is not just one T but several dozen, similar to what would happen to
the schema in the multilingual dictionary analogy if we wanted to
pre-segregate the words according to some categories, say a T for animals, a
T for fruits, a T for verbs, a T for professions, etc.)

(I need to do a bit more work before I can post the EXPLAIN results.)

kynn


Re: [PERFORM] Q on views and performance

2008-02-23 Thread Dean Gibson (DB Administrator)

On 2008-02-23 08:21, Kynn Jones wrote:

...

3. Why not write:

CREATE VIEW txt AS
  SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2,
a2.type AS type2
FROM T a1 [LEFT] JOIN T a2 USING( zipk );  -- Use LEFT if
appropriate
SELECT word1, word1
  FROM S JOIN txt ON word = word1
  WHERE type1 = int1 AND type2 = int2;


This is would indeed produce the same results as Q1, but this approach 
would require defining about 10,000 views, one for each possible pair 
of int1 and int2


Why 10,000 views???  What's wrong with the ONE view above?  You DON'T 
want to be defining VIEWs based on actual tables VALUES;  leave that to 
the SELECT.  For that matter, what's wrong with the final SELECT I 
listed (below)?


SELECT a1.word AS word1, a2.word AS word2
 FROM S JOIN T a1 USING( word )
   LEFT JOIN T a2 USING( zipk )
 WHERE a1.type = int1 AND a2.type = int2;

-- Dean

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.



Re: [PERFORM] Q on views and performance

2008-02-23 Thread Dean Gibson (DB Administrator)

On 2008-02-23 08:49, Dean Gibson (DB Administrator) wrote:
Why 10,000 views???  What's wrong with the ONE view above?  You DON'T 
want to be defining VIEWs based on actual tables VALUES;  leave that 
to the SELECT.  For that matter, what's wrong with the final SELECT I 
listed (below)?


SELECT a1.word AS word1, a2.word AS word2
  FROM S JOIN T a1 USING( word )
LEFT JOIN T a2 USING( zipk )
  WHERE a1.type = int1 AND a2.type = int2;

-- Dean

Amendment:  I forgot, that if it's a LEFT JOIN you have to write it as:

SELECT a1.word AS word1, a2.word AS word2
 FROM S JOIN T a1 USING( word )
   LEFT JOIN T a2 USING( zipk )
 WHERE a1.type = int1 AND (a2.type = int2 OR a2.type IS NULL);

-- Dean

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.



[PERFORM] Q on views and performance

2008-02-22 Thread Kynn Jones
Hi.  I'm trying to optimize the performance of a database whose main purpose
is to support two (rather similar) kinds of queries.  The first kind, which
is expected to be the most common (I estimate it will account for about 90%
of all the queries performed on this DB), has the following general
structure:

(Q1)   SELECT a1.word, a2.word
 FROM T a1 JOIN T a2 USING ( zipk )
WHERE a1.type = int1
  AND a2.type = int2;

...where int1 and int2 stand for some two integers.  In English, this
query essentially executes an inner join between two virtual subtables of
table T, which are defined by the value of the type column.  For brevity, I
will refer to these (virtual) subtables as Tint1 and Tint2.  (I should
point out that T holds about 2 million records, spread roughly evenly over
about 100 values of the type column.  So each of these virtual subtables has
about 20K records.  Also, for practical purposes T may be regarded as an
immutable, read-only table, since it gets re-built from scratch about once a
month.  And, FWIW, all the columns mentioned in this post have a NOT
NULLconstraint.)

The second form is similar to the first, except that now the join is taken
between T and Tint2:

(Q2)   SELECT a1.word, a2.word
 FROM T a1 JOIN T a2 USING ( zipk )
WHERE a2.type = int2;

(Both the forms above are somewhat oversimplified relative to the actual
situation; in our actual application, the joins are actually left outer
ones, and each query also involves an additional inner join with another
table, S.  For the sake of completeness, I give the real-world versions of
these queries at the end of this post, but I think that for the purpose of
my question, the additional complications they entail can be neglected.)

One way to speed (Q1) would be to break T into its subtables, i.e. to create
T1, T2, T3, ... , T100 as bona fide tables.  Then the query would become a
simple join without the two condition of the original's WHERE clause, which
I figure should make it noticeably faster.

But since the second kind of query (Q2) requires T, we can't get rid of this
table, so all the data would need to be stored twice, once in T and once in
some Tint*.

In trying to come up with a way around this duplication, it occurred to me
that instead of creating tables T1, T2, etc., I could create the analogous
views V1, V2, etc.  (e.g. CREATE VIEW V1 AS SELECT * FROM T WHERE type = 1).
 With this design, the two queries above would become

(Q1*)  SELECT Vint1.word, Vint2.word
 FROM Vint1 JOIN Vint2 USING ( zipk );

(Q2*)  SELECT T.word, Vint2.word
 FROM T JOIN Vint2 USING ( zipk );

Of course, I expect that using views Vint1 and Vint2... would result in
a loss in performance relative to a version that used bona fide tables
Tint1 and Tint2.  My question is, how can I minimize this performance
loss?

More specifically, how can I go about building table T and the views
Vint?'s to maximize the performance of (Q1)?  For example, I'm thinking
that if T had an additional id column and were built in such a way that all
the records belonging to each Vint? were physically contiguous, and (say)
had contiguous values in the id column, then I could define each view like
this

  CREATE VIEW Vint1 AS SELECT * FROM T
   WHERE start_int1 = id AND id  start_int1+1;

So my question is, what design would make querying V1, V2, V3 ... as fast as
possible?  Is it possible to approach the performance of the design that
uses bona fide tables T1, T2, T3, ... instead of views V1, V2, V3 ...?

Thank you very much for reading this long post, and many thanks in advance
for your comments!

Kynn


P.S.  Here are the actual form of the queries.  They now include an initial
join with table S, and the join with Tint2 (or Vint2) is a left outer
join.  Interestingly, even though the queries below that use views (i.e.
Q1*** and Q2***) are not much more complex-looking than before, the other
two (Q1** and Q2**) are.  I don't know if this is because my ineptitude with
SQL, but I am not able to render (Q1**) and (Q2**) without resorting to the
subquery sq.

(Q1**)  SELECT a1.word, sq.word FROM
   S  JOIN T a1 USING ( word )
 LEFT JOIN ( SELECT * FROM T a2
 WHERE a2.type = int2 ) sq USING ( zipk )
 WHERE a1.type = int1;

(Q2**)  SELECT a1.word, sq.word FROM
   S  JOIN T a1 USING ( word )
 LEFT JOIN ( SELECT * FROM T a2
 WHERE a2.type = int2 ) sq USING ( zipk )

   -

(Q1***) SELECT Vint1.word, Vint2.word FROM
   S  JOIN Vint1 USING ( word )
 LEFT JOIN Vint2 USING ( zipk );

(Q2***) SELECT T.word, Vint2.word
  FROM S  JOIN T   USING ( word )
 LEFT JOIN Vint2 USING ( zipk );


Re: [PERFORM] Q on views and performance

2008-02-22 Thread Dean Gibson (DB Administrator)

On 2008-02-22 12:49, Kynn Jones wrote:
Of course, I expect that using views Vint1 and Vint2... would 
result in a loss in performance relative to a version that used bona 
fide tables Tint1 and Tint2.  My question is, how can I minimize 
this performance loss?


That used to be my thoughts too, but I have found over the years that 
the PostgreSQL execution planner is able to flatten SELECTs using 
VIEWs, ALMOST ALWAYS in a way that does not adversely affect 
performance, and often gives an IMPROVEMENT in performance, probably 
because by using VIEWs I am stating the query problem in a better way 
than if I try to guess the best way to optimize a SELECT.


I have at least a 10:1 ratio of VIEWs to TABLEs.  Occasionally, with 
some query that is slow, I will try to rewrite it without VIEWs.  This 
ALMOST NEVER results in an improvement in performance, and when it does, 
I am able to find another way to write the VIEW and SELECT to recapture 
the gain.


-- Dean

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly