[PERFORM] Bad performance with hashjoin

2004-09-11 Thread Vitaly Belman
Here's the query:

---
SELECT * FROM bv_reviews r, bv_votes v 
WHERE r.vote_id = v.vote_id 
AND v.book_id = 113
---

bv_votes has around 7000 rows with the given book_id and bv_reviews
has 10 reviews. Thus the resulting table consists of only 10 rows.

That's the regular EXPLAIN of the query:

---
QUERY PLAN
Hash Join  (cost=169.36..49635.37 rows=2117 width=897) (actual
time=13533.550..15107.987 rows=10 loops=1)
  Hash Cond: (outer.vote_id = inner.vote_id)
  -  Seq Scan on bv_reviews r  (cost=0.00..45477.42 rows=396742
width=881) (actual time=12.020..13305.055 rows=396742 loops=1)
  -  Hash  (cost=151.96..151.96 rows=6960 width=16) (actual
time=24.673..24.673 rows=0 loops=1)
-  Index Scan using i_votes_book_id on bv_votes v 
(cost=0.00..151.96 rows=6960 width=16) (actual time=0.035..14.970
rows=7828 loops=1)
  Index Cond: (book_id = 113)
Total runtime: 15109.126 ms
---

And here is what happens when I turn the hashjoin to off:

---
QUERY PLAN
Nested Loop  (cost=0.00..53799.79 rows=2117 width=897) (actual
time=4.260..79.721 rows=10 loops=1)
  -  Index Scan using i_votes_book_id on bv_votes v 
(cost=0.00..151.96 rows=6960 width=16) (actual time=0.071..14.100
rows=7828 loops=1)
Index Cond: (book_id = 113)
  -  Index Scan using i_bv_reviews_vote_id on bv_reviews r 
(cost=0.00..7.70 rows=1 width=881) (actual time=0.007..0.007 rows=0
loops=7828)
Index Cond: (r.vote_id = outer.vote_id)
Total runtime: 79.830 ms
---

What am I to do? Are there hints (like in Oracle) in PostgreSQL to
force it to use the i_bv_reviews_vote_id index instead of doing a
seq.scan? Or is something wrong with my Postgresql settings?

-- 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Performance hit on loading from HD

2004-08-28 Thread Vitaly Belman
I have a problem with certain queries performance. Trouble is that
while their execution plan is pretty good and mostly their execution
is great as well, their FIRST execution time (that is after you mount
the database) is abysmal.

I realize that it happens due to the loading of data from the HD to
the memory/swap and it wouldn't be too bad if I just could make the
data stay in the memory, sadly, after a few minutes the data is back
on the HD and running the query again results the same bad
performance.

Here's a query for example, though as I said, this problem occurs in
different queries.

---

  SELECT *
FROM bv_bookgenres, bv_books
   WHERE bv_books.book_id = bv_bookgenres.book_id and genre_id = 987 
ORDER BY vote_avg limit 10

---

---

QUERY PLAN
Limit  (cost=2601.16..2601.18 rows=10 width=193) (actual
time=4735.097..4735.107 rows=10 loops=1)
  -  Sort  (cost=2601.16..2601.70 rows=219 width=193) (actual
time=4735.092..4735.095 rows=10 loops=1)
Sort Key: bv_books.vote_avg
-  Nested Loop  (cost=0.00..2592.64 rows=219 width=193)
(actual time=74.615..4719.147 rows=1877 loops=1)
  -  Index Scan using i_bookgenres_genre_id on
bv_bookgenres  (cost=0.00..1707.03 rows=218 width=4) (actual
time=74.540..2865.366 rows=1877 loops=1)
Index Cond: (genre_id = 987)
  -  Index Scan using bv_books_pkey on bv_books 
(cost=0.00..4.05 rows=1 width=193) (actual time=0.968..0.971 rows=1
loops=1877)
Index Cond: (bv_books.book_id = outer.book_id)
Total runtime: 4735.726 ms

---

If I run the query again after it just finished running I would get
the following timing:

---

Limit  (cost=3937.82..3937.84 rows=10 width=204)
  -  Sort  (cost=3937.82..3938.38 rows=223 width=204)
Sort Key: bv_books.vote_avg
-  Nested Loop  (cost=0.00..3929.12 rows=223 width=204)
  -  Index Scan using i_bookgenres_genre_id on
bv_bookgenres  (cost=0.00..1731.94 rows=222 width=8)
Index Cond: (genre_id = 987)
  -  Index Scan using bv_books_pkey on bv_books 
(cost=0.00..9.88 rows=1 width=196)
Index Cond: (bv_books.book_id = outer.book_id)

---

Before going on, I should say that I am running PostgreSQL on CoLinux
under Windows 2000. From what I read/tested, the CoLinux performance
on CoLinux are matching to the performance of VMWare. Yet, I'm still
wondering if it is a side effect of my development setup or if some of
my settings are indeed wrong.

With that said, here is the information of the tables:

---

CREATE TABLE bv_books
(
  book_id serial NOT NULL,
  book_name varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  reviews_error int4 NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  book_genre int4[],
  book_name_fulltext tsearch2.tsvector,
  book_name_fulltext2 tsearch2.tsvector,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id),
  CONSTRAINT fk_books_annotation_desc_id FOREIGN KEY
(annotation_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
  CONSTRAINT fk_books_description_desc_id FOREIGN KEY
(description_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
  CONSTRAINT fk_books_series_id FOREIGN KEY (series_id) REFERENCES
bv_series (series_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) 
WITH OIDS;

CREATE TABLE bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES
bv_books (book_id) ON UPDATE RESTRICT ON DELETE CASCADE,
  CONSTRAINT fk_bookgenres_genre_id FOREIGN KEY (genre_id) REFERENCES
bv_genres (genre_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) 
WITH OIDS;

---

As far as the data is concerned, there are around 170,000 rows in
bv_books and 940,000 in bv_bookgenres. There are also btree index on
all the relevant (to the query) fields.

I can live up with the fact that the data has to be loaded the first
time it is accessed, but is it possible to make it stick longer in the
memory? Is it the fact that CoLinux gets only 128MB of RAM? Or one of
my settings should be fixed?

Thanks


Re: [PERFORM] Visual Explain

2004-06-17 Thread Vitaly Belman
Is it possible to download the Visual Explain only (link)? I only see
that you can donwload the whole ISO (which I hardly need).

On Thu, 17 Jun 2004 13:52:15 +0100, Paul Thomas [EMAIL PROTECTED] wrote:
 
 
 On 17/06/2004 12:10 Adam Witney wrote:
 
  Will this run on other platforms? OSX maybe?
 
 It's a Java app so it runs on any any platform with a reasonably modern
 Java VM.
 
 --
 Paul Thomas
 +--+-+
 | Thomas Micro Systems Limited | Software Solutions for
 Business |
 | Computer Consultants |
 http://www.thomas-micro-systems-ltd.co.uk   |
 +--+-+
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Visual Explain

2004-06-17 Thread Vitaly Belman
I see. Thanks :).

On Thu, 17 Jun 2004 18:35:11 +0100, Paul Thomas [EMAIL PROTECTED] wrote:
 
 On 17/06/2004 17:54 Vitaly Belman wrote:
  Is it possible to download the Visual Explain only (link)? I only see
  that you can donwload the whole ISO (which I hardly need).
 
 You can get it from CVS and build it yourself.
 
 
 
 --
 Paul Thomas
 +--+-+
 | Thomas Micro Systems Limited | Software Solutions for
 Business |
 | Computer Consultants |
 http://www.thomas-micro-systems-ltd.co.uk   |
 +--+-+


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Additional select fields in a GROUP BY

2004-06-13 Thread Vitaly Belman
Bruno:

It wasn't exactly my case but you did give me an idea by this tip,
changing a perspective did quite good to the timing of this query.

Tom:

Hmm.. I am not sure how I can demonstrate this to you... To see the
time differences you'd need the whole table.. That's quite a lot of
data to be posted on a mailing list, if you wish to test it on your
side, I'll dump this table partly and send them to you somehow.

I do stand by what I said though, here's the real query example:

Original query (execution time, 800ms):

select s.series_id, avg(vote_avg), sum(vote_count) 
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=6523.51..6523.53 rows=10 width=12)
  -  Sort  (cost=6523.51..6566.27 rows=17104 width=12)
Sort Key: sum(b.vote_count)
-  GroupAggregate  (cost=1368.54..5320.92 rows=17104 width=12)
  -  Merge Join  (cost=1368.54..4796.91 rows=58466 width=12)
Merge Cond: (outer.series_id = inner.series_id)
-  Merge Join  (cost=0.00..6676.41 rows=65902 width=16)
  Merge Cond: (outer.series_id = inner.series_id)
  -  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=4)
  -  Index Scan using i_books_series_id on
bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
-  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
  Sort Key: sg.series_id
  -  Index Scan using i_seriesgenres_genre_id
on bv_seriesgenres sg  (cost=0.00..314.83 rows=15173 width=4)
Index Cond: (genre_id = 1)


Query with added GROUP BY members (execution time, 1400ms):

select s.series_id, s.series_name, s.series_picture, avg(vote_avg),
sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id, s.series_name, s.series_picture
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=12619.76..12619.79 rows=10 width=47)
  -  Sort  (cost=12619.76..12662.52 rows=17104 width=47)
Sort Key: sum(b.vote_count)
-  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
  -  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture
-  Merge Join  (cost=1368.54..4796.91 rows=58466 width=47)
  Merge Cond: (outer.series_id = inner.series_id)
  -  Merge Join  (cost=0.00..6676.41
rows=65902 width=51)
Merge Cond: (outer.series_id =
inner.series_id)
-  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=39)
-  Index Scan using i_books_series_id
on bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
  -  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
Sort Key: sg.series_id
-  Index Scan using
i_seriesgenres_genre_id on bv_seriesgenres sg  (cost=0.00..314.83
rows=15173 width=4)
  Index Cond: (genre_id = 1)

Notice that the GROUP BY items added the following to the plan:

-  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
  -  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
Sort Key: s.series_id, s.series_name, s.series_picture

Which eventually almost doubles the execution time.


On Sun, 13 Jun 2004 08:52:12 -0500, Bruno Wolff III [EMAIL PROTECTED] wrote:
 
 On Sun, Jun 13, 2004 at 06:21:17 +0300,
  Vitaly Belman [EMAIL PROTECTED] wrote:
 
  Consider the following query:
 
  select t1field1, avg(t2fieild2)
  from t1, t2
  where t1.field1 = t2.field2
  group by t1field1
 
  That works fine. But I'd really like to see more fields of t1 in this
  query, however I can't add them into the select because they're not
  part of the GROUP BY, thus I have to add them to there too:
 
 If t1.field1 is a candiate key for t1, then the normal thing to do is
 to group t2 by t2.field1 (assuming you really meant to join on t2.field1,
 not t2.field2) and THEN join to t1. That may even be faster than the way you
 are doing things now.
 
 So the query would look like:
 
 SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1,
  (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a
  WHERE t1.field1 = a.field1
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Additional select fields in a GROUP BY

2004-06-12 Thread Vitaly Belman
Hello,

Consider the following query:

select t1field1, avg(t2fieild2)
from t1, t2
where t1.field1 = t2.field2
group by t1field1

That works fine. But I'd really like to see more fields of t1 in this
query, however I can't add them into the select because they're not
part of the GROUP BY, thus I have to add them to there too:

select t1field1, t1field2, t1field3, avg(t2fieild2)
from t1, t2
where t1.field1 = t2.field2
group by t1field1, t1field2, t1field3

The problem is that addind them all to GROUP BY causes a performance
loss.. The only solution I found is using a subquery like this:

select * from
t1, (select t1field1, avg(t2fieild2)
from t1, t2
where t1.field1 = t2.field2
group by t1field1) t1inner
where t1.field1 = t1inner.field1

It works just fine.. But I prefer not to use subqueries unless I am
really forced to due to the design of my application.

Another solution I considered is using aggreate function like that:

select t1field1, max(t1field2), max(t1field3), avg(t2fieild2)
from t1, t2
where t1.field1 = t2.field2
group by t1field1

Sadly, this caused the same performance... I wonder though, is it
possible to make an aggregate function like first(), last() in Oracle
(IIRC)? I believe that in such cases MySQL does first() by itself.

Other ideas are welcome too.
  

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-01 Thread Vitaly Belman
Hello pgsql-performance,

  I was using the native windows PostgreSQL 7.5Dev and was adviced by
  several people to use an emulated PostgreSQL instead, as it is just
  a beta.

  Well, I give it a whirl and tried both commercial VMWare and the
  freeweare open-source CoLinux, both work under Windows and both
  emulate Linux, that's a quick review of my experience with them, may
  someone in need learn from it.

  This might be not the best place for such a post, but since the
  subject was brought up here, I'll post it here as well. If someone
  thinks it should be posted somewhere else, let me know.

  Installation  Configuration
  
  
  VMWare:

  On the bright side, the installation went quite smoothly, VMWare
  configured all the network stuff by itself and I had no trouble
  using the net right away. On the grim side, the installation itself
  took ages, compared to the plug  play feel of CoLinux.

  Installing PostgreSQL on VMWare was quite straightforward, just as
  the the PostgreSQL documention goes.

  CoLinux:

  As I said, with CoLinux the installation itself goes very quickly.
  To get Linux running you need to download practically less than 20mb
  which include the distribution (Debian in my case) and the CoLinux
  setup. Configuring CoLinux took a bit longer than VMWare, yet, not
  long as I thought it would take. In fact, it can be very easy if you
  just follow the documention of CoLinux Wiki stuff, there are some
  very easy to follow tutorials there.

  Installing PostgreSQL on CoLinux proved a little more difficult
  (again, Debian), but I posted a quick tutorial that should smooth
  the process: http://www.colinux.org/wiki/index.php/PostgreSQL.

  Performance
  ---

  This was a totally subjective test (especially since one of the
  participants is in a beta stage), yet, that's what I tested and that's
  what I needed to know.

  To make the test as fair as possible, I did an exact dump of the
  same database. I ran the SQLs (around 10) in the same order on all
  of them and repeated the test several times. I also did an EXPLAIN
  on the queries to make sure all the databases work on the query the
  same way. It wasn't a full test though, I didn't test mass select
  load, nor inserts, nor work under heavy load, nor I tried different
  types of joins. All I did was to run some heavy (in execution time)
  queries. So you should take these tests just for what they are.

  That's what I got:

  The native window port performed poorly lagging
  30%-50% behind the VMWare/CoLinux solutions in execution times,
  rather sad, but not unexpected, I guess.

  CoLinux and VMWare give AROUND the same results, yet CoLinux did
  give slightly better performance (I'd say 5%-10%) but with such
  slight improvement and inconsistency I wouldn't count it as much.

  Conclusion
  --

  With all that said, VMWare is badly suited for running a database,
  while CoLinux can be run as a service (didn't try it yet though),
  VMWare always sits there, it is slow to go up, slow to go down and
  generally feels like a system hog.

  I'll go on with CoLinux for now and hope it will act as good as it
  looks.

  http://www.vmware.com/
  http://www.colinux.org/

  Thanks to Bryan and Matthew for their advices regarding the emulations.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL caching

2004-05-26 Thread Vitaly Belman
Hello Marty, Nick and Robert,

NB Depending on what version of PG you are running, IN might take a while
NB to complete. If so try an EXISTS instead

RT A question and two experiments... what version of postgresql is this?

I am using the newer 7.5dev native Windows port. For this reason I
don't think that IN will cause any trouble (I read that this issue was
resolved in 7.4).

MS At any rate, a query with an IN clause should help quite a bit

MS SELECT bv_books. *
MS FROM   bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

It looks like it helps a bit (though you meant FROM bv_bookgenres,
right?). I can't tell you how MUCH it helped though, because of two
reasons:

1) As soon as I run a query, it is cached in the memory and I can't
really find a good way to flush it out of there to test again except a
full computer reset (shutting postmaster down doesn't help). If you
have a better idea on this, do tell me =\ (Reminding again, I am on
Windows).

2) I *think* I resolved this issue, at least for most of the genre_ids
(didn't go through them all, but tried a few with different book count
and the results looked quite good). The fault was partly mine, a few
weeks ago I increase the statistics for the genre_id column a bit too
much (from 10 to 70), I was unsure how exactly it works (and still am)
but it helped for a few genre_ids that had a high book count, yet it
also hurt the performence for the genres without as much ids. I now
halved the stastics (to 58) and almost everything looks good now.

Because of that I'll stop working on that query for a while (unless
you have some more performance tips on the subject). Big thanks to
everyone who helped.. And I might bring this issue later again, it it
still will cause too much troubles.

RT Try reindexing i_bookgenres_genre_id and capture the explain
RT analyze for that.

Is that's what you meant REINDEX INDEX i_bookgenres_genre_id? But it
returns no messages what-so-ever =\. I can EXPLAIN it either.

RT If it doesn't help try doing set enable_indexscan = false and
RT capture the explain analyze for that.

Here it is:

--
QUERY PLAN
Limit  (cost=41099.93..41099.96 rows=10 width=76) (actual time=6734.000..6734.000 
rows=10 loops=1)
  -  Sort  (cost=41099.93..41100.45 rows=208 width=76) (actual 
time=6734.000..6734.000 rows=10 loops=1)
Sort Key: bv_books.vote_count
-  Merge Join  (cost=40229.21..41091.92 rows=208 width=76) (actual 
time=6078.000..6593.000 rows=1993 loops=1)
  Merge Cond: (outer.book_id = inner.book_id)
  -  Sort  (cost=16817.97..16818.49 rows=208 width=4) (actual 
time=1062.000..1062.000 rows=1993 loops=1)
Sort Key: bv_bookgenres.book_id
-  Seq Scan on bv_bookgenres  (cost=0.00..16809.96 rows=208 
width=4) (actual time=0.000..1047.000 rows=1993 loops=1)
  Filter: (genre_id = 5830)
  -  Sort  (cost=23411.24..23841.04 rows=171918 width=76) (actual 
time=5016.000..5189.000 rows=171801 loops=1)
Sort Key: bv_books.book_id
-  Seq Scan on bv_books  (cost=0.00..4048.18 rows=171918 
width=76) (actual time=0.000..359.000 rows=171918 loops=1)
Total runtime: 6734.000 ms
--

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Wednesday, May 26, 2004, 1:24:18 AM, you wrote:

MS Vitaly,

MS This looks like there might be some room for performance improvement...

  MS I didn't see the table structure, but I assume
  MS that the vote_avg and
  MS vote_count fields are in bv_bookgenres.
 
  I didn't understand you. vote_avg is stored in bv_books.

MS Ok.  That helps.  The confusion (on my end) came from the SELECT clause
MS of the query you provided:

  SELECT bv_books. * ,
 vote_avg,
 vote_count

MS All fields from bv_books were selected (bv_books.*) along with vote_agv
MS and vote_count.  My assumption was that vote_avg and vote_count were
MS therefore not in bv_books.

MS At any rate, a query with an IN clause should help quite a bit:

MS SELECT bv_books. *
MS FROM   bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

MS Give it a whirl.

MS Marty


MS ---(end of
MS broadcast)---
MS TIP 6: Have you searched our list archives?

MShttp

Re: [PERFORM] PostgreSQL caching

2004-05-25 Thread Vitaly Belman
Hello Jochem and Marty,

I guess I should have posted the table structure before =(:

Table structure + Indexes
-

CREATE TABLE public.bv_books
(
  book_id serial NOT NULL,
  book_title varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id)
) WITH OIDS;

CREATE INDEX i_books_vote_avg
  ON public.bv_books
  USING btree
  (vote_avg);

CREATE INDEX i_books_vote_count
  ON public.bv_books
  USING btree
  (vote_count);

-

CREATE TABLE public.bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES public.bv_books 
(book_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) WITH OIDS;

CREATE INDEX i_bookgenres_book_id
  ON public.bv_bookgenres
  USING btree
  (book_id);

CREATE INDEX i_bookgenres_genre_id
  ON public.bv_bookgenres
  USING btree
  (genre_id);
-

MS I didn't see the table structure, but I assume that the vote_avg and
MS vote_count fields are in bv_bookgenres.  If no fields are actually 
MS needed from bv_bookgenres, then the query might be constructed in a way 
MS that only the index would be read, without loading any row data.

I didn't understand you. vote_avg is stored in bv_books.. So yes, the
only thing I need from bv_bookgenres is the id of the book, but I can't
store this info in bv_books because there is N to N relationship
between them - every book can belong to a number of genres... If
that's what you meant.

MS I think that you mentioned this was for a web app.  Do you actually have
MS a web page that displays 2000 rows of data?

Well.. It is all paginated, you can access 2000 items of the data
(as there are actually 2000 books in the genre) but you only see 10
items at a time.. I mean, probably no one would go over the 2000
books, but I can't just hide them =\.

JvD Presuming that vote_avg is a field in the table bv_bookgenres,
JvD try a composite index on genre_id and vote_avg and then see if 
JvD you can use the limit clause to reduce the number of loop 
JvD iterations from 1993 to 10.

I'm afraid your idea is invalid in my case =\... Naturally I could
eventually do data coupling to gain perforemnce boost if this issue
will not be solved in other ways. I'll keep your idea in mind anyway,
thanks.

Once again thanks for you feedback.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Tuesday, May 25, 2004, 6:37:44 PM, you wrote:

JvD Vitaly Belman wrote:
 
 If you'll be so kind though, I'd be glad if you could spot anything to
 speed up in this query. Here's the query and its plan that happens
 without any caching:
 
 -
 QUERY
 -
 SELECT bv_books. * ,
vote_avg, 
vote_count 
 FROM   bv_bookgenres, 
bv_books 
 WHERE  bv_books.book_id = bv_bookgenres.book_id AND 
bv_bookgenres.genre_id = 5830
 ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0; 
  
 QUERY PLAN
 --
 Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual
 time=7875.000..7875.000 rows=10 loops=1)
   -  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual
 time=7875.000..7875.000 rows=10 loops=1)
 Sort Key: bv_books.vote_avg
 -  Nested Loop  (cost=0.00..2329.13 rows=214 width=76)
 (actual time=16.000..7844.000 rows=1993 loops=1)
   -  Index Scan using i_bookgenres_genre_id on
 bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual
 time=16.000..3585.000 rows=1993 loops=1)
 Index Cond: (genre_id = 5830)
   -  Index Scan using bv_books_pkey on bv_books 
 (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
 loops=1993)
 Index Cond: (bv_books.book_id = quot;outerquot;.book_id)
 Total runtime: 7875.000 ms

JvD Presuming that vote_avg is a field in the table bv_bookgenres, 
JvD try a composite index on genre_id and vote_avg and then see if 
JvD you can use the limit clause to reduce the number of loop 
JvD iterations from 1993 to 10.

JvD CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);


JvD The following query tries to force that execution lan and, 
JvD presuming there is a foreign key relation between 
JvD bv_books.book_id AND bv_bookgenres.book_id, I expect it will give
JvD the same results, but be carefull with NULL's:

JvD SELECT bv_books. * ,
JvDvote_avg,
JvDvote_count
JvD FROM   (
JvDSELECT  bg.*
JvDFROMbv_bookgenres bg
JvDWHERE   bg.genre_id = 5830
JvDORDER BY
JvDbg.vote_avg DESC
JvD

Re: [PERFORM] PostgreSQL caching

2004-05-23 Thread Vitaly Belman
Hello Marty,

MS Is that a composite index?

It is a regular btree index. What is a composite index?

MS Analyzing the taables may help, as the optimizer appears to
MS mispredict the number of rows returned.

I'll try analyzing, but I highly doubt that it would help. I analyzed
once already and haven't changed the data since.

MS I would be curious to see how it performs with an IN clause,
MS which I would suspect would go quite a bit fasrer.

Actually it reached 20s before I canceled it... Here's the explain:

QUERY PLAN
Limit  (cost=3561.85..3561.88 rows=10 width=76)
  -  Sort  (cost=3561.85..3562.39 rows=214 width=76)
Sort Key: bv_books.vote_avg
-  Nested Loop  (cost=1760.75..3553.57 rows=214 width=76)
  -  Index Scan using i_bookgenres_genre_id on bv_bookgenres  
(cost=0.00..1681.54 rows=214 width=0)
Index Cond: (genre_id = 5830)
  -  Materialize  (cost=1760.75..1761.01 rows=26 width=76)
-  Nested Loop  (cost=1682.07..1760.75 rows=26 width=76)
  -  HashAggregate  (cost=1682.07..1682.07 rows=26 width=4)
-  Index Scan using i_bookgenres_genre_id on 
bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4)
  Index Cond: (genre_id = 5830)
  -  Index Scan using bv_books_pkey on bv_books  
(cost=0.00..3.01 rows=1 width=76)
Index Cond: (bv_books.book_id = outer.book_id)


Thank you for your try.

Regards,
Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Friday, May 21, 2004, 11:10:56 PM, you wrote:

MS Not knowing a whole lot about the internals of Pg, one thing jumped out
MS at me, that each trip to get data from bv_books took 2.137 ms, which
MS came to over 4.2 seconds right there.

MS The problem seems to be the 1993 times that the nested loop spins, as
MS almost all of the time is spent there.

MS Personally, I am amazed that it takes 3.585 seconds to index scan 
MS i_bookgenres_genre_id.  Is that a composite index?  Analyzing the 
MS taables may help, as the optimizer appears to mispredict the number of
MS rows returned.

MS I would be curious to see how it performs with an IN clause, which I
MS would suspect would go quite a bit fasrer.  Try the following:

MS SELECT bv_books. * ,
MS vote_avg,
MS vote_count
MS FROM   bv_bookgenres,
MS bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS AND bv_bookgenres.genre_id = 5830
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

MS In this query, all of the book_id values are pulled at once.

MS Who knows?

MS If you get statisctics on this, please post.

MS Marty


MS ---(end of
MS broadcast)---
MS TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] PostgreSQL caching

2004-05-21 Thread Vitaly Belman
Hello,

I have the following problem:

When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.

So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).

P.S If the query or its EXPLAIN are critical for a better
understanding, let me know.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Vitaly Belman
Hello pgsql-performance,

  I discussed the whole subject for some time in DevShed and didn't
  achieve much (as for results). I wonder if any of you guys can help
  out:

  http://forums.devshed.com/t136202/s.html

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly