Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Dave Dutcher
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Adnan DURSUN
 i want to be can read an execution plan when 
 i look at it. 
 So, is there any doc about how it should be read ?


You are asking how to read the output from EXPLAIN?  This page is a good
place to start:

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson
On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote:
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  Adnan DURSUN
  i want to be can read an execution plan when 
  i look at it. 
  So, is there any doc about how it should be read ?
 
 
 You are asking how to read the output from EXPLAIN?  This page is a good
 place to start:
 
 http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 

Robert Treat's Explaining Explain presentation from OSCON is also very
good:

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22

Brad.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Brad Nicholson
On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote:
* When any session updates the data that already in shared
 buffer, 
 does Postgres synchronize the data both disk and shared buffers area 
  immediately ?
 
 Not necessarily true. When a block is modified in the shared buffers,
 the modified block is written to the Postgres WAL log. A periodic DB
 checkpoint is performed to flush the modified blocks in the shared
 buffers to the data files.

Postgres 8.0 and beyond have a process called bgwriter that continually
flushes dirty buffers to disk, to minimize the work that needs to be
done at checkpoint time.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN

   Hi,

   I wonder how PostgreSQL caches the SQL query results. For example ;

   * does postgres cache query result in memory that done by session A 
?

   * does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Tomeh, Husam

Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

Hi,

I wonder how PostgreSQL caches the SQL query results. For example ;

* does postgres cache query result in memory that done by
session A 
?
* does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use of the 
addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited.  If 
you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message immediately thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN


   Thanks,

   I wonder these ;

   * When any session updates the data that allready in shared buffer, 
does Postgres sychronize the data both disk and shared buffers area 
immediately ?
   * Does postgres cache SQL execution plan analyze results in memory 
to use for other sessions ? For example ;
   When session A execute SELECT * FROM tab WHERE col1 = val1 AND col2 
= val2, does postgres save the parser/optimizer result in memory in order
to use by other session to prevent duplicate execution of parser 
and optimizer so therefore get time ?. Because an execution plan is created 
before..


Sincenerly

Adnan DURSUN

- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org

Sent: Wednesday, October 04, 2006 1:11 AM
Subject: Re: [PERFORM] PostgreSQL Caching



Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
 Husam

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

   Hi,

   I wonder how PostgreSQL caches the SQL query results. For example ;

   * does postgres cache query result in memory that done by
session A
?
   * does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use of 
the addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited. 
If you have received this message by mistake, please immediately notify us 
by replying to the message and delete the original message immediately 
thereafter.


Thank you.

  FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Tomeh, Husam
 
  * When any session updates the data that already in shared
buffer, 
does Postgres synchronize the data both disk and shared buffers area 
 immediately ?

Not necessarily true. When a block is modified in the shared buffers,
the modified block is written to the Postgres WAL log. A periodic DB
checkpoint is performed to flush the modified blocks in the shared
buffers to the data files.

  * Does postgres cache SQL execution plan analyze results in memory 
 to use for other sessions ? For example ;
When session A execute SELECT * FROM tab WHERE col1 = val1
AND col2 
 = val2, does postgres save the parser/optimizer result in memory in
order
 to use by other session to prevent duplicate execution of
parser 
 and optimizer so therefore get time ?. Because an execution plan is
created 
 before..

Query plans are not stored in the shared buffers and therefore can not
be re-used by other sessions. They're only cached by the connection on a
session level.

Sincerely,

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 4:53 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL Caching


Thanks,

I wonder these ;

* When any session updates the data that allready in shared
buffer, 
does Postgres sychronize the data both disk and shared buffers area 
immediately ?
* Does postgres cache SQL execution plan analyze results in
memory 
to use for other sessions ? For example ;
When session A execute SELECT * FROM tab WHERE col1 = val1 AND
col2 
= val2, does postgres save the parser/optimizer result in memory in
order
 to use by other session to prevent duplicate execution of
parser 
and optimizer so therefore get time ?. Because an execution plan is
created 
before..

Sincenerly

Adnan DURSUN

- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org
Sent: Wednesday, October 04, 2006 1:11 AM
Subject: Re: [PERFORM] PostgreSQL Caching



Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
  Husam

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

Hi,

I wonder how PostgreSQL caches the SQL query results. For example ;

* does postgres cache query result in memory that done by
session A
?
* does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use
of 
the addressee(s) named above and may contain information that is legally

privileged.  If you are not the addressee, or the person responsible for

delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly
prohibited. 
If you have received this message by mistake, please immediately notify
us 
by replying to the message and delete the original message immediately 
thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN
- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org

Sent: Wednesday, October 04, 2006 4:29 AM
Subject: RE: [PERFORM] PostgreSQL Caching



Query plans are not stored in the shared buffers and therefore can not
be re-used by other sessions. They're only cached by the connection on a
session level.


   Ok. i see. thanks..So that means that a stored object execution plan 
saved before is destroyed from memory after it was altered or dropped by any 
session. Is that true ?

   And last one :-)
   i want to be can read an execution plan when i look at it. 
So, is there any doc about how it should be read ?


Sincenerly !

Adnan DURSUN 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] PostgreSQL caching

2004-05-27 Thread Matthew Nuzum
 
 Hello Josh,
 
 JB Not that you can't improve the query, just that it might not fix
 JB the problem.
 
 Yes, I'm aware it might be slower than the Linux version, but then, as
 you said, I still can improve the query (as I did with your help now).
 
 But true, if there's something awfully wrong with Win32 port
 performance, I might be doing some overwork...
 
 JB Therefore ... your detailed feedback is appreciated, especially if you
 can
 JB compare stuff to the same database running on a Linux, Unix, or BSD
 machine.
 
 I can't easily install Linux right now.. But I am considering using it
 through VMWare. Do you think it would suffice as a comprasion?
 
 From what I saw (e.g
 http://usuarios.lycos.es/hernandp/articles/vpcvs.html) the performance
 are bad only when it's coming to graphics, otherwise it looks pretty
 good.
 
 Regards,
  Vitaly Belman
 

An interesting alternative that I've been using lately is colinux
(http://colinux.sf.net).  It lets you run linux in windows and compared to
vmware, I find it remarkably faster and when it is idle less resource
intensive.  I have vmware but if I'm only going to use a console based
program, colinux seems to outperform it.  

Note that it may simply be interactive processes that run better because it
has a simpler interface and does not try to emulate the display hardware.
(Therefore no X unless you use vmware)  It seems though that there is less
overhead and if that's the case, then everything should run faster.

Also note that getting it installed is a little more work than vmware.  If
you're running it on a workstation that you use for normal day-to-day tasks
though I think you'll like it because you can detach the terminal and let it
run in the background.  When I do that I often forget it is running because
it produces such a low load on the system.  If you are going to give it a
try, the one trick I used to get things going was to download the newest
beta of winpcap and then the networking came up easily.  Everything else was
a piece of cake.

Matthew Nuzum   | Makers of Elite Content Management System
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL caching

2004-05-26 Thread Robert Treat
On Tue, 2004-05-25 at 15:53, Vitaly Belman wrote:
   
  QUERY PLAN
  --
  Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual
  time=7875.000..7875.000 rows=10 loops=1)
-  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual
  time=7875.000..7875.000 rows=10 loops=1)
  Sort Key: bv_books.vote_avg
  -  Nested Loop  (cost=0.00..2329.13 rows=214 width=76)
  (actual time=16.000..7844.000 rows=1993 loops=1)
-  Index Scan using i_bookgenres_genre_id on
  bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual
  time=16.000..3585.000 rows=1993 loops=1)
  Index Cond: (genre_id = 5830)
-  Index Scan using bv_books_pkey on bv_books 
  (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
  loops=1993)
  Index Cond: (bv_books.book_id = quot;outerquot;.book_id)
  Total runtime: 7875.000 ms
 

A question and two experiments... what version of postgresql is this?

Try reindexing i_bookgenres_genre_id and capture the explain analyze for
that. If it doesn't help try doing set enable_indexscan = false and
capture the explain analyze for that. 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL caching

2004-05-26 Thread Vitaly Belman
Hello Marty, Nick and Robert,

NB Depending on what version of PG you are running, IN might take a while
NB to complete. If so try an EXISTS instead

RT A question and two experiments... what version of postgresql is this?

I am using the newer 7.5dev native Windows port. For this reason I
don't think that IN will cause any trouble (I read that this issue was
resolved in 7.4).

MS At any rate, a query with an IN clause should help quite a bit

MS SELECT bv_books. *
MS FROM   bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

It looks like it helps a bit (though you meant FROM bv_bookgenres,
right?). I can't tell you how MUCH it helped though, because of two
reasons:

1) As soon as I run a query, it is cached in the memory and I can't
really find a good way to flush it out of there to test again except a
full computer reset (shutting postmaster down doesn't help). If you
have a better idea on this, do tell me =\ (Reminding again, I am on
Windows).

2) I *think* I resolved this issue, at least for most of the genre_ids
(didn't go through them all, but tried a few with different book count
and the results looked quite good). The fault was partly mine, a few
weeks ago I increase the statistics for the genre_id column a bit too
much (from 10 to 70), I was unsure how exactly it works (and still am)
but it helped for a few genre_ids that had a high book count, yet it
also hurt the performence for the genres without as much ids. I now
halved the stastics (to 58) and almost everything looks good now.

Because of that I'll stop working on that query for a while (unless
you have some more performance tips on the subject). Big thanks to
everyone who helped.. And I might bring this issue later again, it it
still will cause too much troubles.

RT Try reindexing i_bookgenres_genre_id and capture the explain
RT analyze for that.

Is that's what you meant REINDEX INDEX i_bookgenres_genre_id? But it
returns no messages what-so-ever =\. I can EXPLAIN it either.

RT If it doesn't help try doing set enable_indexscan = false and
RT capture the explain analyze for that.

Here it is:

--
QUERY PLAN
Limit  (cost=41099.93..41099.96 rows=10 width=76) (actual time=6734.000..6734.000 
rows=10 loops=1)
  -  Sort  (cost=41099.93..41100.45 rows=208 width=76) (actual 
time=6734.000..6734.000 rows=10 loops=1)
Sort Key: bv_books.vote_count
-  Merge Join  (cost=40229.21..41091.92 rows=208 width=76) (actual 
time=6078.000..6593.000 rows=1993 loops=1)
  Merge Cond: (outer.book_id = inner.book_id)
  -  Sort  (cost=16817.97..16818.49 rows=208 width=4) (actual 
time=1062.000..1062.000 rows=1993 loops=1)
Sort Key: bv_bookgenres.book_id
-  Seq Scan on bv_bookgenres  (cost=0.00..16809.96 rows=208 
width=4) (actual time=0.000..1047.000 rows=1993 loops=1)
  Filter: (genre_id = 5830)
  -  Sort  (cost=23411.24..23841.04 rows=171918 width=76) (actual 
time=5016.000..5189.000 rows=171801 loops=1)
Sort Key: bv_books.book_id
-  Seq Scan on bv_books  (cost=0.00..4048.18 rows=171918 
width=76) (actual time=0.000..359.000 rows=171918 loops=1)
Total runtime: 6734.000 ms
--

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Wednesday, May 26, 2004, 1:24:18 AM, you wrote:

MS Vitaly,

MS This looks like there might be some room for performance improvement...

  MS I didn't see the table structure, but I assume
  MS that the vote_avg and
  MS vote_count fields are in bv_bookgenres.
 
  I didn't understand you. vote_avg is stored in bv_books.

MS Ok.  That helps.  The confusion (on my end) came from the SELECT clause
MS of the query you provided:

  SELECT bv_books. * ,
 vote_avg,
 vote_count

MS All fields from bv_books were selected (bv_books.*) along with vote_agv
MS and vote_count.  My assumption was that vote_avg and vote_count were
MS therefore not in bv_books.

MS At any rate, a query with an IN clause should help quite a bit:

MS SELECT bv_books. *
MS FROM   bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

MS Give it a whirl.

MS Marty


MS ---(end of
MS broadcast)---
MS TIP 6: Have you searched our list archives?

MS

Re: [PERFORM] PostgreSQL caching

2004-05-26 Thread Josh Berkus
Vitaly,

 I am using the newer 7.5dev native Windows port. For this reason I
 don't think that IN will cause any trouble (I read that this issue was
 resolved in 7.4).

Well, for performance, all bets are off for the dev Windows port.   Last I 
checked, the Win32 team was still working on *stability* and hadn't yet even 
looked at performance.  Not that you can't improve the query, just that it 
might not fix the problem.

Therefore ... your detailed feedback is appreciated, especially if you can 
compare stuff to the same database running on a Linux, Unix, or BSD machine.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL caching

2004-05-25 Thread Jochem van Dieten
Vitaly Belman wrote:
If you'll be so kind though, I'd be glad if you could spot anything to
speed up in this query. Here's the query and its plan that happens
without any caching:
-
QUERY
-
SELECT bv_books. * ,
   vote_avg, 
   vote_count 
FROM   bv_bookgenres, 
   bv_books 
WHERE  bv_books.book_id = bv_bookgenres.book_id AND 
   bv_bookgenres.genre_id = 5830
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0; 
 
QUERY PLAN
--
Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual time=7875.000..7875.000 rows=10 loops=1)
  -  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual time=7875.000..7875.000 rows=10 loops=1)
Sort Key: bv_books.vote_avg
-  Nested Loop  (cost=0.00..2329.13 rows=214 width=76) (actual time=16.000..7844.000 rows=1993 loops=1)
  -  Index Scan using i_bookgenres_genre_id on bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual time=16.000..3585.000 rows=1993 loops=1)
Index Cond: (genre_id = 5830)
  -  Index Scan using bv_books_pkey on bv_books  (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1 loops=1993)
Index Cond: (bv_books.book_id = quot;outerquot;.book_id)
Total runtime: 7875.000 ms
Presuming that vote_avg is a field in the table bv_bookgenres, 
try a composite index on genre_id and vote_avg and then see if 
you can use the limit clause to reduce the number of loop 
iterations from 1993 to 10.

CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);
The following query tries to force that execution lan and, 
presuming there is a foreign key relation between 
bv_books.book_id AND bv_bookgenres.book_id, I expect it will give 
the same results, but be carefull with NULL's:

SELECT  bv_books. * ,
vote_avg,
vote_count
FROM(
SELECT  bg.*
FROMbv_bookgenres bg
WHERE   bg.genre_id = 5830
ORDER BY
bg.vote_avg DESC
LIMIT   10
) bv_bookgenres,
bv_books
WHERE   bv_books.book_id = bv_bookgenres.book_id
ORDER BY
vote_avg DESC
LIMIT   10;
Jochem
--
I don't get it
immigrants don't work
and steal our jobs
- Loesje
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL caching

2004-05-25 Thread Vitaly Belman
Hello Jochem and Marty,

I guess I should have posted the table structure before =(:

Table structure + Indexes
-

CREATE TABLE public.bv_books
(
  book_id serial NOT NULL,
  book_title varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id)
) WITH OIDS;

CREATE INDEX i_books_vote_avg
  ON public.bv_books
  USING btree
  (vote_avg);

CREATE INDEX i_books_vote_count
  ON public.bv_books
  USING btree
  (vote_count);

-

CREATE TABLE public.bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES public.bv_books 
(book_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) WITH OIDS;

CREATE INDEX i_bookgenres_book_id
  ON public.bv_bookgenres
  USING btree
  (book_id);

CREATE INDEX i_bookgenres_genre_id
  ON public.bv_bookgenres
  USING btree
  (genre_id);
-

MS I didn't see the table structure, but I assume that the vote_avg and
MS vote_count fields are in bv_bookgenres.  If no fields are actually 
MS needed from bv_bookgenres, then the query might be constructed in a way 
MS that only the index would be read, without loading any row data.

I didn't understand you. vote_avg is stored in bv_books.. So yes, the
only thing I need from bv_bookgenres is the id of the book, but I can't
store this info in bv_books because there is N to N relationship
between them - every book can belong to a number of genres... If
that's what you meant.

MS I think that you mentioned this was for a web app.  Do you actually have
MS a web page that displays 2000 rows of data?

Well.. It is all paginated, you can access 2000 items of the data
(as there are actually 2000 books in the genre) but you only see 10
items at a time.. I mean, probably no one would go over the 2000
books, but I can't just hide them =\.

JvD Presuming that vote_avg is a field in the table bv_bookgenres,
JvD try a composite index on genre_id and vote_avg and then see if 
JvD you can use the limit clause to reduce the number of loop 
JvD iterations from 1993 to 10.

I'm afraid your idea is invalid in my case =\... Naturally I could
eventually do data coupling to gain perforemnce boost if this issue
will not be solved in other ways. I'll keep your idea in mind anyway,
thanks.

Once again thanks for you feedback.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Tuesday, May 25, 2004, 6:37:44 PM, you wrote:

JvD Vitaly Belman wrote:
 
 If you'll be so kind though, I'd be glad if you could spot anything to
 speed up in this query. Here's the query and its plan that happens
 without any caching:
 
 -
 QUERY
 -
 SELECT bv_books. * ,
vote_avg, 
vote_count 
 FROM   bv_bookgenres, 
bv_books 
 WHERE  bv_books.book_id = bv_bookgenres.book_id AND 
bv_bookgenres.genre_id = 5830
 ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0; 
  
 QUERY PLAN
 --
 Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual
 time=7875.000..7875.000 rows=10 loops=1)
   -  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual
 time=7875.000..7875.000 rows=10 loops=1)
 Sort Key: bv_books.vote_avg
 -  Nested Loop  (cost=0.00..2329.13 rows=214 width=76)
 (actual time=16.000..7844.000 rows=1993 loops=1)
   -  Index Scan using i_bookgenres_genre_id on
 bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual
 time=16.000..3585.000 rows=1993 loops=1)
 Index Cond: (genre_id = 5830)
   -  Index Scan using bv_books_pkey on bv_books 
 (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
 loops=1993)
 Index Cond: (bv_books.book_id = quot;outerquot;.book_id)
 Total runtime: 7875.000 ms

JvD Presuming that vote_avg is a field in the table bv_bookgenres, 
JvD try a composite index on genre_id and vote_avg and then see if 
JvD you can use the limit clause to reduce the number of loop 
JvD iterations from 1993 to 10.

JvD CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);


JvD The following query tries to force that execution lan and, 
JvD presuming there is a foreign key relation between 
JvD bv_books.book_id AND bv_bookgenres.book_id, I expect it will give
JvD the same results, but be carefull with NULL's:

JvD SELECT bv_books. * ,
JvDvote_avg,
JvDvote_count
JvD FROM   (
JvDSELECT  bg.*
JvDFROMbv_bookgenres bg
JvDWHERE   bg.genre_id = 5830
JvDORDER BY
JvDbg.vote_avg DESC
JvD

Re: [PERFORM] PostgreSQL caching

2004-05-25 Thread Marty Scholes
Vitaly,
This looks like there might be some room for performance improvement...
 MS I didn't see the table structure, but I assume
 MS that the vote_avg and
 MS vote_count fields are in bv_bookgenres.

 I didn't understand you. vote_avg is stored in bv_books.
Ok.  That helps.  The confusion (on my end) came from the SELECT clause 
of the query you provided:

 SELECT bv_books. * ,
vote_avg,
vote_count
All fields from bv_books were selected (bv_books.*) along with vote_agv 
and vote_count.  My assumption was that vote_avg and vote_count were 
therefore not in bv_books.

At any rate, a query with an IN clause should help quite a bit:
SELECT bv_books. *
FROM   bv_books
WHERE  bv_books.book_id IN (
  SELECT book_id
  FROM bv_genres
  WHERE bv_bookgenres.genre_id = 5830
  )
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
Give it a whirl.
Marty
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-24 Thread Marty Scholes
 Hello Marty,

 MS Is that a composite index?

 It is a regular btree index. What is a composite index?
My apologies.  A composite index is one that consists of multiple fields 
(aka multicolumn index).  The reason I ask is that it was spending 
almost half the time just searching bv_bookgenres, which seemed odd.

I may be speaking out of turn since I am not overly familiar with Pg's 
quirks and internals.

A composite index, or any index of a large field, will lower the number 
of index items stored per btree node, thereby lowering the branching 
factor and increasing the tree depth.  On tables with many rows, this 
can result in many more disk accesses for reading the index.  An index 
btree that is 6 levels deep will require at least seven disk accesses (6 
for the index, one for the table row) per row retrieved.

Not knowing the structure of the indexes, it's hard to say too much 
about it.  The fact that a 1993 row select from an indexed table took 
3.5 seconds caused me to take notice.

 MS I would be curious to see how it performs with an IN clause,
 MS which I would suspect would go quite a bit fasrer.

 Actually it reached 20s before I canceled it... Here's the explain:
I believe that.  The code I posted had a nasty join bug.  If my math is 
right, the query was trying to return 1993*1993, or just under 4 million 
rows.

I didn't see the table structure, but I assume that the vote_avg and 
vote_count fields are in bv_bookgenres.  If no fields are actually 
needed from bv_bookgenres, then the query might be constructed in a way 
that only the index would be read, without loading any row data.

I think that you mentioned this was for a web app.  Do you actually have 
a web page that displays 2000 rows of data?

Good luck,
Marty
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL caching

2004-05-23 Thread Vitaly Belman
Hello Marty,

MS Is that a composite index?

It is a regular btree index. What is a composite index?

MS Analyzing the taables may help, as the optimizer appears to
MS mispredict the number of rows returned.

I'll try analyzing, but I highly doubt that it would help. I analyzed
once already and haven't changed the data since.

MS I would be curious to see how it performs with an IN clause,
MS which I would suspect would go quite a bit fasrer.

Actually it reached 20s before I canceled it... Here's the explain:

QUERY PLAN
Limit  (cost=3561.85..3561.88 rows=10 width=76)
  -  Sort  (cost=3561.85..3562.39 rows=214 width=76)
Sort Key: bv_books.vote_avg
-  Nested Loop  (cost=1760.75..3553.57 rows=214 width=76)
  -  Index Scan using i_bookgenres_genre_id on bv_bookgenres  
(cost=0.00..1681.54 rows=214 width=0)
Index Cond: (genre_id = 5830)
  -  Materialize  (cost=1760.75..1761.01 rows=26 width=76)
-  Nested Loop  (cost=1682.07..1760.75 rows=26 width=76)
  -  HashAggregate  (cost=1682.07..1682.07 rows=26 width=4)
-  Index Scan using i_bookgenres_genre_id on 
bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4)
  Index Cond: (genre_id = 5830)
  -  Index Scan using bv_books_pkey on bv_books  
(cost=0.00..3.01 rows=1 width=76)
Index Cond: (bv_books.book_id = outer.book_id)


Thank you for your try.

Regards,
Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Friday, May 21, 2004, 11:10:56 PM, you wrote:

MS Not knowing a whole lot about the internals of Pg, one thing jumped out
MS at me, that each trip to get data from bv_books took 2.137 ms, which
MS came to over 4.2 seconds right there.

MS The problem seems to be the 1993 times that the nested loop spins, as
MS almost all of the time is spent there.

MS Personally, I am amazed that it takes 3.585 seconds to index scan 
MS i_bookgenres_genre_id.  Is that a composite index?  Analyzing the 
MS taables may help, as the optimizer appears to mispredict the number of
MS rows returned.

MS I would be curious to see how it performs with an IN clause, which I
MS would suspect would go quite a bit fasrer.  Try the following:

MS SELECT bv_books. * ,
MS vote_avg,
MS vote_count
MS FROM   bv_bookgenres,
MS bv_books
MS WHERE  bv_books.book_id IN (
MSSELECT book_id
MSFROM bv_genres
MSWHERE bv_bookgenres.genre_id = 5830
MS)
MS AND bv_bookgenres.genre_id = 5830
MS ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;

MS In this query, all of the book_id values are pulled at once.

MS Who knows?

MS If you get statisctics on this, please post.

MS Marty


MS ---(end of
MS broadcast)---
MS TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] PostgreSQL caching

2004-05-21 Thread Vitaly Belman
Hello,

I have the following problem:

When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.

So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).

P.S If the query or its EXPLAIN are critical for a better
understanding, let me know.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rosser Schwarz
while you weren't looking, Vitaly Belman wrote:

 So, I guess it has to do with PostgreSQL caching.. But how exactly
 does it work? What does it cache? And how can I control it?

PostgreSQL uses the operating system's disk cache.  You can hint to
the postmaster how much memory is available for caching with the
effective_cache_size directive in your postgresql.conf.  If you're
running a *nix OS, you can find this by watching `top` for a while;
in the header, there's a cached value (or something to that effect).
Watching this value, you can determine a rough average and set your
effective_cache_size to that rough average, or perhaps slightly less.
I'm not sure how to get this value on Windows.

Pgsql uses the OS's disk cache instead of its own cache management
because the former is more likely to persist.  If the postmaster
managed the cache, as soon as the last connection died, the memory
allocated for caching would be released, and all the cached data
would be lost.  Relying instead on the OS to cache data means that,
whether or not there's a postmaster, so long as there has been one,
there'll be some data cached.

You can prepopulate the OS disk cache by periodically running a
handful of SELECT queries that pull from your most commonly accessed
tables in a background process.  (A good way of doing that is simply
to run your most commonly executed SELECTS.)  Those queries should
take the performance hit of fetching from disk, while your regular
queries hit the cache.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Richard Huxton
Vitaly Belman wrote:
Hello,
I have the following problem:
When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.
So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?
There are two areas of cache - PostgreSQL's shared buffers and the 
operating system's disk-cache. You can't directly control what data is 
cached, it just keeps track of recently used data. It sounds like PG 
isn't being used for a while so your OS decides to use its cache for 
webserver files.

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).
If you could pin data in the cache it would run quicker, but at the 
cost of everything else running slower.

Suggested steps:
1. Read the configuration/tuning guide at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
2. Post a sample query/explain analyse that runs very slowly when not 
cached.
3. If needs be, you can write a simple timed script that performs a 
query. Or, the autovacuum daemon might be what you want.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Chris Browne
[EMAIL PROTECTED] (Richard Huxton) writes:
 If you could pin data in the cache it would run quicker, but at the
 cost of everything else running slower.

 Suggested steps:
 1. Read the configuration/tuning guide at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
 2. Post a sample query/explain analyse that runs very slowly when not
 cached.
 3. If needs be, you can write a simple timed script that performs a
 query. Or, the autovacuum daemon might be what you want.

I don't think this case will be anywhere near so simple to resolve.

I have seen this phenomenon occur when a query needs to pull a
moderate number of blocks into memory to satisfy a query that involves
some moderate number of rows.

Let's say you need 2000 rows, which fit into 400 blocks.

The first time the query runs, it needs to pull those 400 blocks off
disk, which requires 400 reads of 8K of data.  That can easily take a
few seconds of I/O.

The second time, not only are those blocks cached, they are probably
cached in the buffer cache, so that the I/O overhead disappears.

There's very likely no problem with the table statistics; they are
leading to the right query plan, which happens to need to do 5 seconds
of I/O to pull the data into memory.

What is essentially required is the prescient cacheing algorithm,
where the postmaster must consult /dev/esp in order to get a
prediction of what blocks it may need to refer to in the next sixty
seconds.
-- 
(format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com)
http://cbbrowne.com/info/linuxdistributions.html
Normally, we don't do people's homework around here, but Venice is a
very beautiful city, so I'll make a small exception.
--- Robert Redelmeier compromises his principles

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Rod Taylor
 What is essentially required is the prescient cacheing algorithm,
 where the postmaster must consult /dev/esp in order to get a
 prediction of what blocks it may need to refer to in the next sixty
 seconds.

Easy enough. Television does it all the time with live shows. The guy
with the buzzer always seems to know what will be said before they say
it. All we need is a 5 to 10 second delay...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Neil Conway
Rosser Schwarz wrote:
PostgreSQL uses the operating system's disk cache.
... in addition to its own buffer cache, which is stored in shared 
memory. You're correct though, in that the best practice is to keep the 
PostgreSQL cache small and give more memory to the operating system's 
disk cache.

Pgsql uses the OS's disk cache instead of its own cache management
because the former is more likely to persist.  If the postmaster
managed the cache, as soon as the last connection died, the memory
allocated for caching would be released, and all the cached data
would be lost.
No; the cache is stored in shared memory. It wouldn't persist over 
postmaster restarts (without some scheme of saving and restoring it), 
but that has nothing to do with why the OS disk cache is usually kept 
larger than the PG shared buffer cache.

-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Marty Scholes
Not knowing a whole lot about the internals of Pg, one thing jumped out 
at me, that each trip to get data from bv_books took 2.137 ms, which 
came to over 4.2 seconds right there.

The problem seems to be the 1993 times that the nested loop spins, as 
almost all of the time is spent there.

Personally, I am amazed that it takes 3.585 seconds to index scan 
i_bookgenres_genre_id.  Is that a composite index?  Analyzing the 
taables may help, as the optimizer appears to mispredict the number of 
rows returned.

I would be curious to see how it performs with an IN clause, which I 
would suspect would go quite a bit fasrer.  Try the following:

SELECT bv_books. * ,
   vote_avg,
   vote_count
FROM   bv_bookgenres,
   bv_books
WHERE  bv_books.book_id IN (
  SELECT book_id
  FROM bv_genres
  WHERE bv_bookgenres.genre_id = 5830
  )
AND bv_bookgenres.genre_id = 5830
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
In this query, all of the book_id values are pulled at once.
Who knows?
If you get statisctics on this, please post.
Marty
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster