Re: [HACKERS] Warm-up cache may have its virtue

2006-01-16 Thread Jim C. Nasby
On Sat, Jan 14, 2006 at 04:13:56PM -0500, Qingqing Zhou wrote:
> 
> "Qingqing Zhou" <[EMAIL PROTECTED]> wrote
> >
> > I wonder if we should really implement file-system-cache-warmup strategy
> > which we have discussed before. There are two natural good places to do
> > this:
> >
> > (1) sequentail scan
> > (2) bitmap index scan
> >
> 
> For the sake of memory, there is a third place a warm-up cache or pre-read 
> is beneficial (OS won't help us):
> (3) xlog recovery

Wouldn't it be better to improve pre-reading data instead, ie, making
sure things like seqscan and bitmap scan always keep the IO system busy?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-14 Thread Qingqing Zhou

"Qingqing Zhou" <[EMAIL PROTECTED]> wrote
>
> I wonder if we should really implement file-system-cache-warmup strategy
> which we have discussed before. There are two natural good places to do
> this:
>
> (1) sequentail scan
> (2) bitmap index scan
>

For the sake of memory, there is a third place a warm-up cache or pre-read 
is beneficial (OS won't help us):
(3) xlog recovery

Regards,
Qingqing



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-07 Thread Qingqing Zhou

"Greg Stark" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>
> Hm. Personally I have a hunch you're right. But there we have no actual
> evidence. The first thing that needs to happen is changes to use O_DIRECT 
> for
> everything and then benchmarking one of those big TPC tests with the 
> O_DIRECT
> build and a large buffer cache versus a normal build with an traditional
> buffer cache size.
>

A nice thing is that we can have both. User can choose to use small 
shared_buffer or big shared_buffer. According to user's choice, we will use 
different IO/buffering strategy.

> If it's anywhere close, even with no prefetching then it ought to be clear
> that the costs of double buffering are becoming substantial.
>

AFAIU double buffering only hurts when we use big shared_buffer value.

> As far as predicting cache hits I think the best Postgres could do is 
> track
> the average cache hit rate, either overall for the whole system or perhaps
> even per table and index.
>

There is a linux kernel implementation of pre-read:
http://glide.stanford.edu/lxr/source/mm/readahead.c?v=linux-2.6.5#L306

We have better hints for it: seqscan and bitmap scan.

Regards,
Qingqing



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-07 Thread Greg Stark

Qingqing Zhou <[EMAIL PROTECTED]> writes:

> > In other words, the difference between being in Postgres's buffer cache and
> > being in the filesystem cache, while not insignificant, isn't really 
> > relevant
> > to the planner since it affects sequential scans and index scans equally.
> 
> The bitmap was proposed since I think it is time to use dominated
> shared_buffer size. Thus, if it is not in buffer cache, it is not in OS
> cache either.

Hm. Personally I have a hunch you're right. But there we have no actual
evidence. The first thing that needs to happen is changes to use O_DIRECT for
everything and then benchmarking one of those big TPC tests with the O_DIRECT
build and a large buffer cache versus a normal build with an traditional
buffer cache size.

If it's anywhere close, even with no prefetching then it ought to be clear
that the costs of double buffering are becoming substantial.

As far as predicting cache hits I think the best Postgres could do is track
the average cache hit rate, either overall for the whole system or perhaps
even per table and index. 

The first problem I see with that is that most systems have a mix of OLTP and
DSS queries and the two might have different patterns. Perhaps keeping track
of cache hit rates in multiple buckets based on the estimated number of rows?
Maybe exponentially growing buckets of "1-10" "10-100" "100-1k" "1k-10k", ...

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-07 Thread Qingqing Zhou


On Sat, 7 Jan 2006, Greg Stark wrote:

>
> "Qingqing Zhou" <[EMAIL PROTECTED]> writes:
>
> > For b1, it actually doesn't matter much though. With bitmap we definitely
> > can give a better EXPLAIN numbers for seqscan only, but without the bitmap,
> > we seldom make wrong choice of choosing/not choosing sequential scan.
>
> I think you have a more severe problem than that.
>
> It's not sequential scans that we have trouble estimating.
> It's the index scans that are the problem.

Exactly, we are saying the same thing.

>
> In other words, the difference between being in Postgres's buffer cache and
> being in the filesystem cache, while not insignificant, isn't really relevant
> to the planner since it affects sequential scans and index scans equally.

The bitmap was proposed since I think it is time to use dominated
shared_buffer size. Thus, if it is not in buffer cache, it is not in OS
cache either.

Regards,
Qingqing

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-06 Thread Greg Stark

"Qingqing Zhou" <[EMAIL PROTECTED]> writes:

> For b1, it actually doesn't matter much though. With bitmap we definitely 
> can give a better EXPLAIN numbers for seqscan only, but without the bitmap, 
> we seldom make wrong choice of choosing/not choosing sequential scan. 

I think you have a more severe problem than that. 

It's not sequential scans that we have trouble estimating. Most of their
blocks will be uncached and they'll be read sequentially. Both of these
factors make estimating their costs pretty straightforward.

It's the index scans that are the problem. Index scans look bad to the
optimizer because they're random access, but they often have very high cache
hit rates because they access relatively few blocks and often they're hot (the
DBA did after all feel compelled to create the index in the first place).
Moreover they're often inside Nested Loop plans which causes many of those
blocks to be accessed repeatedly within the loop.

And the cache hit rate matters *a lot* for index scans since a cache hit means
the block won't be affected by the random access penalty. That is, it the
cache speedup will help both sequential and index scans but skipping the seek
only helps the index scan. 

And that's true regardless of whether it's found in Postgres's buffer cache or
has to be read in from the filesystem cache. So you won't really be able to
tell how many seeks are avoided without knowing whether the block is in the
filesystem cache. 

In other words, the difference between being in Postgres's buffer cache and
being in the filesystem cache, while not insignificant, isn't really relevant
to the planner since it affects sequential scans and index scans equally. It's
the difference between being in either cache versus requiring disk i/o that
affects index scans disproportionately.

And worse, it doesn't really matter whether it's in the cache when the query
is planned. It matters whether it'll be in the cache when the access is made.
If the node is inside a Nested Loop then subsequent trips through the loop the
same blocks may end up being read and they may all be cached.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-06 Thread Qingqing Zhou

"Qingqing Zhou" <[EMAIL PROTECTED]> wrote
>
>> Feasibility: Our bufmgr lock rewrite already makes this possible. But to
>> enable it, we may need more work: (w1) make bufferpool relation-wise,
>> which makes our estimation of data page residence more easy and reliable.
>> (w2) add aggresive pre-read on buffer pool level. Also, another benefit 
>> of
>> w1 will make our query planner can estimate query cost more precisely.
>
> "w1" is doable by introducing a shared-memory bitmap indicating which
> pages of a relation are in buffer pool (We may want to add a hash to
> manage the relations). Theoretically, O(shared_buffer) bits are enough. So
> this will not use a lot of space.
>
> When we maintain the SharedBufHash, we maintain this bitmap. When we do
> query cost estimation or preread, we just need a rough number, so this can
> be done by scanning the bitmap without lock. Thus there is also almost no
> extra cost.

After some research, I come to the conclusion that the bitmap idea is bad - 
I hope I am wrong :-(.

The benefits of adding a bitmap can enable us knowing current buffer 
residence: (b1) Plan stage: give a more accurate estimation of sequential 
scan; (b2) Execution stage: provide another way to let sequential 
scan/bitmap scan to identify the pages that need pre-read.

For b1, it actually doesn't matter much though. With bitmap we definitely 
can give a better EXPLAIN numbers for seqscan only, but without the bitmap, 
we seldom make wrong choice of choosing/not choosing sequential scan. 
Another other cost estimation can get benefits? I am afraid no since before 
execution, we simply don't know what to read. For b2, the bitmap does 
provide another way without contenting the BufMappingLock to know the 
buffers we should preread, but since the contention of BufMappingLock is not 
intensive, this does marginal benefits.

My previous estimation of the trouble/cost of maintaining this bitmap is too 
optimistic, for one thing, we need compress the bitmap since many of them 
are sparse. Different from uncompressed bitmap, reading without lock can 
cause core dump or totally wrong result instead of just some lossy one. Thus 
to visit a bitmap, we have to at least grab two locks as I can envision, one 
for relation mapping hash, the other for bitmap content protection.

If no more possible benefits to expect, I don't think adding a bitmap is a 
good idea. Any other benefits that you can foresee?

Regards,
Qingqing 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-05 Thread Qingqing Zhou


On Thu, 5 Jan 2006, Qingqing Zhou wrote:
>
> Feasibility: Our bufmgr lock rewrite already makes this possible. But to
> enable it, we may need more work: (w1) make bufferpool relation-wise,
> which makes our estimation of data page residence more easy and reliable.
> (w2) add aggresive pre-read on buffer pool level. Also, another benefit of
> w1 will make our query planner can estimate query cost more precisely.
>

"w1" is doable by introducing a shared-memory bitmap indicating which
pages of a relation are in buffer pool (We may want to add a hash to
manage the relations). Theoretically, O(shared_buffer) bits are enough. So
this will not use a lot of space.

When we maintain the SharedBufHash, we maintain this bitmap. When we do
query cost estimation or preread, we just need a rough number, so this can
be done by scanning the bitmap without lock. Thus there is also almost no
extra cost.

Regards,
Qingqing



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-05 Thread Qingqing Zhou


On Thu, 5 Jan 2006, Tom Lane wrote:
>
> The difference between the cached and non-cached states is that the
> kernel has seen fit to remove those pages from its cache.  It is
> reasonable to suppose that it did so because there was a more immediate
> use for the memory.  Trying to override that behavior will therefore
> result in de-optimizing the global performance of the machine.
>

Yeah, so in another word, warm-up cache is just wasting of time if the
pages are already in OS caches. I agree with this. But does this mean it
may deserve to experiment another strategy: big-stomach Postgres, i.e.,
with big shared_buffer value. By this strategy, (1) almost all the buffers
are in our control, and we will know when a pre-read is needed; (2) avoid
double-buffering: though people are suggested not to use very big
shared_buffer value, but in practice, I see people gain performance by
increase it to 20 or more.

Feasibility: Our bufmgr lock rewrite already makes this possible. But to
enable it, we may need more work: (w1) make bufferpool relation-wise,
which makes our estimation of data page residence more easy and reliable.
(w2) add aggresive pre-read on buffer pool level. Also, another benefit of
w1 will make our query planner can estimate query cost more precisely.

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-05 Thread Tom Lane
Qingqing Zhou <[EMAIL PROTECTED]> writes:
> Hinted by this thread:
>   http://archives.postgresql.org/pgsql-performance/2006-01/msg00016.php
> I wonder if we should really implement file-system-cache-warmup strategy
> which we have discussed before.

The difference between the cached and non-cached states is that the
kernel has seen fit to remove those pages from its cache.  It is
reasonable to suppose that it did so because there was a more immediate
use for the memory.  Trying to override that behavior will therefore
result in de-optimizing the global performance of the machine.

If the machine is actually dedicated to Postgres, I'd expect disk pages
to stay in cache without our taking any heroic measures to keep them
there.  If they don't, that's a matter for kernel configuration tuning,
not "warmup" processes.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Warm-up cache may have its virtue

2006-01-05 Thread Qingqing Zhou

Hinted by this thread:

http://archives.postgresql.org/pgsql-performance/2006-01/msg00016.php

I wonder if we should really implement file-system-cache-warmup strategy
which we have discussed before. There are two natural good places to do
this:

(1) sequentail scan
(2) bitmap index scan

We can consider (2) as a generalized version of (1). For (1), we have
mentioned several heuristics like keep scan interval to avoid competition.
These strategy is also applable to (2).

Question: why file-system level, instead of buffer pool level?  For two
reasons:  (1) Notice that in the above thread, the user just use
"shared_buffers = 8192" which suggest that file-system level is already
good enough; (2) easy to implement.

Use t*h*r*e*a*d? Well, I am a little bit afraid of mention this word.
But we can have some dedicated backends to do this - like bgwriter.

Let's dirty our hands!

Comments?

Regards,
Qingqing


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings