Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Simon Riggs
On Tue, 2006-06-13 at 15:00 +0200, Martijn van Oosterhout wrote:

> What you might be able to do is to reduce its effect. The thing that
> occurs to me is to make hint bit changes only mark a page half-dirty.
> If the page is evicted because the space is needed in the buffer cache,
> it can be just dropped. However, the bgwriter will write it as normal.
> Hence, setting hint bits will become a sort of background operation,
> done when there's time.

Yes, I think that's a very good idea. What that introduces is the
concept of "dirty priority" - i.e. some pages are more important to
write out quickly than others. If the bgwriter cleans the higher
priority ones first it should do a better job of keeping the bufferpool
clean. That needs some work on it before its a fully fledged proposal.

> It seems to me that if a large table is loaded in a single transaction,
> the check for if the transaction is committed should be cheap because
> it's checking the same transaction id over and over.

I was considering this the other day. My original idea was to set the
xmin to be FrozenTransaction when loading a table with COPY in the same
transaction as the one that loaded it but that didn't work because of
MVCC violation. If we just set the hint bit to show XMIN_COMMITTED then
we need not worry about MVCC violations, since the xmin will still be
higher than any pre-existing snapshots.

I think Tom had a problem with that because it relied on file truncation
on recovery, but I'm not clear on why exactly?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes:
> What you are seeing is the now infamous "Postgres writes a table one more
> time after loading" behavior.

> Tom - can you explain what's going on with this?

Setting XMIN_COMMITTED hint bits.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan
Martin,

On 6/13/06 6:00 AM, "Martijn van Oosterhout"  wrote:

> What you might be able to do is to reduce its effect. The thing that
> occurs to me is to make hint bit changes only mark a page half-dirty.
> If the page is evicted because the space is needed in the buffer cache,
> it can be just dropped. However, the bgwriter will write it as normal.
> Hence, setting hint bits will become a sort of background operation,
> done when there's time.
> 
> It seems to me that if a large table is loaded in a single transaction,
> the check for if the transaction is committed should be cheap because
> it's checking the same transaction id over and over.

Thanks for the tip - it's now in the mill for mulling over and finding
someone with the time to work something in...

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 05:46:23AM -0700, Luke Lonergan wrote:
> Martin,
> 
> On 6/13/06 5:42 AM, "Martijn van Oosterhout"  wrote:
> 
> > Is it possible it may have something to do with the hint bits? There's
> > are a bunch of bits in the header to deal with speeding up of MVCC
> > tests. Maybe changing those bits marks the page dirty and forces a
> > write?
> 
> Yes, that's it.  Writing data twice (three times including WAL) is a Very
> Bad Thing for large data work - how should we fix it?

Well, I don't think you want to get rid of it entirely because
otherwise forevermore, every lookup in that table will require a check
to see if the transaction is committed. So at some point the hint bit
needs to be set and/or the xmin frozen (vacuum freeze does that for
example).

What you might be able to do is to reduce its effect. The thing that
occurs to me is to make hint bit changes only mark a page half-dirty.
If the page is evicted because the space is needed in the buffer cache,
it can be just dropped. However, the bgwriter will write it as normal.
Hence, setting hint bits will become a sort of background operation,
done when there's time.

It seems to me that if a large table is loaded in a single transaction,
the check for if the transaction is committed should be cheap because
it's checking the same transaction id over and over.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan
Martin,

On 6/13/06 5:42 AM, "Martijn van Oosterhout"  wrote:

> Is it possible it may have something to do with the hint bits? There's
> are a bunch of bits in the header to deal with speeding up of MVCC
> tests. Maybe changing those bits marks the page dirty and forces a
> write?

Yes, that's it.  Writing data twice (three times including WAL) is a Very
Bad Thing for large data work - how should we fix it?

- Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 04:54:05AM -0700, Luke Lonergan wrote:
> > Experimental results here suggest that for larger tables Linux seems
> > to detect a seq-scan and not bother caching.   It's very reproducible
> > for me here to do a reboot and not see the full speedup on a seq_scan
> > until the third time I run a query.su
> 
> What you are seeing is the now infamous "Postgres writes a table one more
> time after loading" behavior.
> 
> Simon Riggs once dug into it to find the root cause, and I no longer recall
> exactly why, but after you've loaded data, the first seq scan will re-write
> some large portion of the data while doing the initial scan. This wreaks
> havoc on normal benchmarking practices.

Is it possible it may have something to do with the hint bits? There's
are a bunch of bits in the header to deal with speeding up of MVCC
tests. Maybe changing those bits marks the page dirty and forces a
write?

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-13 Thread Luke Lonergan
Ron,

On 6/8/06 11:49 AM, "Ron Mayer" <[EMAIL PROTECTED]> wrote:

> Experimental results here suggest that for larger tables Linux seems
> to detect a seq-scan and not bother caching.   It's very reproducible
> for me here to do a reboot and not see the full speedup on a seq_scan
> until the third time I run a query.su

What you are seeing is the now infamous "Postgres writes a table one more
time after loading" behavior.

Simon Riggs once dug into it to find the root cause, and I no longer recall
exactly why, but after you've loaded data, the first seq scan will re-write
some large portion of the data while doing the initial scan. This wreaks
havoc on normal benchmarking practices.

Tom - can you explain what's going on with this?  It seems to write more
than just the contents of the WAL, so it's not a flush of the WAL writes
AFAICT.

- Luke 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Running a query twice to ensure cached results.

2006-06-08 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes:
>   Does running a query only twice really insure that a result is cached?
>   It seems not to be the case for seq-scans on Linux.

Should work for tables small enough to fit into the shared_buffers
arena.  I wouldn't necessarily assume it for large tables.

Note that what I was really doing was taking the timing measurement
again on data cached by the *first* run, so that I would have something
that could fairly be compared to the following EXPLAIN ANALYZE --- which
of course would likewise be operating on cached data.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Running a query twice to ensure cached results.

2006-06-08 Thread Ron Mayer

Tom Lane wrote:

-- do it again to ensure fully cached
bench=# select count(*) from accounts;


Short summary:

 Does running a query only twice really insure that a result is cached?
 It seems not to be the case for seq-scans on Linux.

 I think this may matters to the discussions about a readahead
 thread/process that come up on this list that come up here occasionally.

Experimental results here suggest that for larger tables Linux seems
to detect a seq-scan and not bother caching.   It's very reproducible
for me here to do a reboot and not see the full speedup on a seq_scan
until the third time I run a query.su  An example shown below [1] shows
that the third run of a query is faster than the second run.  The
output of a 'vmstat 5' [2] while these queries was happening agrees
that significant I/O was still happening on the second run, but
no I/O happened the third time.   The table comfortably fits in
memory (700MB table on a 2GB ram machine) and the machine was
otherwise idle so noone else wiped out the cache between the
first and second runs.

Why do I think this is worth mentioning here?
  * I think it impacts the occasional thread about wanting
to include logic in postgresql for readahead [3] or for
the threads suggesting hinting to the the OS though madvise
or similar to avoid caching seq-scans.   It seems that the
Linux is detecting and at least somewhat reacting
to seq scans even with no hinting.  Anything added
to postgresql might end up being a duplicated effort.
I think Bruce suggested that Solaris does this free-behind
automatically [4], but this is the first I've noticed
that Linux seems to do similar.

  * I think it matters to people who post explain analyze
twice without running it so often they get stable results.
(I note that this was not a problem for Tom since the
timing of his first and second runs were the same so
I assume he was just saying that he observed that the
query was cached rather than that the first run forced
the second run to be cached.)

Ron


=
== [note 1] the repeated queries showing the speedup after 3 runs.
== Running the same select count(*) 4 times after a clean reboot.
== Seems the OS's caching logic decided that the first seq_scan
== wasn't 'interesting' enough
=
fli=# select count(*) from facets_s;
  count
--
 15976558
(1 row)

Time: 29788.047 ms
fli=# select count(*) from facets_s;
  count
--
 15976558
(1 row)

Time: 19344.573 ms
fli=# select count(*) from facets_s;
  count
--
 15976558
(1 row)

Time: 13411.272 ms
fli=# select count(*) from facets_s;
  count
--
 15976558
(1 row)

Time: 13107.856 ms


# [note 2] vmstat 5 while the above queries were being run


procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 1  1140  62140  71256 713360004731   9284  7  1 92  0
*** the first time
 1  0140  50860  31912 80830402 2521529 1147  2612 49 15  0 36
 1  0360  54420  2 85524000 23934 7 1139  2553 47 14  0 39
 0  1360  54008  11100 87870800 2370425 1149  2467 46 12  0 41
 0  1360  52512  11140 89659200 24062 6 1135  2460 47 11  0 41
*** the second time
 0  0360  52688  11172 90691600 1335719 1085  1989 31  7 38 24
 1  0360  53976  11076 9125400   44 1427357 1113  2102 32  7 29 32
 2  0360  54788  10908 92378800 2450954 1171  2474 46 12  0 42
 1  0360  54944   3096 93994800 1118039 1093  1976 65 13  0 22
*** the third time
 3  0360  54280   3872 94050800   26414 1041  1560 85 15  0  0
 1  0360  53852   3904 940940008829 1022  1505 53  9 36  2
 2  0360  51616   4052 94306800   44354 1037  1552 82 15  0  4
 1  0360  51488   4060 9431800022 2 1013  1522 84 16  0  0

#
[3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01449.php
[4] http://archives.postgresql.org/pgsql-performance/2003-10/msg00188.php

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly