Re: [HACKERS] Running a query twice to ensure cached results.
On Tue, 2006-06-13 at 15:00 +0200, Martijn van Oosterhout wrote: > What you might be able to do is to reduce its effect. The thing that > occurs to me is to make hint bit changes only mark a page half-dirty. > If the page is evicted because the space is needed in the buffer cache, > it can be just dropped. However, the bgwriter will write it as normal. > Hence, setting hint bits will become a sort of background operation, > done when there's time. Yes, I think that's a very good idea. What that introduces is the concept of "dirty priority" - i.e. some pages are more important to write out quickly than others. If the bgwriter cleans the higher priority ones first it should do a better job of keeping the bufferpool clean. That needs some work on it before its a fully fledged proposal. > It seems to me that if a large table is loaded in a single transaction, > the check for if the transaction is committed should be cheap because > it's checking the same transaction id over and over. I was considering this the other day. My original idea was to set the xmin to be FrozenTransaction when loading a table with COPY in the same transaction as the one that loaded it but that didn't work because of MVCC violation. If we just set the hint bit to show XMIN_COMMITTED then we need not worry about MVCC violations, since the xmin will still be higher than any pre-existing snapshots. I think Tom had a problem with that because it relied on file truncation on recovery, but I'm not clear on why exactly? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Running a query twice to ensure cached results.
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > What you are seeing is the now infamous "Postgres writes a table one more > time after loading" behavior. > Tom - can you explain what's going on with this? Setting XMIN_COMMITTED hint bits. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Running a query twice to ensure cached results.
Martin, On 6/13/06 6:00 AM, "Martijn van Oosterhout" wrote: > What you might be able to do is to reduce its effect. The thing that > occurs to me is to make hint bit changes only mark a page half-dirty. > If the page is evicted because the space is needed in the buffer cache, > it can be just dropped. However, the bgwriter will write it as normal. > Hence, setting hint bits will become a sort of background operation, > done when there's time. > > It seems to me that if a large table is loaded in a single transaction, > the check for if the transaction is committed should be cheap because > it's checking the same transaction id over and over. Thanks for the tip - it's now in the mill for mulling over and finding someone with the time to work something in... - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Running a query twice to ensure cached results.
On Tue, Jun 13, 2006 at 05:46:23AM -0700, Luke Lonergan wrote: > Martin, > > On 6/13/06 5:42 AM, "Martijn van Oosterhout" wrote: > > > Is it possible it may have something to do with the hint bits? There's > > are a bunch of bits in the header to deal with speeding up of MVCC > > tests. Maybe changing those bits marks the page dirty and forces a > > write? > > Yes, that's it. Writing data twice (three times including WAL) is a Very > Bad Thing for large data work - how should we fix it? Well, I don't think you want to get rid of it entirely because otherwise forevermore, every lookup in that table will require a check to see if the transaction is committed. So at some point the hint bit needs to be set and/or the xmin frozen (vacuum freeze does that for example). What you might be able to do is to reduce its effect. The thing that occurs to me is to make hint bit changes only mark a page half-dirty. If the page is evicted because the space is needed in the buffer cache, it can be just dropped. However, the bgwriter will write it as normal. Hence, setting hint bits will become a sort of background operation, done when there's time. It seems to me that if a large table is loaded in a single transaction, the check for if the transaction is committed should be cheap because it's checking the same transaction id over and over. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Running a query twice to ensure cached results.
Martin, On 6/13/06 5:42 AM, "Martijn van Oosterhout" wrote: > Is it possible it may have something to do with the hint bits? There's > are a bunch of bits in the header to deal with speeding up of MVCC > tests. Maybe changing those bits marks the page dirty and forces a > write? Yes, that's it. Writing data twice (three times including WAL) is a Very Bad Thing for large data work - how should we fix it? - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Running a query twice to ensure cached results.
On Tue, Jun 13, 2006 at 04:54:05AM -0700, Luke Lonergan wrote: > > Experimental results here suggest that for larger tables Linux seems > > to detect a seq-scan and not bother caching. It's very reproducible > > for me here to do a reboot and not see the full speedup on a seq_scan > > until the third time I run a query.su > > What you are seeing is the now infamous "Postgres writes a table one more > time after loading" behavior. > > Simon Riggs once dug into it to find the root cause, and I no longer recall > exactly why, but after you've loaded data, the first seq scan will re-write > some large portion of the data while doing the initial scan. This wreaks > havoc on normal benchmarking practices. Is it possible it may have something to do with the hint bits? There's are a bunch of bits in the header to deal with speeding up of MVCC tests. Maybe changing those bits marks the page dirty and forces a write? Have a ncie day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Running a query twice to ensure cached results.
Ron, On 6/8/06 11:49 AM, "Ron Mayer" <[EMAIL PROTECTED]> wrote: > Experimental results here suggest that for larger tables Linux seems > to detect a seq-scan and not bother caching. It's very reproducible > for me here to do a reboot and not see the full speedup on a seq_scan > until the third time I run a query.su What you are seeing is the now infamous "Postgres writes a table one more time after loading" behavior. Simon Riggs once dug into it to find the root cause, and I no longer recall exactly why, but after you've loaded data, the first seq scan will re-write some large portion of the data while doing the initial scan. This wreaks havoc on normal benchmarking practices. Tom - can you explain what's going on with this? It seems to write more than just the contents of the WAL, so it's not a flush of the WAL writes AFAICT. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Running a query twice to ensure cached results.
Ron Mayer <[EMAIL PROTECTED]> writes: > Does running a query only twice really insure that a result is cached? > It seems not to be the case for seq-scans on Linux. Should work for tables small enough to fit into the shared_buffers arena. I wouldn't necessarily assume it for large tables. Note that what I was really doing was taking the timing measurement again on data cached by the *first* run, so that I would have something that could fairly be compared to the following EXPLAIN ANALYZE --- which of course would likewise be operating on cached data. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Running a query twice to ensure cached results.
Tom Lane wrote: -- do it again to ensure fully cached bench=# select count(*) from accounts; Short summary: Does running a query only twice really insure that a result is cached? It seems not to be the case for seq-scans on Linux. I think this may matters to the discussions about a readahead thread/process that come up on this list that come up here occasionally. Experimental results here suggest that for larger tables Linux seems to detect a seq-scan and not bother caching. It's very reproducible for me here to do a reboot and not see the full speedup on a seq_scan until the third time I run a query.su An example shown below [1] shows that the third run of a query is faster than the second run. The output of a 'vmstat 5' [2] while these queries was happening agrees that significant I/O was still happening on the second run, but no I/O happened the third time. The table comfortably fits in memory (700MB table on a 2GB ram machine) and the machine was otherwise idle so noone else wiped out the cache between the first and second runs. Why do I think this is worth mentioning here? * I think it impacts the occasional thread about wanting to include logic in postgresql for readahead [3] or for the threads suggesting hinting to the the OS though madvise or similar to avoid caching seq-scans. It seems that the Linux is detecting and at least somewhat reacting to seq scans even with no hinting. Anything added to postgresql might end up being a duplicated effort. I think Bruce suggested that Solaris does this free-behind automatically [4], but this is the first I've noticed that Linux seems to do similar. * I think it matters to people who post explain analyze twice without running it so often they get stable results. (I note that this was not a problem for Tom since the timing of his first and second runs were the same so I assume he was just saying that he observed that the query was cached rather than that the first run forced the second run to be cached.) Ron = == [note 1] the repeated queries showing the speedup after 3 runs. == Running the same select count(*) 4 times after a clean reboot. == Seems the OS's caching logic decided that the first seq_scan == wasn't 'interesting' enough = fli=# select count(*) from facets_s; count -- 15976558 (1 row) Time: 29788.047 ms fli=# select count(*) from facets_s; count -- 15976558 (1 row) Time: 19344.573 ms fli=# select count(*) from facets_s; count -- 15976558 (1 row) Time: 13411.272 ms fli=# select count(*) from facets_s; count -- 15976558 (1 row) Time: 13107.856 ms # [note 2] vmstat 5 while the above queries were being run procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 1140 62140 71256 713360004731 9284 7 1 92 0 *** the first time 1 0140 50860 31912 80830402 2521529 1147 2612 49 15 0 36 1 0360 54420 2 85524000 23934 7 1139 2553 47 14 0 39 0 1360 54008 11100 87870800 2370425 1149 2467 46 12 0 41 0 1360 52512 11140 89659200 24062 6 1135 2460 47 11 0 41 *** the second time 0 0360 52688 11172 90691600 1335719 1085 1989 31 7 38 24 1 0360 53976 11076 9125400 44 1427357 1113 2102 32 7 29 32 2 0360 54788 10908 92378800 2450954 1171 2474 46 12 0 42 1 0360 54944 3096 93994800 1118039 1093 1976 65 13 0 22 *** the third time 3 0360 54280 3872 94050800 26414 1041 1560 85 15 0 0 1 0360 53852 3904 940940008829 1022 1505 53 9 36 2 2 0360 51616 4052 94306800 44354 1037 1552 82 15 0 4 1 0360 51488 4060 9431800022 2 1013 1522 84 16 0 0 # [3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01449.php [4] http://archives.postgresql.org/pgsql-performance/2003-10/msg00188.php ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly