Re: [HACKERS] New Access Method
Also, any pointers where I can find the operator classes and functions for the GIST index? Thanks On 04/03/07, Alan Gatt [EMAIL PROTECTED] wrote: Ok, so I am using GCC 3.2 as a compiler, and the following is the error message: make[4]: Entering directory `/c/dev/postgres/pgsql_tip/src/backend/access/mdim' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND -I../../../../src/include/port/win32 -DBUILDING_DLL -c -o mdim.o mdim.c make[4]: *** [mdim.o] Error 3 make[4]: Target `SUBSYS.o' not remade because of errors. make[4]: Leaving directory `/c/dev/postgres/pgsql_tip/src/backend/access/mdim' make[3]: *** [mdim-recursive] Error 2 If I choose the GIST development, do you know of any documentation which can help me? Thanks, Alan On 04/03/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Alan Gatt escribió: Hello, I am trying to create a new index which is multidimensional (based on R-Tree) which can be used for data warehosuing. I have read the Developers' manual about adding new indexes, and now I want to start coding the new index. Have you considered coding it using the GiST infrastructure? Should be much easier, and automatically give crash safety, concurrency, and a shorter development time. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Re: [HACKERS] New Access Method
On Sun, Mar 04, 2007 at 09:32:58AM +0100, Alan Gatt wrote: Also, any pointers where I can find the operator classes and functions for the GIST index? The documentation is a good start: http://www.postgresql.org/docs/8.1/static/gist.html There are also plenty of examples in the contrib portion of the source code. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] ERROR: operator does not exist: integer !=- integer
On 2007-03-04, William ZHANG [EMAIL PROTECTED] wrote: Here is the steps to reproduce it in CVS HEAD: backend select -1 !=-1; This arguably isn't a bug, because != is not a standard SQL operator, and therefore !=- can legitimately be defined as a single operator by the user. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Synchronized Scan update
On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote: Is there any consensus about whether to include these two parameters as GUCs or constants if my patch is to be accepted? (1) sync_scan_threshold: Use synchronized scanning for tables greater than this many pages; smaller tables will not be affected. That sounds OK. (2) sync_scan_offset: Start a new scan this many pages before a currently running scan to take advantage of the pages that are likely already in cache. I'm somewhat dubious about this parameter, I have to say, even though I am eager for this feature. It seems like a magic parameter that works only when we have the right knowledge to set it correctly. How will we know what to default it to and how will we know whether to set it higher or lower for better performance? Does that value vary according to the workload on the system? How? I'm worried that we get a feature that works well on simple tests and not at all in real world circumstances. I don't want to cast doubt on what could be a great patch or be negative: I just see that the feature relies on the dynamic behaviour of the system. I'd like to see some further studies on how this works to make sure that we can realistically set know how to set this knob, that its the correct knob and it is the only one we need. Further thoughts: It sounds like sync_scan_offset is related to effective_cache_size. Can you comment on whether that might be a something we can use as well/instead? (i.e. set the scan offset to say K * effective_cache_size, 0.1 = K = 0.5)??? Might we do roughly the same thing with sync_scan_threshold as well, and just have enable_sync_scan instead? i.e. sync_scan_threshold = effective_cache_size? When would those two parameters not be connected directly to each other? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UPSERT
Ühel kenal päeval, R, 2007-03-02 kell 10:13, kirjutas Tom Lane: Andrew Dunstan [EMAIL PROTECTED] writes: My instinct would be to follow your first strategy, i.e. detect which path is needed rather than try one and then if it fails do the other. The very first thing you need to think about is how to solve the race condition problem, ie, two backends concurrently trying to insert identical data. Then one of them will update the data inserted by whoeved got the insert first. Until you have a plausible mechanism for that, the whole thing is pie-in-the-sky. Is'nt the standard way of doing it thus: UPDATE IF NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END IF At least this is how UPSERT is usually done in plpgsql -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UPSERT
On Sun, Mar 04, 2007 at 14:55:47 +0200, Hannu Krosing [EMAIL PROTECTED] wrote: UPDATE IF NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END IF I believe it is possible for the above to fail. For example another transaction could create a matching record between the update and insert and then another transaction could delete it between the insert and the second update. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UPSERT
Ühel kenal päeval, P, 2007-03-04 kell 07:46, kirjutas Bruno Wolff III: On Sun, Mar 04, 2007 at 14:55:47 +0200, Hannu Krosing [EMAIL PROTECTED] wrote: UPDATE IF NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END IF I believe it is possible for the above to fail. For example another transaction could create a matching record between the update and insert and then another transaction could delete it between the insert and the second update. Then we may do the second part as a loop and hope that eventually we hit the right point with either INSERT or UPDATE: UPDATE WHILE NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END WHILE -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UPSERT
Bruno Wolff III wrote: On Sun, Mar 04, 2007 at 14:55:47 +0200, Hannu Krosing [EMAIL PROTECTED] wrote: UPDATE IF NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END IF I believe it is possible for the above to fail. For example another transaction could create a matching record between the update and insert and then another transaction could delete it between the insert and the second update. You know we have example in manual right ? http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE :) -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UPSERT
On Sun, Mar 04, 2007 at 02:55:47PM +0200, Hannu Krosing wrote: Is'nt the standard way of doing it thus: UPDATE IF NOT FOUND THEN INSERT IF DUPLICATE KEY THEN UPDATE END IF END IF At least this is how UPSERT is usually done in plpgsql Well, you need to loop, because that last UPDATE can get a not-found again, so you have to keep trying both until they work. I think MERGE would still be cool, because then it's only one command that has to be repeated, rather than two. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] ERROR: operator does not exist: integer !=- integer
Andrew - Supernews wrote: On 2007-03-04, William ZHANG [EMAIL PROTECTED] wrote: Here is the steps to reproduce it in CVS HEAD: backend select -1 !=-1; This arguably isn't a bug, because != is not a standard SQL operator, and therefore !=- can legitimately be defined as a single operator by the user. I missed the first post and can't seem to search for it - so correct me if I am missing something. Isn't the problem here a missing space? != is a valid operator and -1 is the value you are comparing to. !=-1 is not valid but != -1 is correct and what I assume you are looking to achieve. The negation operator goes with the int being negated and is not part of the comparison operator != the space is needed there to separate the two. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] ERROR: operator does not exist: integer !=- integer
Here is the steps to reproduce it in CVS HEAD: $ uname -a Linux os-server 2.6.9-11.19AX #1 Fri Aug 5 05:12:07 EDT 2005 i686 i686 i386 GNU/Linux $ ./postgres --single -D $HOME/pgsql/data postgres PostgreSQL stand-alone backend 8.3devel backend show server_version; 1: server_version (typeid = 25, len = -1, typmod = -1, byval = f) 1: server_version = 8.3devel (typeid = 25, len = -1, typmod = -1, byval = f) backend select -1 != -1; 1: ?column?(typeid = 16, len = 1, typmod = -1, byval = t) 1: ?column? = f (typeid = 16, len = 1, typmod = -1, byval = t) backend select -1 !=-1; ERROR: operator does not exist: integer !=- integer at character 11 HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. STATEMENT: select -1 !=-1; A quick hack in scan.l : *** src/backend/parser/scan.l.old 2007-03-04 11:39:56.831289992 +0800 --- src/backend/parser/scan.l 2007-03-04 11:40:04.142178568 +0800 *** *** 605,610 --- 605,617 { int ic; + /* filter out operaters end with '=' */ + if (yytext[nchars - 2] == '=') + { + nchars--; + continue; + } + for (ic = nchars-2; ic = 0; ic--) { if (strchr([EMAIL PROTECTED]|`?%, yytext[ic])) Now the result is correct: backend select -1 !=-1; 1: ?column?(typeid = 16, len = 1, typmod = -1, byval = t) 1: ?column? = f (typeid = 16, len = 1, typmod = -1, byval = t) -- Regards, William ZHANG ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Trivial HugeTLB Benchmark
Hey, Out of curiosity I benchmarked PostgreSQL 8.2.3 using huge pages for shared memory. Oracle claims fairly significant speedups with huge pages but I couldn't find any information on PostgreSQL. I used the attached patch to enable huge pages on Linux. The test hardware is a dual Nocona Xeon 3.2Ghz with 4GB of RAM and two 15K 73GB Ultra320 disks in a software RAID-1. The box is running CentOS 4.4 for x86-64 and the vendor's stock 2.6.9 kernel. The relevant postgresql.conf settings are: shared_buffers=160MB work_mem=8MB fsync=off full_page_writes=off effective_cache_size=3GB I ran each pgbench after a fresh reboot. I used 85 huge pages reserved at boot for the huge page test, and none for the normal shared memory test. Normal shared memory: -bash-3.00$ pgbench -c 5 -t 1 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 5 number of transactions per client: 1 number of transactions actually processed: 5/5 tps = 1669.009344 (including connections establishing) tps = 1669.941756 (excluding connections establishing) Huge pages: -bash-3.00$ pgbench -c 5 -t 1 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 5 number of transactions per client: 1 number of transactions actually processed: 5/5 tps = 1678.392138 (including connections establishing) tps = 1679.268344 (excluding connections establishing) Assuming that this is a representative benchmark, it looks like huge pages are a very slight (0.5%) performance win. I'm guessing that PostgreSQL doesn't benefit as much as Oracle due to its much less ridiculous shared memory size. That performance boost is almost certainly not worth the platform-specific code or administration overhead of hugetlb on Linux. -Ryan This electronic mail transmission and any accompanying attachments contain confidential information intended only for the use of the individual or entity named above. Any dissemination, distribution, copying or action taken in reliance on the contents of this communication by anyone other than the intended recipient is strictly prohibited. If you have received this communication in error please immediately delete the e-mail and either notify the sender at the above e-mail address or by telephone at 250.386.5323. diff -u postgresql-8.2.3/src/backend/port/sysv_shmem.c /usr/src/redhat/BUILD/postgresql-8.2.3/src/backend/port/sysv_shmem.c --- postgresql-8.2.3/src/backend/port/sysv_shmem.c 2006-07-13 22:28:28.0 -0700 +++ /usr/src/redhat/BUILD/postgresql-8.2.3/src/backend/port/sysv_shmem.c 2007-03-03 17:05:37.0 -0800 @@ -46,6 +46,7 @@ #define PG_SHMAT_FLAGS 0 #endif +#define PG_LARGE_PAGE_SIZE 2048 * 1024 unsigned long UsedShmemSegID = 0; void *UsedShmemSegAddr = NULL; @@ -75,7 +76,8 @@ IpcMemoryId shmid; void *memAddress; - shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection); + size = ((size - 1) ~(PG_LARGE_PAGE_SIZE - 1)) + PG_LARGE_PAGE_SIZE; + shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection | SHM_HUGETLB); if (shmid 0) { ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] ERROR: operator does not exist: integer !=- integer
William ZHANG [EMAIL PROTECTED] writes: backend select -1 !=-1; ERROR: operator does not exist: integer !=- integer at character 11 This is not a bug. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Trivial HugeTLB Benchmark
Ryan Cumming [EMAIL PROTECTED] writes: I ran each pgbench after a fresh reboot. I used 85 huge pages reserved at boot for the huge page test, and none for the normal shared memory test. Normal shared memory: -bash-3.00$ pgbench -c 5 -t 1 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 5 number of transactions per client: 1 number of transactions actually processed: 5/5 tps = 1669.009344 (including connections establishing) tps = 1669.941756 (excluding connections establishing) If you did this only once, the results are not really trustworthy; you need to average several similar runs before you can have much confidence. pgbench's inter-run variation is usually upwards of 10%, so trying to draw conclusions about half-percentage-point differences without averaging is a waste of time. Also, if scaling factor number of clients then what you're mostly measuring is update-contention behavior. Try it with -s 10 and -c 5; and don't forget to reinitialize the database for each run of tests to be sure it's fair. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Synchronized Scan update
(2) sync_scan_offset: Start a new scan this many pages before a currently running scan to take advantage of the pages that are likely already in cache. I'm somewhat dubious about this parameter, I have to say, even though I am eager for this feature. It seems like a magic parameter that works only when we have the right knowledge to set it correctly. Hello, Don't get me wrong, I want things to be easily understandable as well but the reason you site above pretty much makes us need to remove most of the postgresql.conf, including all bgwriter, vacuum cost delay, and autovac settings. Not to mention commit delay and others ;). Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] New Access Method
Alan Gatt [EMAIL PROTECTED] writes: make[4]: Entering directory `/c/dev/postgres/pgsql_tip/src/backend/access/mdim' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -I../../../../src/include -I./src/include/port/win32 -DEXEC_BACKEND -I../../../../src/include/port/win32 -DBUILDING_DLL -c -o mdim.o mdim.c make[4]: *** [mdim.o] Error 3 Umm ... what happened to the actual compiler error message? I'd have expected to see something in between those lines. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ERROR: operator does not exist: integer !=- integer
I missed the first post and can't seem to search for it - so correct me if I am missing something. Isn't the problem here a missing space? != is a valid operator and -1 is the value you are comparing to. !=-1 is not valid but != -1 is correct and what I assume you are looking to achieve. Well yes it will work if you add a space, but technically the problem is the query should be written like this: 1 -1 or 1 -1 Joshua D. Drake The negation operator goes with the int being negated and is not part of the comparison operator != the space is needed there to separate the two. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Bug: Buffer cache is not scan resistant
I'm putting this out there before we publish a fix so that we can discuss how best to fix it. Doug and Sherry recently found the source of an important performance issue with the Postgres shared buffer cache. The issue is summarized like this: the buffer cache in PGSQL is not scan resistant as advertised. A sequential scan of a table larger than cache will pollute the buffer cache in almost all circumstances. Here is performance of GPDB 2.301 (Postgres 8.1.6) on a single X4500 (thumper-3) with 4 cores where bigtable is a table 2x the size of RAM and memtable is a table that fits into I/O cache: With our default setting of shared_buffers (16MB): Operation memtablebigtable --- SELECT COUNT(*) 1221 MB/s 973 MB/s VACUUM 1709 MB/s 1206 MB/s We had observed that VACUUM would perform better when done right after a SELECT. In the above example, the faster rate from disk was 1608 MB/s, compared to the normal rate of 1206 MB/s. We verified this behavior on Postgres 8.2 as well. The buffer selection algorithm is choosing buffer pages scattered throughout the buffer cache in almost all circumstances. Sherry traced the behavior to the processor repeatedly flushing the L2 cache. Doug found that we weren't using the Postgres buffer cache the way we expected, instead we were loading the scanned data from disk into the cache even though there was no possibility of reusing it. In addition to pushing other, possibly useful pages from the cache, it has the additional behavior of invalidating the L2 cache for the remainder of the executor path that uses the data. To prove that the buffer cache was the source of the problem, we dropped the shared buffer size to fit into L2 cache (1MB per Opteron core), and this is what we saw (drop size of shared buffers to 680KB): Operation memtablebigtable --- SELECT COUNT(*) 1320 MB/s 1059 MB/s VACUUM 3033 MB/s 1597 MB/s These results do not vary with the order of operations. Thoughts on the best way to fix the buffer selection algorithm? Ideally, one page would be used in the buffer cache in circumstances where the table to be scanned is (significantly?) larger than the size of the buffer cache. - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Luke Lonergan [EMAIL PROTECTED] writes: The issue is summarized like this: the buffer cache in PGSQL is not scan resistant as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache; which is hardly surprising considering it doesn't know the size of L2 cache. That's not a consideration that we've ever taken into account. I'm also less than convinced that it'd be helpful for a big seqscan: won't reading a new disk page into memory via DMA cause that memory to get flushed from the processor cache anyway? I wonder whether your numbers are explained by some other consideration than you think. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bug: Buffer cache is not scan resistant
When we instrument the page selections made within the buffer cache, they are sequential and span the entire address space of the cache. With respect to whether it's L2, it's a conclusion based on the experimental results. It's not the TLB, as we also tested for the 512 entries for each L2. One thing I left out of the previous post: the difference between fast and slow behavior was that in the fast case, the buffer selection alternated between two buffer pages. This was the case only when the preceding statement was a SELECT and the statement was VACUUM. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Sunday, March 04, 2007 08:36 PM Eastern Standard Time To: Luke Lonergan Cc: PGSQL Hackers; Doug Rady; Sherry Moore Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant Luke Lonergan [EMAIL PROTECTED] writes: The issue is summarized like this: the buffer cache in PGSQL is not scan resistant as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache; which is hardly surprising considering it doesn't know the size of L2 cache. That's not a consideration that we've ever taken into account. I'm also less than convinced that it'd be helpful for a big seqscan: won't reading a new disk page into memory via DMA cause that memory to get flushed from the processor cache anyway? I wonder whether your numbers are explained by some other consideration than you think. regards, tom lane
Re: [HACKERS] Bug: Buffer cache is not scan resistant
One more thing: the L2 is invalidated when re-written from the kernel IO cache, but the pages addressed in L2 retain their values when 'writeen thru' which allows the new data to be re-used up the executor chain. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Sunday, March 04, 2007 08:36 PM Eastern Standard Time To: Luke Lonergan Cc: PGSQL Hackers; Doug Rady; Sherry Moore Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant Luke Lonergan [EMAIL PROTECTED] writes: The issue is summarized like this: the buffer cache in PGSQL is not scan resistant as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache; which is hardly surprising considering it doesn't know the size of L2 cache. That's not a consideration that we've ever taken into account. I'm also less than convinced that it'd be helpful for a big seqscan: won't reading a new disk page into memory via DMA cause that memory to get flushed from the processor cache anyway? I wonder whether your numbers are explained by some other consideration than you think. regards, tom lane
[HACKERS] Automatic adjustment of bgwriter_lru_maxpages (was: Dead Space Map version 2)
Jim C. Nasby [EMAIL PROTECTED] wrote: Perhaps it would be better to have the bgwriter take a look at how many dead tuples (or how much space the dead tuples account for) when it writes a page out and adjust the DSM at that time. Yeah, I feel it is worth optimizable, too. One question is, how we treat dirty pages written by backends not by bgwriter? If we want to add some works in bgwriter, do we also need to make bgwriter to write almost of dirty pages? IMO yes, we want the bgwriter to be the only process that's normally writing pages out. How close we are to that, I don't know... I'm working on making the bgwriter to write almost of dirty pages. This is the proposal for it using automatic adjustment of bgwriter_lru_maxpages. The bgwriter_lru_maxpages value will be adjusted to the equal number of calls of StrategyGetBuffer() per cycle with some safety margins (x2 at present). The counter are incremented per call and reset to zero at StrategySyncStart(). This patch alone is not so useful except for hiding hardly tunable parameters from users. However, it would be a first step of allow bgwriters to do some works before writing dirty buffers. - [DSM] Pick out pages worth vaccuming and register them into DSM. - [HOT] Do a per page vacuum for HOT updated tuples. (Is it worth doing?) - [TODO Item] Shrink expired COLD updated tuples to just their headers. - Set commit hint bits to reduce subsequent writes of blocks. http://archives.postgresql.org/pgsql-hackers/2007-01/msg01363.php I tested the attached patch on pgbench -s5 (80MB) with shared_buffers=32MB. I got an expected result as below. Over 75% of buffers are written by bgwriter. In addition , automatic adjusted bgwriter_lru_maxpages values were much higher than the default value (5). It shows that the most suitable values greatly depends on workloads. benchmark | throughput | cpu-usage | by-bgwriter | bgwriter_lru_maxpages ++---+-+--- default|300tps | 100% | 77.5% | 120 pages/cycle with sleep |150tps | 50% | 98.6% | 70 pages/cycle I hope that this patch will be a first step of the intelligent bgwriter. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Tom Lane wrote: Luke Lonergan [EMAIL PROTECTED] writes: The issue is summarized like this: the buffer cache in PGSQL is not scan resistant as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache; which is hardly surprising considering it doesn't know the size of L2 cache. That's not a consideration that we've ever taken into account. To add a little to this - forgetting the scan resistant point for the moment... cranking down shared_buffers to be smaller than the L2 cache seems to help *any* sequential scan immensely, even on quite modest HW: e.g: PIII 1.26Ghz 512Kb L2 cache, 2G ram, SELECT count(*) FROM lineitem (which is about 11GB) performance: Shared_buffers Elapsed -- --- 400MB 101 s 128KB74 s When I've profiled this activity, I've seen a lot of time spent searching for/allocating a new buffer for each page being fetched. Obviously having less of them to search through will help, but having less than the L2 cache-size worth of 'em seems to help a whole lot! Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bug: Buffer cache is not scan resistant
On Mon, 5 Mar 2007, Mark Kirkwood wrote: To add a little to this - forgetting the scan resistant point for the moment... cranking down shared_buffers to be smaller than the L2 cache seems to help *any* sequential scan immensely, even on quite modest HW: e.g: PIII 1.26Ghz 512Kb L2 cache, 2G ram, SELECT count(*) FROM lineitem (which is about 11GB) performance: Shared_buffers Elapsed -- --- 400MB 101 s 128KB74 s When I've profiled this activity, I've seen a lot of time spent searching for/allocating a new buffer for each page being fetched. Obviously having less of them to search through will help, but having less than the L2 cache-size worth of 'em seems to help a whole lot! Could you demonstrate that point by showing us timings for shared_buffers sizes from 512K up to, say, 2 MB? The two numbers you give there might just have to do with managing a large buffer. Thanks, Gavin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Gavin, Mark, Could you demonstrate that point by showing us timings for shared_buffers sizes from 512K up to, say, 2 MB? The two numbers you give there might just have to do with managing a large buffer. I suggest two experiments that we've already done: 1) increase shared buffers to double the L2 cache size, you should see that the behavior reverts to the slow performance and is constant at larger sizes 2) instrument the calls to BufferGetPage() (a macro) and note that the buffer block numbers returned increase sequentially during scans of tables larger than the buffer size - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Gavin Sherry [EMAIL PROTECTED] writes: Could you demonstrate that point by showing us timings for shared_buffers sizes from 512K up to, say, 2 MB? The two numbers you give there might just have to do with managing a large buffer. Using PG CVS HEAD on 64-bit Intel Xeon (1MB L2 cache), Fedora Core 5, I don't measure any noticeable difference in seqscan speed for shared_buffers set to 32MB or 256kB. I note that the code would not let me choose the latter setting without a large decrease in max_connections, which might be expected to cause some performance changes in itself. Now this may only prove that the disk subsystem on this machine is too cheap to let the system show any CPU-related issues. I'm seeing a scan rate of about 43MB/sec for both count(*) and plain ol' wc, which is a factor of 4 or so less than Mark's numbers suggest... but top shows CPU usage of less than 5%, so even with a 4x faster disk I'd not really expect that CPU speed would become interesting. (This is indeed a milestone, btw, because it wasn't so long ago that count(*) was nowhere near disk speed.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq