Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+
Yes, I assume that the patch to group the writes isn't something we want right now, and the one for O_DIRECT is going to need an additional fsync, and I have asked for testing on that. I have posted a patch that I think fixes the memory leak reported and am waiting for feedback on that. --- Simon Riggs wrote: On Tue, 2005-03-01 at 13:53 -0800, Mark Wong wrote: On Thu, Feb 03, 2005 at 07:25:55PM +0900, ITAGAKI Takahiro wrote: Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. Ok, finally got back into the office and was able to run 1 set of tests. So the new baseline result with 8.0.1: http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/ Throughput: 3639.97 Results with the patch but open_direct not set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/308/ Throughput: 3494.72 Results with the patch and open_direct set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/312/ Throughput: 3489.69 You can verify that the wall_sync_method is set to open_direct under the database parameters link, but I'm wondering if I missed something. It looks a little odd the the performance dropped. Is there anything more to say on this? Is it case-closed, or is there further work underway - I can't see any further chat on this thread. These results show it doesn't work better on larger systems. The original testing showed it worked better on smaller systems - is there still scope to include this for smaller configs? If not, thanks for taking the time to write the patch and investigate whether changes in this area would help. Not every performance patch improves things, but that doesn't mean we shouldn't try... Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
On Wed, Mar 23, 2005 at 01:55:46PM +0900, ITAGAKI Takahiro wrote: Hi, Mark. Mark Wong [EMAIL PROTECTED] wrote: In light of this thread, have you compared the performance on Linux-2.4? No, but I'm just testing my patch on Linux-2.4 with a middle-range server. I will report the results sometime soon. By the way, I found the debug option (XLOG_MULTIPAGE_WRITER_DEBUG) was enabled in your prior benchmarks. It writes logs a lot. I hope performance doesn't fall at least without debug. That's 'log_min_messages = debug1' right? Maybe I should give it another go with that set back to the default. Thanks, Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
On Tue, Jan 25, 2005 at 06:06:23PM +0900, ITAGAKI Takahiro wrote: Environment: OS : Linux kernel 2.6.9 CPU: Pentium 4 3GHz disk : ATA 5400rpm (Data and WAL are placed on same partition.) memory : 1GB config : shared_buffers=1, wal_buffers=256, XLOG_SEG_SIZE=256MB, checkpoint_segment=4 Hi Itagaki, In light of this thread, have you compared the performance on Linux-2.4? Direct io on block device has performance regression on 2.6.x kernel http://www.ussg.iu.edu/hypermail/linux/kernel/0503.1/0328.html Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
Hi, Mark. Mark Wong [EMAIL PROTECTED] wrote: In light of this thread, have you compared the performance on Linux-2.4? No, but I'm just testing my patch on Linux-2.4 with a middle-range server. I will report the results sometime soon. By the way, I found the debug option (XLOG_MULTIPAGE_WRITER_DEBUG) was enabled in your prior benchmarks. It writes logs a lot. I hope performance doesn't fall at least without debug. So the new baseline result with 8.0.1: Throughput: 3639.97 Results with the patch but open_direct not set: Throughput: 3494.72 Results with the patch and open_direct set: Throughput: 3489.69 --- ITAGAKI Takahiro [EMAIL PROTECTED] NTT Cyber Space Laboratories ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+
On Tue, 2005-03-01 at 13:53 -0800, Mark Wong wrote: On Thu, Feb 03, 2005 at 07:25:55PM +0900, ITAGAKI Takahiro wrote: Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. Ok, finally got back into the office and was able to run 1 set of tests. So the new baseline result with 8.0.1: http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/ Throughput: 3639.97 Results with the patch but open_direct not set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/308/ Throughput: 3494.72 Results with the patch and open_direct set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/312/ Throughput: 3489.69 You can verify that the wall_sync_method is set to open_direct under the database parameters link, but I'm wondering if I missed something. It looks a little odd the the performance dropped. Is there anything more to say on this? Is it case-closed, or is there further work underway - I can't see any further chat on this thread. These results show it doesn't work better on larger systems. The original testing showed it worked better on smaller systems - is there still scope to include this for smaller configs? If not, thanks for taking the time to write the patch and investigate whether changes in this area would help. Not every performance patch improves things, but that doesn't mean we shouldn't try... Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory leak)
On Thu, Feb 03, 2005 at 07:25:55PM +0900, ITAGAKI Takahiro wrote: Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. Ok, finally got back into the office and was able to run 1 set of tests. So the new baseline result with 8.0.1: http://www.osdl.org/projects/dbt2dev/results/dev4-010/309/ Throughput: 3639.97 Results with the patch but open_direct not set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/308/ Throughput: 3494.72 Results with the patch and open_direct set: http://www.osdl.org/projects/dbt2dev/results/dev4-010/312/ Throughput: 3489.69 You can verify that the wall_sync_method is set to open_direct under the database parameters link, but I'm wondering if I missed something. It looks a little odd the the performance dropped. Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
This thread has been saved for the 8.1 release: http://momjian.postgresql.org/cgi-bin/pgpatches2 --- ITAGAKI Takahiro wrote: Hello, all. I think that there is room for improvement in WAL. Here is a patch for it. - Multiple pages are written in one write() if it is contiguous. - Add 'open_direct' to wal_sync_method. WAL writer writes one page in one write(). This is not efficient when wal_sync_method is 'open_sync', because the writer waits for IO completions at each write(). Multipage-writer can reduce syscalls and improve IO throughput. 'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous writing, so it may show the tendency like open_sync. But maybe it can reduce memcpy() and save OS's disk cache memory. I benchmarked this patch with pgbench. It works well and improved 50% of tps on my machine. WAL seems to be bottle-neck on machines with poor disks. This patch has not yet tested enough. I would like it to be examined much and taken into PostgreSQL. There are still many TODOs: * Is this logic really correct? - O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time. - Consider to use writev() instead of write(). Buffers are noncontiguous when WAL ring buffer rotates. - If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0. Sincerely, ITAGAKI Takahiro -- pgbench result -- $ ./pgbench -s 100 -c 50 -t 400 - 8.0.0 default + fsync: tps = 20.630632 (including connections establishing) tps = 20.636768 (excluding connections establishing) - multipage-writer + open_direct: tps = 33.761917 (including connections establishing) tps = 33.778320 (excluding connections establishing) Environment: OS : Linux kernel 2.6.9 CPU: Pentium 4 3GHz disk : ATA 5400rpm (Data and WAL are placed on same partition.) memory : 1GB config : shared_buffers=1, wal_buffers=256, XLOG_SEG_SIZE=256MB, checkpoint_segment=4 --- ITAGAKI Takahiro [EMAIL PROTECTED] NTT Cyber Space Laboratories Nippon Telegraph and Telephone Corporation. [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory
This has been saved for the 8.1 release: http://momjian.postgresql.org/cgi-bin/pgpatches2 --- ITAGAKI Takahiro wrote: Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. 1. Fix update timing of Write-curridx. (pointed by Tom) Change to update it soon after write(). 2. Fix buffer alignment routine on 64bit cpu. (pointed by Mark) I checked it on Xeon EM64T and it worked properly, but I don't have IA64... BTW, I found memory leak in BootStrapXLOG(). The buffer allocated by malloc() is not free()ed. ISSUE_BOOTSTRAP_MEMORYLEAK in this patch points out it. (But this leak is not serious, because this function is called only once.) ITAGAKI Takahiro [ Attachment, skipping... ] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer (+ memory leak)
Hello everyone. I fixed two bugs in the patch that I sent before. Check and test new one, please. 1. Fix update timing of Write-curridx. (pointed by Tom) Change to update it soon after write(). 2. Fix buffer alignment routine on 64bit cpu. (pointed by Mark) I checked it on Xeon EM64T and it worked properly, but I don't have IA64... BTW, I found memory leak in BootStrapXLOG(). The buffer allocated by malloc() is not free()ed. ISSUE_BOOTSTRAP_MEMORYLEAK in this patch points out it. (But this leak is not serious, because this function is called only once.) ITAGAKI Takahiro xlog.c.diff Description: Binary data ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
Hi everyone, I gave this a try with DBT-2, but got a core dump on our ia64 system. I hope this isn't a random thing, like I ran into previously. Maybe I'll try again, but postgres dumped core. Binary and core here: http://developer.osdl.org/markw/pgsql/core/2morefiles.tar.bz2 #0 FunctionCall2 (flinfo=0x0, arg1=0, arg2=0) at fmgr.c:1141 1141result = FunctionCallInvoke(fcinfo); (gdb) bt #0 FunctionCall2 (flinfo=0x0, arg1=0, arg2=0) at fmgr.c:1141 #1 0x403bdb80 in FunctionCall2 (flinfo=Cannot access memory at address 0x0 ) at fmgr.c:1141 #2 0x403bdb80 in FunctionCall2 (flinfo=Cannot access memory at address 0x0 ) at fmgr.c:1141 Over and over again, so I'll keep the backtrace short. Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer
Thanks for testing, Mark! I gave this a try with DBT-2, but got a core dump on our ia64 system. I hope this isn't a random thing, like I ran into previously. Maybe I'll try again, but postgres dumped core. Sorry, this seems to be my patch's bug. Which datatype did you compile with? LP64, ILP64, or LLP64? If you used LLP64, I think the cause is buffer alignment routine because of sizeof(long) != sizeof(void*). I'll fix it soon... ITAGAKI Takahiro ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] [HACKERS] WAL: O_DIRECT and multipage-writer
Hmm... I don't remember specifying a datatype. I suppose whatever the default one is. :) I'll be happy to test again, just let me know. Mark On Fri, Jan 28, 2005 at 06:28:32AM +0900, ITAGAKI Takahiro wrote: Thanks for testing, Mark! I gave this a try with DBT-2, but got a core dump on our ia64 system. I hope this isn't a random thing, like I ran into previously. Maybe I'll try again, but postgres dumped core. Sorry, this seems to be my patch's bug. Which datatype did you compile with? LP64, ILP64, or LLP64? If you used LLP64, I think the cause is buffer alignment routine because of sizeof(long) != sizeof(void*). I'll fix it soon... ITAGAKI Takahiro ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] WAL: O_DIRECT and multipage-writer
Tom Lane [EMAIL PROTECTED] writes: What does XLOG_EXTRA_BUFFERS accomplish? It is because the buffer passed to direct-io must be aligned to the same size of filesystem page, typically 4KB. Buffers allocated with ShmemInitStruct are not necessarily aligned, so we need to allocate extra buffers and align them by ourself. Also, I'm worried that you broke something by not updating Write-curridx immediately in XLogWrite. There certainly isn't going to be any measurable performance boost from keeping that in a local variable, so why take any risk? Keeping Write-curridx in a local variable is not for performance, but for writing multiple pages at once. I think it is ok to update Write-curridx at the end of XLogWrite, because XLogCtl.Write.curridx will be touched by only one process at a time. Process-shared variables are not modified until XLogWrite is completed, so that other backends can write same contents later even if the backend in XLogWrite is crushed. Sincerely, ITAGAKI Takahiro ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] WAL: O_DIRECT and multipage-writer
Hello, all. I think that there is room for improvement in WAL. Here is a patch for it. - Multiple pages are written in one write() if it is contiguous. - Add 'open_direct' to wal_sync_method. WAL writer writes one page in one write(). This is not efficient when wal_sync_method is 'open_sync', because the writer waits for IO completions at each write(). Multipage-writer can reduce syscalls and improve IO throughput. 'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous writing, so it may show the tendency like open_sync. But maybe it can reduce memcpy() and save OS's disk cache memory. I benchmarked this patch with pgbench. It works well and improved 50% of tps on my machine. WAL seems to be bottle-neck on machines with poor disks. This patch has not yet tested enough. I would like it to be examined much and taken into PostgreSQL. There are still many TODOs: * Is this logic really correct? - O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time. - Consider to use writev() instead of write(). Buffers are noncontiguous when WAL ring buffer rotates. - If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0. Sincerely, ITAGAKI Takahiro -- pgbench result -- $ ./pgbench -s 100 -c 50 -t 400 - 8.0.0 default + fsync: tps = 20.630632 (including connections establishing) tps = 20.636768 (excluding connections establishing) - multipage-writer + open_direct: tps = 33.761917 (including connections establishing) tps = 33.778320 (excluding connections establishing) Environment: OS : Linux kernel 2.6.9 CPU: Pentium 4 3GHz disk : ATA 5400rpm (Data and WAL are placed on same partition.) memory : 1GB config : shared_buffers=1, wal_buffers=256, XLOG_SEG_SIZE=256MB, checkpoint_segment=4 --- ITAGAKI Takahiro [EMAIL PROTECTED] NTT Cyber Space Laboratories Nippon Telegraph and Telephone Corporation. xlog.diff Description: Binary data ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq