Re: [PATCHES] [HACKERS] Load distributed checkpoint
On 2/26/07, ITAGAKI Takahiro <[EMAIL PROTECTED]> wrote: Josh Berkus wrote: > Can I have a copy of the patch to add to the Sun testing queue? This is the revised version of the patch. Delay factors in checkpoints can be specified by checkpoint_write_percent, checkpoint_nap_percent and checkpoint_sync_percent. They are relative to checkpoint_timeout. Also, checking of archive_timeout during checkpoints and some error handling routines were added. One of the issues we had during testing with original patch was db stop not working properly. I think you coded something to do a stop checkpoint in immediately but if a checkpoint is already in progress at that time, it would take its own time to complete. Does this patch resolve that issue? Also, is it based on pg82stable or HEAD? regards, inaam Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Load distributed checkpoint
On 12/19/06, ITAGAKI Takahiro <[EMAIL PROTECTED]> wrote: "Takayuki Tsunakawa" <[EMAIL PROTECTED]> wrote: > I performed some simple tests, and I'll show the results below. > (1) The default case > 235 80 226 77 240 > (2) No write case > 242 250 244 253 280 > (3) No checkpoint case > 229 252 256 292 276 > (4) No fsync() case > 236 112 215 216 221 > (5) No write by PostgreSQL, but fsync() by another program case > 9 223 260 283 292 > (6) case (5) + O_SYNC by write_fsync > 97 114 126 112 125 > (7) O_SYNC case > 182 103 41 50 74 I posted a patch to PATCHES. Please try out it. It does write() smoothly, but fsync() at a burst. I suppose the result will be between (3) and (5). Itagaki, Did you had a chance to look into this any further? We, at EnterpriseDB, have done some testing on this patch (dbt2 runs) and it looks like we are getting the desired results, particularly so when we spread out both sync and write phases. -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Load distributed checkpoint
No, I've not tried yet. Inaam-san told me that Linux had a few I/O schedulers but I'm not familiar with them. I'll find information about them (how to change the scheduler settings) and try the same test. I am sorry, your response just slipped by me. The docs for RHEL (I believe you are running RHEL which has 2.6.9 kernel) say that it does support selectable IO scheduler. http://www.redhat.com/rhel/details/limits/ I am not sure where else to look for scheduler apart from /sys regards, inaam
Re: [HACKERS] Load distributed checkpoint
On 12/22/06, Takayuki Tsunakawa <[EMAIL PROTECTED]> wrote: From: Inaam Rana > Which IO Shceduler (elevator) you are using? Elevator? Sorry, I'm not familiar with the kernel implementation, so I don't what it is. My Linux distribution is Red Hat Enterprise Linux 4.0for AMD64/EM64T, and the kernel is 2.6.9-42.ELsmp. I probably havn't changed any kernel settings, except for IPC settings to run PostgreSQL. There are four IO schedulers in Linux. Anticipatory, CFQ (default), deadline, and noop. For typical OLTP type loads generally deadline is recommended. If you are constrained on CPU and you have a good controller then its better to use noop. Deadline attempts to merge requests by maintaining two red black trees in sector sort order and it also ensures that a request is serviced in given time by using FIFO. I don't expect it to do the magic but was wondering that it may dilute the issue of fsync() elbowing out WAL writes. You can look into /sys/block//queue/scheduler to see which scheduler you are using. regards, inaam -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Load distributed checkpoint
On 12/22/06, Takayuki Tsunakawa <[EMAIL PROTECTED]> wrote: From: "Takayuki Tsunakawa" <[EMAIL PROTECTED]> > (5) (4) + /proc/sys/vm/dirty* tuning > dirty_background_ratio is changed from 10 to 1, and dirty_ratio is > changed from 40 to 4. > > 308 349 84 349 84 Sorry, I forgot to include the result when using Itagaki-san's patch. The patch showd the following tps for case (5). 323 350 340 59 225 The best response time was 4 msec, and the worst one was 16 seconds. Which IO Shceduler (elevator) you are using? -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Load distributed checkpoint
On 12/20/06, Takayuki Tsunakawa <[EMAIL PROTECTED]> wrote: [Conclusion] I believe that the problem cannot be solved in a real sense by avoiding fsync/fdatasync(). We can't ignore what commercial databases have done so far. The kernel does as much as he likes when PostgreSQL requests him to fsync(). I am new to the community and am very interested in the tests that you have done. I am also working on resolving the sudden IO spikes at checkpoint time. I agree with you that fsync() is the core issue here. Being a new member I was wondering if someone on this list has done testing with O_DIRECT and/or O_SYNC for datafiles as that seems to be the most logical way of dealing with fsync() flood at checkpoint time. If so, I'll be very interested in the results. As mentioned in this thread that a single bgwriter with O_DIRECT will not be able to keep pace with cleaning effort causing backend writes. I think (i.e. IMHO) multiple bgwriters and/or AsyncIO with O_DIRECT can resolve this issue. Talking of bgwriter_* parameters I think we are missing a crucial internal counter i.e. number of dirty pages. How much work bgwriter has to do at each wakeup call should be a function of total buffers and currently dirty buffers. Relying on both these values instead of just one static NBuffers should allow bgwriter to adapt more quickly to workload changes and ensure that not much work is accumulated for checkpoint. -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Load distributed checkpoint
I wonder how the other big DBMS, IBM DB2, handles this. Is Itagaki-san referring to DB2? DB2 would also open data files with O_SYNC option and page_cleaners (counterparts of bgwriter) would exploit AIO if available on the system. Inaam Rana EnterpriseDB http://www.enterprisedb.com