Re: AW: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Zeugswetter Andreas SB <[EMAIL PROTECTED]> writes: >> HPUX has usleep, but the man page says >> >> The usleep() function is included for its historical usage. The >> setitimer() function is preferred over this function. > I doubt that setitimer has microsecond precision on HPUX. Well, if you insist on beating this into the ground: $ cat timetest.c #include #include #include int main(int argc, char** argv) { int i; int delay; delay = atoi(argv[1]); for (i = 0; i < 1000; i++) usleep(delay); return 0; } $ gcc -O -Wall timetest.c $ time ./a.out 1 real0m20.02s user0m0.04s sys 0m0.09s $ time ./a.out 1000 real0m20.04s user0m0.04s sys 0m0.09s $ time ./a.out 1 real0m20.01s user0m0.03s sys 0m0.08s $ time ./a.out 2 real0m30.03s user0m0.04s sys 0m0.09s $ $ cat timetest2.c #include #include #include #include #include typedef void (*pqsigfunc) (int); pqsigfunc pqsignal(int signo, pqsigfunc func) { struct sigaction act, oact; act.sa_handler = func; sigemptyset(&act.sa_mask); act.sa_flags = 0; if (signo != SIGALRM) act.sa_flags |= SA_RESTART; if (sigaction(signo, &act, &oact) < 0) return SIG_ERR; return oact.sa_handler; } void catch_alarm(int sig) { } int main(int argc, char** argv) { int i; struct itimerval iv; int delay; delay = atoi(argv[1]); pqsignal(SIGALRM, catch_alarm); for (i = 0; i < 1000; i++) { iv.it_value.tv_sec = 0; iv.it_value.tv_usec = delay; iv.it_interval.tv_sec = 0; iv.it_interval.tv_usec = 0; setitimer(ITIMER_REAL, &iv, NULL); pause(); } return 0; } $ gcc -O -Wall timetest2.c $ time ./a.out 1 real0m20.04s user0m0.01s sys 0m0.05s $ time ./a.out 1000 real0m20.02s user0m0.01s sys 0m0.06s $ time ./a.out 1 real0m20.01s user0m0.01s sys 0m0.05s $ time ./a.out 2 real0m30.01s user0m0.01s sys 0m0.06s $ The usleep man page implies that usleep is actually implemented as a setitimer call, which would explain the interchangeable results. In any case, neither one is useful for timing sub-clock-tick intervals; in fact they're worse than select(). Anyone else want to try these examples on other platforms? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
AW: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
> >> It's great as long as you never block, but it sucks for making things > >> wait, because the wait interval will be some multiple of 10 msec rather > >> than just the time till the lock comes free. > > > On the AIX platform usleep (3) is able to really sleep microseconds without > > busying the cpu when called for more than approx. 100 us (the longer the interval, > > the less busy the cpu gets) . > > Would this not be ideal for spin_lock, or is usleep not very common ? > > Linux sais it is in the BSD 4.3 standard. > > HPUX has usleep, but the man page says > > The usleep() function is included for its historical usage. The > setitimer() function is preferred over this function. I doubt that setitimer has microsecond precision on HPUX. > In any case, I would expect that all these functions offer accuracy > no better than the scheduler's regular clock cycle (~ 100Hz) on most > kernels. Not on AIX, and I don't beleive that for the majority of other UNIX platforms eighter. I do however suspect, that some implementations need a busy loop, which would, if at all, only be acceptable on an SMP system. Andreas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Zeugswetter Andreas SB <[EMAIL PROTECTED]> writes: >> It's great as long as you never block, but it sucks for making things >> wait, because the wait interval will be some multiple of 10 msec rather >> than just the time till the lock comes free. > On the AIX platform usleep (3) is able to really sleep microseconds without > busying the cpu when called for more than approx. 100 us (the longer the interval, > the less busy the cpu gets) . > Would this not be ideal for spin_lock, or is usleep not very common ? > Linux sais it is in the BSD 4.3 standard. HPUX has usleep, but the man page says The usleep() function is included for its historical usage. The setitimer() function is preferred over this function. In any case, I would expect that all these functions offer accuracy no better than the scheduler's regular clock cycle (~ 100Hz) on most kernels. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
> On 3/16/01, 11:10:34 AM, The Hermit Hacker <[EMAIL PROTECTED]> wrote > regarding Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC : > > > But, with shared libraries, are you really pulling in a "whole > > thread-support library"? My understanding of shared libraries (altho it > > may be totally off) was that instead of pulling in a whole library, you > > pulled in the bits that you needed, pretty much as you needed them ... * Larry Rosenman <[EMAIL PROTECTED]> [010316 10:02] wrote: > Yes, you are. On UnixWare, you need to add -Kthread, which CHANGES a LOT > of primitives to go through threads wrappers and scheduling. > > See the doc on the http://UW7DOC.SCO.COM or http://www.lerctr.org:457/ > web pages. > > Also, some functions are NOT available without the -Kthread or -Kpthread > directives. This is true on FreeBSD as well. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
[ Charset ISO-8859-1 unsupported, converting... ] > Yes, you are. On UnixWare, you need to add -Kthread, which CHANGES a LOT > of primitives to go through threads wrappers and scheduling. This was my concern; the change that happens on startup and lib calls when thread support comes in through a library. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Tom Lane <[EMAIL PROTECTED]> writes: > Alfred Perlstein <[EMAIL PROTECTED]> writes: > >> definitely need before considering this is to replace the existing > >> spinlock mechanism with something more efficient. > > > What sort of problems are you seeing with the spinlock code? > > It's great as long as you never block, but it sucks for making things > wait, because the wait interval will be some multiple of 10 msec rather > than just the time till the lock comes free. Plus, using select() for the timeout is putting you into the kernel multiple times in a short period, and causing a reschedule everytime, which is a big lose. This was discussed in the linux-kernel thread that was referred to a few days ago. > We've speculated about using Posix semaphores instead, on platforms > where those are available. I think Bruce was concerned about the > possible overhead of pulling in a whole thread-support library just to > get semaphores, however. Are Posix semaphores faster by definition than SysV semaphores (which are described as "slow" in the source comments)? I can't see how they'd be much faster unless locking/unlocking an uncontended semaphore avoids a system call, in which case you might run into the same problems with userland backoff... Just looked, and on Linux pthreads and POSIX semaphores are both already in the C library. Unfortunately, the Linux C library doesn't support the PROCESS_SHARED attribute for either pthreads mutexes or POSIX semaphores. Grumble. What's the point then? Just some ignorant ramblings, thanks for listening... -Doug ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Yes, you are. On UnixWare, you need to add -Kthread, which CHANGES a LOT of primitives to go through threads wrappers and scheduling. See the doc on the http://UW7DOC.SCO.COM or http://www.lerctr.org:457/ web pages. Also, some functions are NOT available without the -Kthread or -Kpthread directives. LER >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 3/16/01, 11:10:34 AM, The Hermit Hacker <[EMAIL PROTECTED]> wrote regarding Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC : > On Fri, 16 Mar 2001, Tom Lane wrote: > > Alfred Perlstein <[EMAIL PROTECTED]> writes: > > >> definitely need before considering this is to replace the existing > > >> spinlock mechanism with something more efficient. > > > > > What sort of problems are you seeing with the spinlock code? > > > > It's great as long as you never block, but it sucks for making things > > wait, because the wait interval will be some multiple of 10 msec rather > > than just the time till the lock comes free. > > > > We've speculated about using Posix semaphores instead, on platforms > > where those are available. I think Bruce was concerned about the > > possible overhead of pulling in a whole thread-support library just to > > get semaphores, however. > But, with shared libraries, are you really pulling in a "whole > thread-support library"? My understanding of shared libraries (altho it > may be totally off) was that instead of pulling in a whole library, you > pulled in the bits that you needed, pretty much as you needed them ... > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
> >> definitely need before considering this is to replace the existing > >> spinlock mechanism with something more efficient. > > > What sort of problems are you seeing with the spinlock code? > > It's great as long as you never block, but it sucks for making things I like optimistic approaches :-) > wait, because the wait interval will be some multiple of 10 msec rather > than just the time till the lock comes free. On the AIX platform usleep (3) is able to really sleep microseconds without busying the cpu when called for more than approx. 100 us (the longer the interval, the less busy the cpu gets) . Would this not be ideal for spin_lock, or is usleep not very common ? Linux sais it is in the BSD 4.3 standard. postgres@s0188000zeu:/usr/postgres> time ustest # with 100 us real0m10.95s user0m0.40s sys 0m0.74s postgres@s0188000zeu:/usr/postgres> time ustest # with 10 us real0m18.62s user0m1.37s sys 0m5.73s Andreas PS: sorry off for weekend now :-) Current looks good on AIX. ustest.c ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Larry Rosenman <[EMAIL PROTECTED]> writes: >> But, with shared libraries, are you really pulling in a "whole >> thread-support library"? > Yes, you are. On UnixWare, you need to add -Kthread, which CHANGES a LOT > of primitives to go through threads wrappers and scheduling. Right, it's not so much that we care about referencing another shlib, it's that -lpthreads may cause you to get a whole new thread-aware version of libc, with attendant overhead that we don't need or want. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
RE: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
> We've speculated about using Posix semaphores instead, on platforms For spinlocks we should use pthread mutex-es. > where those are available. I think Bruce was concerned about the And nutex-es are more portable than semaphores. Vadim ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
On Fri, 16 Mar 2001, Tom Lane wrote: > Alfred Perlstein <[EMAIL PROTECTED]> writes: > >> definitely need before considering this is to replace the existing > >> spinlock mechanism with something more efficient. > > > What sort of problems are you seeing with the spinlock code? > > It's great as long as you never block, but it sucks for making things > wait, because the wait interval will be some multiple of 10 msec rather > than just the time till the lock comes free. > > We've speculated about using Posix semaphores instead, on platforms > where those are available. I think Bruce was concerned about the > possible overhead of pulling in a whole thread-support library just to > get semaphores, however. But, with shared libraries, are you really pulling in a "whole thread-support library"? My understanding of shared libraries (altho it may be totally off) was that instead of pulling in a whole library, you pulled in the bits that you needed, pretty much as you needed them ... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Alfred Perlstein <[EMAIL PROTECTED]> writes: >> definitely need before considering this is to replace the existing >> spinlock mechanism with something more efficient. > What sort of problems are you seeing with the spinlock code? It's great as long as you never block, but it sucks for making things wait, because the wait interval will be some multiple of 10 msec rather than just the time till the lock comes free. We've speculated about using Posix semaphores instead, on platforms where those are available. I think Bruce was concerned about the possible overhead of pulling in a whole thread-support library just to get semaphores, however. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
* Tom Lane <[EMAIL PROTECTED]> [010316 08:16] wrote: > Alfred Perlstein <[EMAIL PROTECTED]> writes: > >> couldn't the syncer process cache opened files? is there any problem I > >> didn't consider ? > > > 1) IPC latency, the amount of time it takes to call fsync will > >increase by at least two context switches. > > > 2) a working set (number of files needed to be fsync'd) that > >is larger than the amount of files you wish to keep open. > > These days we're really only interested in fsync'ing the current WAL > log file, so working set doesn't seem like a problem anymore. However > context-switch latency is likely to be a big problem. One thing we'd > definitely need before considering this is to replace the existing > spinlock mechanism with something more efficient. What sort of problems are you seeing with the spinlock code? > Vadim has designed the WAL stuff in such a way that a separate > writer/syncer process would be easy to add; in fact it's almost that way > already, in that any backend can write or sync data that's been added > to the queue by any other backend. The question is whether it'd > actually buy anything to have another process. Good stuff to experiment > with for 7.2. The delayed/coallecesed (sp?) fsync looked interesting. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Alfred Perlstein <[EMAIL PROTECTED]> writes: >> couldn't the syncer process cache opened files? is there any problem I >> didn't consider ? > 1) IPC latency, the amount of time it takes to call fsync will >increase by at least two context switches. > 2) a working set (number of files needed to be fsync'd) that >is larger than the amount of files you wish to keep open. These days we're really only interested in fsync'ing the current WAL log file, so working set doesn't seem like a problem anymore. However context-switch latency is likely to be a big problem. One thing we'd definitely need before considering this is to replace the existing spinlock mechanism with something more efficient. Vadim has designed the WAL stuff in such a way that a separate writer/syncer process would be easy to add; in fact it's almost that way already, in that any backend can write or sync data that's been added to the queue by any other backend. The question is whether it'd actually buy anything to have another process. Good stuff to experiment with for 7.2. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
* Xu Yifeng <[EMAIL PROTECTED]> [010316 01:15] wrote: > Hello Alfred, > > Friday, March 16, 2001, 3:21:09 PM, you wrote: > > AP> * Xu Yifeng <[EMAIL PROTECTED]> [010315 22:25] wrote: > >> > >> Could anyone consider fork a syncer process to sync data to disk ? > >> build a shared sync queue, when a daemon process want to do sync after > >> write() is called, just put a sync request to the queue. this can release > >> process from blocked on writing as soon as possible. multipile sync > >> request for one file can be merged when the request is been inserting to > >> the queue. > > AP> I suggested this about a year ago. :) > > AP> The problem is that you need that process to potentially open and close > AP> many files over and over. > > AP> I still think it's somewhat of a good idea. > > I am not a DBMS guru. Hah, same here. :) > couldn't the syncer process cache opened files? is there any problem I > didn't consider ? 1) IPC latency, the amount of time it takes to call fsync will increase by at least two context switches. 2) a working set (number of files needed to be fsync'd) that is larger than the amount of files you wish to keep open. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC
Hello Alfred, Friday, March 16, 2001, 3:21:09 PM, you wrote: AP> * Xu Yifeng <[EMAIL PROTECTED]> [010315 22:25] wrote: >> >> Could anyone consider fork a syncer process to sync data to disk ? >> build a shared sync queue, when a daemon process want to do sync after >> write() is called, just put a sync request to the queue. this can release >> process from blocked on writing as soon as possible. multipile sync >> request for one file can be merged when the request is been inserting to >> the queue. AP> I suggested this about a year ago. :) AP> The problem is that you need that process to potentially open and close AP> many files over and over. AP> I still think it's somewhat of a good idea. I am not a DBMS guru. couldn't the syncer process cache opened files? is there any problem I didn't consider ? -- Best regards, Xu Yifeng ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl