Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-20 Thread Bruce Momjian

Added to TODO:

* Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
* Allow multiple blocks to be written to WAL with one write()  


 Bruce Momjian [EMAIL PROTECTED] writes:
  It is hard for me to imagine O_* being slower than fsync(),
 
 Not hard at all --- if we're writing multiple xlog blocks per
 transaction, then O_* constrains the sequence of operations more
 than we really want.  Changing xlog.c to combine writes as much
 as possible would reduce this problem, but not eliminate it.
 
 Besides, the entire object of this exercise is to work around
 an unexpected inefficiency in some kernels' implementations of
 fsync/fdatasync (viz, scanning over lots of not-dirty buffers).
 Who's to say that there might not be inefficiencies in other
 platforms' implementations of the O_* options?
 
   regards, tom lane
 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



AW: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-19 Thread Zeugswetter Andreas SB


  It's great as long as you never block, but it sucks for making things
  wait, because the wait interval will be some multiple of 10 msec rather
  than just the time till the lock comes free.
 
  On the AIX platform usleep (3) is able to really sleep microseconds without 
  busying the cpu when called for more than approx. 100 us (the longer the interval,
  the less busy the cpu gets) .
  Would this not be ideal for spin_lock, or is usleep not very common ?
  Linux sais it is in the BSD 4.3 standard.
 
 HPUX has usleep, but the man page says
 
  The usleep() function is included for its historical usage. The
  setitimer() function is preferred over this function.

I doubt that setitimer has microsecond precision on HPUX.

 In any case, I would expect that all these functions offer accuracy
 no better than the scheduler's regular clock cycle (~ 100Hz) on most
 kernels.

Not on AIX, and I don't beleive that for the majority of other UNIX platforms eighter. 
I do however suspect, that some implementations need a busy loop, which would, 
if at all, only be acceptable on an SMP system.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-19 Thread Bruce Momjian

 * William K. Volkman [EMAIL PROTECTED] [010318 11:56] wrote:
  The Hermit Hacker wrote:
   
   But, with shared libraries, are you really pulling in a "whole
   thread-support library"?  My understanding of shared libraries (altho it
   may be totally off) was that instead of pulling in a whole library, you
   pulled in the bits that you needed, pretty much as you needed them ...
  
  Just by making a thread call libc changes personality to use thread
  safe routines (I.E. add mutex locking).  Use one thread feature, get
  the whole set...which may not be that bad.
 
 Actually it can be pretty bad.  Locked bus cycles needed for mutex
 operations are very, very expensive, not something you want to do
 unless you really really need to do it.

And don't forget buggy implementations.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread William K. Volkman

The Hermit Hacker wrote:
 
 But, with shared libraries, are you really pulling in a "whole
 thread-support library"?  My understanding of shared libraries (altho it
 may be totally off) was that instead of pulling in a whole library, you
 pulled in the bits that you needed, pretty much as you needed them ...

Just by making a thread call libc changes personality to use thread
safe routines (I.E. add mutex locking).  Use one thread feature, get
the whole set...which may not be that bad.
-- 
William K. Volkman.
CIO - H.I.S. Financial Services Corporation.
102 S. Tejon, Ste. 920, Colorado Springs, CO 80903
Phone: 719-633-6942  Fax: 719-633-7006  Cell: 719-330-8423

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein

* William K. Volkman [EMAIL PROTECTED] [010318 11:56] wrote:
 The Hermit Hacker wrote:
  
  But, with shared libraries, are you really pulling in a "whole
  thread-support library"?  My understanding of shared libraries (altho it
  may be totally off) was that instead of pulling in a whole library, you
  pulled in the bits that you needed, pretty much as you needed them ...
 
 Just by making a thread call libc changes personality to use thread
 safe routines (I.E. add mutex locking).  Use one thread feature, get
 the whole set...which may not be that bad.

Actually it can be pretty bad.  Locked bus cycles needed for mutex
operations are very, very expensive, not something you want to do
unless you really really need to do it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 Just by making a thread call libc changes personality to use thread
 safe routines (I.E. add mutex locking).  Use one thread feature, get
 the whole set...which may not be that bad.

 Actually it can be pretty bad.  Locked bus cycles needed for mutex
 operations are very, very expensive, not something you want to do
 unless you really really need to do it.

It'd be interesting to try to get some numbers about the actual cost
of using a thread-aware libc, on platforms where there's a difference.
Shouldn't be that hard to build a postgres executable with the proper
library and run some benchmarks ... anyone care to try?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Larry Rosenman

* Tom Lane [EMAIL PROTECTED] [010318 14:55]:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  Just by making a thread call libc changes personality to use thread
  safe routines (I.E. add mutex locking).  Use one thread feature, get
  the whole set...which may not be that bad.
 
  Actually it can be pretty bad.  Locked bus cycles needed for mutex
  operations are very, very expensive, not something you want to do
  unless you really really need to do it.
 
 It'd be interesting to try to get some numbers about the actual cost
 of using a thread-aware libc, on platforms where there's a difference.
 Shouldn't be that hard to build a postgres executable with the proper
 library and run some benchmarks ... anyone care to try?
I can get the code compiled, but don't have the skills to generate
a test case worthy of anything

LER

 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Alfred Perlstein

* Larry Rosenman [EMAIL PROTECTED] [010318 14:17] wrote:
 * Tom Lane [EMAIL PROTECTED] [010318 14:55]:
  Alfred Perlstein [EMAIL PROTECTED] writes:
   Just by making a thread call libc changes personality to use thread
   safe routines (I.E. add mutex locking).  Use one thread feature, get
   the whole set...which may not be that bad.
  
   Actually it can be pretty bad.  Locked bus cycles needed for mutex
   operations are very, very expensive, not something you want to do
   unless you really really need to do it.
  
  It'd be interesting to try to get some numbers about the actual cost
  of using a thread-aware libc, on platforms where there's a difference.
  Shouldn't be that hard to build a postgres executable with the proper
  library and run some benchmarks ... anyone care to try?
 I can get the code compiled, but don't have the skills to generate
 a test case worthy of anything

There's a 'make test' or something ('regression' maybe?) target that
runs a suite of tests on the database, you could use that as a
bench/timer, you could also try mysql's "crashme" script.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-18 Thread Tom Lane

Larry Rosenman [EMAIL PROTECTED] writes:
 I can get the code compiled, but don't have the skills to generate
 a test case worthy of anything

contrib/pgbench would do as a first cut.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Xu Yifeng

Hello Alfred,

Friday, March 16, 2001, 3:21:09 PM, you wrote:

AP * Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:

 Could anyone consider fork a syncer process to sync data to disk ?
 build a shared sync queue, when a daemon process want to do sync after
 write() is called, just put a sync request to the queue. this can release
 process from blocked on writing as soon as possible. multipile sync
 request for one file can be merged when the request is been inserting to
 the queue.

AP I suggested this about a year ago. :)

AP The problem is that you need that process to potentially open and close
AP many files over and over.

AP I still think it's somewhat of a good idea.

I am not a DBMS guru.
couldn't the syncer process cache opened files? is there any problem I
didn't consider ?

-- 
Best regards,
Xu Yifeng



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Xu Yifeng [EMAIL PROTECTED] [010316 01:15] wrote:
 Hello Alfred,
 
 Friday, March 16, 2001, 3:21:09 PM, you wrote:
 
 AP * Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:
 
  Could anyone consider fork a syncer process to sync data to disk ?
  build a shared sync queue, when a daemon process want to do sync after
  write() is called, just put a sync request to the queue. this can release
  process from blocked on writing as soon as possible. multipile sync
  request for one file can be merged when the request is been inserting to
  the queue.
 
 AP I suggested this about a year ago. :)
 
 AP The problem is that you need that process to potentially open and close
 AP many files over and over.
 
 AP I still think it's somewhat of a good idea.
 
 I am not a DBMS guru.

Hah, same here. :)

 couldn't the syncer process cache opened files? is there any problem I
 didn't consider ?

1) IPC latency, the amount of time it takes to call fsync will
   increase by at least two context switches.

2) a working set (number of files needed to be fsync'd) that
   is larger than the amount of files you wish to keep open.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



AW: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Zeugswetter Andreas SB


 Okay ... we can fall back to O_FSYNC if we don't see either of the
 others.  No problem.  Any other weird cases out there?  I think Andreas
 might've muttered something about AIX but I'm not sure now.

You can safely use O_DSYNC on AIX, the only special on AIX is,
that it does not make a speed difference to O_SYNC. This is imho
because the jfs only needs one sync write to the jfs journal for meta info 
in eighter case (so that nobody misunderstands: both perform excellent).

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Bruce Momjian

  Could anyone consider fork a syncer process to sync data to disk ?
  build a shared sync queue, when a daemon process want to do sync after
  write() is called, just put a sync request to the queue. this can release
  process from blocked on writing as soon as possible. multipile sync
  request for one file can be merged when the request is been inserting to
  the queue.
 
 I suggested this about a year ago. :)
 
 The problem is that you need that process to potentially open and close
 many files over and over.
 
 I still think it's somewhat of a good idea.

I like the idea too, but people want the transaction to return COMMIT
only after data has been fsync'ed so I don't see a big win.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Bruce Momjian [EMAIL PROTECTED] [010316 07:11] wrote:
   Could anyone consider fork a syncer process to sync data to disk ?
   build a shared sync queue, when a daemon process want to do sync after
   write() is called, just put a sync request to the queue. this can release
   process from blocked on writing as soon as possible. multipile sync
   request for one file can be merged when the request is been inserting to
   the queue.
  
  I suggested this about a year ago. :)
  
  The problem is that you need that process to potentially open and close
  many files over and over.
  
  I still think it's somewhat of a good idea.
 
 I like the idea too, but people want the transaction to return COMMIT
 only after data has been fsync'ed so I don't see a big win.

This isn't simply handing off the sync to this other process, it requires
an ack from the syncer before returning 'COMMIT'.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Ken Hirsch

From: "Bruce Momjian" [EMAIL PROTECTED]
   Could anyone consider fork a syncer process to sync data to disk ?
   build a shared sync queue, when a daemon process want to do sync after
   write() is called, just put a sync request to the queue. this can
release
   process from blocked on writing as soon as possible. multipile sync
   request for one file can be merged when the request is been inserting
to
   the queue.
 
  I suggested this about a year ago. :)
 
  The problem is that you need that process to potentially open and close
  many files over and over.
 
  I still think it's somewhat of a good idea.

 I like the idea too, but people want the transaction to return COMMIT
 only after data has been fsync'ed so I don't see a big win.

For a log file on a busy system, this could improve throughput a lot--batch
commit.  You end up with fewer than one fsync() per transaction.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 couldn't the syncer process cache opened files? is there any problem I
 didn't consider ?

 1) IPC latency, the amount of time it takes to call fsync will
increase by at least two context switches.

 2) a working set (number of files needed to be fsync'd) that
is larger than the amount of files you wish to keep open.

These days we're really only interested in fsync'ing the current WAL
log file, so working set doesn't seem like a problem anymore.  However
context-switch latency is likely to be a big problem.  One thing we'd
definitely need before considering this is to replace the existing
spinlock mechanism with something more efficient.

Vadim has designed the WAL stuff in such a way that a separate
writer/syncer process would be easy to add; in fact it's almost that way
already, in that any backend can write or sync data that's been added
to the queue by any other backend.  The question is whether it'd
actually buy anything to have another process.  Good stuff to experiment
with for 7.2.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010316 08:16] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  couldn't the syncer process cache opened files? is there any problem I
  didn't consider ?
 
  1) IPC latency, the amount of time it takes to call fsync will
 increase by at least two context switches.
 
  2) a working set (number of files needed to be fsync'd) that
 is larger than the amount of files you wish to keep open.
 
 These days we're really only interested in fsync'ing the current WAL
 log file, so working set doesn't seem like a problem anymore.  However
 context-switch latency is likely to be a big problem.  One thing we'd
 definitely need before considering this is to replace the existing
 spinlock mechanism with something more efficient.

What sort of problems are you seeing with the spinlock code?

 Vadim has designed the WAL stuff in such a way that a separate
 writer/syncer process would be easy to add; in fact it's almost that way
 already, in that any backend can write or sync data that's been added
 to the queue by any other backend.  The question is whether it'd
 actually buy anything to have another process.  Good stuff to experiment
 with for 7.2.

The delayed/coallecesed (sp?) fsync looked interesting.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 definitely need before considering this is to replace the existing
 spinlock mechanism with something more efficient.

 What sort of problems are you seeing with the spinlock code?

It's great as long as you never block, but it sucks for making things
wait, because the wait interval will be some multiple of 10 msec rather
than just the time till the lock comes free.

We've speculated about using Posix semaphores instead, on platforms
where those are available.  I think Bruce was concerned about the
possible overhead of pulling in a whole thread-support library just to
get semaphores, however.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



RE: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Mikheev, Vadim

  I was wondering if the multiple writes performed to the 
  XLOG could be grouped into one write().
 
 That would require fairly major restructuring of xlog.c, which I don't

Restructing? Why? It's only XLogWrite() who make writes.

 want to undertake at this point in the cycle (we're trying to push out
 a release candidate, remember?).  I'm not convinced it would be a huge
 win anyway.  It would be a win if your average transaction writes
 multiple blocks' worth of XLOG ... but if your average transaction
 writes less than a block then it won't help.

But in multi-user environment multiple transactions may write  1 block
before commit.

 I think it probably is a good idea to restructure xlog.c so 
 that it can write more than one page at a time --- but it's
 not such a great idea that I want to hold up the release any
 more for it.

Agreed.

Vadim

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread The Hermit Hacker

On Fri, 16 Mar 2001, Tom Lane wrote:

 Alfred Perlstein [EMAIL PROTECTED] writes:
  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.

  What sort of problems are you seeing with the spinlock code?

 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

 We've speculated about using Posix semaphores instead, on platforms
 where those are available.  I think Bruce was concerned about the
 possible overhead of pulling in a whole thread-support library just to
 get semaphores, however.

But, with shared libraries, are you really pulling in a "whole
thread-support library"?  My understanding of shared libraries (altho it
may be totally off) was that instead of pulling in a whole library, you
pulled in the bits that you needed, pretty much as you needed them ...




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 I was wondering if the multiple writes performed to the 
 XLOG could be grouped into one write().
 
 That would require fairly major restructuring of xlog.c, which I don't

 Restructing? Why? It's only XLogWrite() who make writes.

I was thinking of changing the data structure.  I guess you could keep
the data structure the same and make XLogWrite more complicated, though.

 I think it probably is a good idea to restructure xlog.c so 
 that it can write more than one page at a time --- but it's
 not such a great idea that I want to hold up the release any
 more for it.

 Agreed.

Yes, to-do item for 7.2.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



RE: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Mikheev, Vadim

 We've speculated about using Posix semaphores instead, on platforms

For spinlocks we should use pthread mutex-es.

 where those are available.  I think Bruce was concerned about the

And nutex-es are more portable than semaphores.

Vadim

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Larry Rosenman [EMAIL PROTECTED] writes:
 But, with shared libraries, are you really pulling in a "whole
 thread-support library"?

 Yes, you are.  On UnixWare, you need to add -Kthread, which CHANGES a LOT 
 of primitives to go through threads wrappers and scheduling.

Right, it's not so much that we care about referencing another shlib,
it's that -lpthreads may cause you to get a whole new thread-aware
version of libc, with attendant overhead that we don't need or want.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Zeugswetter Andreas SB


  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.
 
  What sort of problems are you seeing with the spinlock code?
 
 It's great as long as you never block, but it sucks for making things

I like optimistic approaches :-)

 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

On the AIX platform usleep (3) is able to really sleep microseconds without 
busying the cpu when called for more than approx. 100 us (the longer the interval,
the less busy the cpu gets) .
Would this not be ideal for spin_lock, or is usleep not very common ?
Linux sais it is in the BSD 4.3 standard.

postgres@s0188000zeu:/usr/postgres time ustest # with 100 us
real0m10.95s
user0m0.40s
sys 0m0.74s

postgres@s0188000zeu:/usr/postgres time ustest # with 10 us
real0m18.62s
user0m1.37s
sys 0m5.73s

Andreas

PS: sorry off for weekend now :-) Current looks good on AIX.


 ustest.c


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Doug McNaught

Tom Lane [EMAIL PROTECTED] writes:

 Alfred Perlstein [EMAIL PROTECTED] writes:
  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.
 
  What sort of problems are you seeing with the spinlock code?
 
 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

Plus, using select() for the timeout is putting you into the kernel
multiple times in a short period, and causing a reschedule everytime,
which is a big lose.  This was discussed in the linux-kernel thread
that was referred to a few days ago.

 We've speculated about using Posix semaphores instead, on platforms
 where those are available.  I think Bruce was concerned about the
 possible overhead of pulling in a whole thread-support library just to
 get semaphores, however.

Are Posix semaphores faster by definition than SysV semaphores (which
are described as "slow" in the source comments)?  I can't see how
they'd be much faster unless locking/unlocking an uncontended
semaphore avoids a system call, in which case you might run into the
same problems with userland backoff...

Just looked, and on Linux pthreads and POSIX semaphores are both
already in the C library.  Unfortunately, the Linux C library doesn't
support the PROCESS_SHARED attribute for either pthreads mutexes or
POSIX semaphores.  Grumble.  What's the point then?

Just some ignorant ramblings, thanks for listening...

-Doug

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



AW: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Zeugswetter Andreas SB


 For a log file on a busy system, this could improve throughput a lot--batch
 commit.  You end up with fewer than one fsync() per transaction.

This is not the issue, since that is already implemented.
The current bunching method might have room for improvement, but
there are currently fewer fsync's than transactions when appropriate.

Andreas

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Bruce Momjian

[ Charset ISO-8859-1 unsupported, converting... ]
 Yes, you are.  On UnixWare, you need to add -Kthread, which CHANGES a LOT 
 of primitives to go through threads wrappers and scheduling.

This was my concern;  the change that happens on startup and lib calls
when thread support comes in through a library.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Zeugswetter Andreas SB  [EMAIL PROTECTED] writes:
 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

 On the AIX platform usleep (3) is able to really sleep microseconds without 
 busying the cpu when called for more than approx. 100 us (the longer the interval,
 the less busy the cpu gets) .
 Would this not be ideal for spin_lock, or is usleep not very common ?
 Linux sais it is in the BSD 4.3 standard.

HPUX has usleep, but the man page says

 The usleep() function is included for its historical usage. The
 setitimer() function is preferred over this function.

In any case, I would expect that all these functions offer accuracy
no better than the scheduler's regular clock cycle (~ 100Hz) on most
kernels.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 09:35] wrote:
 
 BTW, are there any platforms where O_DSYNC exists but has a different
 spelling?

Yes, FreeBSD only has: O_FSYNC
it doesn't have O_SYNC nor O_DSYNC.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 * Tom Lane [EMAIL PROTECTED] [010315 09:35] wrote:
 BTW, are there any platforms where O_DSYNC exists but has a different
 spelling?

 Yes, FreeBSD only has: O_FSYNC
 it doesn't have O_SYNC nor O_DSYNC.

Okay ... we can fall back to O_FSYNC if we don't see either of the
others.  No problem.  Any other weird cases out there?  I think Andreas
might've muttered something about AIX but I'm not sure now.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

 I think we need to make both O_SYNC and fsync() choices available in
 7.1.  Two important questions need to be settled:

 1. Is a compile-time flag (in config.h.in) good enough, or do we need
 to make it configurable via a GUC variable?  (A variable would have to
 be postmaster-start-time changeable only, so you'd still need a
 postmaster restart to change it.)

As a general rule, if something can be a run time option, as opposed to a
compile time option, then it should be.  At the very least you keep the
installation simple and allow for easier experimenting.

 There's also the lesser question of what to call the config symbol
 or variable.

I suggest "wal_use_fsync" as a GUC variable, assuming the default would be
off.  Otherwise "wal_use_open_sync".  (Use a general-to-specific naming
scheme to allow for easier grouping.  Having defaults be "off"
consistently is more intuitive.)

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 As a general rule, if something can be a run time option, as opposed to a
 compile time option, then it should be.  At the very least you keep the
 installation simple and allow for easier experimenting.

I've been mentally working through the code, and see only one reason why
it might be necessary to go with a compile-time choice: suppose we see
that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
If it's a GUC variable then we need a way to prevent the GUC option from
becoming unset (which would disable the fsync() calls, leaving nothing
to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
thoughts about that?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



RE: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Mikheev, Vadim

 Based on the tests we did last week, it seems clear than on many
 platforms it's a win to sync the WAL log by writing it with open()
 option O_SYNC (or O_DSYNC where available) rather than 
 issuing explicit fsync() (resp. fdatasync()) calls.

I don't remember big difference in using fsync or O_SYNC in tfsync
tests. Both depend on block size and keeping in mind that fsync
allows us syncing after writing *multiple* blocks I would either
use fsync as default or don't deal with O_SYNC at all.
But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
use O_DSYNC by default.
(BTW, we didn't compare fdatasync and O_SYNC yet).

Vadim

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 11:07] wrote:
 "Mikheev, Vadim" [EMAIL PROTECTED] writes:
  ... I would either
  use fsync as default or don't deal with O_SYNC at all.
  But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
  use O_DSYNC by default.
 
 Hm.  We could do that reasonably painlessly as a compile-time test in
 xlog.c, but I'm not clear on how it would play out as a GUC option.
 Peter, what do you think about configuration-dependent defaults for
 GUC variables?

Sorry, what's a GUC? :)

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Alfred Perlstein writes:

 Sorry, what's a GUC? :)

Grand Unified Configuration system

It's basically a cute name for the achievement that there's now a single
name space and interface for (almost) all postmaster run time
configuration variables,

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Lamar Owen

Alfred Perlstein wrote:
 * Tom Lane [EMAIL PROTECTED] [010315 11:07] wrote:
  Peter, what do you think about configuration-dependent defaults for
  GUC variables?
 
 Sorry, what's a GUC? :)

Grand Unified Configuration, Peter E.'s baby.

See the thread starting at
http://www.postgresql.org/mhonarc/pgsql-hackers/2000-03/msg00107.html
for details.

(And the search is working :-)).
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 11:45] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  And since we're sorta on the topic of IO, I noticed that it looks
  like (at least in 7.0.3) that vacuum and certain other routines
  read files in reverse order.
 
 Vacuum does that because it's trying to push tuples down from the end
 into free space in earlier blocks.  I don't see much way around that
 (nor any good reason to think that it's a critical part of vacuum's
 performance anyway).  Where else have you seen such behavior?

Just vacuum, but the source is large, and I'm sort of lacking
on database-foo so I guessed that it may be done elsewhere.

You can optimize this out by implementing the read behind yourselves
sorta like this:

struct sglist *
read(fd, len)
{

if (fd.lastpos - fd.curpos = THRESHOLD) {
fd.curpos = fd.lastpos - THRESHOLD;
len = THRESHOLD;
}

return (do_read(fd, len));
}

of course this is entirely wrong, but illustrates what
would/could help.

I would fix FreeBSD, but it's sort of a mess and beyond what
I've got time to do ATM.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Peter Eisentraut [EMAIL PROTECTED] writes:
  As a general rule, if something can be a run time option, as opposed to a
  compile time option, then it should be.  At the very least you keep the
  installation simple and allow for easier experimenting.
 
 I've been mentally working through the code, and see only one reason why
 it might be necessary to go with a compile-time choice: suppose we see
 that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
 compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
 If it's a GUC variable then we need a way to prevent the GUC option from
 becoming unset (which would disable the fsync() calls, leaving nothing
 to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
 thoughts about that?

I don't think having something a run-time option is always a good idea. 
Giving people too many choices is often confusing.  

I think we should just check at compile time, and choose O_* if we have
it, and if not, use fsync().  No one will ever do the proper timing
tests to know which is better except us.  Also, it seems O_* should be
faster because you are fsync'ing the buffer you just wrote, so there is
no looking around for dirty buffers like fsync().

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Based on the tests we did last week, it seems clear than on many
 platforms it's a win to sync the WAL log by writing it with open()
 option O_SYNC (or O_DSYNC where available) rather than issuing explicit
 fsync() (resp. fdatasync()) calls.  In theory fsync ought to be faster,
 but it seems that too many kernels have inefficient implementations of
 fsync.

Can someone explain why configure/platform-specific flags are allowed to
be added at this stage in the release, but my pgmonitor patch was
rejected?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Can someone explain why configure/platform-specific flags are allowed to
 be added at this stage in the release, but my pgmonitor patch was
 rejected?

Possibly just because Marc hasn't stomped on me quite yet ;-)

However, I can actually make a case for this: we are flushing out
performance bugs in a new feature, ie WAL.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  Can someone explain why configure/platform-specific flags are allowed to
  be added at this stage in the release, but my pgmonitor patch was
  rejected?
 
 Possibly just because Marc hasn't stomped on me quite yet ;-)
 
 However, I can actually make a case for this: we are flushing out
 performance bugs in a new feature, ie WAL.


You did a masterful job of making my pgmonitor patch sound like a debug
aid instead of a feature too.  :-)

Have you considered a career in law.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

  I've been mentally working through the code, and see only one reason why
  it might be necessary to go with a compile-time choice: suppose we see
  that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?  With the
  compile-time choice it's easy: #define USE_FSYNC_FOR_WAL, and sail on.
  If it's a GUC variable then we need a way to prevent the GUC option from
  becoming unset (which would disable the fsync() calls, leaving nothing
  to replace 'em).  Doable, perhaps, but seems kind of ugly ... any
  thoughts about that?
 
 I don't think having something a run-time option is always a good idea. 
 Giving people too many choices is often confusing.  
 
 I think we should just check at compile time, and choose O_* if we have
 it, and if not, use fsync().  No one will ever do the proper timing
 tests to know which is better except us.  Also, it seems O_* should be
 faster because you are fsync'ing the buffer you just wrote, so there is
 no looking around for dirty buffers like fsync().

I later read Vadim's comment that fsync() of two blocks may be faster
than two O_* writes, so I am now confused about the proper solution. 
However, I think we need to pick one and make it invisible to the user. 
Perhaps a compiler/config.h flag for testing would be a good solution.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

[ Charset ISO-8859-1 unsupported, converting... ]
  Based on the tests we did last week, it seems clear than on many
  platforms it's a win to sync the WAL log by writing it with open()
  option O_SYNC (or O_DSYNC where available) rather than 
  issuing explicit fsync() (resp. fdatasync()) calls.
 
 I don't remember big difference in using fsync or O_SYNC in tfsync
 tests. Both depend on block size and keeping in mind that fsync
 allows us syncing after writing *multiple* blocks I would either
 use fsync as default or don't deal with O_SYNC at all.

I see what you are saying.  That the OS may be faster at fsync'ing two
blocks in one operation rather than doing to O_SYNC operations.

Seems we should just pick a default and leave the rest for a later
release.  Marc wants RC1 tomorrow, I think.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I later read Vadim's comment that fsync() of two blocks may be faster
 than two O_* writes, so I am now confused about the proper solution. 
 However, I think we need to pick one and make it invisible to the user. 
 Perhaps a compiler/config.h flag for testing would be a good solution.

I believe that we don't know enough yet to nail down a hard-wired
decision.  Vadim's idea of preferring O_DSYNC if it appears to be
different from O_SYNC is a good first cut, but I think we'd better make
it possible to override that, at least for testing purposes.

So I think it should be configurable at *some* level.  I don't much care
whether it's a config.h entry or a GUC variable.

But consider this: we'll be more likely to get some feedback from the
field (allowing us to refine the policy in future releases) if it is a
GUC variable.  Not many people will build two versions of the software,
but people might take the trouble to play with a run-time configuration
setting.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  I later read Vadim's comment that fsync() of two blocks may be faster
  than two O_* writes, so I am now confused about the proper solution. 
  However, I think we need to pick one and make it invisible to the user. 
  Perhaps a compiler/config.h flag for testing would be a good solution.
 
 I believe that we don't know enough yet to nail down a hard-wired
 decision.  Vadim's idea of preferring O_DSYNC if it appears to be
 different from O_SYNC is a good first cut, but I think we'd better make
 it possible to override that, at least for testing purposes.
 
 So I think it should be configurable at *some* level.  I don't much care
 whether it's a config.h entry or a GUC variable.
 
 But consider this: we'll be more likely to get some feedback from the
 field (allowing us to refine the policy in future releases) if it is a
 GUC variable.  Not many people will build two versions of the software,
 but people might take the trouble to play with a run-time configuration
 setting.

Yes, I can imagine.  Can we remove it once we know the answer?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Larry Rosenman

I'd actually vote for it to remain for a release or two or more, as 
we get more experience with stuff, the defaults may be different for 
different workloads. 

LER
-- 
Larry Rosenman
 http://www.lerctr.org/~ler/
Phone: +1 972 414 9812
 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 US

 Original Message 

On 3/15/01, 2:46:20 PM, Bruce Momjian [EMAIL PROTECTED] wrote 
regarding Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC:


  Bruce Momjian [EMAIL PROTECTED] writes:
   I later read Vadim's comment that fsync() of two blocks may be faster
   than two O_* writes, so I am now confused about the proper solution.
   However, I think we need to pick one and make it invisible to the user.
   Perhaps a compiler/config.h flag for testing would be a good solution.
 
  I believe that we don't know enough yet to nail down a hard-wired
  decision.  Vadim's idea of preferring O_DSYNC if it appears to be
  different from O_SYNC is a good first cut, but I think we'd better make
  it possible to override that, at least for testing purposes.
 
  So I think it should be configurable at *some* level.  I don't much care
  whether it's a config.h entry or a GUC variable.
 
  But consider this: we'll be more likely to get some feedback from the
  field (allowing us to refine the policy in future releases) if it is a
  GUC variable.  Not many people will build two versions of the software,
  but people might take the trouble to play with a run-time configuration
  setting.

 Yes, I can imagine.  Can we remove it once we know the answer?

 --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?

 http://www.postgresql.org/users-lounge/docs/faq.html

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

 "Mikheev, Vadim" [EMAIL PROTECTED] writes:
  ... I would either
  use fsync as default or don't deal with O_SYNC at all.
  But if O_DSYNC is defined and O_DSYNC != O_SYNC then we should
  use O_DSYNC by default.

 Hm.  We could do that reasonably painlessly as a compile-time test in
 xlog.c, but I'm not clear on how it would play out as a GUC option.
 Peter, what do you think about configuration-dependent defaults for
 GUC variables?

We have plenty of those already, but we should avoid a variable whose
specification is:

"The default is 'on' if your system defines one of the macros O_SYNC,
O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
default is 'off'."

The net result of this would be that the average user would have
absolutely no clue what the default on his machine is.

Additionally consider that maybe O_SYNC and O_DSYNC have different values
but the kernel treats them the same anyway.  We really shouldn't try to
guess that far.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

 However, I can actually make a case for this: we are flushing out
 performance bugs in a new feature, ie WAL.

I haven't followed the jungle of numbers too closely.

Is it not the case that WAL + fsync is still faster than 7.0 + fsync and
WAL/no fsync is still faster than 7.0/no fsync?

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Peter, what do you think about configuration-dependent defaults for
 GUC variables?

 We have plenty of those already, but we should avoid a variable whose
 specification is:

 "The default is 'on' if your system defines one of the macros O_SYNC,
 O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
 default is 'off'."

Unfortunately, I think that's just about what the default would need to
be.  What alternative do you have to offer?

 The net result of this would be that the average user would have
 absolutely no clue what the default on his machine is.

Sure he would.  Fire up the software and do "SHOW wal_use_fsync"
(or whatever we call it).  I think the documentation could just say
"the default is platform-dependent".

 Additionally consider that maybe O_SYNC and O_DSYNC have different values
 but the kernel treats them the same anyway.  We really shouldn't try to
 guess that far.

Well, that's exactly *why* we need an overridable default.  Or would you
like to try to do some performance measurements in configure?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



RE: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Mikheev, Vadim

 I believe that we don't know enough yet to nail down a hard-wired
 decision.  Vadim's idea of preferring O_DSYNC if it appears to be
 different from O_SYNC is a good first cut, but I think we'd 
 better make it possible to override that, at least for testing purposes.

So let's leave fsync as default and add option to open log files
with O_DSYNC/O_SYNC.

Vadim

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

 I've been mentally working through the code, and see only one reason why
 it might be necessary to go with a compile-time choice: suppose we see
 that none of O_DSYNC, O_SYNC, O_FSYNC, [others] are defined?

We postulate that one of those has to exist.  Alternatively, you make the
option read

wal_sync_method = fsync | open_sync

In the "parse_hook" for the parameter you if #ifdef out 'open_sync' as a
valid option if none of those exist, so a user will get "'open_sync' is
not a valid option value".

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 "The default is 'on' if your system defines one of the macros O_SYNC,
 O_DSYNC, O_FSYNC, and if O_SYNC and O_DSYNC are distinct, otherwise the
 default is 'off'."
 
 The net result of this would be that the average user would have
 absolutely no clue what the default on his machine is.
 
 Additionally consider that maybe O_SYNC and O_DSYNC have different values
 but the kernel treats them the same anyway.  We really shouldn't try to
 guess that far.

Good point.  I think Tom already found dfsync points to fsync in his
libc, or something like that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 We postulate that one of those has to exist.  Alternatively, you make the
 option read
 wal_sync_method = fsync | open_sync
 In the "parse_hook" for the parameter you if #ifdef out 'open_sync' as a
 valid option if none of those exist, so a user will get "'open_sync' is
 not a valid option value".

I like this a lot.  In fact, I am mightily tempted to make it

wal_sync_method = fsync | fdatasync | open_sync | open_datasync

where fdatasync would only be valid if configure found fdatasync() and
open_datasync would only be valid if we found O_DSYNC exists and isn't
O_SYNC.  This would let people try all the available methods under
realistic test conditions, for hardly any extra work.

Furthermore, the documentation could say something like "The default is
the first available method in the order open_datasync, fdatasync, fsync,
open_sync" (assuming that Vadim's preferences are right).

A small problem is that I don't want to be doing multiple strcasecmp's
to figure out what to do in xlog.c.  Do you object if I add an
"assign_hook" to guc.c that's called when an actual assignment is made?
That would provide a place to set up the flag variables that xlog.c
would actually look at.  Furthermore, having an assign_hook would let us
support changing this value at SIGHUP, not only at postmaster start.
(The assign hook would just need to fsync whatever WAL file is currently
open and possibly close/reopen the file, to ensure that no blocks miss
getting synced when we change conventions.)

Creeping featurism strikes again ;-) ... but this feels right ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Peter Eisentraut

Tom Lane writes:

 wal_sync_method = fsync | fdatasync | open_sync | open_datasync

 A small problem is that I don't want to be doing multiple strcasecmp's
 to figure out what to do in xlog.c.

This should be efficient:

switch(lower(string[0]) + lower(string[5]))
{
case 'f':   /* fsync */
case 'f' + 's': /* fdatasync */
case 'o' + 's': /* open_sync */
case 'o' + 'd': /* open_datasync */
}

Although ugly, it should serve as a readable solution for now.

 Do you object if I add an "assign_hook" to guc.c that's called when an
 actual assignment is made?

Something like this is on my wish list, but I'm not sure if it's wise to
start this now.  There are a few issues that need some thought, like how
to make the interface for non-string options, and how to keep it in sync
with the parse hook of string options, ...

 That would provide a place to set up the flag variables that xlog.c
 would actually look at.  Furthermore, having an assign_hook would let
 us support changing this value at SIGHUP, not only at postmaster
 start. (The assign hook would just need to fsync whatever WAL file is
 currently open and possibly close/reopen the file, to ensure that no
 blocks miss getting synced when we change conventions.)

... and possibly here you need to pass the context to the assign hook as
well.  This application strikes me as a bit too esoteric for a first try.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Mikheev, Vadim [EMAIL PROTECTED] [010315 13:52] wrote:
  I believe that we don't know enough yet to nail down a hard-wired
  decision.  Vadim's idea of preferring O_DSYNC if it appears to be
  different from O_SYNC is a good first cut, but I think we'd 
  better make it possible to override that, at least for testing purposes.
 
 So let's leave fsync as default and add option to open log files
 with O_DSYNC/O_SYNC.

I have a weird and untested suggestion:

How many files need to be fsync'd?

If it's more than one, what might work is using mmap() to map the
files in adjacent areas, then calling msync() on the entire range,
this would allow you to batch fsync the data.

The only problem is that I'm not sure:

1) how portable msync() is.
2) if msync garauntees metadata consistancy.

Another benifit of mmap() is the 'zero' copy nature of it.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 How many files need to be fsync'd?

Only one.

 If it's more than one, what might work is using mmap() to map the
 files in adjacent areas, then calling msync() on the entire range,
 this would allow you to batch fsync the data.

Interesting thought, but mmap to a prespecified address is most
definitely not portable, whether or not you want to assume that
plain mmap is ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 switch(lower(string[0]) + lower(string[5]))
 {
   case 'f':   /* fsync */
   case 'f' + 's': /* fdatasync */
   case 'o' + 's': /* open_sync */
   case 'o' + 'd': /* open_datasync */
 }

 Although ugly, it should serve as a readable solution for now.

Ugly is the word ...

 Do you object if I add an "assign_hook" to guc.c that's called when an
 actual assignment is made?

 Something like this is on my wish list, but I'm not sure if it's wise to
 start this now.

I'm not particularly concerned about changing the interface later if
that proves necessary.  We're not likely to have so many of the things
that an API change is burdensome, and they will all be strictly backend
internal.

What I have in mind for now is just

void (*assign_hook) (const char *newval);

(obviously this is for string variables only, for now) called just
before actually changing the variable value.  This lets the hook see
the old value if it needs to.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010315 14:54] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  How many files need to be fsync'd?
 
 Only one.
 
  If it's more than one, what might work is using mmap() to map the
  files in adjacent areas, then calling msync() on the entire range,
  this would allow you to batch fsync the data.
 
 Interesting thought, but mmap to a prespecified address is most
 definitely not portable, whether or not you want to assume that
 plain mmap is ...

Yeah... :(

Evil thought though (for reference):

mmap(anon memory) returns addr1
addr2 = addr1 + maplen
split addr1-addr2 on points A B and C
mmap(file1 over addr1 to A)
mmap(file2 over A to B)
mmap(file3 over B to C)
mmap(file4 over C to addr2)

It _should_ work, but there's probably some corner cases where it
doesn't.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Justin Clift

Bruce Momjian wrote:
 
snip
 No one will ever do the proper timing tests to know which is better except us.

Hi Bruce,

I believe in the future that anyone doing serious benchmark tests before
large-scale implementation will indeed be testing things like this. 
There will also be people/companies out there who will specialise in
"tuning" PostgreSQL systems and they will definitely test stuff like
this... different variations, different database structures, different
OS's, etc.

Regards and best wishes,

Justin Clift

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian wrote:
  
 snip
  No one will ever do the proper timing tests to know which is better except us.
 
 Hi Bruce,
 
 I believe in the future that anyone doing serious benchmark tests before
 large-scale implementation will indeed be testing things like this. 
 There will also be people/companies out there who will specialize in
 "tuning" PostgreSQL systems and they will definitely test stuff like
 this... different variations, different database structures, different
 OS's, etc.

But I don't want to go the Informix/Oracle way where we have so many
tuning options that no one understands them all.  I would like us to
find the best options and only give users choices when there is a real
tradeoff.

For example, Tom had a nice fsync test program.  Why can't we run that
on various platforms and collect the results, then make a decision on
the best default.

Trying to test the affects of fsync() with a database wrapped around it
really makes for difficult measurement anyway.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 For example, Tom had a nice fsync test program.  Why can't we run that
 on various platforms and collect the results, then make a decision on
 the best default.

Mainly because (a) there's not enough time before release, and (b) that
test program was far too stupid to give trustworthy results anyway.
(It was assuming exactly one commit per XLOG block, for example.)

 Trying to test the affects of fsync() with a database wrapped around it
 really makes for difficult measurement anyway.

Exactly.  What I'm doing now is providing some infrastructure with which
we can hope to see some realistic tests.  For example, I'm gonna be
leaning on Great Bridge's lab guys to rerun their TPC tests with a bunch
of combinations, just as soon as the dust settles.  But I'm not planning
to put my faith in only that one benchmark.

I'm all for improving the intelligence of the defaults once we know
enough to pick better defaults.  But we don't yet, and there's no way
that we *will* know enough until after we've shipped a release that has
these tuning knobs and gotten some real-world results from the field.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

I was wondering if the multiple writes performed to the XLOG could be
grouped into one write().  Seems everyone agrees:

fdatasync/O_DSYNC is better then plain fsync/O_SYNC

and the O_* flags are better than fsync() if we are doing only one write
before every fsync.  It seems the only open question is now often we do
multiple writes before fsync, and if that is ever faster than putting
the O_* on the file for all writes.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I was wondering if the multiple writes performed to the XLOG could be
 grouped into one write().

That would require fairly major restructuring of xlog.c, which I don't
want to undertake at this point in the cycle (we're trying to push out
a release candidate, remember?).  I'm not convinced it would be a huge
win anyway.  It would be a win if your average transaction writes
multiple blocks' worth of XLOG ... but if your average transaction
writes less than a block then it won't help.

I think it probably is a good idea to restructure xlog.c so that it can
write more than one page at a time --- but it's not such a great idea
that I want to hold up the release any more for it.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  I was wondering if the multiple writes performed to the XLOG could be
  grouped into one write().
 
 That would require fairly major restructuring of xlog.c, which I don't
 want to undertake at this point in the cycle (we're trying to push out
 a release candidate, remember?).  I'm not convinced it would be a huge
 win anyway.  It would be a win if your average transaction writes
 multiple blocks' worth of XLOG ... but if your average transaction
 writes less than a block then it won't help.
 
 I think it probably is a good idea to restructure xlog.c so that it can
 write more than one page at a time --- but it's not such a great idea
 that I want to hold up the release any more for it.

OK, but the point of adding all those configuration options was to allow
us to figure out which was faster.  If you can do the code so we no
longer need to know the answer of which is best, why bother adding the
config options.  Just ship our best guess and fix it when we can.  Does
that make sense?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, but the point of adding all those configuration options was to allow
  us to figure out which was faster.  If you can do the code so we no
  longer need to know the answer of which is best, why bother adding the
  config options.
 
 How in the world did you arrive at that idea?  I don't see anyone around
 here but you claiming that we don't need any experimentation ...

I am trying to understand what testing we need to do.   I know we need
configure tests to check to see what exists in the OS.

My question was what are we needing to test?  If we can do only single writes
to the log, don't we prefer O_* to fsync, and the O_D* options over
plain O_*?  Am I confused?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 My question was what are we needing to test?  If we can do only single writes
 to the log, don't we prefer O_* to fsync, and the O_D* options over
 plain O_*?  Am I confused?

I don't think we have enough data to conclude that with any certainty.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  My question was what are we needing to test?  If we can do only single writes
  to the log, don't we prefer O_* to fsync, and the O_D* options over
  plain O_*?  Am I confused?
 
 I don't think we have enough data to conclude that with any certainty.

I just figured we knew the answers to above issues, that that the only
issue was multiple writes vs. fsync().

It is hard for me to imagine O_* being slower than fsync(), or fdatasync
being slower than fsync.  Are we not able to assume that?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 It is hard for me to imagine O_* being slower than fsync(),

Not hard at all --- if we're writing multiple xlog blocks per
transaction, then O_* constrains the sequence of operations more
than we really want.  Changing xlog.c to combine writes as much
as possible would reduce this problem, but not eliminate it.

Besides, the entire object of this exercise is to work around
an unexpected inefficiency in some kernels' implementations of
fsync/fdatasync (viz, scanning over lots of not-dirty buffers).
Who's to say that there might not be inefficiencies in other
platforms' implementations of the O_* options?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Xu Yifeng

Hello Tom,

Friday, March 16, 2001, 6:54:22 AM, you wrote:

TL Alfred Perlstein [EMAIL PROTECTED] writes:
 How many files need to be fsync'd?

TL Only one.

 If it's more than one, what might work is using mmap() to map the
 files in adjacent areas, then calling msync() on the entire range,
 this would allow you to batch fsync the data.

TL Interesting thought, but mmap to a prespecified address is most
TL definitely not portable, whether or not you want to assume that
TL plain mmap is ...

TL regards, tom lane

Could anyone consider fork a syncer process to sync data to disk ?
build a shared sync queue, when a daemon process want to do sync after
write() is called, just put a sync request to the queue. this can release
process from blocked on writing as soon as possible. multipile sync
request for one file can be merged when the request is been inserting to
the queue.

-- 
Regards,
Xu Yifeng



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Re[2]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-15 Thread Alfred Perlstein

* Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:
 Hello Tom,
 
 Friday, March 16, 2001, 6:54:22 AM, you wrote:
 
 TL Alfred Perlstein [EMAIL PROTECTED] writes:
  How many files need to be fsync'd?
 
 TL Only one.
 
  If it's more than one, what might work is using mmap() to map the
  files in adjacent areas, then calling msync() on the entire range,
  this would allow you to batch fsync the data.
 
 TL Interesting thought, but mmap to a prespecified address is most
 TL definitely not portable, whether or not you want to assume that
 TL plain mmap is ...
 
 TL regards, tom lane
 
 Could anyone consider fork a syncer process to sync data to disk ?
 build a shared sync queue, when a daemon process want to do sync after
 write() is called, just put a sync request to the queue. this can release
 process from blocked on writing as soon as possible. multipile sync
 request for one file can be merged when the request is been inserting to
 the queue.

I suggested this about a year ago. :)

The problem is that you need that process to potentially open and close
many files over and over.

I still think it's somewhat of a good idea.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl