Re: [HACKERS] WAL fsync scheduling

2001-01-24 Thread Bruce Momjian


Added to TODO.detail and TODO list.

 [ Charset ISO-8859-1 unsupported, converting... ]
   There are two parts to transaction commit.  The first is writing all
   dirty buffers or log changes to the kernel, and second is fsync of the
 
  Backend doesn't write any dirty buffer to the kernel at commit time.
 
 Yes, I suspected that.
 
  
   log file.
  
  The first part is writing commit record into WAL buffers in shmem.
  This is what XLogInsert does.  After that XLogFlush is called to ensure
  that  entire commit record is on disk. XLogFlush does *both* write() and
  fsync() (single slock is used for both writing and fsyncing) if it needs to
  do it at all.
 
 Yes, I realize there are new steps in WAL.
 
  
   I suggest having a per-backend shared memory byte that has the following
   values:
   
   START_LOG_WRITE
   WAIT_ON_FSYNC
   NOT_IN_COMMIT
   backend_number_doing_fsync
   
   I suggest that when each backend starts a commit, it sets its byte to
   START_LOG_WRITE. 
^^^
  Isn't START_COMMIT more meaningful?
 
 Yes.
 
  
   When it gets ready to fsync, it checks all backends. 
 ^^
  What do you mean by this? The moment just after XLogInsert?
 
 Just before it calls fsync().
 
  
   If all are NOT_IN_COMMIT, it does fsync and continues.
  
  1st edition:
   If one or more are in START_LOG_WRITE, it waits until no one is in
   START_LOG_WRITE.  It then checks all WAIT_ON_FSYNC, and if it is the
   lowest backend in WAIT_ON_FSYNC, marks all others with its backend
   number, and does fsync.  It then clears all backends with its number to
   NOT_IN_COMMIT.  Other backend will see they are not the lowest
   WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
   so they can then continue, knowing their data was synced.
  
  2nd edition:
   I have another idea.  If a backend gets to the point that it needs
   fsync, and there is another backend in START_LOG_WRITE, it can go to an
   interuptable sleep, knowing another backend will perform the fsync and
   wake it up.  Therefore, there is no busy-wait or timed sleep.
   
   Of course, a backend must set its status to WAIT_ON_FSYNC to avoid a
   race condition.
  
  The 2nd edition is much better. But I'm not sure do we really need in
  these per-backend bytes in shmem. Why not just have some counters?
  We can use a semaphore to wake-up all waiters at once.
 
 Yes, that is much better and clearer.  My idea was just to say, "if no
 one is entering commit phase, do the commit.  If someone else is coming,
 sleep and wait for them to do the fsync and wake me up with a singal."  
 
  
   This allows a single backend not to sleep, and allows multiple backends
   to bunch up only when they are all about to commit.
   
   The reason backend numbers are written is so other backends entering the
   commit code will not interfere with the backends performing fsync.
  
  Being waked-up backend can check what's written/fsynced by calling XLogFlush.
 
 Seems that may not be needed anymore with a counter.  The only issue is
 that other backends may enter commit while fsync() is happening.  The
 process that did the fsync must be sure to wake up only the backends
 that were waiting for it, and not other backends that may be also be
 doing fsync as a group while the first fsync was happening.  I leave
 those details to people more experienced.  :-)
 
 I am just glad people liked my idea.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Larry Rosenman

* Tom Lane [EMAIL PROTECTED] [001117 23:21]:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Other backend will see they are not the lowest
  WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
  so they can then continue, knowing their data was synced.
 
 How will they wait?  Without a semaphore involved, your answer must
 be either "timed sleep" or "busy-wait loop", neither of which is
 attractive ...
how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 

 
   regards, tom lane
-- 
Larry Rosenman  http://www.lerctr.org/~ler
Phone: +1 972-414-9812 (voice) Internet: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Bruce Momjian

 * Tom Lane [EMAIL PROTECTED] [001117 23:21]:
  Bruce Momjian [EMAIL PROTECTED] writes:
   Other backend will see they are not the lowest
   WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
   so they can then continue, knowing their data was synced.
  
  How will they wait?  Without a semaphore involved, your answer must
  be either "timed sleep" or "busy-wait loop", neither of which is
  attractive ...
 how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 

Looks like a winner.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 

 Looks like a winner.

sigpause() is a BSD-ism, and not part of any recognized standard
according to my HP man pages.  How portable do you think it is?

regards, tom lane



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 

 The standard is sigsuspend:

OK, we can probably assume that at least one of sigsuspend or sigpause
is available everywhere.  Now all you need is a free signal number.
Unfortunately we're already using both SIGUSR1 and SIGUSR2.

regards, tom lane



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 
 
  The standard is sigsuspend:
 
 OK, we can probably assume that at least one of sigsuspend or sigpause
 is available everywhere.  Now all you need is a free signal number.
 Unfortunately we're already using both SIGUSR1 and SIGUSR2.

Oh, I didn't want to hear that one.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Now all you need is a free signal number. Unfortunately we're already
 using both SIGUSR1 and SIGUSR2.

 Maybe you could dump the old meaning SIGQUIT (externally invoked error),
 move quickdie() to SIGQUIT, and you got SIGUSR1 free.

 (That would even make sense in two ways:  1) SIGQUIT would actually cause
 the guy to quit; 2) there is a correspondence between postmaster and
 postgres signals.)

Seems like a plan.  The current definition of backend SIGQUIT is really
stupid anyway --- what's the value of forcing an error asynchronously?

Also, it always bothered me that the postmaster and backend signals
weren't consistent, so I'd be inclined to make this change even if we
end up not using SIGUSR1 for Bruce's idea ...

regards, tom lane



Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Peter Eisentraut

Larry Rosenman writes:

 how about sigpause, and using SIGUSR1/SIGUSR2 to wake them up ? 

Both of these signals are already used.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Peter Eisentraut

Tom Lane writes:

 OK, we can probably assume that at least one of sigsuspend or sigpause
 is available everywhere.

#ifdef HAVE_POSIX_SIGNALS should tell you.

 Now all you need is a free signal number. Unfortunately we're already
 using both SIGUSR1 and SIGUSR2.

Maybe you could dump the old meaning SIGQUIT (externally invoked error),
move quickdie() to SIGQUIT, and you got SIGUSR1 free.

(That would even make sense in two ways:  1) SIGQUIT would actually cause
the guy to quit; 2) there is a correspondence between postmaster and
postgres signals.)

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




Re: [HACKERS] WAL fsync scheduling

2000-11-18 Thread Bruce Momjian

 Tom Lane writes:
 
  OK, we can probably assume that at least one of sigsuspend or sigpause
  is available everywhere.
 
 #ifdef HAVE_POSIX_SIGNALS should tell you.
 
  Now all you need is a free signal number. Unfortunately we're already
  using both SIGUSR1 and SIGUSR2.
 
 Maybe you could dump the old meaning SIGQUIT (externally invoked error),
 move quickdie() to SIGQUIT, and you got SIGUSR1 free.
 
 (That would even make sense in two ways:  1) SIGQUIT would actually cause
 the guy to quit; 2) there is a correspondence between postmaster and
 postgres signals.)

Good idea.

Of course, this assumes my idea was valid.  Was it?


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026