Re: How can I detect the end of a replication cycle?

Garth Webb Tue, 13 Apr 2004 17:14:03 -0700

On Tue, 2004-04-13 at 13:13, [EMAIL PROTECTED] wrote:
[snip]
> >> I could frequently poll the slave servers using SHOW SLAVE STATUS or
> SHOW
> >> PROCESS LIST and watch for their status to change.
> >>
> >> I could start each slave with "--log-slave-updates" and watch the
> slaves'
> >> binlogs for changes.
> >>
> >> I could watch for the creation/deletion of the slave's relay logs.
> 
> > This seems to indicate that you are afraid of selecting rows on the
> > slave that are in the middle of being updated from the master.  A single
> > update statement is still atomic, so you don't need to poll log files to
> > determine if an update statement has finished.
> 
> > On the other hand, if there is some set of multiple updates and inserts
> > that constitute a collection of data that you want to merge only when
> >this collection is complete, you're better off finding a way to signal
> > this through the database.  You could have the master lock the tables in
> > question until its finished and then the program quering the slave knows
> > that when it gets a read lock, its will see the full set of data.  You
> > could also have a status column or a status table that has a flag
> > letting the program on the slave side know when the data is ready.
> 
> > If this is off the mark, maybe some example statements would help...
> 
> [more snipping]
> 
> Yes, I AM concerned about getting a partial update to the warehouse. I know
> that transactions aren't logged until after they commit. If I use
> transactional boundaries to post multitable reports (for instance: an
> invoice takes two tables, one for the base information and one for the line
> items) into the branch masters then they will arrive intact and I won't
> corrupt the central slaves. (That's NOT the issue I am worried about!)
> 
> I am worried that if I start processing those new records from the slave
> database to the Warehouse before all of the records have been processed
> from the Relay Logs (lets say I started trying to merge records when I see
> the relay log being created), I could miss some data (like the last few
> items on the invoice). That's why I am so worried about not merging until
> the END of the cycle. I need to be sure that everything has been committed
> to my central slave BEFORE I can merge the latest updates with the
> warehouse database.
> 
> I can lock a slave database so that it won't replicate in the middle of my
> merging so I know that if I can catch a slave when it goes back to "sleep"
> ("Waiting for master to send event"), I would have a complete set of data.
> I could use "--log-slave-updates" to copy the updates to the slave's binlog
> and check that to see if I need to merge records( if slave status is
> "waiting" and the binlog is not "empty" then merge). Each merge could flush
> the binlog. However, there was a post from someone using binlogs for
> similar purpose that said that for 4.1.x+ the binlogs vary in size after
> flushing so I don't know how reliable that would be as a check. How could I
> tell when a binlog is empty?


Using the binlog in this way will lead to a race condition.  What if
another update comes in during the few milliseconds between your 'ready'
check (an empty binlog and a 'waiting' status) and when you select those
rows for processing?  Plus if MySQL does any buffering of its output to
the binlog, you could be basing your check on seconds old data, further
aggravating the problem.

The only solution I can think of that won't cause a race condition is to
lock your tables, but your program running on the slave database would
have to be able to connect to the master.  Your slave program would lock
the necessary tables on the master side, wait to receive the lock, wait
until the slave had caught up, do your merge, then release the lock. 
Likewise, the code updating tables on the master would need to lock the
tables while they write.  You make the process a little friendlier by
creating a separate control table that the slave and master alternately
locked, rather than locking all the table you'll use.  That way
processes on the master that just want to read the data don't have to
wait for a lock.


-- 

 |- Garth Webb       -|
 |- [EMAIL PROTECTED] -|

signature.asc
Description: This is a digitally signed message part

Re: How can I detect the end of a replication cycle?

Reply via email to