Re: 4.0.2 Replication still buggy...

Jeremy Zawodny Fri, 26 Apr 2002 19:35:03 -0700

On Fri, Apr 26, 2002 at 01:28:59PM -0600, Sasha Pachev wrote:
> On Friday 26 April 2002 10:11 am, Jeremy Zawodny wrote:
> > 
> > I recently wiped my 4.0.2 slave clean and installed the latest 4.0.2,
> > built from the BK tree.  Then I synced it up with a nearby slave
> > running 3.23.47 (using rsync after I had flushed the tables on the
> > other slave and run a "SLAVE STOP").
> > 
> > I started it up and it ran for about a day before it ran into a
> > duplicate key error.  The 3.23.47 slave hasn't hit the duplicate key
> > error, nor have any of our other slaves.  So it is a 4.0.2 bug of some
> > sort.
> > 
> > What can I do to help you debug this?
> > 
> > It is currently "stuck".  I haven't bumped the counter yet and
> > restarted the slave.
> 
> Jeremy:
> 
> It indeed looks like a bug to me from your description. However, at
> this point I have no clue as to where it might be without some
> additionial information.


As I expected. :-)

> Actually, having said "no clue" brought me into a humble state of
> mind, and one clue has just come. I suspect that somehow I/O thread
> manages to log the same event twice in the relay log under some
> circumstances. I looked at the code and could not think of a way how
> it could happen, though. So let's do two things:
> 
>   * use mysqlbinlog to check the current relay log in the SQL thread
> ( you can see it in SHOW SLAVE STATUS) at the current position and
> one event prior to see if this is the same query. You will probably
> want to do mysqlbinlog log-name > log.sql, search for the current
> position in a text editor, and then scroll back one entry

I spent some time analyzing the relay log and I didn't find
duplicates.  But what I found is a bit odd, so I'm gonna compare to
the log off a 3.23.x slave (or the master).

I spent a fair amount of time trying to filter crap out of the log (we
get a couple hundred write queries per second during the week).  In
the end, I patched mysqlbinlog to add a "-d" command-line option.
That let's you say "only show me queries from this particular
database".  Not all log events are associated with a database, of
course, but doing this made a really huge difference.

I'll send the patch along to the internals list.

>  * If you find a duplicate event in the relay log, check its
> originating master position, and then check the error log to see if
> there were any reconnects in the I/O thread around that position.

No reconnects or anything odd logged on the slave or the master
during that time (or even close to it).

> Hope the above makes sense. If not, feel free to ask clarifying
> questions.

Makes complete sense.  And having been in the code for about 8 hours,
I know way more about the binary logs that I ever thought I'd care to.

My current theory is that the relay log could somehow be *missing* a
query rather than having a duplicate.  Why do I think that?  According
to the relay log on the messed-up slave, it *should* have failed.
Given the SQL it logged, there should have been a duplicate key
violation.  It's as if a DELETE query never made it into the relay
log, and that's how I get a duplicate key.

Anyway, I need to do a bit more digging to try and verify that.  But
I'm taking a break and wanted to summarize where I'm at and see if it
gives you any ideas.

Thanks,

Jeremy
-- 
Jeremy D. Zawodny, <[EMAIL PROTECTED]>
Technical Yahoo - Yahoo Finance
Desk: (408) 349-7878   Fax: (408) 349-5454   Cell: (408) 685-5936

MySQL 3.23.47-max: up 78 days, processed 2,053,108,324 queries (302/sec. avg)

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: 4.0.2 Replication still buggy...

Reply via email to