On Fri, Apr 26, 2002 at 01:28:59PM -0600, Sasha Pachev wrote: > On Friday 26 April 2002 10:11 am, Jeremy Zawodny wrote: > > > > I recently wiped my 4.0.2 slave clean and installed the latest 4.0.2, > > built from the BK tree. Then I synced it up with a nearby slave > > running 3.23.47 (using rsync after I had flushed the tables on the > > other slave and run a "SLAVE STOP"). > > > > I started it up and it ran for about a day before it ran into a > > duplicate key error. The 3.23.47 slave hasn't hit the duplicate key > > error, nor have any of our other slaves. So it is a 4.0.2 bug of some > > sort. > > > > What can I do to help you debug this? > > > > It is currently "stuck". I haven't bumped the counter yet and > > restarted the slave. > > Jeremy: > > It indeed looks like a bug to me from your description. However, at > this point I have no clue as to where it might be without some > additionial information.
As I expected. :-) > Actually, having said "no clue" brought me into a humble state of > mind, and one clue has just come. I suspect that somehow I/O thread > manages to log the same event twice in the relay log under some > circumstances. I looked at the code and could not think of a way how > it could happen, though. So let's do two things: > > * use mysqlbinlog to check the current relay log in the SQL thread > ( you can see it in SHOW SLAVE STATUS) at the current position and > one event prior to see if this is the same query. You will probably > want to do mysqlbinlog log-name > log.sql, search for the current > position in a text editor, and then scroll back one entry I spent some time analyzing the relay log and I didn't find duplicates. But what I found is a bit odd, so I'm gonna compare to the log off a 3.23.x slave (or the master). I spent a fair amount of time trying to filter crap out of the log (we get a couple hundred write queries per second during the week). In the end, I patched mysqlbinlog to add a "-d" command-line option. That let's you say "only show me queries from this particular database". Not all log events are associated with a database, of course, but doing this made a really huge difference. I'll send the patch along to the internals list. > * If you find a duplicate event in the relay log, check its > originating master position, and then check the error log to see if > there were any reconnects in the I/O thread around that position. No reconnects or anything odd logged on the slave or the master during that time (or even close to it). > Hope the above makes sense. If not, feel free to ask clarifying > questions. Makes complete sense. And having been in the code for about 8 hours, I know way more about the binary logs that I ever thought I'd care to. My current theory is that the relay log could somehow be *missing* a query rather than having a duplicate. Why do I think that? According to the relay log on the messed-up slave, it *should* have failed. Given the SQL it logged, there should have been a duplicate key violation. It's as if a DELETE query never made it into the relay log, and that's how I get a duplicate key. Anyway, I need to do a bit more digging to try and verify that. But I'm taking a break and wanted to summarize where I'm at and see if it gives you any ideas. Thanks, Jeremy -- Jeremy D. Zawodny, <[EMAIL PROTECTED]> Technical Yahoo - Yahoo Finance Desk: (408) 349-7878 Fax: (408) 349-5454 Cell: (408) 685-5936 MySQL 3.23.47-max: up 78 days, processed 2,053,108,324 queries (302/sec. avg) --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php