Re: SOLVED: Problem with *very* slow replication, FreeBSD 6.2

2007-11-04 Thread Christopher E. Brown

On Sat, 3 Nov 2007, bob b wrote:


Good to hear that you found the problem.

The only remaining puzzle is why the replica reported that it was up to date 
when it was several binlogs behind.
Possibly the replica was always caught up with the last entry from the very 
slow link.


Perhaps you should report this as a bug?  The replication mechanism should be 
able to check the last binlog being written on the master and report that 
difference?


Bob  Bankay (from home)



The reporting confusion is due to the fact that the "seconds behind 
master" figure is based on the relay logs and how long it will take to catch 
up.


For example, I had replication shut down for 45 minutes wile feeding 
millions of writes into the master.  On slave restart the binlog dump 
started, and it went fast.  As the relay log grew so did seconds behind 
master.  One the relay log was up to date, the "seconds behind master" was 
based on the execution rate and backlog.  (Somthing like 12 minutes and 
counting down)



So, a slave is down for 8hrs.  It comes online and pulls the binlog in 120 
seconds.  The "seconds behind master" does not reflect 8hrs, but how many 
seconds (at current processing rate) before the slave finishes the relay 
logs.



The "seconds behind master" value is really "seconds until currency with 
the relay logs" and should prolly be documented as such.



It would be nice if there was a way for the slave to find the actual 
current master position and compare with the local state though.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



SOLVED: Problem with *very* slow replication, FreeBSD 6.2

2007-11-02 Thread Christopher E. Brown



An update for those actually paying attention.

I have been fighing unusual performance issues with replication between 
FreeBSD 6.2 machines.


The unusual part is that while replication would never top 10 writes per 
second (even while the master was taking hundres of writes per second), 
the slave always reported zero seconds behind.


This is on servers with less than 1% CPU used.

The actual problem was not with writing the binlog, or the slave SQL 
thread, but the actual transfer of the binlog across the network.


After days of running, the slave would be many Gigs behind the master.

While debugging I tried many things including updating from 5.1.19 to 
5.1.22, rebuilding with WITH_PROC_SCOPE_PTH=yes, and even rebuilding using 
linuxthreads.



None of this worked.



The problem was rfc1323...  Window scaling *SHOULD* have improved 
performance given that this is a jumbo frame GigE network.


For reasons I don't understand, with rfc1323 enabled the data transfer 
rate for replication is limited to a ~ 200Kbyte/sec (I do not see the same 
slowdown for http or scp transfers).



To verify I rebuilt both systems back to default (native threads), 
re-inited the Master<->Master replication loop, shutdown one of the 
servers and inserted several million records on the live system (about 
1.8Gbyte of binlog).



On restarting the second system it read the binlog into the relay log at 
20 - 25 Mbyte/sec.  The seconds behind master value showed sane values, 
and it processed the relay-log backlog at about 6600 writes/sec until 
finished.



Further testing included 3,000 inserts/sec to each of the servers 
(6,000/sec total) with the master/master replication loop active.  During 
a run of 10,000,000 inserts to each server replication was never more than 
2 seconds behind.



On Tue, 30 Oct 2007, Christopher E. Brown wrote:


On Thu, 25 Oct 2007, [EMAIL PROTECTED] wrote:


Not sure that I get the whole picture.

We have been running replication since about 4.0 and we have been through 
several upgrades and are now at 5.0.27.


The 'show slave status' always gives us an accurate reflection of where it 
is at which is usually 0 seconds behind.


Occasionally, it falls behind if the master is really busy (>2200 q/s with 
about 70% being updates/deletes/inserts).


At those times the slave tops out at about 1200 q/s of which most are db 
mods of some kind and some selects since we have reports running against 
the replica and it will fall behind temporarily.


Can you send show slave status and show master status as well and typical 
mytop outputs for master and slave?


That might let me be able to provide more help.


Bob


Unfortunatly I had to tear down replication as it was causing problems with 
the master.  (The master will not delete binlogs that a slave is still 
loading, when the slave is 40 file behind disk gets short).


CPU load was near zero on both systems (98% idle or better).

Disk load is minimal.

The slave is always up to date with relay file processing and reporting zero 
seconds behind.


In short, everything looks fine.


What happens is that the master -> slave binlog feed runs very slow (no more 
than abount 10 writes/sec).



So, afer a few days the slave is still reporting zero seconds behind, and it 
is zero seconds behind the relay log.


The problem is that while the master is currently writing binlog 650, the 
slave is actually zero seconds behind the feed, but the binlog feed has 
fallen 20 - 30 files behind (our binlog rolls at 256M).



Since there is no load issue, I expect there is a timing or trigger issue 
with the master side proc doing the binlog dump, or the slave side receiving 
it.



I can stop/start replication and/or reload both servers, it still holds.

I see the replication restart, with the slave running zero seconds behind the 
relay log, the binlog feed starts up right where it left off but the feed 
only runs at about 10 writes a second.



Are your running native or LinuxThreads?  This is smelling like threading 
issue to me (we are running FreeBSD 6.2 with native threading and 5.1.19).


The exact same setup was pre-built on Linux systems (2.6.x Slackware) before 
being built out on the production systems (FreeBSD 6.2).


During the testing 1000 writes/sec were no problem (small/simple table, fits 
in memory).  When I forced a backlog of approx 2GB by shuttong down the slave 
on restart the binlog -> relay log feed ran at over 25MB/sec until caught up.





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Problem with *very* slow replication

2007-10-29 Thread Christopher E. Brown

On Thu, 25 Oct 2007, [EMAIL PROTECTED] wrote:


Not sure that I get the whole picture.

We have been running replication since about 4.0 and we have been through 
several upgrades and are now at 5.0.27.


The 'show slave status' always gives us an accurate reflection of where it is 
at which is usually 0 seconds behind.


Occasionally, it falls behind if the master is really busy (>2200 q/s with 
about 70% being updates/deletes/inserts).


At those times the slave tops out at about 1200 q/s of which most are db mods 
of some kind and some selects since we have reports running against the 
replica and it will fall behind temporarily.


Can you send show slave status and show master status as well and typical 
mytop outputs for master and slave?


That might let me be able to provide more help.


Bob


Unfortunatly I had to tear down replication as it was causing problems 
with the master.  (The master will not delete binlogs that a slave is 
still loading, when the slave is 40 file behind disk gets short).


CPU load was near zero on both systems (98% idle or better).

Disk load is minimal.

The slave is always up to date with relay file processing and reporting 
zero seconds behind.


In short, everything looks fine.


What happens is that the master -> slave binlog feed runs very slow (no 
more than abount 10 writes/sec).



So, afer a few days the slave is still reporting zero seconds behind, and 
it is zero seconds behind the relay log.


The problem is that while the master is currently writing binlog 650, the 
slave is actually zero seconds behind the feed, but the binlog feed has 
fallen 20 - 30 files behind (our binlog rolls at 256M).



Since there is no load issue, I expect there is a timing or trigger issue 
with the master side proc doing the binlog dump, or the slave side 
receiving it.



I can stop/start replication and/or reload both servers, it still holds.

I see the replication restart, with the slave running zero seconds behind 
the relay log, the binlog feed starts up right where it left off but the 
feed only runs at about 10 writes a second.



Are your running native or LinuxThreads?  This is smelling like threading 
issue to me (we are running FreeBSD 6.2 with native threading and 5.1.19).


The exact same setup was pre-built on Linux systems (2.6.x Slackware) 
before being built out on the production systems (FreeBSD 6.2).


During the testing 1000 writes/sec were no problem (small/simple table, 
fits in memory).  When I forced a backlog of approx 2GB by shuttong down 
the slave on restart the binlog -> relay log feed ran at over 25MB/sec 
until caught up.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: remote connect crash

2003-01-09 Thread Christopher E. Brown
On Thu, 9 Jan 2003, Dmitry V. Sokolov wrote:

> Good day,
> could you help me to solve this problem?
>
> MySQL server segmentation faults when remote mysql client
> tries to connect on source and binary distributions. Local
> client connect does not cause any problems whatsoever.


The server dies when the connecting hosts IP fails a reverse lookup.
According to a note I received this morning it was fixed last night in
the source tree, and 4.0.9 is being built for release.

-- 
I route, therefore you are.



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Re[2]: mysql 4.0.8- crash on TCP connection

2003-01-09 Thread Christopher E. Brown
On Thu, 9 Jan 2003, Gelu Gogancea wrote:

>
> Functions gethostby* ,from glibc, work directly with the /etc/hosts file.If
> this functions didn't find an entry for the client, will be crashed.
> I try to find in the Andrew e-mail if he has installed the glibc 2.2.x but i
> don't see nothing about it.What i see is, he use 2.95.x which is declared by
> MySQL like unstable.In this context can be a coincidence what is happened.
> Also i don't find difference in MYSQL daemon source code(hostname.cc)
> between 4.0.7 and 4.0.8.
> Regards,
>
> Gelu


No, the glibc gethostby* will walk the tree defined in hosts.conf,
normally files,dns.  A non-find in /etc/hosts followed by a NXDOMAIN
from DNS results in a negative return from the gethostby* call.  *This
should never cause a crash*, it is not a failure in the resolver code,
it is a negative result.


As to gcc, 2.95.3 is fine and stable, the notes you mention refer to
gcc 2.96, an *unofficial* gcc release, a heavily patched monster
released by RedHat and (for a while) used in alot of places.


-- 
I route, therefore you are.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Re[2]: mysql 4.0.8- crash on TCP connection

2003-01-09 Thread Christopher E. Brown
On Thu, 9 Jan 2003, Gelu Gogancea wrote:

> Hi,
> This is a glibc problem.In this case you can start mysql daemon with option
> "--skip-name-resolve" and in this situation is no need to add the IP address
> of every client in hosts file.The disadvantage is that the client can not
> connect to the server using host alias.
> Regards,
>
> Gelu


Could you clarify "This is a glibc problem"?  A known standard glibc
2.2.5 against which every other piece of software functions correctly,
even when receiving null returns on reverse lookups, but 4.0.8 (both
precompiled binary and locally build) crashes on a null return.
Specially when (according to other reports) 4.0.7 functions correctly.


-- 
I route, therefore you are.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: mysql 4.0.8- crash on TCP connection

2003-01-09 Thread Christopher E. Brown
On Thu, 9 Jan 2003, Gelu Gogancea wrote:

> Hi,
> What OS you use ?
>
> Regards,
>
> Gelu
> _
> G.NET SOFTWARE COMPANY
>
> Permanent e-mail address : [EMAIL PROTECTED]
>   [EMAIL PROTECTED]
> - Original Message -
> From: "Andrew Sitnikov" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Wednesday, January 08, 2003 8:14 PM
> Subject: mysql 4.0.8- crash on TCP connection
>
>
> > Hello mysql,
> >
> >   I try use 4.0.8 (max & standard)in our production box,
> >   and it was crash every TCP connection, For 4.0.7 (standard) i has over
> 20 days uptime.


This sounds like what I just submitted a bug report for.  Connections
to 4.0.8 (compiled locally or binary distro) cause a server crash if
the IP of the client is not resolvable in DNS.  One can add an entry
to the hosts file for certain IPs to stop this, however this still
leaves the fact that ANY IP that can connect to the server can crash
it if there is no reverse entry.


 --
I route, therefore you are.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Possable bug, remote mysqld crash and DoS

2003-01-09 Thread Christopher E. Brown

Description:
MySQL 4.0.8, both compiled my me and the official release version
crashes whenever receiving a network connection from a system without a DNS
entry.  Connecting from a system that resolves on reverse (eithor from DNS
or a local hosts file entry) works find.  This system is a Slackware 8.1
install with all currect updates.  I do not know if this is a mysqld
internal thing or some interaction with the system resolver in glibc 2.2.5,
as unfort even a staticly compiled glibc binary uses the system resolver.

This of course concerns me, there is a large potential for remote
DoS here.


How-To-Repeat:
Install 4.0.8, run the install db script and fire it up.  Attempt to
connect from a host that will not reverse resolve.  Even a telnet to port
3306 crashed the daemon.  The dump from mysqld is included at the bottom of
the message.

Fix:
Unknown

Submitter-Id:   [EMAIL PROTECTED]
Originator:
Organization:
MySQL support: none
Synopsis:   
Severity:   serious
Priority:   high
Category:   mysql
Class:  sw-bug
Release:mysql-4.0.8-gamma-standard (Official MySQL-standard binary)

C compiler:2.95.3
C++ compiler:  2.95.3
Environment:
System: Linux inlet 2.4.20 #2 Tue Dec 24 08:59:29 AKST 2002 i686 unknown
Architecture: i686

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gmake /usr/bin/gcc /usr/bin/cc
GCC: Reading specs from /usr/lib/gcc-lib/i386-slackware-linux/2.95.3/specs
gcc version 2.95.3 20010315 (release)
Compilation info: CC='gcc'  CFLAGS='-O2 -mcpu=pentiumpro'  CXX='gcc'  CXXFLAGS='-O2 
-mcpu=pentiumpro -felide-constructors'  LDFLAGS=''  ASFLAGS=''
LIBC:
lrwxrwxrwx1 root root   13 Dec 23 18:45 /lib/libc.so.6 -> libc-2.2.5.so
-rwxr-xr-x1 root root  1237712 Jul 30 14:15 /lib/libc-2.2.5.so
-rw-r--r--1 root root 24984184 Jul 30 12:55 /usr/lib/libc.a
-rw-r--r--1 root root  178 Jul 30 12:56 /usr/lib/libc.so
Configure command: ./configure '--prefix=/usr/local/mysql' '--with-comment=Official 
MySQL-standard binary' '--with-extra-charsets=complex' 
'--with-server-suffix=-standard' '--enable-thread-safe-client' '--enable-local-infile' 
'--enable-assembler' '--disable-shared' '--with-client-ldflags=-all-static' 
'--with-mysqld-ldflags=-all-static' '--with-innodb' 'CFLAGS=-O2 -mcpu=pentiumpro' 
'CXXFLAGS=-O2 -mcpu=pentiumpro -felide-constructors' 'CXX=gcc'




mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is definitely
wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=131072
sort_buffer_size=1048568
max_used_connections=0
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections =
147967 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8717a50
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfe7f608, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x806f3bb
0x8269928
0x807724c
0x8077665
0x82670dc
0x829c67a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/U/s/Using_stack_trace.html and follow
instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=1

Successfully dumped variables, if you ran with --log, take a look at the
details of what thread 1 did to cause the crash.  In some cases of really
bad corruption, the values shown above may be invalid.

The manual page at http://www.mysql.com/doc/C/r/Crashing.html contains
information that should help you find out what is causing the crash.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php