Hi,
We have recently upgraded to MySQL 4.0 (binary) from 3.23.50, and we
are seeing *all* MySQL queries freeze up every morning for about 25
seconds. I don't believe this is the standard run-of-the-mill MyISAM
locking problem people tend to run into, as queries in separate
databases hang, aswell as "SELECT 1" queries hanging.
Here's the situation, a few jobs run over night that do a humongous
amount of inserts into a single table in two separate databases. (We'll
call this 'table A', as it is named the same in both databases). This
nightly job is run for only 2 databases out of maybe 100. We use batch
inserts to insert these rows. When these jobs finish, all is calm (all
the users are asleep). All the servers are pretty much entirely idle.
There should be no system cronjobs running at this point. Then,
sometime between 10 minutes and an hour later, all the queries going to
the mysql server will freeze for ~25 seconds. During this time, "vmstat
1" reports that one full cpu is cranking 100% user time. (this is a 4
way box, so "%25" user time is reported). We have no cronjobs running
at this time, and we have disabled nightly optimization since this
problem started occuring.
Last night, I setup a 'show processlist' python script, and I captured
the results every 5 seconds. I have approximately 5 processlist
captures from the server in this state. They look like this (this is
from the first capture)
(31531L, 'ABC', 'xxx:48319', 'ABC', 'Query', '1', None, 'select 1')
(31532L, 'ABC', 'xxx:48320', 'ABC', 'Sleep', '21', '', None)
(31533L, 'BCD', 'xxx:48322', 'BCD', 'Query', '0', None, 'select 1')
(31567L, 'CDE', 'xxx:48489', 'CDE', 'Query', '0', None, 'select 1')
(31610L, 'DEF', 'xxx:48622', 'DEF', 'Query', '0', None, 'select 1')
(31611L, 'EFG', 'xxx:48626', 'EFG', 'Query', '1', None, 'select 1')
(31666L, 'HIJ', 'yyy:54831', 'HIJ', 'Query', '0', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(32016L, 'JKL', 'zzz:50352', 'JKL', 'Query', '2', 'update', 'INSERT INTO
Response
(responseSetID,questionID,userID,hasRule,isNA,creationDate,versionNumber,foilID')
[...]
There are approximately 600 db connections total, at at this point, only
a small fraction are hanging. You will note there is exactly one insert
being run at this time. I believe this is related to a trigger for this
condition. This insert, however, is *NOT* into 'Table A'. The table
this insert is going into should have seen a lot of select activity
overnight, but no inserts. You can see the select 1's hanging, and a
few select count's hanging. Both of those queries are diagnostic
queries run by the application to make sure the database is healthy.
The hang time in the above list is small, but if we look at my last
capture:
(31895L, 'ABC', 'yyy:55654', 'ABC', 'Query', '16', None, 'select 1')
(31898L, 'BCD', 'yyy:55658', 'BCD', 'Query', '21', None, 'select 1')
(31948L, 'CDE', 'yyy:56017', 'CDE', 'Query', '21', None, 'select 1')
(32016L, 'JKL', 'zzz:50352', 'JKL', 'Query', '28', 'update', 'INSERT
INTO Response
(responseSetID,questionID,userID,hasRule,isNA,creationDate,versionNumber,foilID')
(31793L, 'DEF', 'yyy:55176', 'DEF', 'Query', '25', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(31794L, 'DEF', 'yyy:55177', 'DEF', 'Query', '15', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(31795L, 'DEF', 'yyy:55178', 'DEF', 'Query', '4', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
The select 1's and counts range from hanging 4 seconds to 25 seconds,
and the insert hangs for 28 seconds. Based on the increase in the
number of hanging requests, I do not believe *any* requests have
completed since that insert started (although my show processlist seems
to go through!). The Mysql.err log is empty, the Mysql-slow.log shows
all the queries above, including the insert.
A few things to note. vmstat shows very little I/O during this period.
It shows slightly less than normal I/O then a 'standard' idle state,
presumably because none of the mysql queries go through during this
period. I/O is not frozen however, as I do see disk writes during this
period.
Here's the server config:
my.cnf:
[mysqld]
skip-locking
server-id = 4
skip-innodb
set-variable= key_buffer=800M
set-variable= tmp_table_size=1024M
set-variable= max_allowed_packet=16M
set-variable= thread_stack=128K
set-variable= max_connections=2000
set-variable= max_connect_errors=9
set-variable= table_cache=1024
set-variable= myisam_max_sort_file_size=4096
set-variable= myisam_sort_buffer_size=512M
set-variable= join_buffer_size=512M
set-variable= sort_buffer=512M
query_cache_size= 512M
The system has 4gigs of memory, recently reduced from 8 gigs to try to
work around a separate problem, and we're running the 2.4.26 linux
kernel on a quad cpu x86 machine.
Any thoughts? What kind of internal