On 5/12/06, sheeri kritzer <[EMAIL PROTECTED]> wrote:
So, our MySQL master database crashes about once a week, then
immediately recovers. We are running a Dell 2850 -- 64-bit Fedora Core
3 box with 6G of memory, 4 Intel Xeon processors, at 3.60 GHz speed
each (says /proc/cpuinfo), each cpu cache size is 2048 Kb. It
replicates to 2 slaves, which have the same hardware and memory.  (the
slaves don't crash).

I've done everything at http://dev.mysql.com/doc/refman/4.1/en/crashing.html

> uname -a
Linux dbhotsl1.manhunt.net 2.6.12-1.1381_FC3smp #1 SMP Fri Oct 21
04:22:48 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux

> cat /proc/meminfo
MemTotal: 6142460 kB
MemFree: 26564 kB
Buffers: 15396 kB
Cached: 805128 kB
SwapCached: 1336 kB
Active: 5503352 kB
Inactive: 505792 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 6142460 kB
LowFree: 26564 kB
SwapTotal: 2096472 kB
SwapFree: 2088036 kB
Dirty: 1996 kB
Writeback: 0 kB
Mapped: 5195364 kB
Slab: 78348 kB
CommitLimit: 5167700 kB
Committed_AS: 5532772 kB
PageTables: 12384 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 263636 kB
VmallocChunk: 34359474295 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB

The server regularly runs at 20-30 MB free memory all the time, so
it's not (necessarily) a low memory issue. We get the dreaded "Signal
11" error, and no core dumps even though we have core-file set in the
[mysqld] of the my.cnf.

Speaking of the my.cnf, here it is:
-----------------------------------------------------------------------
[mysqld]
core-file
old-passwords
tmpdir = /tmp/
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
port = 3306
key_buffer = 320M
max_allowed_packet = 16M
table_cache = 10240
thread_cache = 80
ft_min_word_len = 3

# Query Cache Settings - OFF due to overload of Session table
query_cache_size = 32M
query_cache_type = 2

# Log queries taking longer than "long_query_time" seconds
long_query_time = 4
log-slow-queries = /var/log/mysql/slow-queries.log
log-error = /var/log/mysql/mysqld.err

# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 12

interactive_timeout = 28800
wait_timeout = 30

# up to 15 Apache Servers with 256 connections each = 3840
# 5.8 G of memory = 2200 cxns
# when you change this recalculate total possible mysqld memory usage!!
# innodb_buffer_pool_size + key_buffer_size
# + max_connections*(sort_buffer_size+read_buffer_size+binlog_cache_size)
# + max_connections*2MB

max_connections = 2200
max_connect_errors = 128

# Replication Master Server (default)
# binary logging is required for replication
log-bin=/var/log/mysql/dbhotsl1-bin
server-id = 18
binlog-do-db = db1
binlog-do-db = db2
binlog-do-db = db3
max_binlog_size = 2G

# InnoDB tables
innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:3G;ibdata2:3G;ibdata3:3G;ibdata4:3G;
innodb_log_group_home_dir = /var/log/mysql/
innodb_log_files_in_group = 2
innodb_log_arch_dir = /var/log/mysql/
innodb_buffer_pool_size = 4G
innodb_additional_mem_pool_size = 40M
innodb_log_file_size = 160M
innodb_log_buffer_size = 80M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_thread_concurrency = 8
innodb_file_io_threads = 4


##################################################
[mysql.server]
user=mysql
basedir=/var/lib


##################################################
[safe_mysqld]
err-log=/var/log/mysql/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
-----------------------------------------------------------------------

And then the error file, pretty standard, not really telling me
anything (and no stack trace):

--------------------------------------------------------------------------
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=335544320
read_buffer_size=131072
max_used_connections=2201
max_connections=2200
threads_connected=152
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size +
sort_buffer_size)*max_connections = 5114862 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

060427 23:56:44 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
060427 23:56:44 InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 752 3907332354.
InnoDB: Doing recovery: scanned up to log sequence number 752 3912574976
InnoDB: Doing recovery: scanned up to log sequence number 752 3917817856
[...more of the same]
InnoDB: Doing recovery: scanned up to log sequence number 752 4144467558
060427 23:57:09 InnoDB: Starting an apply batch of log records to the
database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 9
6 97 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 53262417, file name swan-bin.003989
InnoDB: Last MySQL binlog file position 0 933891534, file name
/var/log/mysql/dbhotsl1-bin.001193
060427 23:59:08 InnoDB: Flushing modified pages from the buffer pool...
060427 23:59:39 InnoDB: Started; log sequence number 752 4144467558
/usr/sbin/mysqld: ready for connections.
Version: '4.1.12-standard-log' socket: '/var/lib/mysql/mysql.sock'
port: 3306 MySQL Community Edition - Standard (GPL)
-------------------------------------------------------------------------

Our binary logs grow 1.1G in 2 hours, so enabling the general log
isn't really an option. I'm looking for answers that are not "check
the slow query log" because 1) I've done that and changes have been
put in place, and 2) there are no slow queries before a crash but
plenty after (because of all the queries queued up while the server
was down).

Can anyone help shed some light? Even if it's further ways to debug or
find debug files. It's a really bizarre problem, and shouldn't be
happening, and not acceptable in our environment.

We run about 3000 queries per second normal load, 6000 at peak times
and when we crash -- 1000 of those queries are DML under normal load,
about 2000 under peak load and when we crash. We hit peak loads AFTER
we crash, which makes sense (we're a web-based application).

We have tons of monitoring, both on the system and things like the
InnoDB monitor.  We deadlock only right AFTER a crash, and will go for
days at a time without a deadlock (ie, the "last detected deadlock"
before a crash was 5 days earlier).  Load and memory usage are normal
for hours, and only increase AFTER a crash.

There are no core files generated.  Just today I added
core-file-size=unlimited to the my.cnf in the hopes that that will
work.  If not, I have permission to restart MySQL running as the root
user to see if it will dump core, even though it's a security risk.

One thing we did do was CHECK, ANALYZE and OPTIMIZE all the tables at
the beginning of the month -- this helped stop the crashing under high
load, but it still crashes about once a week, at NON peak times.

Is it possible we're doing so many updates that we're slowly
corrupting our db?  I'm also going to try doing a CHECK, ANALYZE and
OPTIMIZE weekly to see if that helps...but I really feel like we
shouldn't need to do that.

We've looked everywhere we can find and there are no clues.

Any advice/help? Thanx!


As you stated yourself, your server is pretty much in order, the only
thing I can think off is a bug, and a good one :) so, upgrade to the
latest 4.1.x and see if it continue to happen, maybe its an issue
fixed by now.

--
Daniel da Veiga
Computer Operator - RS - Brazil
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/IT/P/O d-? s:- a? C++$ UBLA++ P+ L++ E--- W+++$ N o+ K- w O M- V-
PS PE Y PGP- t+ 5 X+++ R+* tv b+ DI+++ D+ G+ e h+ r+ y++
------END GEEK CODE BLOCK------

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to