~75% performance drop upgrading from Mysql 4.0 to 5.0

2006-01-05 Thread Joe Kislo
Here are the timings:

64-bit 5.0 Single Thread 1:00:12.17 total (~76% slower)
64-bit 4.1 Single Thread 41:38.07 total (~20% slower)
64-bit 4.0 Single Thread 34:50.23 total

I have been trying to get a stable configuration for a 64-bit mysql on
ubuntu for the past 6-8 months, and have developed a number of stress
tests to try to isolate problems.  I've been able to successfully
discover several bugs in MySQL (or libc) running in 64-bit mode.  I've
recently started running these tests against 4.1 and 5.0 versions of
MySQL, but I have discovered very very different performance
characteristics compared to MySQL 4.0.  

For this test, it imports one of our customer databases, records the
import time, drops the database, and restarts.  I've run this test
against the 4.0, 4.1 and 5.0 binaries.  I've found a *dramatic*
performance drop on the 4.1 and 5.0 lines against the 4.0 line of MySQL
when executing this test.  The tests and configurations are identical.
This test may be lopsided (insert + alter table enable keys), however it
is something that we actually do daily in our production environment.
If we were to consider upgrading our production environment from 4.0 to
5.0, a 75% drop in performance would not be acceptable for these types
of operations.

I ran these tests for a minimum of 24 hours each (although there was
very little variance between the 1st and the last run).  Each
installation was a fresh mysql binary installation, with newly built
grant tables (EG: blank).  The machine is a 2-cpu 2.4GHz Opteron 4GB
Memory with a 4 disc RAID 0+1 array, running Ubuntu Breezy.  I ran these
tests against the 32-bit version for each of those versions, and saw
similar timings, except on average they were 11% slower than their
corresponding 64-bit version.  I also ran these tests with two threads,
(operating on separate databases), and found similar performance hits
with MySQL 4.1 and 5.0 and an overall 17% drop in performance against
single threaded operation.

All my.cnf files were identical between the versions, except that I
changed the datadir and language settings (to point to the appropriate
directory).  I have attached the my.cnf file.  I am using MyISAM tables
for all tests.

The test harness is very simple:

#!/bin/zsh
while true; do
  echo create database $1
| /var/lib/mysql/mysql-standard-5.0.17-linux-x86_64-glibc23/bin/mysql -u
root
  time zcat TEST_DATABASE.mysql.gz
| /var/lib/mysql/mysql-standard-5.0.17-linux-x86_64-glibc23/bin/mysql -u
root $1
  echo drop database $1
| /var/lib/mysql/mysql-standard-5.0.17-linux-x86_64-glibc23/bin/mysql -u
root
  date
done

I thought perhaps MySQL 5.0 may be doing a better job determining index
cardinalities than 4.0, and as a result there would be no need to do an
optimize-db after an insert.  I modified the test to insert the
database, then optimize each of the tables.  I discovered the same test
execution time (MySQL 5.0 ~75% slower) and MySQL 4.0 was able to
optimize the tables 20% faster.

Although I cannot provide our customer data, I can show you an example
of what the import file looks like:

CREATE TABLE ABCState (
  userID int(11) NOT NULL default '0',
  roleID int(11) NOT NULL default '0',
  aiFilter int(11) default NULL,
  PRIMARY KEY  (userID,roleID)
) TYPE=MyISAM;

--
-- Dumping data for table `ABCState`
--

/*!4 ALTER TABLE ABCState DISABLE KEYS */;
LOCK TABLES ABCState WRITE;
INSERT INTO ABCState VALUES
(15,4,8),(20,4,8),(21,4,8),(22,4,8),(19,5,8),(40,4,8),(42,4,8),(38,4,8),(39,4,8),(43,4,8),(33,4,8),(27,
4,8),(28,4,8),(26,4,8),(25,4,8),(34,4,8),(32,4,8),(35,4,1),(31,4,8),(24,4,8),(36,4,8),(37,4,8),(30,4,8),(29,4,8),(47,6,8),(66,6,8),(64,6,8),(46,
6,8),(67,10,8),(60,6,8),(68,10,8),(69,10,8),(44,5,8),(70,10,8),(71,10,8),(72,5,8),(73,10,8),(77,6,8),(76,6,8),(75,4,8),(74,4,8),(78,4,8),(76,4,8
),(79,4,8),(77,4,8),(18,5,8),(82,4,8),(11,5,8),(81,5,8),(84,4,8),(85,4,8),(106,4,8),(107,4,8),(31,6,8),(108,4,8),(108,6,8),(110,6,8),(109,6,8),(
111,4,8),(6,5,8),(35,6,8),(36,6,8),(112,6,8),(113,6,8),(114,6,8),(115,4,8),(116,4,8),(120,6,8),(16,5,8),(121,6,8),(118,4,8),(119,4,8),(117,4,8),
(126,4,8),(127,4,8),(128,4,8),(129,4,8),(130,4,8),(123,4,8),(126,6,8),(133,4,8),(132,4,8),(132,6,8),(134,4,8),(135,10,8),(38,6,8),(136,10,8),(11
0,4,8),(145,4,8),(137,4,8),(144,4,8),(141,4,8),(140,4,8),(146,4,8),(142,4,8),(148,4,8),(147,4,8),(138,4,8),(149,4,8),(141,6,8),(146,6,8),(142,6,
8),(144,6,8),(147,10,8),(75,10,8),(150,4,8),(137,10,8),(156,10,8),(157,4,8),(158,4,8),(159,4,8),(160,4,8),(113,4,8),(165,4,8);
UNLOCK TABLES;
/*!4 ALTER TABLE ABCState ENABLE KEYS */;

the dump was generated from a mysqldump -l --add-locks --extended-insert
--quick --all --disable-keys.  There are millions of records, the
compressed GZ file is 500 megs.  There is a good array of varchar, char,
blob fields across 100+ tables.

Does anybody have any idea why 5.0 is performing so poorly?  Since I am
not using any of the new 4.1 or 5.0 features, I would 

RE: AMD64 Linux MySQL Error 1041 Out of Memory

2005-06-20 Thread Joe Kislo
On Thu, 2005-06-16 at 21:06, Richard Dale wrote:
  So we have recently started stress testing Mysql on an Opteron dual
 CPU
  machine running Ubuntu Hoary.  We are using the 64-bit GCC
  4.0.24-standard binary from mysql.  The stress test that I'm currently
  running on it involves inserting a large database (from a mysqldump)
  ...
 
 Which linux kernel are you running?

2.6.10, specifically the ubuntu compile:

2.6.10-5-amd64-k8-smp

 Why not consider MySQL 4.1? It's now a stable production release.

Unfortunately when we tried to run our application on MySQL 4.1, we
were bit by a number of features of MySQL 4.1.  MySQL 4.1 now silently
returns Long objects instead of Integers for most functions (COUNT),
causing class cast exceptions in Java... Also a number of our queries
which used to compare binary and non-binary fields, now nolonger cast
automatically... Requiring some additional SQL which our database
abstraction layer is not aware of needing.  A lengthy porting process
ensued... and has been completed, but our application is still quite a
ways from being fully certified to run on MySQL 4.1.  :(  The MySQL 4.1
upgrade was 10x nastier than our 3.23-4.0 upgrade.

 Also, consider visiting:
 http://hashmysql.org/index.php?title=Opteron_HOWTO
 We would welcome a section on there about Ubuntu and x86-64.

Yes, I will be updating that wiki once I've got it running stably.  I
ran into some libc issues with Ubuntu already, as seen here:

http://bugzilla.ubuntu.com/show_bug.cgi?id=11730

However I've compiled my own replacement libc with patches applied to
fix the issues, which I will make available to all.  Except for this
memory problem, the machine is ready for our burnin testing.

-Joe



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: AMD64 Linux MySQL Error 1041 Out of Memory

2005-06-20 Thread Joe Kislo
On Thu, 2005-06-16 at 21:06, Richard Dale wrote:
  So we have recently started stress testing Mysql on an Opteron dual
 CPU
  machine running Ubuntu Hoary.  We are using the 64-bit GCC
  4.0.24-standard binary from mysql.  The stress test that I'm currently
  running on it involves inserting a large database (from a mysqldump)
  ...

So I don't think this has anything to do memory consumption anymore. 
I've been able to get the problem to repeat with mysqld's memory
footprint being about 800+megs (545M resident).  I also was able to get
it to happen with the ubuntu stock mysqld my.cnf aswell... which uses
ridiculously conservative memory settings.  (The key cache is so small
it takes 4x as long to run the test).  With an identical configuration
as my normal setup, running the 32bit binary doesn't appear to trigger
the problem.

I'll try 64bit with linuxthreads next.

Somebody asked for a show processlist and show status... Here's the data
pretty close to when it happened...

+--++
| Variable_name| Value  |
+--++
| Aborted_clients  | 4  |
| Aborted_connects | 0  |
| Bytes_received   | 3638957151 |
| Bytes_sent   | 18427266   |
| Com_admin_commands   | 5  |
| Com_alter_table  | 10996  |
| Com_analyze  | 0  |
| Com_backup_table | 0  |
| Com_begin| 0  |
| Com_change_db| 9  |
| Com_change_master| 0  |
| Com_check| 193|
| Com_commit   | 0  |
| Com_create_db| 57 |
| Com_create_function  | 0  |
| Com_create_index | 0  |
| Com_create_table | 5505   |
| Com_delete   | 0  |
| Com_delete_multi | 0  |
| Com_drop_db  | 57 |
| Com_drop_function| 0  |
| Com_drop_index   | 0  |
| Com_drop_table   | 0  |
| Com_flush| 3  |
| Com_grant| 0  |
| Com_ha_close | 0  |
| Com_ha_open  | 0  |
| Com_ha_read  | 0  |
| Com_insert   | 334612 |
| Com_insert_select| 0  |
| Com_kill | 0  |
| Com_load | 0  |
| Com_load_master_data | 0  |
| Com_load_master_table| 0  |
| Com_lock_tables  | 5505   |
| Com_optimize | 0  |
| Com_purge| 0  |
| Com_rename_table | 0  |
| Com_repair   | 0  |
| Com_replace  | 0  |
| Com_replace_select   | 0  |
| Com_reset| 0  |
| Com_restore_table| 0  |
| Com_revoke   | 0  |
| Com_rollback | 0  |
| Com_savepoint| 0  |
| Com_select   | 1  |
| Com_set_option   | 0  |
| Com_show_binlog_events   | 0  |
| Com_show_binlogs | 0  |
| Com_show_create  | 0  |
| Com_show_databases   | 2  |
| Com_show_fields  | 0  |
| Com_show_grants  | 0  |
| Com_show_keys| 0  |
| Com_show_logs| 0  |
| Com_show_master_status   | 0  |
| Com_show_new_master  | 0  |
| Com_show_open_tables | 0  |
| Com_show_processlist | 2  |
| Com_show_slave_hosts | 0  |
| Com_show_slave_status| 0  |
| Com_show_status  | 1  |
| Com_show_innodb_status   | 0  |
| Com_show_tables  | 9  |
| Com_show_variables   | 0  |
| Com_slave_start  | 0  |
| Com_slave_stop   | 0  |
| Com_truncate | 0  |
| Com_unlock_tables| 5491   |
| Com_update   | 0  |
| Com_update_multi | 0  |
| Connections  | 182|
| Created_tmp_disk_tables  | 0  |
| Created_tmp_tables   | 0  |
| Created_tmp_files| 43 |
| Delayed_insert_threads   | 0  |
| Delayed_writes   | 0  |
| Delayed_errors   | 0  |
| Flush_commands   | 1  |
| Handler_commit   | 0  |
| Handler_delete   | 0  |
| Handler_read_first   | 1  |
| Handler_read_key | 0  |
| Handler_read_next| 0  |
| Handler_read_prev| 0  |
| Handler_read_rnd | 0  |
| Handler_read_rnd_next| 9  |
| Handler_rollback | 0  |
| Handler_update   | 0  |
| Handler_write| 617450860  |
| Key_blocks_used  | 478697 |
| 

AMD64 Linux MySQL Error 1041 Out of Memory

2005-06-16 Thread Joe Kislo

So we have recently started stress testing Mysql on an Opteron dual CPU
machine running Ubuntu Hoary.  We are using the 64-bit GCC
4.0.24-standard binary from mysql.  The stress test that I'm currently
running on it involves inserting a large database (from a mysqldump)
from three separate windows (so three imports running simultaneously). 
The database dump is about 3.7 gigs uncompressed, or 580megs
compressed.  It was dumped using the following dump parameters:
--add-locks --extended-insert --quick --lock-tables --all --disable-keys

Each window has as script that creates a database, imports the data,
dumps the database, and repeats.

After about 12 cycles (each take about an hour) mysql starts spewing
these errors:

ERROR 1041 at line 195: Out of memory;  Check if mysqld or some other
process uses all available memory. If not you may have to use 'ulimit'
to allow mysqld to use more memory or you can add more swap space

I, unfortunately, have not been at the server when this actually
happens, however when I come in in the morning top is reporting mysqld
taking up between 2.7 gigs and 3.2 gigs of memory.  I have had a vmstat
running all night, and at no point saw the system run out of swap space
(it did over the course of the 15 hours or so, slowly hit swap up for
about 60megs out of 2 gigs though).  

Obviously checking ulimit was my first stop, however I believe MySQLd
does it's own setuid... And I'm not sure it uses PAM to get it's initial
ulimits.  Either way, I do this:

su mysql -s /bin/sh
sh-3.00$ ulimit -a

core file size(blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size   (kbytes, -m) unlimited
open files(-n) 8192
pipe size  (512 bytes, -p) 8
stack size(kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes(-u) unlimited
virtual memory(kbytes, -v) unlimited

Which implies it should be able to alloc quite a bit of memory without
problem.  

So my first question, is it appears to be konking out around 4 gigs of
memory.  Is there some reason why mysqld can't allocate more than
4gigs?  I confirmed I *am* running the 64-bit binary:

file /usr/sbin/mysqld
/usr/sbin/mysqld: ELF 64-bit LSB executable, AMD x86-64, version 1
(SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs),
stripped

This system has 4 gigs of memory in it. So if it tried to allocate
4gigs, it would have had to hit swap up harder than 60megs.  It seems
like mysqld is hitting the 32bit allocation limit, but that doesn't
particularly make sense to me.  Anybody have their mysqld allocating
4gigs?  Anything else I can try here?

The second thing is, I have no idea why mysql is taking up so much
memory.  For the first 5 runs or so, mysql only allocates about 800
megs.  Sometime during the night, is when it jumps up in memory.. I
don't really understand why if it didn't need 3+ gigs of memory after
the first 5 complete runs (x3 of course... since there's 3 running in
parallel), it would suddenly need more later.

Either way, lets do some math.  Mysql is 2.7 gigs this morning, which is
about half a gig less than yesterday morning.

2.7 gigs

Key buffer: 512m
Tmp Table: 128m
sort buffer size: 512m
join buffer size: 512m
query cache: 256m

KeyBuffer=512m, I could see that possibly not being returned.. So lets
assume 512M there.  There are no threads connected at the moment because
I have shutdown the test, so tmptable should take up 0, but lets say it
didn't return 3x128M (384M).  Sort buffer size is 512M, well it may have
used that for the alter table  activate keys... and never returned
it, so 512M there.  Join buffer size, not a single select query was used
ever, 0M.  Query cache, 0M.  3x16M max packet.

So I see 512M+384+512M+48M=1.4gigs.  I have no idea why mysql is using
this much memory... especially after it successfully performs 5 cycles
with considerably less.

Any ideas?

here's the my.cnf

[mysqld]
user= mysql
pid-file= /var/run/mysqld/mysqld.pid
socket  = /var/run/mysqld/mysqld.sock
port= 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir  = /tmp
language= /usr/share/mysql/english

skip-external-locking

key_buffer  = 512M
tmp_table_size  = 128M
max_connections = 2000
max_connect_errors  = 9
table_cache = 1024
myisam_max_sort_file_size=2048M
myisam_sort_buffer_size =512M
join_buffer_size=512M
sort_buffer =512M

max_allowed_packet  = 16M
thread_stack= 128K

query_cache_limit   = 1M
query_cache_size= 256M
query_cache_type= 1

skip-innodb

and a show variables from a running server after a night's testing:
mysql show variables;

4.0.23 - error 127 from table handler during many parallel inserts

2005-05-24 Thread Joe Kislo

We have been very successfully running MySQL in a production
environment one way or another for the past 6 years.  We have recently
run into what I believe is a thread race condition while writing then
reading from a MyISAM table.  The server we are experiencing this
problem on is a very stable environment, very rarely has anything
changed on this machine in the past 4 years (other than mysql upgrades,
and security updates). The machine is running Debian Woody (stable), I
have included the libraries from mysqlbug and kernel version at the end
of this email.  The machine is a Dell Poweredge 6450 4 processor XEON
700/2MB, running local hardware raid with an LSI controller.  I include
the hardware configuration because it's possible the thread interaction
problem may lay closer to the hardware level (or compiler) since the box
is a 4 CPU machine with fairly massive L2 caches (even by today's
standards) on each chip that need to be kept synchronized.  There have
been firmware upgrades for this machine in the past to fix 'cpu
synchronization' issues, however I do not know if we are running these
fixes or not.  We are running MySQL 4.0.23 Mysql-binary  (so it should
be statically linked anyway).  I've looked in the changelog for 4.0.24
and 4.0.25 and I don't see any updates that might resolve our issue.

Unfortunately I have no solid test case for this issue and it only
occurs under times of heavy stress.  The problem has manifested itself
twice out of the past two times a customer has been doing 'massive'
batch configuration changes to our system.  They do these changes once a
month, and the past two months this issue has occurred.  The issue
manifests itself in a very particular way, and has been practically
exactly the same both times.  I am hoping somebody can give me a
direction to take this, either to open a MySQL support case,
mysqldumping the table and reinserting the table, looking at the
firmware updates...  

The problem is we get an error 127 from table handler error when doing
a select:

General error,  message from server: Got error 127 from table handler
-- SQLQuery was:select  DISTINCT
Activity.activityID,Activity.processID,Activity.activityName, []
from Activity where ( Activity.processID='147008' AND
Activity.activityName='VIMforQuiz' )

This, ofcourse, crashes out our XML processing and the worker thread
will end up stopping.  What is happening when this happens is there are
atleast 3-4 active threads running on our application server processing
tens of thousands of XML files, and making major changes to the
database.  Each thread is essentially performing the same duty, but just
with different data.  These threads are primarily database bound. 
Replication is active on this server, and there are 3 replication
servers pulling the updates from this server.  None of their tables are
corrupted by this (by virtue of check table).  Looking in the MySQL
binlog for the timeperiod when this occurs (it, ofcourse, does not
include selects), shows about 200-300 inserts/updates going on during
the 1 second period when this issue happens.  There happens to be 3
separate threads inserting records into the Activity table during that
second (they must just happened to have converged).  I am guessing that
the record the above select query is trying to query, has just been
inserted (I found the insert for the record during that same 1 second as
the crash).  

Performing a check table reports everything is kosher:

mysql check table Activity;
+-+---+--+--+
| Table   | Op| Msg_type | Msg_text |
+-+---+--+--+
| abc.Activity| check | status   | OK   |
+-+---+--+--+
1 row in set (5 min 36.80 sec)

Last month I did a repair table aswell, and that reported everything was
fine aswell, but obviously didn't fix anything.

Once this crash occurs, that worker thread will quit... but all other
threads will continue to pound away on this and other tables, and will
work perfectly fine with no errors.  The table is a MyISAM Dynamic
table, as shown from this show table status:

| Activity | MyISAM | Dynamic   
|  4681274 |105 |   505218432 |  4294967295 |   
166466560 |  10512548 |4960114 | 2004-10-14 14:04:25 |
2005-05-24 00:57:34 | 2005-05-24 00:57:10 || |

The free space probably was more like zero when the incident occurred,
all of the commands I have run here were several hours after the
incident.

perror 127 says that the record file is crashed.  What will cause MySQL
to return such an error during a lookup? Clearly the table *itself* is
not marked as crashed, so I suspect what must be happening is the SELECT
statement is being allowed to read from the table while an insert is
still writing... It presumably should be locked out during that
operation, 

All Queries Hanging/Freeze: Trying to perform Select 1

2004-08-24 Thread Joe Kislo
Hi,

We have recently upgraded to MySQL 4.0 (binary) from 3.23.50, and we
are seeing *all* MySQL queries freeze up every morning for about 25
seconds.  I don't believe this is the standard run-of-the-mill MyISAM
locking problem people tend to run into, as queries in separate
databases hang, aswell as SELECT 1 queries hanging.

Here's the situation, a few jobs run over night that do a humongous
amount of inserts into a single table in two separate databases. (We'll
call this 'table A', as it is named the same in both databases).  This
nightly job is run for only 2 databases out of maybe 100.  We use batch
inserts to insert these rows.  When these jobs finish, all is calm (all
the users are asleep).  All the servers are pretty much entirely idle. 
There should be no system cronjobs running at this point.  Then,
sometime between 10 minutes and an hour later, all the queries going to
the mysql server will freeze for ~25 seconds.  During this time, vmstat
1 reports that one full cpu is cranking 100% user time.  (this is a 4
way box, so %25 user time is reported).  We have no cronjobs running
at this time, and we have disabled nightly optimization since this
problem started occuring.

Last night, I setup a 'show processlist' python script, and I captured
the results every 5 seconds.  I have approximately 5 processlist
captures from the server in this state.  They look like this (this is
from the first capture)

(31531L, 'ABC', 'xxx:48319', 'ABC', 'Query', '1', None, 'select 1')
(31532L, 'ABC', 'xxx:48320', 'ABC', 'Sleep', '21', '', None)
(31533L, 'BCD', 'xxx:48322', 'BCD', 'Query', '0', None, 'select 1')
(31567L, 'CDE', 'xxx:48489', 'CDE', 'Query', '0', None, 'select 1')
(31610L, 'DEF', 'xxx:48622', 'DEF', 'Query', '0', None, 'select 1')
(31611L, 'EFG', 'xxx:48626', 'EFG', 'Query', '1', None, 'select 1')
(31666L, 'HIJ', 'yyy:54831', 'HIJ', 'Query', '0', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(32016L, 'JKL', 'zzz:50352', 'JKL', 'Query', '2', 'update', 'INSERT INTO
Response
(responseSetID,questionID,userID,hasRule,isNA,creationDate,versionNumber,foilID')

[...]

There are approximately 600 db connections total, at at this point, only
a small fraction are hanging.  You will note there is exactly one insert
being run at this time.  I believe this is related to a trigger for this
condition.  This insert, however, is *NOT* into 'Table A'.  The table
this insert is going into should have seen a lot of select activity
overnight, but no inserts.  You can see the select 1's hanging, and a
few select count's hanging.  Both of those queries are diagnostic
queries run by the application to make sure the database is healthy. 
The hang time in the above list is small, but if we look at my last
capture:

(31895L, 'ABC', 'yyy:55654', 'ABC', 'Query', '16', None, 'select 1')
(31898L, 'BCD', 'yyy:55658', 'BCD', 'Query', '21', None, 'select 1')
(31948L, 'CDE', 'yyy:56017', 'CDE', 'Query', '21', None, 'select 1')
(32016L, 'JKL', 'zzz:50352', 'JKL', 'Query', '28', 'update', 'INSERT
INTO Response
(responseSetID,questionID,userID,hasRule,isNA,creationDate,versionNumber,foilID')
(31793L, 'DEF', 'yyy:55176', 'DEF', 'Query', '25', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(31794L, 'DEF', 'yyy:55177', 'DEF', 'Query', '15', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')
(31795L, 'DEF', 'yyy:55178', 'DEF', 'Query', '4', None, 'select
count(GlobalPreferences.primaryKey) from GlobalPreferences')

The select 1's and counts range from hanging 4 seconds to 25 seconds,
and the insert hangs for 28 seconds.  Based on the increase in the
number of hanging requests, I do not believe *any* requests have
completed since that insert started (although my show processlist seems
to go through!).  The Mysql.err log is empty, the Mysql-slow.log shows
all the queries above, including the insert.

A few things to note.  vmstat shows very little I/O during this period.
It shows slightly less than normal I/O then a 'standard' idle state,
presumably because none of the mysql queries go through during this
period.  I/O is not frozen however, as I do see disk writes during this
period.

Here's the server config:

my.cnf:

[mysqld]

skip-locking

server-id   = 4

skip-innodb

set-variable= key_buffer=800M
set-variable= tmp_table_size=1024M
set-variable= max_allowed_packet=16M
set-variable= thread_stack=128K
set-variable= max_connections=2000
set-variable= max_connect_errors=9
set-variable= table_cache=1024
set-variable= myisam_max_sort_file_size=4096
set-variable= myisam_sort_buffer_size=512M
set-variable= join_buffer_size=512M
set-variable= sort_buffer=512M

query_cache_size= 512M

The system has 4gigs of memory, recently reduced from 8 gigs to try to
work around a separate problem, and we're running the 2.4.26 linux
kernel on a quad cpu x86 machine.

Any thoughts?  What kind of internal 

Re: All Queries Hanging/Freeze: Trying to perform Select 1

2004-08-24 Thread Joe Kislo
 Any thoughts?  What kind of internal locks might be generated?  Is there
 another command I can run to get the status of those queries that are
 hanging?  It shows 'None' for the state (NULL), I've never seen a

Aha.. So I had a brainstorm while driving to lunch.  Is it possible
this is related to the query cache?  Obviously we did not have a query
cache before we upgraded to 4.0.  

Over night, the heavy jobs running inserts millions of records into
Table A, but basically does hundreds of thousands of select's from Table
B.  Table B is the table we're seeing the insert into that appears to be
freezing all the queries on the server (including 'select 1'). 
Presumably, if this is the very first insert into this table, the query
cache must be flushed for that table.  I am assuming the query cache is
nearly full of queries against that table (as the system has been idle
since this job finished).  Our query cache is 512M max.  Can it really
take 25 seconds to clear the query cache of all those queries?  Would
that hang all the queries in the manner we are seeing (Including 'select
1')?  The Insert would be in the 'update' state while clearing the query
cache?

I can update my script to also pull back 'show status' from the server
tonight and see what the query cache is doing at the same time.  If it
does turn out to be a query cache lock, what course of action can we
take?  Periodic 'reset query cache', throughout the night?  Is a query
cache of 512M beyond the recommended size?  I wouldn't have expected it
to take 25 seconds to clear out a single table's query cache, is this
possible?

Thanks,
-Joe



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Mysql 4.0.20(binary) Relay Log Purging not working

2004-08-16 Thread Joe Kislo
We just upgraded to MySQL 4.0.20 (binary), and we use replication to a
slave running the same.  We're having some difficulty deleting the relay
log files the slave creates.  I'm fairly confused, as the manual clearly
says we shouldn't be having this problem:

According to the MySQL manual:

Relay logs have the same format as binary logs, so you can use
mysqlbinlog to read them. A relay log is automatically deleted by the
SQL thread as soon as it has executed all its events and no longer needs
it). There is no explicit mechanism for deleting relay logs, because the
SQL thread takes care of doing so. However, from MySQL 4.0.14, FLUSH
LOGS rotates relay logs, which will influence when the SQL thread
deletes them. 


So, I've run Flush Logs on the slave, and it's created the next log file
in the sequence.  But the old ones still stay around indefinitely.  Even
after issuing stop slave/start slave.  I cannot find a command similar
to PURGE MASTER LOGS for the slave to purge logs on the slave.  Show
slave status shows the slave Relay_log_space increasing steadily even
after rotation.  The Exec_master_log_pos is in sync with the master,
meaning the old relay files are nolonger needed.

There is a feature in 4.1.1:

--relay-log-purge={0|1}
Disables or enables automatic purging of relay logs as soon as they are
not needed any more. The default value is 1 (enabled). This is a global
variable that can be changed dynamically with SET GLOBAL
relay_log_purge. This option is available as of MySQL 4.1.1.

Does that mean that log purging doesn't exist until 4.1.1, or that the
'option to turn it off' isn't available until 4.1.1?  If log purging
isn't available until the next major mysql release, how can I delete
these log files now?  Even if I delete them off disk, they are still
referenced in the 'relay-bin.index' file.  Presumably I'm not allowed to
update that file on a live slave...  

Ideas?

-Joe



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Bug #1858: Repeatable on 3.53.57

2004-07-07 Thread Joe Kislo
A week ago I posted that we were having horrible problems with MySQL's
replication failing and reconnecting (after years of reliable
operation).  I still haven't come up with a solution to the problems, it
just looks like the master keeps tearing down the replication
connection, and the slave keeps reconnecting.  Setting up more slaves,
they exhibit the same exact problems.  As noted in my previous emails,
after about 4-6 errors like this:

040707  2:24:44  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040707  2:24:44  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.178' position 13468249
040707  2:24:44  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.178' at
position 13468249
040707  2:39:57  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040707  2:39:57  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.178' position 25712380
040707  2:39:57  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.178' at
position 25712380

The replication would completely stop with this error:

040702  4:13:51  Slave thread exiting, replication stopped in log
'mysql-bin.167' at position 108374685


Well, it turns out the slave disconnecting and reconnecting does not
appear to be related to the slave stopping (odd since they both became
problems around the same time).  It looks like at 4am, our optimize
database script fires off on the SLAVE, and it looks like the slave
thread dies a little while after.  I have turned off nightly
optimization, and for the past 5 days, we have not seen the slave stop. 
So I would believe that we are running into bug:
http://bugs.mysql.com/bug.php?id=1858

on the 3.53.57/3.53.58 line of code.  The bug was already fixed on the
4.0.x line of code, but presumably a patch needs to be made to the
3.53.x line of code aswell.

As for the rest of my replication problems... who knows :)

Thanks,
-Joe



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Replication problem on 3.23.57 3.23.58

2004-07-01 Thread Joe Kislo

We have suddenly run into a replication problem between our master and
our slave databases. It had been running without a problem for atleast
several months since our last upgrade, and for several years before
that.  I've tried a number of debugging techniques, and now I'm hoping
somebody here can make sense of it.  On the slave I get error messages
like:

040630  2:43:52  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.163' at
position 37919441

It does that several times between 2:20am and 4:30am.  and every few
nights the slave gives up:

040630  4:15:34  Slave thread exiting, replication stopped in log
'mysql-bin.163' at position 99512496

No logs are recorded on the master.  This 2am to 5am time coincides with
a lot of mysql update/insert traffic on the master, in parallel with
alot of different connections.  Master and slave are running linux. 
Master is running 3.23.57 binary and the slave is running 3.23.58
binary.  I checked the changelong on mysql's site, and I don't see why
the version difference would be a problem.  (but I am open to upgrading
the master!)  Both master and slave have max packet set at 16M.  So my
initial thought was hardware (since everything was great before).  So I
feel like I've almost eliminated it by doing the following:

Changing ethernet boards on both master and slave (to totally different
brand of hardware even)
Changed cables
Used different switch ports
Turned off auto negotiate on the switch port

Then I:
swapped the slave hardware for completely different hardware (removed
raid drives, stuffed into similar server)

Upgraded the kernels on both machines to 2.4.26

I also setup a ping over night to both the master and slave servers
looking for packet loss, none.  I wrote a TCP based ping between both of
those servers and our administrative server.  It kept passing data back
and forth and recorded the total time.  No slow down, or Net errors
during the time periods Mysql has communications problems.

So then I setup a second slave.  This box runs debian (the other slave
ran redhat).  This machine is running the same exact binary mysql that
was running on the other slave.  This slave ran fine for 3 days (which
led to make some false assumptions that it was working fine).  But now
it's slave breaks too.  Interesting though.  Both slaves will break at
*approximately* the same time (about a minute apart..  Both machines are
ntp synced to the second, so it's not *EXACTLY* the same time), and both
will be pretty *close* in the binlog, but not exact.

I have looked in the biglog at the various queries that it's crapped
out on, and they're nothing interesting.  I've looked at about 10 of
them, and some are very simple updates, some are long complex inserts. 
A few nights I have SLAVE STOP;ed the slave, and waited until the
morning to run the slave log forward.  When I run the slave log in the
morning, I get the same exact slave disconnects.  This is the first time
I've seen the slave disconnect outside the 2:30-4am window.  So clearly
this seems to be a mysql problem... Something in the binlog that's
causing it to disconnect, but why do the two slaves not disconnect in
the same *EXACT* place?

So, before putting the axe through the mysql server, I busted out my
packet sniffer.  I sniffed the packets from both sides of the connection
until it failed.  From my reading, it looks like the master is tearing
down the connection.  It basically looks like this:

Master sends lots of data
Master sends a PSH ACK packet
Master sends lots of data
Master sends a PSH ACK packet
Master sends lots of data
Master sends a PSH ACK FIN packet
Slave sends a ACK FIN packet

(with slave ACKs in there too)

S, it looks like the master is tearing down the connection.  Anybody
have any thoughts?  I can upgrade the master to 3.23.58, but I don't see
anything in the mysql change log that implies that will help.  (Bringing
down the master server requires much dancing around and appeasing the
customers due to the outage).  My other thought is going to mysql
4.0.x.  But again, I don't generally like doing things just because they
might help.  We have had a plan to upgrade to 4.0.x for some time (we
certified our software on it), but we don't have an urgent need to
budget the resources required to do it.

Is it possible there is some sort of race in the mysql-biglog writing? 
One of the reasons why this might have only started cropping up now is
the 2:30-4am slot has been getting progressively busier and busier, with
a huge number of parallel insert/updates (4cpu box).  The rest of the
day the traffic isn't even close to that time period, and replication
works rockstar.

Settings on the master:

set-variable= key_buffer=1024M
set-variable= tmp_table_size=1024M
set-variable= max_allowed_packet=16M
set-variable= thread_stack=128K
set-variable= max_connections=2000
set-variable= max_connect_errors=9
set-variable

Re: Replication problem on 3.23.57 3.23.58

2004-07-01 Thread Joe Kislo
 somebody here can make sense of it.  On the slave I get error messages
 like:
 
 040630  2:43:52  Slave: reconnected to master
 '[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.163' at
 position 37919441
 
 It does that several times between 2:20am and 4:30am.  and every few

Ah.. right.. I'm not-so-smart.  I didn't actually post the entire error
log message:

040701 14:56:05  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040701 14:56:05  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.165' position 14792980
040701 14:56:05  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.165' at
position 14792980
040701 15:02:13  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040701 15:02:13  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.165' position 25852745
040701 15:02:13  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.165' at
position 25852745
040701 15:09:14  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040701 15:09:14  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.165' position 37090598
040701 15:09:14  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.165' at
position 37090598
040701 15:15:28  Error reading packet from server: Lost connection to
MySQL server during query (server_errno=2013)
040701 15:15:28  Slave: Failed reading log event, reconnecting to retry,
log 'mysql-bin.165' position 47700336
040701 15:15:28  Slave: reconnected to master
'[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.165' at
position 47700336

are the messages that come out of the slave.  I'm currently running a
test where I back up the entire slave, run it forward through the logs,
and restore the slave back exactly to it's previous point, and run it
through the logs again.  See if it dies in the same places or if it's
somewhat random.

-Joe



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



MySQL 3.23.43 Binary Distribution not stable on Domain Socket 1000 threads

2001-10-30 Thread Joe Kislo


We've just switched from compiling our own MySQL to using your binary
distribution.  This is the first time we've run MySQL in production from
one of your binary builds.  We had to switch to a binary distribution
because we were running into the 1000 threads issue with the GLIBC
library we were linking against.

We've now had the server running for about a week and we've just run
into a little problem which seems like it's a bug in the mysql and
mysqladmin client, or a problem with the mysqld server.

We currently have ~1117 clients connected to our production database,
all of these connections are TCP/IP based connections, none of them are
on the local domainsocket on the db server.

Our DB server was taking a pounding, so I logged into the db server,
and connected as root and tried to list the connected processes:

---
db-01:~ mysql -u root -p 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 49923 to server version: 3.23.43-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql show processlist;
ERROR 2013: Lost connection to MySQL server during query
---

Ooops!  The thread died!  Tried again, doesn't work, tried mysqladmin -u
root -p processlist, same deal!  Crashed thread!  Nothing appears in the
mysql error log..  I tried running some queries on one of our databases
from the db server using the mysql client.  If I do a select * on a
table with only a few rows (seems like less then 100), the query goes
through fine.  If I do it on a table with more then like 100 rows, the
thread crashes:


mysql select * from Course;
[.]
74 rows in set (0.00 sec)

mysql select * from Section;
ERROR 2013: Lost connection to MySQL server during query
mysql select count(*) from Section;
ERROR 2006: MySQL server has gone away
No connection. Trying to reconnect...
Connection id:49927
Current database: TT_fcgi

+--+
| count(*) |
+--+
|  318 |
+--+
1 row in set (0.11 sec)

mysql 





Is there something horrifically wrong with mysql when it goes over 1000
connections or something?  I swear I was able to do show processlist
from the db server yesterday (but we had less then 1000 connections
yesterday).  I *am* able to a process list if I connect through TCP/IP:

db-01:~ mysql -u joe -h 127.0.0.1 -p

So it looks like it just effects the domain socket connections, and
TCP/IP is okay.   Fortunaetly all of our apps use TCP/IP, so it's not a
big deal to us.  But I bet somebody out there -does- care!  :)  The
threads crashing seems to compeltely uneffect any of our TCP/IP
connections running queries at the same time.

Thanks,
-Joe

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




InnoDB Autoincrement deadlock,

2001-05-14 Thread Joe Kislo


I'm having some problems with tables using autoincrementing primary
keys when I use InnoDB.. I've searched through the documentation at
Mysql and innobase's website, and havn't been able to find anything
saying this is a limitation of innodb, so I will assume this is a bug
(or unintentional feature).

Basically, if you start two transactions, and insert a record in the
first (and let the autoincrementing key be selected automatically, and
not committing), it will work.  If the second transaction does the same,
it blocks.  

Inserting another record from the first transaction returns a table
error 100.  This makes both transactions kinda useless for working
with this table.  Since one's frozen, and the other returns an error.

I'm sure what's happening is the second transaction is being blocked
until the first commits, so mysql knows what autoincrementing number to
issue the second insert.  Problem is, this makes no sense.  This would
mean effectivly only 1 transaction could be in use at a time if you are
going to be inserting into a table with a autoincrementing key.  And
since the existance of a second transaction causes the first to fail too
(eg: table error 10), it would be entirely unsafe to allow more then
1 transaction on the table.  

I really could care less if my autoincrementing keys are handed out
sequentially, or if there are missing digits due to rolled back
transactions.  (If I cared, I would put a timestamp on the record).  Is
there some way to get this baby going?  Maybe a I don't care about
sequential auto_incrementing keys variable?  

Here's some screen dumps:


mysql create table vroomm (i int auto_increment, j int, primary key(i))
type=innodb;
Query OK, 0 rows affected (0.03 sec)

mysql begin;
Query OK, 0 rows affected (0.00 sec)

mysql insert into vroomm (j) values(2);
Query OK, 1 row affected (0.00 sec)

-- (Second DB Transaction tries to insert vroomm)

mysql insert into vroomm (j) values(4);
ERROR 1030: Got error 100 from table handler

I ran into this problem when I tried converting one of our existing
applications to run under innodb.  About 3 minutes after I started it,
it deadlocked.  This was the cause.  Two transactions were trying to
create new DB records w/ Autoincrementing keys.  The entire application
froze seconds later when the all the other threads tried to insert new
DB records too.  Ooops.  

Lemme know what you guys think..  If this is a bug, a feature, or how I
can get it working,

Thanks,
-Joe

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Why Heap Tables doesn´t support auto increment colums

2001-03-01 Thread Joe Kislo

 Im currently working on system for Managing http Session and would like
 to use a heap table for storing the session information. Session
 information is only of temporary intrest. But still I need an unique
 ID for reference with some other Tabels storing dtaa for the session.
 All these tables are Heap tables. So there is no Probelem if the ID
 generated by the auto_increment column is not unique when the Database
 server is restarted.

There's a couple problems with that to begin with.  First, you're
storing state in a HEAP table.  If the mysql server gets reset, all the
sessions are lost.  It would be tough to think you want to write your
system so your database server cannot ever be restarted, especially when
writing the data to a MyISAM table would probably be at a trivial loss
of speed.  Also, if you used an auto_incrementing column, you would be
handing out sessionIDs sequentially.  It would be trivial for somebody
to usurp somebody else's sessionID by simply subtracting or adding 1 to
their own.  

-Joe

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php