Re: Re[2]: Serious MySQL internal deadlock

Andreas Steinmetz Wed, 07 Feb 2001 06:47:20 -0800
Hi,
first of all: the deadlock happened again today, this time with no slave
running so it isn't a replication issue. It seems we're getting closer as when
I did run 'show processlist', the pending query was (excerpt from output):

1666    logreader       10.1.1.4        syslog  Query   114     Sending data   
select max(swap_used+swap_free) from pcp where host='apollo' and
time>='20010206140000' and time<'20

This is the same type of query as the last time the deadlock occured. Actually
looking with netstat there was no more connection between the webserver on
10.1.1.4 and the database. Then I tried to kill the thread which resulted in
(excerpt from 'show processlist' output):

1666    logreader       10.1.1.4        syslog  Killed  850     Sending data   
select max(swap_used+swap_free) from pcp where host='apollo' and
time>='20010206140000' and time<'20

The slow query log didn't show the query. After killing mysql (with SIGTERM)
the slow query log contained:

# Time: 010207 14:38:22
# User@Host: logreader[logreader] @  [10.1.1.4]
# Time: 891  Lock_time: 0  Rows_sent: 1
use syslog;
select max(swap_used+swap_free) from pcp where host='apollo' and
time>='20010206140000' and time<'20010207141000';

Running the 'explain' statement on the query shows something very interesting:

mysql> explain select max(swap_used+swap_free) from pcp where host='apollo' and
time>='20010206140000' and time<'20010207141000';
+-------+------+---------------+------+---------+-------+------+------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra      |
+-------+------+---------------+------+---------+-------+------+------------+
| pcp   | ref  | time,host     | host |       8 | const | 8318 | where used |
+-------+------+---------------+------+---------+-------+------+------------+
1 row in set (0.00 sec)
 
mysql> explain select max(swap_used+swap_free) from pcp where host='apollo' and
time>='20010206140000' and time<'20010207141000';
+-------+------+---------------+------+---------+-------+------+------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra      |
+-------+------+---------------+------+---------+-------+------+------------+
| pcp   | ref  | time,host     | host |       8 | const | 5549 | where used |
+-------+------+---------------+------+---------+-------+------+------------+
1 row in set (0.00 sec)
 
mysql>

Both commands were run within about a minute. The number of rows, however, is
vastly different, especially the amount of rows of the first query is vastly 
larger than the amount of rows of the second query, but there is no delete on
this table, only inserts and queries. I now did count the total rows in the
table for host apollo:

mysql> select count(time) from pcp where host='apollo';
+-------------+
| count(time) |
+-------------+
|        6684 |
+-------------+
1 row in set (0.08 sec)
 
mysql>

Then I counted the rows for the time range:

mysql> select count(time) from pcp where host='apollo' and
time>='20010206140000' and time<'20010207141000';
+-------------+
| count(time) |
+-------------+
|        1427 |
+-------------+
1 row in set (0.09 sec)
 
mysql>

The latter value is somewhat OK, given the fact that there's one insert/minute
for every host (1440 inserts a day) and that there was a database deadlock
during the given time span so some inserts were lost while I didn't restart
mysql during the search for the reason of the deadlock.

As an additional information 'max(swap_used+swap_free)' currently results in
the same value (1GByte) for all rows of host apollo,during the given time range
all rows do contain swap_used=0 and swap_free=1GByte.

The structure of the table in the query in question runs against is as follows:

CREATE TABLE pcp (
   time timestamp(14),
   host varchar(8) NOT NULL,
   cpu_user smallint(5) unsigned DEFAULT '0' NOT NULL,
   cpu_nice smallint(5) unsigned DEFAULT '0' NOT NULL,
   cpu_sys smallint(5) unsigned DEFAULT '0' NOT NULL,
   cpu_idle smallint(5) unsigned DEFAULT '0' NOT NULL,
   mem_used int(10) unsigned DEFAULT '0' NOT NULL,
   mem_free int(10) unsigned DEFAULT '0' NOT NULL,
   mem_shared int(10) unsigned DEFAULT '0' NOT NULL,
   mem_cached int(10) unsigned DEFAULT '0' NOT NULL,
   mem_bufmem int(10) unsigned DEFAULT '0' NOT NULL,
   swap_free int(10) unsigned DEFAULT '0' NOT NULL,
   swap_used int(10) unsigned DEFAULT '0' NOT NULL,
   nprocs smallint(5) unsigned DEFAULT '0' NOT NULL,
   in_lo int(10) unsigned DEFAULT '0' NOT NULL,
   in_bond0 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth0 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth1 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth2 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth3 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth4 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth5 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth6 int(10) unsigned DEFAULT '0' NOT NULL,
   in_eth7 int(10) unsigned DEFAULT '0' NOT NULL,
   out_lo int(10) unsigned DEFAULT '0' NOT NULL,
   out_bond0 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth0 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth1 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth2 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth3 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth4 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth5 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth6 int(10) unsigned DEFAULT '0' NOT NULL,
   out_eth7 int(10) unsigned DEFAULT '0' NOT NULL,
   KEY time (time),
   KEY host (host)
);

As a sidenote: I can't use 'between' as this is "min<=val<=max" whereas I need
"min<=val<max". The formatting of the timestamp is basically OK as the stamps
are code generated and not really intended to be too human-readable. I was even
thinking of using Unix secs since the epoch but than this isn't portable and
doesn't help when looking for problems.

On 07-Feb-2001 Sinisa Milivojevic wrote:
> Andreas Steinmetz writes:
>  > Hi,
>  > the inserts did not change the state after the select finished. The
> network is
>  > running full duplex but the only problem I ever encountered with the cards
> was
>  > initalization on fast systems. I fixed this and posted it to the linux
> kernel
>  > mailing list (search for epic100).
>  > If the problem continues to appear with the slaves stopped I'll try to
> code a
>  > reproducable test case.
>  > The only thing that's visible is the following entry in the slow query log
>  > which matches the process list attached to my earlier mail. The funny
> thing is
>  > that this query will result in exactly one row and according to the log
> the
>  > this row was sent. The query time, however, of 277 seconds for a somewhat
>  > simple query makes me wonder...
>  > 
>  > # Time: 010205 23:13:00
>  > # User@Host: logreader[logreader] @  [10.1.1.4]
>  > # Time: 277  Lock_time: 0  Rows_sent: 1
>  > use syslog;
>  > select max(swap_used+swap_free) from pcp where host='castor' and
>  > time>='20010204230000' and time<'20010205231000';
>  > 
> 
> 
> HI!
> 
> You could send me the output from the EXPLAIN statement for the above
> query and you could use BETWEEN for time column. If time is of
> datetime (timestamp) type, better formatting might help too a bit.
> 
> 
> Regards,
> 
> Sinisa
> 
>       ____  __     _____   _____  ___     ==  MySQL AB
>      /*/\*\/\*\   /*/ \*\ /*/ \*\ |*|     Sinisa Milivojevic
>     /*/ /*/ /*/   \*\_   |*|   |*||*|     mailto:[EMAIL PROTECTED]
>    /*/ /*/ /*/\*\/*/  \*\|*|   |*||*|     Larnaka, Cyprus
>   /*/     /*/  /*/\*\_/*/ \*\_/*/ |*|____
>   ^^^^^^^^^^^^/*/^^^^^^^^^^^\*\^^^^^^^^^^^
>              /*/             \*\                Developers Team
> 

Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Re[2]: Serious MySQL internal deadlock

Reply via email to