I've been experiencing a simmilar problem only with MyISAM tables.
This is under 3.23.35 and 3.23.36 on Sparc/Solaris 2.7.
For me the problem occurs when doing an ALTER TABLE or a simmilar
operation (eg OPTIMIZE) on relatively large tables (>4 millon rows)
with one or two indeces.
The table in question is occasionally accessed by other queries;
these access could occur during the ALTER TABLE. In this case,
the application running the other queries would not mind seeing the
old structure (for read) or wating until the ALTER is complete
(for write).
The symptom is the same as described below; with ALL threads
(not just those touching the ALTERed table) getting stuck in
"opening tables" or "closing tables".
To recover it is necessary to terminate mysqld a SIGKILL;
though data loss can usually be avoided by doing a
FLUSH TABLES first (which also hangs) and also sending
a SIGTERM and waiting a minute before sending the KILL.
Would upgrading to 3.23.39 fix this problem? Or is it
still a live bug in the current version?
Heikki Tuuri wrote:
> Hi!
>
> This may be a bug in the MySQL part of the code.
> It looks like in sql_base.cc the pthread mutex LOCK_open
> has been locked but not released by some thread.
> Is it so that the ALTER TABLE went through without
> hanging?
>
> It is crucial to know if all interesting threads are stuck waiting
> for the LOCK_open mutex, or if some thread is stuck inside InnoDB
> code. Did you look carefully through all the threads printing bt?
>
> In the processlist one thread is 'copying to tmp table'. What does bt say
> for that thread?
>
> Anyway, to find out who failed to release the mutex we should
> compile in debug code which would tell who locked LOCK_open,
> and in which query. You could add a global char* variable in sql_base.cc
> and store about on line 781 table_name to it when it reserves LOCK_open.
> Now, if mysqld hangs, you can with gdb print the value of the global
> variable
> and you see what table was accessed when the pthread mutex was
> locked. I think thd (user session object) does not contain the query
> text as a string. It would be nice if it contained it and one could store
> the
> query text to the global variable.
>
> Monty, can you advise Peter about this?
>
> By the way, why do you need to ALTER a table which is simultaneously
> used in queries? Why do you use LOCK TABLES when you dump a table?
> In InnoDB you can just SELECT INTO OUTFILE, since InnoDB does a
> consistent read by default. If you do several SELECT INTO OUTFILEs in the
> same transaction you can dump several tables seen at the same snapshot
> of the database.
>
> Regards,
> Heikki
>
> -----Original Message-----
> From: Peter Zaitsev <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> Date: Monday, July 16, 2001 6:08 PM
> Subject: I got it hanged !
>
> >Hello Heikki,
> >
> > mysql 3.23.39 Linux Innodb.
> >
> > I've got INNODB to hang quite simple way - I just run an alter table
> > on a table which was used in many queries.
> > I run "alter table table_groups add version int unsigned not null"
> >
> > I got mysql to be completely stock with the following result:
> >
> > 1) PROCESSLIST:
> >
> > As you see all queries got stuck it "opening tables" or "closing
> > tables"
> >
> >+---------+----------+----------------+-------------+---------+-------+----
> ------------------+---------------------------------------------------------
> ---------------------------------------------+
> >| Id | User | Host | db | Command | Time |
> State | Info
> |
> >+---------+----------+----------------+-------------+---------+-------+----
> ------------------+---------------------------------------------------------
> ---------------------------------------------+
> >| 1 | root | localhost | counter | Query | 295 |
> Opening tables | lock tables heap.gstats write
> |
> >| 7 | root | localhost | | Sleep | 5747 |
> |
> |
> >| 14 | titan | topaz.local | counter | Sleep | 469 |
> |
> |
> >| 24 | titan | php.local | news | Sleep | 887 |
> |
> |
> >| 290 | titan | saturn.local | counter | Query | 296 |
> Opening table | UPDATE quickstat SET visitors=16421, hosts=16983,
> hits=81173, reloads=3713, sessions=18826, jumps=63 |
> >| 336 | titan | opal.local | counter | Sleep | 3387 |
> |
> |
> >| 506 | titan | coral.local | counter | Sleep | 407 |
> |
> |
> >| 534 | titan | suse.local | mlists | Sleep | 1110 |
> |
> |
> >| 544 | titan | ruby.local | counter | Query | 302 |
> Copying to tmp table | select
> > cp.counter_id
> >from
> > counterproperty cp,table_groups tg, counterlayers cl, quickstat |
> >| 565 | titan | php.local | counter | Sleep | 1715 |
> |
> |
> >| 628 | titan | cesar.local | counter | Query | 11 |
> Opening tables | select
> > cp.counter_id
> >from
> > counterproperty cp,table_groups tg, counterlayers cl, quickstat |
> >| 731 | titan | php.local | counter | Sleep | 366 |
> |
> |
> >| 734 | titan | php.local | counter | Sleep | 931 |
> |
> |
> >| 739 | titan | rambo.local | counter | Sleep | 14821 |
> |
> |
> >| 763 | titan | agate.local | counter | Sleep | 14923 |
> |
> |
> >| 765 | titan | testserv.local | lacontrol | Sleep | 4504 |
> |
> |
> >| 780 | titan | mars.local | counter | Sleep | 977 |
> |
> |
> >| 781 | titan | saturn.local | counter | Sleep | 976 |
> |
> |
> >| 782 | titan | mars.local | counter | Sleep | 726 |
> |
> |
> >| 783 | titan | saturn.local | counter | Sleep | 581 |
> |
> |
> >| 865 | titan | php.local | counter | Sleep | 2260 |
> |
> |
> >| 1022 | titan | tor.local | counter | Sleep | 1399 |
> |
> |
> >| 1097 | titan | ruby.local | counter | Sleep | 652 |
> |
> |
> >| 1265 | titan | php.local | counter | Sleep | 1862 |
> |
> |
> >| 1775 | titan | php.local | counter | Sleep | 2257 |
> |
> |
> >| 2218 | titan | rat.local | counter | Query | 14 |
> Opening tables | select
> > cp.counter_id
> >from
> > counterproperty cp,table_groups tg, counterlayers cl, quickstat |
> >| 2240 | titan | elf.local | counter | Sleep | 535 |
> |
> |
> >| 2516 | titan | pan.local | counter | Query | 49 |
> Opening tables | select
> > cp.counter_id
> >from
> > counterproperty cp,table_groups tg, counterlayers cl, quickstat |
> >| 2586 | titan | tor.local | counter | Sleep | 626 |
> |
> |
> >| 2882 | titan | sol.local | counter | Sleep | 2614 |
> |
> |
> >| 2914 | titan | berill.local | counter | Sleep | 529 |
> |
> |
> >| 3682 | titan | php.local | news | Sleep | 4314 |
> |
> |
> >| 6591 | titan | testserv.local | lacontrol | Sleep | 25285 |
> |
> |
> >| 8278 | titan | php.local | news | Sleep | 3417 |
> |
> |
> >| 8363 | titan | tor.local | counter | Sleep | 25643 |
> |
> |
> >| 8502 | titan | viking.local | counter | Sleep | 490 |
> |
> |
> >| 10311 | titan | foton.local | timezone | Sleep | 25323 |
> |
> |
> >| 10376 | titan | foton.local | counter | Sleep | 977 |
> |
> |
> >| 12174 | titan | foton.local | counter | Query | 95 |
> Opening tables | SELECT lang_id FROM http_lang WHERE lang='sp'
> |
> >| 12175 | titan | foton.local | counter | Query | 296 |
> Opening table | UPDATE quickstat SET visitors=4983, hosts=4198,
> hits=8695, reloads=337, sessions=6366, jumps=55, com |
> >| 15726 | titan | rat.local | counter | Sleep | 24081 |
> |
> |
> >| 16813 | titan | agate.local | counter | Sleep | 23878 |
> |
> |
> >| 18342 | titan | foton.local | timezone | Sleep | 23599 |
> |
> |
> >| 18358 | titan | foton.local | counter | Sleep | 460 |
> |
> |
> >| 18654 | titan | foton.local | counter | Sleep | 1058 |
> |
> |
> >| 18657 | titan | foton.local | counter | Query | 293 |
> Opening table | UPDATE quickstat SET visitors=209, hosts=205,
> hits=435, reloads=18, sessions=214, jumps=28, commings |
> >| 19054 | titan | mars.local | counter | Sleep | 8271 |
> |
> |
> >| 19073 | titan | saturn.local | counter | Sleep | 1220 |
> |
> |
> >| 19076 | titan | saturn.local | counter | Sleep | 4806 |
> |
> |
> >| 19125 | titan | satir.local | counter | Sleep | 7821 |
> |
> |
> >| 19204 | titan | mars.local | counter | Query | 296 |
> closing tables | UPDATE quickstat SET visitors=12487, hosts=12604,
> hits=26984, reloads=809, sessions=13474, jumps=418 |
> >| 19205 | titan | mars.local | counter | Query | 287 |
> Opening table | UPDATE quickstat SET visitors=58946, hosts=56448,
> hits=109532, reloads=116954, sessions=66404, jumps |
> >| 19350 | titan | mars.local | counter | Sleep | 5836 |
> |
> |
> >| 19356 | titan | saturn.local | counter | Query | 296 |
> Opening table | UPDATE quickstat SET visitors=3454, hosts=3215,
> hits=6826, reloads=1431, sessions=3861, jumps=0, com |
> >| 19360
> >
> >
> >2) VMSTAT
> >
> > 0 0 0 12 4676 7260 404836 0 0 0 0 113 13 2 8
> 91
> > 0 0 0 12 4676 7260 404836 0 0 0 0 115 14 1 9
> 91
> > 0 0 0 12 4676 7260 404836 0 0 0 0 121 21 1 8
> 91
> > 0 0 0 12 4672 7260 404836 0 0 0 0 118 13 1 8
> 91
> > 0 0 0 12 4672 7260 404836 0 0 0 0 116 14 2 8
> 91
> > 0 0 0 12 4672 7260 404836 0 0 0 0 113 12 1 9
> 91
> >
> >
> > Note NO disk I/O at all and just quite a few CPU usage I beleve
> > because of connections in "Sleep" state.
> >
> > 3) GBD. I've connected with GDB to the process and got some traces
> > from different threads:
> >
> > Most of them was in following states:
> >
> >
> >/spylog/21860: No such file or directory.
> >Attaching to program: /usr/local/mysql/libexec/mysqld, Thread 21860
> >0x8240c19 in sigsuspend ()
> >(gdb) bt
> >#0 0x8240c19 in sigsuspend ()
> >#1 0x8229f80 in __pthread_wait_for_restart_signal () at raid.cc:806
> >#2 0x822b67c in __pthread_lock () at raid.cc:806
> >#3 0x8228a5e in pthread_mutex_lock () at raid.cc:806
> >#4 0x808343f in open_table (thd=0xa41eae8, db=0xa303bd8 "counter",
> table_name=0xa41f990 "quickstat", alias=0xa41f990 "quickstat",
> > refresh=0x9edff613) at sql_base.cc:781
> >#5 0x8084486 in open_ltable (thd=0xa41eae8, table_list=0xa41f9b8,
> lock_type=TL_WRITE_LOW_PRIORITY) at sql_base.cc:1397
> >#6 0x809a175 in mysql_update (thd=0xa41eae8, table_list=0xa41f9b8,
> fields=@0xa41ed34, values=@0xa41ed58, conds=0x583c7df8,
> > limit=4294967295, handle_duplicates=DUP_ERROR,
> lock_type=TL_WRITE_LOW_PRIORITY) at sql_update.cc:62
> >#7 0x807416e in mysql_execute_command () at sql_parse.cc:1503
> >#8 0x8077024 in mysql_parse (thd=0xa41eae8,
> > inBuf=0xa41f280 "UPDATE quickstat SET visitors=16421, hosts=16983,
> hits=81173, reloads=3713, sessions=18826, jumps=63, commings=77,
> ignoredloads=642, badvisitors=906, badvisitorloads=1314, proxyloads=10927,
> anonimprox"..., length=1807) at sql_parse.cc:2289
> >#9 0x8072304 in do_command (thd=0xa41eae8) at sql_parse.cc:828
> >#10 0x80716f7 in handle_one_connection (arg=0xa41eae8) at sql_parse.cc:551
> >#11 0x8227daa in pthread_start_thread () at raid.cc:806
> >(gdb) quit
> >
> >
> >(gdb) bt
> >#0 0x8240c19 in sigsuspend ()
> >#1 0x8229f80 in __pthread_wait_for_restart_signal () at raid.cc:806
> >#2 0x822b67c in __pthread_lock () at raid.cc:806
> >#3 0x8228a5e in pthread_mutex_lock () at raid.cc:806
> >#4 0x808343f in open_table (thd=0x4fe27830, db=0x9740388 "counter",
> table_name=0xb19ea68 "counterproperty", alias=0xb19ea78 "cp",
> > refresh=0xacdff743) at sql_base.cc:781
> >#5 0x8084311 in open_tables (thd=0x4fe27830, start=0xb19eaa0) at
> sql_base.cc:1340
> >#6 0x80869ec in open_and_lock_tables (thd=0x4fe27830, tables=0xb19eaa0) at
> sql_base.cc:1450
> >#7 0x8073000 in mysql_execute_command () at sql_parse.cc:1126
> >#8 0x8077024 in mysql_parse (thd=0x4fe27830,
> > inBuf=0xb19e7a0 "select \n cp.counter_id \nfrom \n counterproperty
> cp,table_groups tg, counterlayers cl, quickstat qs \nwhere \n
> cp.tblid=tg.tblid and tg.server=7 and cl.counter_id=cp.counter_id and
> qs.counter_id=cp"..., length=600)
> > at sql_parse.cc:2289
> >#9 0x8072304 in do_command (thd=0x4fe27830) at sql_parse.cc:828
> >#10 0x80716f7 in handle_one_connection (arg=0x93d2188) at sql_parse.cc:551
> >#11 0x8227daa in pthread_start_thread () at raid.cc:806
> >(gdb) quit
> >
> >
> >
> >
> >Some was in the following state which is I think is connection
> >handling:
> >
> >Attaching to program: /usr/local/mysql/libexec/mysqld, Thread 21868
> >0x825ef14 in __libc_read ()
> >(gdb) bt
> >#0 0x825ef14 in __libc_read ()
> >#1 0x4 in ?? ()
> >#2 0x80686e2 in vio_read (vio=0x64763788, buf=0x4fbc1d00 "\001", size=4)
> at violite.c:177
> >#3 0x8067da2 in my_real_read (net=0x5a182b7c, complen=0x9dfff880) at
> net_serv.cc:472
> >#4 0x80680ad in my_net_read (net=0x5a182b7c) at net_serv.cc:618
> >#5 0x8071dd4 in do_command (thd=0x5a182b70) at sql_parse.cc:729
> >#6 0x80716f7 in handle_one_connection (arg=0xa5bb3a0) at sql_parse.cc:551
> >#7 0x8227daa in pthread_start_thread () at raid.cc:806
> >(gdb) quit
> >
> >
> >
> >If I'm not mistaken this mostly happends then I need to lock the table
> >which has transactions active - then I need to bakc it up (with
> >mysqldump) or then I do alter table ore something similar.
> >I remember I saw this before allready it was a broadcast missing
> >somethere - may be it's the same case now.
> >
> >Anyway I would be really happy if you would help me to provide any
> >info required to debug this more.
> >
> >I'm now using INNODB in production and this makes my site down for
> >about hour per day :(
> >
> >
> >
> >
> >
> >
> >
> >--
> >Best regards,
> > Peter mailto:[EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> Before posting, please check:
> http://www.mysql.com/manual.php (the manual)
> http://lists.mysql.com/ (the list archive)
>
> To request this thread, e-mail [EMAIL PROTECTED]
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php