Re: Kernel crash caused by mysql?

2001-08-04 Thread Mario Witte

On Fri, Aug 03, 2001 at 03:42:28AM -0700, Van wrote:
 Mario Witte wrote:
 Memory or motherboard. How many sticks of RAM do you have in the machine?  Are
 they the same speed (p100/p133, etc.)?  What kind of motherboard? Is updatedb
 running at this time?  (might be a hard-drive croaking while trying to update
 the locate database).
There are 2x256 MB and 2x128MB sticks in there, all at a spped of 133.
Please don't ask me what kind of motherboard we're running in there, but
that shouldn't be a problem.

Updatedb is running around midnight, but I just found out that
cron.hourly could be a problem in there as we experienced another crash
tonight which was at 1:59, the crash yesterday occured at 4:59. Always
around the full hour. I've tried and disabled cron.hourly for now,
hoping it will help. Seems like it wasn't a problem of mysql, it was
just mysql which was killed and thus appeared in the kerne ltrace or
something.

 Seems your machine might have a wrong hardware component somewhere.  I'd check
 it out if it's a production machine.  
We sure will, but the system is located about 500 kilometers from where
our bureau is, so I hope it will stay alive at least over the weekend
:-)


 mysqld can't run as the only service.  You can't run anything without initd.
Ok, you won! ;)

Thanks for your fast help,
With regards,
-- 
Mario Witte [EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Kernel crash caused by mysql?

2001-08-04 Thread Van

Mario Witte wrote:
 
 On Fri, Aug 03, 2001 at 03:42:28AM -0700, Van wrote:
  Mario Witte wrote:
  Memory or motherboard. How many sticks of RAM do you have in the machine?  Are
  they the same speed (p100/p133, etc.)?  What kind of motherboard? Is updatedb
  running at this time?  (might be a hard-drive croaking while trying to update
  the locate database).
 There are 2x256 MB and 2x128MB sticks in there, all at a spped of 133.
 Please don't ask me what kind of motherboard we're running in there, but
 that shouldn't be a problem.

Couple months ago I had 256 MBytes in my Slackware Athlon workstation labeled
100MHz-128MBytes on both chips.  Turns out one of the chips was 133MHz and the
other was 100MHz (Fry's electronics labeling dep't).  Sadly, the bucks I spent
on a new PIII true Intel Board qualified the chips and I put the 2 133MHz chips
into the Athlon and the 2 qualified P100Mhz chips into the PIII Intel board. 
Honestly, my Athlon was crashing on Slackware and the Intel was crashing on
Advanced Server regularly.  Slackware with 2.4.1 kernel sometimes twice;
sometimes 3 times in a day; sometimes would go for a few days;  The Advanced
Server couldn't stay up long enough to show a login without a BSOD, except
randomly sometimes.  Switched the chips after verifying them, and now the Athlon
can run 3 weeks at a time

vanboers@sedona:~$ w
  4:06am  up 22 days, 13:27,  0 users,  load average: 1.10, 1.24, 1.14
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU  WHAT

(developer machine, what can I say?  1+ Load Avg due to SetiAtHome, BTW) and
I've seen the Advanced server run close to a month (between virus patches and
IIS security updates).  On average, the developer Athlon machine still beats the
Advanced Server machine on uptime, but the point is that the memory was killing
both of them.  Don't take the label for granted.  I can't tell you how much time
I wasted determining this.  I can tell you I lost over $5k US in billing, though
because I assumed the label was correct.  I lost the time because I assumed I
was doing something wrong and couldn't bill my client for development during the
month it took me to find out what the problem was; disparate memory on the same
motherboard.   If the labeling had been correct; I would have just swapped the
chips.  Hope that makes sense.

 
 Updatedb is running around midnight, but I just found out that
 cron.hourly could be a problem in there as we experienced another crash
 tonight which was at 1:59, the crash yesterday occured at 4:59. Always
 around the full hour. I've tried and disabled cron.hourly for now,
 hoping it will help. Seems like it wasn't a problem of mysql, it was
 just mysql which was killed and thus appeared in the kerne ltrace or
 something.

You're probably onto something here.  Great forensics work! I have experience
with the cron.hourly/cron.daily/etc. processes that fire up when you have the
logrotate packages installed.  It's been a while, but while I was using RedHat
at Intel I convinced them these crons and the logrotate packages should either
be audited thoroughly, or pitched because of the second-guessing they do to the
admin of the machine/network.  

Intel opted to replace logrotate with an implementation (msgarch) I've had
running on all of my production machines for several years on some of their
monitoring servers in the division in which I was working.   (Intel applied my
implementation on modified Red Hat and Slackware monitoring servers at that
time.  I have no idea what they've done with their Red Hat implementations and
don't know if they currently deploy Slackware servers at this point in that
division).  I haven't been at Intel for over 4 months, so I have no idea what
they're up to with their server software in that division, at this point.

If logrotate is the cause, I'll send you msgarch and the cron entries for
msgarch.  Sorry I didn't OSS msgarch before, but I hadn't heard of many
complaints on logrotate.   The BSD people use it also and most with a certain
level of satisfaction.  msgarch is my own recipe, but has been implemented by
many of my affiliates for many years.  I just didn't OSS it because most people
have been using LogRotate and I thought it redundant to toss msgarch to the
community.  If that assumption was wrong, let me know.  I'll pitch msgarch into
the community.  

 
  Seems your machine might have a wrong hardware component somewhere.  I'd check
  it out if it's a production machine.
 We sure will, but the system is located about 500 kilometers from where
 our bureau is, so I hope it will stay alive at least over the weekend
 :-)

This is a problematic situation.  Hardware is SO important in remote
deployments.  I hate to say this, but my most important machine is 2000 miles
away from me, but I tested it locally for 2 months on the hardware I put it on
before I deployed it.  That might be the lesson, here.  The hardware didn't fail
after 2 months testing.  Not comprehensive, but might 

Re: Kernel crash caused by mysql?

2001-08-03 Thread Van

Mario Witte wrote:
 
 Tonight one of our database servers died, though it din't have any
 special load at that time. The following output could be found in
 /var/log/warn:
 
 Aug  3 04:58:57 db2 kernel: Unable to handle kernel paging request at
 virtual address 6dad7220
 Aug  3 04:58:57 db2 kernel:  printing eip:
 Aug  3 04:58:57 db2 kernel: c013195b
 Aug  3 04:58:57 db2 kernel: *pde = 
 Aug  3 04:58:57 db2 kernel: Oops: 0002
 Aug  3 04:58:57 db2 kernel: CPU:0
 Aug  3 04:58:57 db2 kernel: EIP:0010:[__insert_into_lru_list+59/92]
 Aug  3 04:58:57 db2 kernel: EFLAGS: 00010282
 Aug  3 04:58:57 db2 kernel: eax: 6dad7200   ebx: 0002   ecx:
 c912db00   edx: 0008
 Aug  3 04:58:57 db2 kernel: esi: c912db00   edi: 0001   ebp:
 e4015640   esp: d392bedc
 Aug  3 04:58:57 db2 kernel: ds: 0018   es: 0018   ss: 0018
 Aug  3 04:58:57 db2 kernel: Process mysqld (pid: 26237,
 stackpage=d392b000)
 Aug  3 04:58:57 db2 kernel: Stack: c01324ef c912db00 0002 c912db00
 1000 c0132502 c912db00 c0132ebb
 Aug  3 04:58:57 db2 kernel:c912db00 e4015640 00256800 
 c13d3e0c c912db00 1000 
 Aug  3 04:58:57 db2 kernel:c0133598 e4015640 c13d3e0c 0400
 0800 c13d3e0c 0800 00256400
 Aug  3 04:58:57 db2 kernel: Call Trace: [__refile_buffer+91/100]
 [refile_buffer+10/16] [__block_commit_write+123/208]
 [generic_commit_write+52/148] [geneAug  3 04:58:57 db2 kernel:
 Aug  3 04:58:57 db2 kernel: Code: 89 48 20 8b 82 cc 58 31 c0 89 48 24 ff
 82 dc 58 31 c0 31 c0
 Aug  3 04:58:57 db2 kernel: Unable to handle kernel paging request at
 virtual address 6dad7220
 Aug  3 04:58:57 db2 kernel:  printing eip:
 Aug  3 04:58:57 db2 kernel: c013195b
 Aug  3 04:58:57 db2 kernel: *pde = 
 Aug  3 04:58:57 db2 kernel: Oops: 0002
 Aug  3 04:58:57 db2 kernel: CPU:0
 Aug  3 04:58:57 db2 kernel: EIP:0010:[__insert_into_lru_list+59/92]
 Aug  3 04:58:57 db2 kernel: EFLAGS: 00010282
 Aug  3 04:58:57 db2 kernel: eax: 6dad7200   ebx: 0002   ecx:
 c61df980   edx: 0008
 Aug  3 04:58:57 db2 kernel: esi: c61df980   edi: 0001   ebp:
 eea49be0   esp: ee565ed8
 Aug  3 04:58:57 db2 kernel: ds: 0018   es: 0018   ss: 0018
 Aug  3 04:58:57 db2 kernel: Process mysqld (pid: 451,
 stackpage=ee565000)
 Aug  3 04:58:57 db2 kernel: Stack: c01324ef c61df980 0002 c61df980
 1000 c0132502 c61df980 c0132ebb
 Aug  3 04:58:57 db2 kernel:c61df980 eea49be0 0039fc29 
 c114e428 c61df980 1000 
 Aug  3 04:58:57 db2 kernel:c0133598 eea49be0 c114e428 0b32
 0c29 c114e428 0c29 0039fb32
 Aug  3 04:58:57 db2 kernel: Call Trace: [__refile_buffer+91/100]
 [refile_buffer+10/16] [__block_commit_write+123/208]
 [generic_commit_write+52/148] [geneAug  3 04:58:57 db2 kernel:
 Aug  3 04:58:57 db2 kernel: Code: 89 48 20 8b 82 cc 58 31 c0 89 48 24 ff
 82 dc 58 31 c0 31 c0
 
 Though the system wasn't completely dead at that time. Hours later I was
 able to log into the system via ssh, but most commands (e.g. ps ax or
 killall mysqld didn't complete but died, killing of my shell). As a
 last resort I did a reboot which caused the kernel to spit out
 neverending stack trace. A hard reboot plus a manual fsck solved the
 problem.
 
 Does anybody have an idea, if mysqld caused this crash? In fact there
 are no other services running on the system and it had been running for
 weeks without any problems.
 
 Any help is appreciated,
 With regards,
 --
 Mario Witte [EMAIL PROTECTED]

Mario:

Memory or motherboard. How many sticks of RAM do you have in the machine?  Are
they the same speed (p100/p133, etc.)?  What kind of motherboard? Is updatedb
running at this time?  (might be a hard-drive croaking while trying to update
the locate database).

Seems I'm pointing to hardware, here.

Point is, I have one machine among my tribe of 10 who does this kind of thing to
my kernel from time-to-time.  Pima has something nasty happening to his hardware
situation and dies running setiathome randomly.  He's been running 57+ days, but
that doesn't mean he's all there.  Couple screws loose somewhere.  Scared of
clowns, or something.

Seems your machine might have a wrong hardware component somewhere.  I'd check
it out if it's a production machine.  

mysqld can't run as the only service.  You can't run anything without initd.

Hope that helps.

Regards,
Van
-- 
=
Linux rocks!!!   http://www.dedserius.com/
=

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php