Re: Kernel crash caused by mysql?
On Fri, Aug 03, 2001 at 03:42:28AM -0700, Van wrote: Mario Witte wrote: Memory or motherboard. How many sticks of RAM do you have in the machine? Are they the same speed (p100/p133, etc.)? What kind of motherboard? Is updatedb running at this time? (might be a hard-drive croaking while trying to update the locate database). There are 2x256 MB and 2x128MB sticks in there, all at a spped of 133. Please don't ask me what kind of motherboard we're running in there, but that shouldn't be a problem. Updatedb is running around midnight, but I just found out that cron.hourly could be a problem in there as we experienced another crash tonight which was at 1:59, the crash yesterday occured at 4:59. Always around the full hour. I've tried and disabled cron.hourly for now, hoping it will help. Seems like it wasn't a problem of mysql, it was just mysql which was killed and thus appeared in the kerne ltrace or something. Seems your machine might have a wrong hardware component somewhere. I'd check it out if it's a production machine. We sure will, but the system is located about 500 kilometers from where our bureau is, so I hope it will stay alive at least over the weekend :-) mysqld can't run as the only service. You can't run anything without initd. Ok, you won! ;) Thanks for your fast help, With regards, -- Mario Witte [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Kernel crash caused by mysql?
Mario Witte wrote: On Fri, Aug 03, 2001 at 03:42:28AM -0700, Van wrote: Mario Witte wrote: Memory or motherboard. How many sticks of RAM do you have in the machine? Are they the same speed (p100/p133, etc.)? What kind of motherboard? Is updatedb running at this time? (might be a hard-drive croaking while trying to update the locate database). There are 2x256 MB and 2x128MB sticks in there, all at a spped of 133. Please don't ask me what kind of motherboard we're running in there, but that shouldn't be a problem. Couple months ago I had 256 MBytes in my Slackware Athlon workstation labeled 100MHz-128MBytes on both chips. Turns out one of the chips was 133MHz and the other was 100MHz (Fry's electronics labeling dep't). Sadly, the bucks I spent on a new PIII true Intel Board qualified the chips and I put the 2 133MHz chips into the Athlon and the 2 qualified P100Mhz chips into the PIII Intel board. Honestly, my Athlon was crashing on Slackware and the Intel was crashing on Advanced Server regularly. Slackware with 2.4.1 kernel sometimes twice; sometimes 3 times in a day; sometimes would go for a few days; The Advanced Server couldn't stay up long enough to show a login without a BSOD, except randomly sometimes. Switched the chips after verifying them, and now the Athlon can run 3 weeks at a time vanboers@sedona:~$ w 4:06am up 22 days, 13:27, 0 users, load average: 1.10, 1.24, 1.14 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT (developer machine, what can I say? 1+ Load Avg due to SetiAtHome, BTW) and I've seen the Advanced server run close to a month (between virus patches and IIS security updates). On average, the developer Athlon machine still beats the Advanced Server machine on uptime, but the point is that the memory was killing both of them. Don't take the label for granted. I can't tell you how much time I wasted determining this. I can tell you I lost over $5k US in billing, though because I assumed the label was correct. I lost the time because I assumed I was doing something wrong and couldn't bill my client for development during the month it took me to find out what the problem was; disparate memory on the same motherboard. If the labeling had been correct; I would have just swapped the chips. Hope that makes sense. Updatedb is running around midnight, but I just found out that cron.hourly could be a problem in there as we experienced another crash tonight which was at 1:59, the crash yesterday occured at 4:59. Always around the full hour. I've tried and disabled cron.hourly for now, hoping it will help. Seems like it wasn't a problem of mysql, it was just mysql which was killed and thus appeared in the kerne ltrace or something. You're probably onto something here. Great forensics work! I have experience with the cron.hourly/cron.daily/etc. processes that fire up when you have the logrotate packages installed. It's been a while, but while I was using RedHat at Intel I convinced them these crons and the logrotate packages should either be audited thoroughly, or pitched because of the second-guessing they do to the admin of the machine/network. Intel opted to replace logrotate with an implementation (msgarch) I've had running on all of my production machines for several years on some of their monitoring servers in the division in which I was working. (Intel applied my implementation on modified Red Hat and Slackware monitoring servers at that time. I have no idea what they've done with their Red Hat implementations and don't know if they currently deploy Slackware servers at this point in that division). I haven't been at Intel for over 4 months, so I have no idea what they're up to with their server software in that division, at this point. If logrotate is the cause, I'll send you msgarch and the cron entries for msgarch. Sorry I didn't OSS msgarch before, but I hadn't heard of many complaints on logrotate. The BSD people use it also and most with a certain level of satisfaction. msgarch is my own recipe, but has been implemented by many of my affiliates for many years. I just didn't OSS it because most people have been using LogRotate and I thought it redundant to toss msgarch to the community. If that assumption was wrong, let me know. I'll pitch msgarch into the community. Seems your machine might have a wrong hardware component somewhere. I'd check it out if it's a production machine. We sure will, but the system is located about 500 kilometers from where our bureau is, so I hope it will stay alive at least over the weekend :-) This is a problematic situation. Hardware is SO important in remote deployments. I hate to say this, but my most important machine is 2000 miles away from me, but I tested it locally for 2 months on the hardware I put it on before I deployed it. That might be the lesson, here. The hardware didn't fail after 2 months testing. Not comprehensive, but might
Re: Kernel crash caused by mysql?
Mario Witte wrote: Tonight one of our database servers died, though it din't have any special load at that time. The following output could be found in /var/log/warn: Aug 3 04:58:57 db2 kernel: Unable to handle kernel paging request at virtual address 6dad7220 Aug 3 04:58:57 db2 kernel: printing eip: Aug 3 04:58:57 db2 kernel: c013195b Aug 3 04:58:57 db2 kernel: *pde = Aug 3 04:58:57 db2 kernel: Oops: 0002 Aug 3 04:58:57 db2 kernel: CPU:0 Aug 3 04:58:57 db2 kernel: EIP:0010:[__insert_into_lru_list+59/92] Aug 3 04:58:57 db2 kernel: EFLAGS: 00010282 Aug 3 04:58:57 db2 kernel: eax: 6dad7200 ebx: 0002 ecx: c912db00 edx: 0008 Aug 3 04:58:57 db2 kernel: esi: c912db00 edi: 0001 ebp: e4015640 esp: d392bedc Aug 3 04:58:57 db2 kernel: ds: 0018 es: 0018 ss: 0018 Aug 3 04:58:57 db2 kernel: Process mysqld (pid: 26237, stackpage=d392b000) Aug 3 04:58:57 db2 kernel: Stack: c01324ef c912db00 0002 c912db00 1000 c0132502 c912db00 c0132ebb Aug 3 04:58:57 db2 kernel:c912db00 e4015640 00256800 c13d3e0c c912db00 1000 Aug 3 04:58:57 db2 kernel:c0133598 e4015640 c13d3e0c 0400 0800 c13d3e0c 0800 00256400 Aug 3 04:58:57 db2 kernel: Call Trace: [__refile_buffer+91/100] [refile_buffer+10/16] [__block_commit_write+123/208] [generic_commit_write+52/148] [geneAug 3 04:58:57 db2 kernel: Aug 3 04:58:57 db2 kernel: Code: 89 48 20 8b 82 cc 58 31 c0 89 48 24 ff 82 dc 58 31 c0 31 c0 Aug 3 04:58:57 db2 kernel: Unable to handle kernel paging request at virtual address 6dad7220 Aug 3 04:58:57 db2 kernel: printing eip: Aug 3 04:58:57 db2 kernel: c013195b Aug 3 04:58:57 db2 kernel: *pde = Aug 3 04:58:57 db2 kernel: Oops: 0002 Aug 3 04:58:57 db2 kernel: CPU:0 Aug 3 04:58:57 db2 kernel: EIP:0010:[__insert_into_lru_list+59/92] Aug 3 04:58:57 db2 kernel: EFLAGS: 00010282 Aug 3 04:58:57 db2 kernel: eax: 6dad7200 ebx: 0002 ecx: c61df980 edx: 0008 Aug 3 04:58:57 db2 kernel: esi: c61df980 edi: 0001 ebp: eea49be0 esp: ee565ed8 Aug 3 04:58:57 db2 kernel: ds: 0018 es: 0018 ss: 0018 Aug 3 04:58:57 db2 kernel: Process mysqld (pid: 451, stackpage=ee565000) Aug 3 04:58:57 db2 kernel: Stack: c01324ef c61df980 0002 c61df980 1000 c0132502 c61df980 c0132ebb Aug 3 04:58:57 db2 kernel:c61df980 eea49be0 0039fc29 c114e428 c61df980 1000 Aug 3 04:58:57 db2 kernel:c0133598 eea49be0 c114e428 0b32 0c29 c114e428 0c29 0039fb32 Aug 3 04:58:57 db2 kernel: Call Trace: [__refile_buffer+91/100] [refile_buffer+10/16] [__block_commit_write+123/208] [generic_commit_write+52/148] [geneAug 3 04:58:57 db2 kernel: Aug 3 04:58:57 db2 kernel: Code: 89 48 20 8b 82 cc 58 31 c0 89 48 24 ff 82 dc 58 31 c0 31 c0 Though the system wasn't completely dead at that time. Hours later I was able to log into the system via ssh, but most commands (e.g. ps ax or killall mysqld didn't complete but died, killing of my shell). As a last resort I did a reboot which caused the kernel to spit out neverending stack trace. A hard reboot plus a manual fsck solved the problem. Does anybody have an idea, if mysqld caused this crash? In fact there are no other services running on the system and it had been running for weeks without any problems. Any help is appreciated, With regards, -- Mario Witte [EMAIL PROTECTED] Mario: Memory or motherboard. How many sticks of RAM do you have in the machine? Are they the same speed (p100/p133, etc.)? What kind of motherboard? Is updatedb running at this time? (might be a hard-drive croaking while trying to update the locate database). Seems I'm pointing to hardware, here. Point is, I have one machine among my tribe of 10 who does this kind of thing to my kernel from time-to-time. Pima has something nasty happening to his hardware situation and dies running setiathome randomly. He's been running 57+ days, but that doesn't mean he's all there. Couple screws loose somewhere. Scared of clowns, or something. Seems your machine might have a wrong hardware component somewhere. I'd check it out if it's a production machine. mysqld can't run as the only service. You can't run anything without initd. Hope that helps. Regards, Van -- = Linux rocks!!! http://www.dedserius.com/ = - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php