Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Oleg Derevenetz wrote: Anyway, I looked at the ddb output already, said that it looks as either driver or hw problem with very high confidence. I think the time of the project could be spent more productive elsewere, while submitter checks his hardware, for instance, by changing controller, disks, or controller type. I already said that: 1. This controller and disks succesfully works earlier with FreeBSD 4.6.2 without any problems; Yes, as Kostik says it may be a driver problem. Or it could be a hardware problem because the driver is quite different than in 4.x, and may be exercising the hardware in a different way that provokes a hardware bug. 2. I tried to replace a disk with another one (the same model), but it doesn't help. Unfortunately, I have no another free SCSI controller (but see #1); 3. I have another AMD64 machine with different hardware (including disks and SCSI controller) that periodically suffers from the same problem. Unfortunately, that machine is in production and heavily loaded, so I can't overload it even more with INVARIANTS, WITNESS, and DIAGNOSTIC - my clients will not forgive me for that. Many different problems can have similar symptoms when you do not look closely. Indeed, the PR you are replying to is itself a completely different issue that has nothing to do with your own bug report, and someone else also replied to it with what looks like yet another completely different issue. There is no evidence so far pointing anywhere apart from the mylex controller, so that argues against the hypothesis that your second problem is the same. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Anyway, I looked at the ddb output already, said that it looks as either driver or hw problem with very high confidence. I think the time of the project could be spent more productive elsewere, while submitter checks his hardware, for instance, by changing controller, disks, or controller type. I already said that: 1. This controller and disks succesfully works earlier with FreeBSD 4.6.2 without any problems; 2. I tried to replace a disk with another one (the same model), but it doesn't help. Unfortunately, I have no another free SCSI controller (but see #1); 3. I have another AMD64 machine with different hardware (including disks and SCSI controller) that periodically suffers from the same problem. Unfortunately, that machine is in production and heavily loaded, so I can't overload it even more with INVARIANTS, WITNESS, and DIAGNOSTIC - my clients will not forgive me for that. -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Kostik Belousov wrote: On Sun, Nov 04, 2007 at 11:26:20PM +0100, Kris Kennaway wrote: Oleg Derevenetz wrote: Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip Is this just the vmcore, or the debugging kernel also? Both are needed to make sense of the dump. Kernel binary with kernel config is here: ftp://oleg.vsi.ru/private/kernel.zip This kernel was built statically, and no modules loaded on boot at all. -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru That kernel doesn't appear to match with the vmcore, are you sure it is the right one? Are you able to successfully run kgdb on these locally? Besides the matching kernel, kgdb also must be build from the same sources as the kernel to provide useful information from the core dump. Anyway, I looked at the ddb output already, said that it looks as either driver or hw problem with very high confidence. I think the time of the project could be spent more productive elsewere, while submitter checks his hardware, for instance, by changing controller, disks, or controller type. Yes, at this point it does seem to be related to the mylex controller. I hear from another developer that they are not considered to be high-quality hardware. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
On Sun, Nov 04, 2007 at 11:26:20PM +0100, Kris Kennaway wrote: > Oleg Derevenetz wrote: > >Dumpdev is swap partition on da0 (single physical disk) that > >connected to Mylex AcceleRAID 170 RAID controller. The problem > >arrives when I copy large amount of files from FTP to another disk > >(da1) that is connected to the same RAID controller. > > If the driver or controller is misbehaving it could explain both > problems. Any chance you can get another disk in there on a > different controller to dump onto? > >>> > >>>Yes, I got IDE disk and saved kernel dump for another static hang > >>>state on it. Here is the dump: > >>> > >>>ftp://oleg.vsi.ru/private/vmcore.0.zip > >> > >>Is this just the vmcore, or the debugging kernel also? Both are > >>needed to make sense of the dump. > > > >Kernel binary with kernel config is here: > > > >ftp://oleg.vsi.ru/private/kernel.zip > > > >This kernel was built statically, and no modules loaded on boot at all. > > > >-- > >Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE > >Phone: +7 4732 539880 > >Fax: +7 4732 531415 http://www.vsi.ru > >CenterTelecom Voronezh ISPhttp://isp.vsi.ru > > > > > > That kernel doesn't appear to match with the vmcore, are you sure it is > the right one? Are you able to successfully run kgdb on these locally? Besides the matching kernel, kgdb also must be build from the same sources as the kernel to provide useful information from the core dump. Anyway, I looked at the ddb output already, said that it looks as either driver or hw problem with very high confidence. I think the time of the project could be spent more productive elsewere, while submitter checks his hardware, for instance, by changing controller, disks, or controller type. pgpEc2mqJT3zK.pgp Description: PGP signature
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip Is this just the vmcore, or the debugging kernel also? Both are needed to make sense of the dump. Kernel binary with kernel config is here: ftp://oleg.vsi.ru/private/kernel.zip This kernel was built statically, and no modules loaded on boot at all. That kernel doesn't appear to match with the vmcore, are you sure it is the right one? Are you able to successfully run kgdb on these locally? # kgdb kernel vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". (no debugging symbols found)...Attempt to extract a component of a value that is not a structure pointer. (kgdb) bt #0 0xc0554b12 in doadump () #1 0xc048fe2b in db_fncall () #2 0xc048fc30 in db_command () #3 0xc048fcf8 in db_command_loop () #4 0xc04918f5 in db_trap () #5 0xc056e158 in kdb_trap () #6 0xc06c5960 in trap () #7 0xc06b176a in calltrap () #8 0xc056debf in kdb_enter () #9 0xc069bad6 in siointr1 () #10 0xc069b8cd in siointr () #11 0xc06b5565 in intr_execute_handlers () #12 0xc06b7cc6 in lapic_handle_intr () #13 0xc06b1b23 in Xapic_isr1 () #14 0xc06aaa4d in acpi_cpu_c1 () #15 0xc049b9c9 in acpi_cpu_idle () #16 0xc06b9e54 in cpu_idle () #17 0xc05402e9 in idle_proc () #18 0xc05400cc in fork_exit () #19 0xc06b17cc in fork_trampoline () (kgdb) info threads 114 Thread 100124 (PID=854: mc) 0xc05663ab in sched_switch () 113 Thread 100059 (PID=851: csh) 0xc05663ab in sched_switch () 112 Thread 100107 (PID=850: su) 0xc05663ab in sched_switch () 111 Thread 100125 (PID=845: csh) 0xc05663ab in sched_switch () 110 Thread 100122 (PID=844: sshd) 0xc05663ab in sched_switch () 109 Thread 100137 (PID=840: sshd) 0xc05663ab in sched_switch () 108 Thread 100138 (PID=839: httpd) 0xc05663ab in sched_switch () 107 Thread 100085 (PID=838: httpd) 0xc05663ab in sched_switch () 106 Thread 100092 (PID=837: httpd) 0xc05663ab in sched_switch () 105 Thread 100104 (PID=836: httpd) 0xc05663ab in sched_switch () 104 Thread 100083 (PID=835: httpd) 0xc05663ab in sched_switch () 103 Thread 100042 (PID=834: httpd) 0xc05663ab in sched_switch () 102 Thread 100064 (PID=833: httpd) 0xc05663ab in sched_switch () 101 Thread 100082 (PID=832: httpd) 0xc05663ab in sched_switch () 100 Thread 100106 (PID=831: httpd) 0xc05663ab in sched_switch () 99 Thread 100114 (PID=830: httpd) 0xc05663ab in sched_switch () 98 Thread 100080 (PID=829: httpd) 0xc05663ab in sched_switch () 97 Thread 100096 (PID=828: httpd) 0xc05663ab in sched_switch () 96 Thread 100113 (PID=827: httpd) 0xc05663ab in sched_switch () 95 Thread 100103 (PID=826: httpd) 0xc05663ab in sched_switch () 94 Thread 100139 (PID=825: httpd) 0xc05663ab in sched_switch () 93 Thread 100140 (PID=824: httpd) 0xc05663ab in sched_switch () 92 Thread 100141 (PID=823: httpd) 0xc05663ab in sched_switch () 91 Thread 100142 (PID=822: httpd) 0xc05663ab in sched_switch () 90 Thread 100076 (PID=821: httpd) 0xc05663ab in sched_switch () 89 Thread 100093 (PID=820: httpd) 0xc05663ab in sched_switch () 88 Thread 100069 (PID=819: httpd) 0xc05663ab in sched_switch () 87 Thread 100099 (PID=818: httpd) 0xc05663ab in sched_switch () 86 Thread 100068 (PID=817: httpd) 0xc05663ab in sched_switch () 85 Thread 100041 (PID=816: httpd) 0xc05663ab in sched_switch () 84 Thread 100070 (PID=815: httpd) 0xc05663ab in sched_switch () 83 Thread 100098 (PID=814: httpd) 0xc05663ab in sched_switch () 82 Thread 100050 (PID=813: httpd) 0xc05663ab in sched_switch () 81 Thread 100088 (PID=812: httpd) 0xc05663ab in sched_switch () 80 Thread 100131 (PID=811: httpd) 0xc05663ab in sched_switch () 79 Thread 100100 (PID=810: httpd) 0xc05663ab in sched_switch () 78 Thread 100133 (PID=809: mysqld) 0xc05663ab in sched_switch () 77 Thread 100126 (PID=809: mysqld) 0xc05663ab in sched_switch () 76 Thread 100134 (PID=809: mysqld) 0xc05663ab in sched_switch () 75 Thread 100079 (PID=809: mysqld) 0xc05663ab in sched_switch () 74 Thread 100128 (PID=809: mysqld) 0xc05663ab in sched_switch () 73 Threa
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Oleg Derevenetz wrote: Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip Is this just the vmcore, or the debugging kernel also? Both are needed to make sense of the dump. Kernel binary with kernel config is here: ftp://oleg.vsi.ru/private/kernel.zip This kernel was built statically, and no modules loaded on boot at all. -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru That kernel doesn't appear to match with the vmcore, are you sure it is the right one? Are you able to successfully run kgdb on these locally? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip Is this just the vmcore, or the debugging kernel also? Both are needed to make sense of the dump. Kernel binary with kernel config is here: ftp://oleg.vsi.ru/private/kernel.zip This kernel was built statically, and no modules loaded on boot at all. -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Oleg Derevenetz wrote: Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip Is this just the vmcore, or the debugging kernel also? Both are needed to make sense of the dump. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Yes, I got IDE disk and saved kernel dump for another static hang state on it. Here is the dump: ftp://oleg.vsi.ru/private/vmcore.0.zip -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
Oleg Derevenetz wrote: > > After my break to debugger using Ctrl+Alt+Esc sequence and entering a > > "panic" command kernel does not wrote a kernel dump but seems to > > hang. Can > > anyone describe how to obtain a kernel dump in this situation, or at least > > say - which output of show commands need in first place to debug this > > ? > > Output of all suggested commands is huge and I afraid of making > > mistake > > when carrying this output from screen to list of paper and back :-) > > Oleg, one thing you can do to make this less painful is to > run your machine's console over serial port. > > First get a crossover serial cable, make sure it works from one > box to another, it should be easy to run "tip com1" on both > boxes to ensure that it works. > > Then you just need to add console=comconsole to /boot/loader.conf > and your box's console should come over serial. > > Then on the machine watching the console, you can just do this: > > % script > Script started, output file is typescript > % tip com1 > ...do ddb stuff now... > ...stop tip > % exit > > now you should have everything logged into a file called "typescript" > should save you a big headache. Thanks, I'll try it in the monday morning. > As far as getting a dump from ddb, try this: > > ddb> call doadump > > I'm completely at a loss why this isn't a base ddb command "dump" > but whatever... :) Unfortunately, this doesn't work too. I called duty personnel in this datacenter and asked them to do this, and person on duty tells me that after he enters this command something like that arrives on monitor: db> call doadump Dumping 3072 MB Dump aborted error I/O Dump failed. (Error 5) Hmnmm, that seems like you might be having a hardware problem, It is possible, but unlikely: 1. I have simular symptoms on another AMD64 machine with 6.2 (uname -a from this machine listed in PR kern/104406 in my followup at Wed, 7 Mar 2007 05:10:59 +0300), but they are rare and this machine is in production, so I can't make experiments with it; 2. All these hardware successfully works earlier with FreeBSD 4.6. what disk device do you have? Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. If the driver or controller is misbehaving it could explain both problems. Any chance you can get another disk in there on a different controller to dump onto? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
> > Can anyone take a look on PR kern/104406 ? I got repeatable hang situation, > > but I can't obtain a kernel dump to get result of all show commands from > > here: > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > > > After my break to debugger using Ctrl+Alt+Esc sequence and entering a > > "panic" command kernel does not wrote a kernel dump but seems to hang. Can > > anyone describe how to obtain a kernel dump in this situation, or at least > > say - which output of show commands need in first place to debug this ? > > Output of all suggested commands is huge and I afraid of making mistake > > when carrying this output from screen to list of paper and back :-) > > Oleg, one thing you can do to make this less painful is to > run your machine's console over serial port. > > First get a crossover serial cable, make sure it works from one > box to another, it should be easy to run "tip com1" on both > boxes to ensure that it works. > > Then you just need to add console=comconsole to /boot/loader.conf > and your box's console should come over serial. > > Then on the machine watching the console, you can just do this: > > % script > Script started, output file is typescript > % tip com1 > ...do ddb stuff now... > ...stop tip > % exit > > now you should have everything logged into a file called "typescript" > should save you a big headache. Thanks, I'll try it in the monday morning. I posted a followup to kern/104406 that includes all information listed in "Debugging Deadlocks" chapter of FreeBSD Developer's Handbook. Can anyone take a look on it and say - is this certainly a hardware problem or some sort of software problem ? -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: kern/104406: [ufs] Processes get stuck in "ufs" stateunderpersistent CPU load
> > After my break to debugger using Ctrl+Alt+Esc sequence and entering a > > "panic" command kernel does not wrote a kernel dump but seems to > > hang. Can > > anyone describe how to obtain a kernel dump in this situation, or at least > > say - which output of show commands need in first place to debug this > > ? > > Output of all suggested commands is huge and I afraid of making > > mistake > > when carrying this output from screen to list of paper and back :-) > > Oleg, one thing you can do to make this less painful is to > run your machine's console over serial port. > > First get a crossover serial cable, make sure it works from one > box to another, it should be easy to run "tip com1" on both > boxes to ensure that it works. > > Then you just need to add console=comconsole to /boot/loader.conf > and your box's console should come over serial. > > Then on the machine watching the console, you can just do this: > > % script > Script started, output file is typescript > % tip com1 > ...do ddb stuff now... > ...stop tip > % exit > > now you should have everything logged into a file called "typescript" > should save you a big headache. Thanks, I'll try it in the monday morning. > As far as getting a dump from ddb, try this: > > ddb> call doadump > > I'm completely at a loss why this isn't a base ddb command "dump" > but whatever... :) Unfortunately, this doesn't work too. I called duty personnel in this datacenter and asked them to do this, and person on duty tells me that after he enters this command something like that arrives on monitor: db> call doadump Dumping 3072 MB Dump aborted error I/O Dump failed. (Error 5) Hmnmm, that seems like you might be having a hardware problem, It is possible, but unlikely: 1. I have simular symptoms on another AMD64 machine with 6.2 (uname -a from this machine listed in PR kern/104406 in my followup at Wed, 7 Mar 2007 05:10:59 +0300), but they are rare and this machine is in production, so I can't make experiments with it; 2. All these hardware successfully works earlier with FreeBSD 4.6. what disk device do you have? Dumpdev is swap partition on da0 (single physical disk) that connected to Mylex AcceleRAID 170 RAID controller. The problem arrives when I copy large amount of files from FTP to another disk (da1) that is connected to the same RAID controller. Have you also enabled kernel dumps via /etc/rc.conf:dumpdev= ? Yes, I have dumpdev="AUTO" in rc.conf and swap device (4G) listed in /etc/fstab. -- Oleg Derevenetz <[EMAIL PROTECTED]> OOD3-RIPE Phone: +7 4732 539880 Fax: +7 4732 531415 http://www.vsi.ru CenterTelecom Voronezh ISPhttp://isp.vsi.ru ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"