Re: [gentoo-user] 3.7.1 SATA errors
Am 26.12.2012 02:11, schrieb fe...@crowfix.com: On Tue, Dec 25, 2012 at 01:11:04PM +0100, Florian Philipp wrote: The best way to find out what's wrong is to bisect the kernel, i.e. finding the exact commit that caused the issue to appear. http://wiki.gentoo.org/wiki/Kernel_git-bisect Got the repository cloned: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-stable Tried to start the bisect, but ran into a problem: # git bisect start # git bisect bad v3.7.0 fatal: Needed a single revision Bad rev input: v3.7.0 Tried v3.7.0.0 for fun, same error. Tried good first, guessing it can't do much harm that a git bisect reset can't fix. # git bisect good v3.6.10 a63a7cf3fc2ac1aff657f58ea446c34f3252209a was both good and bad # git bisect bad v3.7.0 fatal: Needed a single revision Bad rev input: v3.7.0 Have I grabbed a repository which doesn't include 3.7.0? Google research continues. `git tag` should give you a list of version numbers. The tag you are searching for is v3.7. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] 3.7.1 SATA errors
On Wed, Dec 26, 2012 at 12:56:39PM +0100, Florian Philipp wrote: `git tag` should give you a list of version numbers. The tag you are searching for is v3.7. Thanks -- power went out, standby generator kicked in and woke me up at 0430, and I woke realizing that. Bisect is happy. My git-fu is weak, since I mostly use it for personal projects. Work only uses subversion, blecch. Didn't know about git tag, and got bisect help doesn't mention it. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
Am 23.12.2012 20:23, schrieb fe...@crowfix.com: I have since had some time to explore this and find it related to the kernel; 3.6.10 works fine, while 3.7.1 fails. If I reset during the 3.7.1 boot while it is spewing its error messages, but before the kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to power off and wait a few minutes before a 3.6.10 reboot is succesful. This is repeatable, but I haven't bothered to see how long the system must be off; a few minutes is enough. There are two error messages during the 3.7.1 boot, repeated for all SATA drives: ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) The code that prints these messages has not been changed since 2011 so I guess it is a driver issue. You never posted which driver you use exactly and your kernel config enables all. Therefore I cannot look further. The best way to find out what's wrong is to bisect the kernel, i.e. finding the exact commit that caused the issue to appear. http://wiki.gentoo.org/wiki/Kernel_git-bisect Unfortunately, there have been 1545 commits between 3.6 and 3.7. With blind bisection you need 39 kernels to find the issue. Maybe `git log` can give you a hint which commits might be relevant. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 08:53:33AM -0800, fe...@crowfix.com wrote: On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote: emerge -av app-text/wgetpaste wgetpaste /path/to/3.6/.config /path/to/3.7/.config 3.6.10 .config -- http://bpaste.net/show/66307/ 3.7.1 .config -- http://bpaste.net/show/66309/ Also can you dmesg | wgetpaste and note the uname -srm output? 3.6.10 dmesg -- http://bpaste.net/show/66310/ uname -srm: Linux 3.6.10-gentoo x86_64 A couple of others: My partial transcription of the 3.7.1 boot error messages: http://bpaste.net/show/66311/ 3.6.10 emerge --info: http://bpaste.net/show/66312/ I also added all this to the Dropbox dir. We're on the road, getting ready to pack, and not in a good position to do much on this issue atm. I would suggest you run lspci -nnk with your running 3.6.10 kernel and save that output. Then go into the kernel source directory for 3.7.1, run make mrproper then make defconfig and enable all the kernel drivers listed in the lspci -nnk output, as well as the drivers for your IDE/SATA controllers, and / filesystem. That kernel should boot you, and will get rid of a lot of the cruft from the present bloated kernels. -- Happy Penguin Computers ') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ supp...@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
Re: [gentoo-user] 3.7.1 SATA errors
* Florian Philipp li...@binarywings.net [121225 07:16]: Am 23.12.2012 20:23, schrieb fe...@crowfix.com: I have since had some time to explore this and find it related to the kernel; 3.6.10 works fine, while 3.7.1 fails. If I reset during the 3.7.1 boot while it is spewing its error messages, but before the kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to power off and wait a few minutes before a 3.6.10 reboot is succesful. This is repeatable, but I haven't bothered to see how long the system must be off; a few minutes is enough. There are two error messages during the 3.7.1 boot, repeated for all SATA drives: ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) The code that prints these messages has not been changed since 2011 so I guess it is a driver issue. You never posted which driver you use exactly and your kernel config enables all. Therefore I cannot look further. The best way to find out what's wrong is to bisect the kernel, i.e. finding the exact commit that caused the issue to appear. http://wiki.gentoo.org/wiki/Kernel_git-bisect Unfortunately, there have been 1545 commits between 3.6 and 3.7. With blind bisection you need 39 kernels to find the issue. Maybe `git log` can give you a hint which commits might be relevant. Regards, Florian Philipp A me too on the problem the original poster is seeing. I too am seeing this on a server I have. 3.7.0 and 3.7.1 both don't work but 3.6.10 works fine. I'm using the sata_mv driver with a SuperMicro (two actually) cards with Marvell MV88SX6081's. These chips and their driver have had some issues in the past. I also looked for changes in the driver and didn't see any. Though I did see some libata changes. I haven't had time to do a git bisect yet. Todd
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote: We're on the road, getting ready to pack, and not in a good position to do much on this issue atm. Nevertheless, a most unexpected Christmas present! In progress, and thank you. My dilemna certainly isn't urgent, since 3.6.10 still works. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 10:58:54AM -0500, Todd Goodman wrote: A me too on the problem the original poster is seeing. I too am seeing this on a server I have. 3.7.0 and 3.7.1 both don't work but 3.6.10 works fine. I'm using the sata_mv driver with a SuperMicro (two actually) cards with Marvell MV88SX6081's. These chips and their driver have had some issues in the past. A pruned lspci -nnk: 00:07.1 IDE interface [0101]: Advanced Micro Devices [AMD] AMD-8111 IDE [1022:7469] (rev 03) Subsystem: Advanced Micro Devices [AMD] AMD-8111 IDE [1022:7469] Kernel driver in use: pata_amd 01:03.0 SCSI storage controller [0100]: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller [11ab:6081] (rev 09) Subsystem: Marvell Technology Group Ltd. Device [11ab:11ab] Kernel driver in use: sata_mv 02:06.0 SCSI storage controller [0100]: Adaptec AIC-7902B U320 [9005:801d] (rev 10) Subsystem: Adaptec Device [9005:005e] Kernel driver in use: aic79xx 02:06.1 SCSI storage controller [0100]: Adaptec AIC-7902B U320 [9005:801d] (rev 10) Subsystem: Adaptec Device [9005:005e] Kernel driver in use: aic79xx 03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02) Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller [1095:3114] Kernel driver in use: sata_sil pata_amd /dev/sdg 320G for boot which seems happy sata_mv /dev/sd[ab] 2 x 300G LVM mounted automatically from fstab /dev/sd[cd] 2 4T ditto sata_sil /dev/sde512G SSD with / and swap /dev/sdf512G SSD wirh LVM for /home, /encfs, and mail spool aic79xx no drives The sata_mv drives are not necessary for boot, but they do take up /dev/sd? namespace. Might be interesting to try Bruce Hill's idea of a pruned 3.7.1 kernel without that driver. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote: I would suggest you run lspci -nnk with your running 3.6.10 kernel and save that output. Then go into the kernel source directory for 3.7.1, run make mrproper then make defconfig and enable all the kernel drivers listed in the lspci -nnk output, as well as the drivers for your IDE/SATA controllers, and / filesystem. That kernel should boot you, and will get rid of a lot of the cruft from the present bloated kernels. Made a minimal 3.7.1 kernel, much smaller and compiled nice and fast. Hung just like the bloated one, drat. So I guess I will read up on bisecting. I know the principle, but have never tried it. I suppose one starting point is make sure a pure-vanilla 3.6.10 kernel boots. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
fe...@crowfix.com wrote: On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote: I would suggest you run lspci -nnk with your running 3.6.10 kernel and save that output. Then go into the kernel source directory for 3.7.1, run make mrproper then make defconfig and enable all the kernel drivers listed in the lspci -nnk output, as well as the drivers for your IDE/SATA controllers, and / filesystem. That kernel should boot you, and will get rid of a lot of the cruft from the present bloated kernels. Made a minimal 3.7.1 kernel, much smaller and compiled nice and fast. Hung just like the bloated one, drat. So I guess I will read up on bisecting. I know the principle, but have never tried it. I suppose one starting point is make sure a pure-vanilla 3.6.10 kernel boots. This is what I would try: Do a lspci -k from whatever Linux you can boot, sysrescue CD or stick comes to mind here. That should list the drivers you need for hardware. Then mount partitions so you can get to /usr/src/kernel here and cat the config file and make sure the results from lspci are built INTO the kernel, not modules but built INTO the kernel. You could even do: 'cat .config | grep -i driver name from lspci -k here' Repeat that for each driver. Remember, arrow up keys for that one. Saves you some typing. lol If you have those built in, the only thing to check then is that the file system for / is also built INTO the kernel. That has always got me to at least a console login. Some other hardware may not work but you can boot and fix from inside the OS instead of booting DVD, USB stick or whatever and having to mount and such. That is such a pain to do. Maybe that will help. At least get you to a console. That alone makes fixing something else easier. Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words!
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 04:20:23PM -0600, Dale wrote: This is what I would try: ... Maybe that will help. At least get you to a console. That alone makes fixing something else easier. Checked all that -- it boots into the same ATA driver failures as the bloated version of the kernel. Even have to power off and wait a while before it resets properly for a 3.6.10 reboot. So I think it is bisecting for me. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Sun, Dec 23, 2012 at 1:23 PM, fe...@crowfix.com wrote: Google does not enlighten me. One suggestion was change the SATA cable, but this is definitely a change from 3.6.10 to 3.7.1. I can't find where I read it, but just yesterday I was reading a somewhat recent LKML post which mentioned SATA errors introduced in 3.7.x series, especially problems with JMicron controllers (surprise, surprise), but perhaps others as well, and also some new warnings thrown out in the kernel log that didn't used to be there. Sorry I have nothing more than that anecdote...
Re: [gentoo-user] 3.7.1 SATA errors
fe...@crowfix.com wrote: On Tue, Dec 25, 2012 at 04:20:23PM -0600, Dale wrote: This is what I would try: ... Maybe that will help. At least get you to a console. That alone makes fixing something else easier. Checked all that -- it boots into the same ATA driver failures as the bloated version of the kernel. Even have to power off and wait a while before it resets properly for a 3.6.10 reboot. So I think it is bisecting for me. Is it possible that you have two SATA drivers enabled and the two conflict each other? I read, I think on this list, where someone had to disable one driver for the correct driver to work. You may want to go here: http://kmuto.jp/debian/hcl/ Get the driver list from that and try it. On that site, you can use lspci or look up by model brand on the left. This is weird. If I think of anything else, I'll post but I'm sort of stumped. My previous post always gets me to a console login if nothing else. Once you get that, you can work out the rest. One other thought, you tried a even more recent kernel version? Maybe that version is bad or something. Back to my stump. Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words!
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 06:03:12PM -0600, Dale wrote: Is it possible that you have two SATA drivers enabled and the two conflict each other? I read, I think on this list, where someone had to disable one driver for the correct driver to work. You may want to go here: http://kmuto.jp/debian/hcl/ I'm not sure what good it would do me to find driver incompatibility like that, since I need all the drivers working at once, and I'd still have to bisect them. One other thought, you tried a even more recent kernel version? Maybe that version is bad or something. Back to my stump. 3.7.0 failed, then 3.7.1. I haven't tried anything more recent. I'm trying to download the kernel git, but my satellite link is kinda slow, and it's snowing on and off and temporarily clogging the dish until the heater kicks in. Once I get it downloaded, I'll make sure the 3.6.10 equivalent works and the 3.7.0 fails, then start bisecting. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Tue, Dec 25, 2012 at 01:11:04PM +0100, Florian Philipp wrote: The best way to find out what's wrong is to bisect the kernel, i.e. finding the exact commit that caused the issue to appear. http://wiki.gentoo.org/wiki/Kernel_git-bisect Got the repository cloned: # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-stable Tried to start the bisect, but ran into a problem: # git bisect start # git bisect bad v3.7.0 fatal: Needed a single revision Bad rev input: v3.7.0 Tried v3.7.0.0 for fun, same error. Tried good first, guessing it can't do much harm that a git bisect reset can't fix. # git bisect good v3.6.10 a63a7cf3fc2ac1aff657f58ea446c34f3252209a was both good and bad # git bisect bad v3.7.0 fatal: Needed a single revision Bad rev input: v3.7.0 Have I grabbed a repository which doesn't include 3.7.0? Google research continues. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote: snip, whack, d200d, cough, spit Puhleeeze don't put such long stuff in an email. Have you heard of attachments? pastebins? Your dropbox postings lost me after reading: Please enable browser-cookies to use the Dropbox website. -- Happy Penguin Computers ') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ supp...@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 08:35:20AM -0600, Bruce Hill wrote: Puhleeeze don't put such long stuff in an email. Have you heard of attachments? pastebins? I was under the impression that gentoo strips attachments. At any rate, I summarized as much as possible and only put the the full logs at the end. As for the cookies, shrug so many sites require cookies and/or javascript these days that I won't waste my time trying to find one that doesn't. I just make sure they are temporary. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
fe...@crowfix.com wrote: On Mon, Dec 24, 2012 at 08:35:20AM -0600, Bruce Hill wrote: Puhleeeze don't put such long stuff in an email. Have you heard of attachments? pastebins? I was under the impression that gentoo strips attachments. At any rate, I summarized as much as possible and only put the the full logs at the end. As for the cookies, shrug so many sites require cookies and/or javascript these days that I won't waste my time trying to find one that doesn't. I just make sure they are temporary. One bad thing about paste bins, they get removed. Most people on this list prefer them included or attached. That way the error is always available for future reference in the archives. If it is on a paste bin site and it gets removed, then that reference is gone, usually forever. I might add, I don't have a paste bin account either. ;-) Dale :-) :-) -- I am only responsible for what I said ... Not for what you understood or how you interpreted my words!
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 07:41:10AM -0800, fe...@crowfix.com wrote: I was under the impression that gentoo strips attachments. At any rate, I summarized as much as possible and only put the the full logs at the end. As for the cookies, shrug so many sites require cookies and/or javascript these days that I won't waste my time trying to find one Would you consider our own pastebin from portage? emerge -av app-text/wgetpaste wgetpaste /path/to/3.6/.config /path/to/3.7/.config You can pastebin them both at the same time, in the same paste, and include a link. I ask for both because there might be other options other than the ones you noted, and we can use vimdiff on the two files side-by-side, which IMO makes it very easy to see the differences. Also can you dmesg | wgetpaste and note the uname -srm output? Thanks, Bruce -- Happy Penguin Computers ') 126 Fenco Drive ( \ Tupelo, MS 38801 ^^ supp...@happypenguincomputers.com 662-269-2706 662-205-6424 http://happypenguincomputers.com/ Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote: Would you consider our own pastebin from portage? Sure, in progress. I'll have to read up on this pastebin stuff. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote: emerge -av app-text/wgetpaste wgetpaste /path/to/3.6/.config /path/to/3.7/.config 3.6.10 .config -- http://bpaste.net/show/66307/ 3.7.1 .config -- http://bpaste.net/show/66309/ Also can you dmesg | wgetpaste and note the uname -srm output? 3.6.10 dmesg -- http://bpaste.net/show/66310/ uname -srm: Linux 3.6.10-gentoo x86_64 A couple of others: My partial transcription of the 3.7.1 boot error messages: http://bpaste.net/show/66311/ 3.6.10 emerge --info: http://bpaste.net/show/66312/ I also added all this to the Dropbox dir. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 07:41:10AM -0800, fe...@crowfix.com wrote: I was under the impression that gentoo strips attachments. At any rate, I summarized as much as possible and only put the the full logs at the end. Looks like the attachments got thru. I will try to remember that. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman rocket surgeon / fe...@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 6:35 AM, Bruce Hill da...@happypenguincomputers.com wrote: On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote: snip, whack, d200d, cough, spit Puhleeeze don't put such long stuff in an email. Have you heard of attachments? pastebins? Felix, Personally, after years reading LKML, I have no problem with in-line text of _any_ length, especially on the initial post or when you are asked to respond with detailed info. While I understand Bruce's comment I don't think it represents a democratic picture of what this list has been comfortable with over the years. That said, what I do have a BIG problem with is people responding and not taking the time to edit the response down to a few lines that make it clear about what their point is. Many responses to 1000 line emails are 1001 lines - the responder adds a one-liner. That's a real waste. It's a trade off. It's less likely that some of us will go read pastebin stuff, and if we want to respond technically then that's leaving us to copy/paste responses which I'm personally less likely to do. Anyway, you pays your money, you takes your chance... ;-) Cheers, Mark
Re: [gentoo-user] 3.7.1 SATA errors
On Mon, Dec 24, 2012 at 1:21 PM, Mark Knecht markkne...@gmail.com wrote: On Mon, Dec 24, 2012 at 6:35 AM, Bruce Hill da...@happypenguincomputers.com wrote: On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote: snip, whack, d200d, cough, spit Puhleeeze don't put such long stuff in an email. Have you heard of attachments? pastebins? Felix, Personally, after years reading LKML, I have no problem with in-line text of _any_ length, especially on the initial post or when you are asked to respond with detailed info. While I understand Bruce's comment I don't think it represents a democratic picture of what this list has been comfortable with over the years. Agreed. That said, what I do have a BIG problem with is people responding and not taking the time to edit the response down to a few lines that make it clear about what their point is. Many responses to 1000 line emails are 1001 lines - the responder adds a one-liner. That's a real waste. Guilty. To be fair, I try to properly snip and edit when I can, but if I'm responding from my phone (more often than not, of late), getting that kind of editing work in is very difficult. -- :wq
[gentoo-user] 3.7.1 SATA errors
A few weeks ago I had a scare when a reboot paniced the kernel with a complaint that it could not find the root device (/dev/sde), and further reboots couldn't even see the USB keyboard. Leavng the system powered off overnight fixed the problem and the system has been working fine ever since. I have since had some time to explore this and find it related to the kernel; 3.6.10 works fine, while 3.7.1 fails. If I reset during the 3.7.1 boot while it is spewing its error messages, but before the kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to power off and wait a few minutes before a 3.6.10 reboot is succesful. This is repeatable, but I haven't bothered to see how long the system must be off; a few minutes is enough. This is a ~amd64 system, dual Opterons, Tyan S2882, Thunder K8S Pro. The dmesg times here start around 30 seconds because it spends 15 seconds on each of two SCSI hosts probing for nonexistent drives. udev etc are all frozen pre-systemd nonsense. Disks are two SSDs, two 4T drives, two 300G drives, and one 320G IDE/PATA drive; the main board is so old that there are only three boot options: IDE, DVD, network. There are two error messages during the 3.7.1 boot, repeated for all SATA drives: ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) Google does not enlighten me. One suggestion was change the SATA cable, but this is definitely a change from 3.6.10 to 3.7.1. So here are some details ... You can see everything at https://www.dropbox.com/sh/o8j80rps3agvvcf/FBjJLcykRS I am willing to try reasonable config changes for a new reboot attempt, but it is my main home server, not an experimental toy :-) dmesg differences I took some pictures during the boot process and transcribed the results. The 3.6.10 dmesg matches, but of course I can't get a 3.7.1 dmesg. Both 3.6.10 and 3.7.1 appear to be the same up to this point: ata13.00: ATA-8: WDC WD3200AAJB-00J3A0, 01.03E01, max UDMA/133 ata13.00: 625142448 sectors, multi 16: LBA48 ata13.00: configured for UDMA/133 ata1: SATA link down (SStatus 0 SControl 300) ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata9.00: ATA-9: M4-CT512M4SD2, 000F, max UDMA/100 ata9.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 0/32) ata9.00: configured for UDMA/100 ata2: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata5.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133 ata5.00: 586114704 sectors, multi 0: LBA48 NCQ (not used) Around here 3.6.10 begins scrolling so fast that I could not get any pictures, so this is from the 3.6.10 dmesg, where it diverges from 3.7.1: ata5.00: configured for UDMA/133 scsi 6:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5 sd 6:0:0:0: [sda] 586114704 512-byte logical blocks: (300 GB/279 GiB) sd 6:0:0:0: [sda] Write Protect is off sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sd 6:0:0:0: [sda] Attached SCSI disk ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata6.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133 ata6.00: 586114704 sectors, multi 0: LBA48 NCQ (not used) ata6.00: configured for UDMA/133 scsi 7:0:0:0: Direct-Access ATA Maxtor 6B300S0 BANC PQ: 0 ANSI: 5 sd 7:0:0:0: [sdb] 586114704 512-byte logical blocks: (300 GB/279 GiB) sd 7:0:0:0: [sdb] Write Protect is off sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 7:0:0:0: [sdb] Attached SCSI disk and on and on until it boots. (The unknown partition table is an LVM volume.) But 3.7.1 pokes along slowly enough while generating its errors that I did get some pictures to transcribe, and this is where it diverges from 3.6.10. ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5: limiting SATA link speed to 1.5 Gbps ata5.00: limiting speed to UDMA/133:PIO3 ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode (err_mask=0x40) ata5.00: disabled ata5: hard resetting link ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5: EH complete ... for all ATA drives until it eventually panics because the root device, /dev/sde, is not found. 3.6.10 --- 3.7.1 conf changes I rebuilt the 3.7.1 kernel and logged all