Re: [gentoo-user] 3.7.1 SATA errors

2012-12-26 Thread Florian Philipp
Am 26.12.2012 02:11, schrieb fe...@crowfix.com:
 On Tue, Dec 25, 2012 at 01:11:04PM +0100, Florian Philipp wrote:
 
 The best way to find out what's wrong is to bisect the kernel, i.e.
 finding the exact commit that caused the issue to appear.

 http://wiki.gentoo.org/wiki/Kernel_git-bisect
 
 Got the repository cloned:
 
 # git clone 
 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
 linux-stable
 
 Tried to start the bisect, but ran into a problem:
 
 # git bisect start
 # git bisect bad v3.7.0
 fatal: Needed a single revision
 Bad rev input: v3.7.0
 
 Tried v3.7.0.0 for fun, same error.
 
 Tried good first, guessing it can't do much harm that a git bisect reset 
 can't fix.
 
 # git bisect good v3.6.10
 a63a7cf3fc2ac1aff657f58ea446c34f3252209a was both good and bad
 # git bisect bad v3.7.0
 fatal: Needed a single revision
 Bad rev input: v3.7.0
 
 Have I grabbed a repository which doesn't include 3.7.0?
 
 Google research continues.
 

`git tag` should give you a list of version numbers. The tag you are
searching for is v3.7.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] 3.7.1 SATA errors

2012-12-26 Thread felix
On Wed, Dec 26, 2012 at 12:56:39PM +0100, Florian Philipp wrote:

 `git tag` should give you a list of version numbers. The tag you are
 searching for is v3.7.

Thanks -- power went out, standby generator kicked in and woke me up
at 0430, and I woke realizing that.  Bisect is happy.  My git-fu is
weak, since I mostly use it for personal projects.  Work only uses
subversion, blecch.  Didn't know about git tag, and got bisect help
doesn't mention it.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Florian Philipp
Am 23.12.2012 20:23, schrieb fe...@crowfix.com:
 
 I have since had some time to explore this and find it related to the
 kernel; 3.6.10 works fine, while 3.7.1 fails.  If I reset during the
 3.7.1 boot while it is spewing its error messages, but before the
 kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes
 all the way to the panic, I have to power off and wait a few minutes
 before a 3.6.10 reboot is succesful.  This is repeatable, but I
 haven't bothered to see how long the system must be off; a few
 minutes is enough.
 
 There are two error messages during the 3.7.1 boot, repeated for all
 SATA drives:
 
 ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode
 (err_mask=0x40)
 

The code that prints these messages has not been changed since 2011 so I
guess it is a driver issue. You never posted which driver you use
exactly and your kernel config enables all. Therefore I cannot look further.

The best way to find out what's wrong is to bisect the kernel, i.e.
finding the exact commit that caused the issue to appear.

http://wiki.gentoo.org/wiki/Kernel_git-bisect

Unfortunately, there have been 1545 commits between 3.6 and 3.7. With
blind bisection you need 39 kernels to find the issue. Maybe `git log`
can give you a hint which commits might be relevant.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Bruce Hill
On Mon, Dec 24, 2012 at 08:53:33AM -0800, fe...@crowfix.com wrote:
 On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote:
 
  emerge -av app-text/wgetpaste  wgetpaste /path/to/3.6/.config
  /path/to/3.7/.config
 
 3.6.10 .config -- http://bpaste.net/show/66307/
 3.7.1 .config  -- http://bpaste.net/show/66309/
 
  Also can you dmesg | wgetpaste and note the uname -srm output?
 
 3.6.10 dmesg   -- http://bpaste.net/show/66310/
 
 uname -srm: Linux 3.6.10-gentoo x86_64
 
 A couple of others:
 
 My partial transcription of the 3.7.1 boot error messages: 
 http://bpaste.net/show/66311/
 
 3.6.10 emerge --info: http://bpaste.net/show/66312/
 
 I also added all this to the Dropbox dir.

We're on the road, getting ready to pack, and not in a good position to do
much on this issue atm.

I would suggest you run lspci -nnk with your running 3.6.10 kernel and save
that output. Then go into the kernel source directory for 3.7.1, run make
mrproper then make defconfig and enable all the kernel drivers listed in
the lspci -nnk output, as well as the drivers for your IDE/SATA controllers,
and / filesystem. That kernel should boot you, and will get rid of a lot of
the cruft from the present bloated kernels.
-- 
Happy Penguin Computers   ')
126 Fenco Drive   ( \
Tupelo, MS 38801   ^^
supp...@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/

Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Todd Goodman
* Florian Philipp li...@binarywings.net [121225 07:16]:
 Am 23.12.2012 20:23, schrieb fe...@crowfix.com:
  
  I have since had some time to explore this and find it related to the
  kernel; 3.6.10 works fine, while 3.7.1 fails.  If I reset during the
  3.7.1 boot while it is spewing its error messages, but before the
  kernel ultimately panics, I can reboot with 3.6.10, but if 3.7.1 goes
  all the way to the panic, I have to power off and wait a few minutes
  before a 3.6.10 reboot is succesful.  This is repeatable, but I
  haven't bothered to see how long the system must be off; a few
  minutes is enough.
  
  There are two error messages during the 3.7.1 boot, repeated for all
  SATA drives:
  
  ata5.00: qc timeout (cmd 0x2f) ata5.00: failed to set xfermode
  (err_mask=0x40)
  
 
 The code that prints these messages has not been changed since 2011 so I
 guess it is a driver issue. You never posted which driver you use
 exactly and your kernel config enables all. Therefore I cannot look further.
 
 The best way to find out what's wrong is to bisect the kernel, i.e.
 finding the exact commit that caused the issue to appear.
 
 http://wiki.gentoo.org/wiki/Kernel_git-bisect
 
 Unfortunately, there have been 1545 commits between 3.6 and 3.7. With
 blind bisection you need 39 kernels to find the issue. Maybe `git log`
 can give you a hint which commits might be relevant.
 
 Regards,
 Florian Philipp
 

A me too on the problem the original poster is seeing.

I too am seeing this on a server I have.  3.7.0 and 3.7.1 both don't work
but 3.6.10 works fine.

I'm using the sata_mv driver with a SuperMicro (two actually) cards with
Marvell MV88SX6081's.  These chips and their driver have had some issues
in the past.

I also looked for changes in the driver and didn't see any.  Though I
did see some libata changes.

I haven't had time to do a git bisect yet.

Todd




Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote:

 We're on the road, getting ready to pack, and not in a good position to do
 much on this issue atm.

Nevertheless, a most unexpected Christmas present!  In progress, and thank you.

My dilemna certainly isn't urgent, since 3.6.10 still works.


-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 10:58:54AM -0500, Todd Goodman wrote:
 A me too on the problem the original poster is seeing.
 
 I too am seeing this on a server I have.  3.7.0 and 3.7.1 both don't work
 but 3.6.10 works fine.
 
 I'm using the sata_mv driver with a SuperMicro (two actually) cards with
 Marvell MV88SX6081's.  These chips and their driver have had some issues
 in the past.

A pruned lspci -nnk:

00:07.1 IDE interface [0101]: Advanced Micro Devices [AMD] AMD-8111 IDE 
[1022:7469] (rev 03)
Subsystem: Advanced Micro Devices [AMD] AMD-8111 IDE [1022:7469]
Kernel driver in use: pata_amd
01:03.0 SCSI storage controller [0100]: Marvell Technology Group Ltd. 
MV88SX6081 8-port SATA II PCI-X Controller [11ab:6081] (rev 09)
Subsystem: Marvell Technology Group Ltd. Device [11ab:11ab]
Kernel driver in use: sata_mv
02:06.0 SCSI storage controller [0100]: Adaptec AIC-7902B U320 [9005:801d] 
(rev 10)
Subsystem: Adaptec Device [9005:005e]
Kernel driver in use: aic79xx
02:06.1 SCSI storage controller [0100]: Adaptec AIC-7902B U320 [9005:801d] 
(rev 10)
Subsystem: Adaptec Device [9005:005e]
Kernel driver in use: aic79xx
03:05.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller [1095:3114]
Kernel driver in use: sata_sil

pata_amd /dev/sdg 320G for boot which seems happy

sata_mv  /dev/sd[ab] 2 x 300G LVM mounted automatically from fstab
 /dev/sd[cd] 2 4T ditto

sata_sil /dev/sde512G SSD with / and swap
 /dev/sdf512G SSD wirh LVM for /home, /encfs, and mail spool

aic79xx  no drives

The sata_mv drives are not necessary for boot, but they do take up
/dev/sd? namespace.  Might be interesting to try Bruce Hill's idea of
a pruned 3.7.1 kernel without that driver.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote:

 I would suggest you run lspci -nnk with your running 3.6.10 kernel and save
 that output. Then go into the kernel source directory for 3.7.1, run make
 mrproper then make defconfig and enable all the kernel drivers listed in
 the lspci -nnk output, as well as the drivers for your IDE/SATA controllers,
 and / filesystem. That kernel should boot you, and will get rid of a lot of
 the cruft from the present bloated kernels.

Made a minimal 3.7.1 kernel, much smaller and compiled nice and fast.
Hung just like the bloated one, drat.

So I guess I will read up on bisecting.  I know the principle, but
have never tried it.  I suppose one starting point is make sure a
pure-vanilla 3.6.10 kernel boots.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Dale
fe...@crowfix.com wrote:
 On Tue, Dec 25, 2012 at 08:56:56AM -0600, Bruce Hill wrote:

 I would suggest you run lspci -nnk with your running 3.6.10 kernel and save
 that output. Then go into the kernel source directory for 3.7.1, run make
 mrproper then make defconfig and enable all the kernel drivers listed in
 the lspci -nnk output, as well as the drivers for your IDE/SATA 
 controllers,
 and / filesystem. That kernel should boot you, and will get rid of a lot of
 the cruft from the present bloated kernels.
 Made a minimal 3.7.1 kernel, much smaller and compiled nice and fast.
 Hung just like the bloated one, drat.

 So I guess I will read up on bisecting.  I know the principle, but
 have never tried it.  I suppose one starting point is make sure a
 pure-vanilla 3.6.10 kernel boots.


This is what I would try:

Do a lspci -k from whatever Linux you can boot, sysrescue CD or stick
comes to mind here.  That should list the drivers you need for
hardware.  Then mount partitions so you can get to /usr/src/kernel
here and cat the config file and make sure the results from lspci are
built INTO the kernel, not modules but built INTO the kernel.  You could
even do:  'cat .config | grep -i driver name from lspci -k here' 
Repeat that for each driver.  Remember, arrow up keys for that one. 
Saves you some typing.  lol 

If you have those built in, the only thing to check then is that the
file system for / is also built INTO the kernel.  That has always got me
to at least a console login.  Some other hardware may not work but you
can boot and fix from inside the OS instead of booting DVD, USB stick or
whatever and having to mount and such.  That is such a pain to do. 

Maybe that will help.  At least get you to a console.  That alone makes
fixing something else easier. 

Dale

:-)  :-) 

-- 
I am only responsible for what I said ... Not for what you understood or how 
you interpreted my words!




Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 04:20:23PM -0600, Dale wrote:

 This is what I would try:
 ...
 Maybe that will help.  At least get you to a console.  That alone makes
 fixing something else easier. 

Checked all that -- it boots into the same ATA driver failures as the
bloated version of the kernel.  Even have to power off and wait a
while before it resets properly for a 3.6.10 reboot.  So I think it is
bisecting for me.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Paul Hartman
On Sun, Dec 23, 2012 at 1:23 PM,  fe...@crowfix.com wrote:
 Google does not enlighten me.  One suggestion was change the SATA cable, but 
 this is definitely a change from 3.6.10 to 3.7.1.

I can't find where I read it, but just yesterday I was reading a
somewhat recent LKML post which mentioned SATA errors introduced in
3.7.x series, especially problems with JMicron controllers (surprise,
surprise), but perhaps others as well, and also some new warnings
thrown out in the kernel log that didn't used to be there. Sorry I
have nothing more than that anecdote...



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread Dale
fe...@crowfix.com wrote:
 On Tue, Dec 25, 2012 at 04:20:23PM -0600, Dale wrote:

 This is what I would try:
 ...
 Maybe that will help.  At least get you to a console.  That alone makes
 fixing something else easier. 
 Checked all that -- it boots into the same ATA driver failures as the
 bloated version of the kernel.  Even have to power off and wait a
 while before it resets properly for a 3.6.10 reboot.  So I think it is
 bisecting for me.


Is it possible that you have two SATA drivers enabled and the two
conflict each other?  I read, I think on this list, where someone had to
disable one driver for the correct driver to work.  You may want to go here:

http://kmuto.jp/debian/hcl/

Get the driver list from that and try it.  On that site, you can use
lspci or look up by model brand on the left.  This is weird.  If I think
of anything else, I'll post but I'm sort of stumped.  My previous post
always gets me to a console login if nothing else.  Once you get that,
you can work out the rest. 

One other thought, you tried a even more recent kernel version?  Maybe
that version is bad or something.  Back to my stump. 

Dale

:-)  :-) 

-- 
I am only responsible for what I said ... Not for what you understood or how 
you interpreted my words!




Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 06:03:12PM -0600, Dale wrote:

 Is it possible that you have two SATA drivers enabled and the two
 conflict each other?  I read, I think on this list, where someone had to
 disable one driver for the correct driver to work.  You may want to go here:
 
 http://kmuto.jp/debian/hcl/

I'm not sure what good it would do me to find driver incompatibility
like that, since I need all the drivers working at once, and I'd still
have to bisect them.

 One other thought, you tried a even more recent kernel version?  Maybe
 that version is bad or something.  Back to my stump. 

3.7.0 failed, then 3.7.1.  I haven't tried anything more recent.  I'm
trying to download the kernel git, but my satellite link is kinda
slow, and it's snowing on and off and temporarily clogging the dish
until the heater kicks in.

Once I get it downloaded, I'll make sure the 3.6.10 equivalent works
and the 3.7.0 fails, then start bisecting.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-25 Thread felix
On Tue, Dec 25, 2012 at 01:11:04PM +0100, Florian Philipp wrote:

 The best way to find out what's wrong is to bisect the kernel, i.e.
 finding the exact commit that caused the issue to appear.
 
 http://wiki.gentoo.org/wiki/Kernel_git-bisect

Got the repository cloned:

# git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-stable

Tried to start the bisect, but ran into a problem:

# git bisect start
# git bisect bad v3.7.0
fatal: Needed a single revision
Bad rev input: v3.7.0

Tried v3.7.0.0 for fun, same error.

Tried good first, guessing it can't do much harm that a git bisect reset can't 
fix.

# git bisect good v3.6.10
a63a7cf3fc2ac1aff657f58ea446c34f3252209a was both good and bad
# git bisect bad v3.7.0
fatal: Needed a single revision
Bad rev input: v3.7.0

Have I grabbed a repository which doesn't include 3.7.0?

Google research continues.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread Bruce Hill
On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote:
snip, whack, d200d, cough, spit

Puhleeeze don't put such long stuff in an email. Have you heard of attachments?
pastebins?

Your dropbox postings lost me after reading:

Please enable browser-cookies to use the Dropbox website.
-- 
Happy Penguin Computers   ')
126 Fenco Drive   ( \
Tupelo, MS 38801   ^^
supp...@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/

Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread felix
On Mon, Dec 24, 2012 at 08:35:20AM -0600, Bruce Hill wrote:

 Puhleeeze don't put such long stuff in an email. Have you heard of 
 attachments?
 pastebins?

I was under the impression that gentoo strips attachments.  At any
rate, I summarized as much as possible and only put the the full logs
at the end.

As for the cookies, shrug so many sites require cookies and/or
javascript these days that I won't waste my time trying to find one
that doesn't.  I just make sure they are temporary.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread Dale
fe...@crowfix.com wrote:
 On Mon, Dec 24, 2012 at 08:35:20AM -0600, Bruce Hill wrote:

 Puhleeeze don't put such long stuff in an email. Have you heard of 
 attachments?
 pastebins?
 I was under the impression that gentoo strips attachments.  At any
 rate, I summarized as much as possible and only put the the full logs
 at the end.

 As for the cookies, shrug so many sites require cookies and/or
 javascript these days that I won't waste my time trying to find one
 that doesn't.  I just make sure they are temporary.


One bad thing about paste bins, they get removed.  Most people on this
list prefer them included or attached.  That way the error is always
available for future reference in the archives.  If it is on a paste bin
site and it gets removed, then that reference is gone, usually forever. 

I might add, I don't have a paste bin account either.  ;-) 

Dale

:-)  :-) 

-- 
I am only responsible for what I said ... Not for what you understood or how 
you interpreted my words!




Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread Bruce Hill
On Mon, Dec 24, 2012 at 07:41:10AM -0800, fe...@crowfix.com wrote:
 
 I was under the impression that gentoo strips attachments.  At any
 rate, I summarized as much as possible and only put the the full logs
 at the end.
 
 As for the cookies, shrug so many sites require cookies and/or
 javascript these days that I won't waste my time trying to find one

Would you consider our own pastebin from portage?

emerge -av app-text/wgetpaste  wgetpaste /path/to/3.6/.config
/path/to/3.7/.config

You can pastebin them both at the same time, in the same paste, and include a
link. I ask for both because there might be other options other than the ones
you noted, and we can use vimdiff on the two files side-by-side, which IMO
makes it very easy to see the differences.

Also can you dmesg | wgetpaste and note the uname -srm output?

Thanks,
Bruce
-- 
Happy Penguin Computers   ')
126 Fenco Drive   ( \
Tupelo, MS 38801   ^^
supp...@happypenguincomputers.com
662-269-2706 662-205-6424
http://happypenguincomputers.com/

Don't top-post: http://en.wikipedia.org/wiki/Top_post#Top-posting



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread felix
On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote:

 Would you consider our own pastebin from portage?

Sure, in progress.  I'll have to read up on this pastebin stuff.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread felix
On Mon, Dec 24, 2012 at 10:07:04AM -0600, Bruce Hill wrote:

 emerge -av app-text/wgetpaste  wgetpaste /path/to/3.6/.config
 /path/to/3.7/.config

3.6.10 .config -- http://bpaste.net/show/66307/
3.7.1 .config  -- http://bpaste.net/show/66309/

 Also can you dmesg | wgetpaste and note the uname -srm output?

3.6.10 dmesg   -- http://bpaste.net/show/66310/

uname -srm: Linux 3.6.10-gentoo x86_64

A couple of others:

My partial transcription of the 3.7.1 boot error messages: 
http://bpaste.net/show/66311/

3.6.10 emerge --info: http://bpaste.net/show/66312/

I also added all this to the Dropbox dir.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread felix
On Mon, Dec 24, 2012 at 07:41:10AM -0800, fe...@crowfix.com wrote:
 
 I was under the impression that gentoo strips attachments.  At any
 rate, I summarized as much as possible and only put the the full logs
 at the end.

Looks like the attachments got thru.  I will try to remember that.

-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / fe...@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread Mark Knecht
On Mon, Dec 24, 2012 at 6:35 AM, Bruce Hill
da...@happypenguincomputers.com wrote:
 On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote:
 snip, whack, d200d, cough, spit

 Puhleeeze don't put such long stuff in an email. Have you heard of 
 attachments?
 pastebins?


Felix,
   Personally, after years reading LKML, I have no problem with
in-line text of _any_ length, especially on the initial post or when
you are asked to respond with detailed info. While I understand
Bruce's comment I don't think it represents a democratic picture of
what this list has been comfortable with over the years.

   That said, what I do have a BIG problem with is people responding
and not taking the time to edit the response down to a few lines that
make it clear about what their point is. Many responses to 1000 line
emails are 1001 lines - the responder adds a one-liner. That's a real
waste.

   It's a trade off. It's less likely that some of us will go read
pastebin stuff, and if we want to respond technically then that's
leaving us to copy/paste responses which I'm personally less likely to
do.

   Anyway, you pays your money, you takes your chance... ;-)

Cheers,
Mark



Re: [gentoo-user] 3.7.1 SATA errors

2012-12-24 Thread Michael Mol
On Mon, Dec 24, 2012 at 1:21 PM, Mark Knecht markkne...@gmail.com wrote:
 On Mon, Dec 24, 2012 at 6:35 AM, Bruce Hill
 da...@happypenguincomputers.com wrote:
 On Sun, Dec 23, 2012 at 11:23:35AM -0800, fe...@crowfix.com wrote:
 snip, whack, d200d, cough, spit

 Puhleeeze don't put such long stuff in an email. Have you heard of 
 attachments?
 pastebins?


 Felix,
Personally, after years reading LKML, I have no problem with
 in-line text of _any_ length, especially on the initial post or when
 you are asked to respond with detailed info. While I understand
 Bruce's comment I don't think it represents a democratic picture of
 what this list has been comfortable with over the years.

Agreed.


That said, what I do have a BIG problem with is people responding
 and not taking the time to edit the response down to a few lines that
 make it clear about what their point is. Many responses to 1000 line
 emails are 1001 lines - the responder adds a one-liner. That's a real
 waste.

Guilty. To be fair, I try to properly snip and edit when I can, but if
I'm responding from my phone (more often than not, of late), getting
that kind of editing work in is very difficult.

--
:wq



[gentoo-user] 3.7.1 SATA errors

2012-12-23 Thread felix
A few weeks ago I had a scare when a reboot paniced the kernel with a complaint 
that it could not find the root device (/dev/sde), and further reboots couldn't 
even see the USB keyboard.  Leavng the system powered off overnight fixed the 
problem and the system has been working fine ever since.

I have since had some time to explore this and find it related to the kernel; 
3.6.10 works fine, while 3.7.1 fails.  If I reset during the 3.7.1 boot while 
it is spewing its error messages, but before the kernel ultimately panics, I 
can reboot with 3.6.10, but if 3.7.1 goes all the way to the panic, I have to 
power off and wait a few minutes before a 3.6.10 reboot is succesful.  This is 
repeatable, but I haven't bothered to see how long the system must be off; a 
few minutes is enough.

This is a ~amd64 system, dual Opterons, Tyan S2882, Thunder K8S Pro.  The dmesg 
times here start around 30 seconds because it spends 15 seconds on each of two 
SCSI hosts probing for nonexistent drives.  udev etc are all frozen pre-systemd 
nonsense.  Disks are two SSDs, two 4T drives, two 300G drives, and one 320G 
IDE/PATA drive; the main board is so old that there are only three boot 
options: IDE, DVD, network.

There are two error messages during the 3.7.1 boot, repeated for all SATA 
drives:

ata5.00: qc timeout (cmd 0x2f)
ata5.00: failed to set xfermode (err_mask=0x40)

Google does not enlighten me.  One suggestion was change the SATA cable, but 
this is definitely a change from 3.6.10 to 3.7.1.

So here are some details ... You can see everything at 
https://www.dropbox.com/sh/o8j80rps3agvvcf/FBjJLcykRS

I am willing to try reasonable config changes for a new reboot attempt, but it 
is my main home server, not an experimental toy :-)

 dmesg differences

I took some pictures during the boot process and transcribed the results.  The 
3.6.10 dmesg matches, but of course I can't get a 3.7.1 dmesg.

Both 3.6.10 and 3.7.1 appear to be the same up to this point:

ata13.00: ATA-8: WDC WD3200AAJB-00J3A0, 01.03E01, max UDMA/133
ata13.00: 625142448 sectors, multi 16: LBA48
ata13.00: configured for UDMA/133
ata1: SATA link down (SStatus 0 SControl 300)
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: ATA-9: M4-CT512M4SD2, 000F, max UDMA/100
ata9.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata9.00: configured for UDMA/100
ata2: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
ata5.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)

Around here 3.6.10 begins scrolling so fast that I could not get any pictures, 
so this is from the 3.6.10 dmesg, where it diverges from 3.7.1:

ata5.00: configured for UDMA/133
scsi 6:0:0:0: Direct-Access ATA  Maxtor 6B300S0   BANC PQ: 0 ANSI: 5
sd 6:0:0:0: [sda] 586114704 512-byte logical blocks: (300 GB/279 GiB)
sd 6:0:0:0: [sda] Write Protect is off
sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sda:
sd 6:0:0:0: [sda] Attached SCSI disk
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: ATA-7: Maxtor 6B300S0, BANC17M0, max UDMA/133
ata6.00: 586114704 sectors, multi 0: LBA48 NCQ (not used)
ata6.00: configured for UDMA/133
scsi 7:0:0:0: Direct-Access ATA  Maxtor 6B300S0   BANC PQ: 0 ANSI: 5
sd 7:0:0:0: [sdb] 586114704 512-byte logical blocks: (300 GB/279 GiB)
sd 7:0:0:0: [sdb] Write Protect is off
sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
 sdb: unknown partition table
sd 7:0:0:0: [sdb] Attached SCSI disk
 and on and on until it boots.  (The unknown partition table is an LVM 
volume.)

But 3.7.1 pokes along slowly enough while generating its errors that I did get 
some pictures to transcribe, and this is where it diverges from 3.6.10.

ata5.00: qc timeout (cmd 0x2f)
ata5.00: failed to set xfermode (err_mask=0x40)
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: qc timeout (cmd 0x2f)
ata5.00: failed to set xfermode (err_mask=0x40)
ata5: limiting SATA link speed to 1.5 Gbps
ata5.00: limiting speed to UDMA/133:PIO3
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: qc timeout (cmd 0x2f)
ata5.00: failed to set xfermode (err_mask=0x40)
ata5.00: disabled
ata5: hard resetting link
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5: EH complete
... for all ATA drives until it eventually panics because the root device, 
/dev/sde, is not found.


 3.6.10 --- 3.7.1 conf changes

I rebuilt the 3.7.1 kernel and logged all