Re: [PATCH 1/3] pci: Add PCI ROM helper for platform-provided ROM images

2013-04-05 Thread Chris Murphy

On Apr 2, 2013, at 2:10 PM, Bjorn Helgaas  wrote:

> On Mon, Apr 1, 2013 at 11:47 AM, Matthew Garrett
>  wrote:
>> On Mon, 2013-04-01 at 11:39 -0600, Bjorn Helgaas wrote:
>> 
>>> Chris still has problems (see
>>> https://bugzilla.redhat.com/show_bug.cgi?id=927451), but I don't know
>>> whether they are related to these patches or something else.
>> 
>> I think they're unrelated. The log he posts using this patch gives the
>> correct output - the ROM image comes from the platform method rather
>> than from PCI. I think Ben probably needs to look at that.
> 
> OK, I added these three patches to my for-linus branch, headed for v3.9.

Are they in 3.9.0-0.rc5.git2.1.f19? I'm seeing a regression from 3.8.5 with the 
radeon driver not finding BIOS ROM as well.
https://bugzilla.redhat.com/show_bug.cgi?id=949083

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] pci: Add PCI ROM helper for platform-provided ROM images

2013-04-05 Thread Chris Murphy

On Apr 5, 2013, at 2:35 PM, Bjorn Helgaas  wrote:

> On Fri, Apr 5, 2013 at 2:31 PM, Chris Murphy  
> wrote:
>> 
>> 
>> Are they in 3.9.0-0.rc5.git2.1.f19? I'm seeing a regression from 3.8.5 with 
>> the radeon driver not finding BIOS ROM as well.
>> https://bugzilla.redhat.com/show_bug.cgi?id=949083
> 
> No.  I haven't asked Linus to pull my branch yet (was just thinking it
> was time to do that, coincidentally :))

The patch appears to fix Bug 949083 radeon issue as well. I've updated the bug 
report.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-01 Thread Chris Murphy

On Oct 1, 2013, at 9:19 AM, Guenter Roeck  wrote:

> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
 Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
 
 [   10.886016] applesmc: key count changed from 261 to 1174405121
 
>>> 
>>> Explains the crash, but the new key count is very wrong. 1174405121 = 
>>> 0x4601.
>>> Which I guess explains the subsequent memory allocation error in the log.
>>> 
>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>> command failure leaves some state machine in a bad state ?
>> 
>> I seem to recall a report on another similar state problem on newer
>> machines, so maybe, yes. Older machines seem fine, I have never
>> encountered the problem myself. Here is a patch to test that
>> theory. It has been tested to be pretty harmless on two different
>> generations.
>> 
>> I really really do not want to add an 'if (value is insane)' check ;-)
>> 
> Chris,
> 
> any chance you can load this patch on an affected machine so we can get
> test feedback ? This one is too experimental to submit upstream without
> knowing that it really fixes the problem.

Yes. What kernel.org source version should I apply it against? I'd use the 
non-debug config file from an equivalent version Fedora kernel, unless asked 
otherwise. And also should I test it on other vintages? I have here 
MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).

Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-01 Thread Chris Murphy

On Oct 1, 2013, at 10:24 AM, Guenter Roeck  wrote:

> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck  wrote:
>> 
>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>> 
>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>> 
>>>>> 
>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 
>>>>> 0x4601.
>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>> 
>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>> command failure leaves some state machine in a bad state ?
>>>> 
>>>> I seem to recall a report on another similar state problem on newer
>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>> encountered the problem myself. Here is a patch to test that
>>>> theory. It has been tested to be pretty harmless on two different
>>>> generations.
>>>> 
>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>> 
>>> Chris,
>>> 
>>> any chance you can load this patch on an affected machine so we can get
>>> test feedback ? This one is too experimental to submit upstream without
>>> knowing that it really fixes the problem.
>> 
>> Yes. What kernel.org source version should I apply it against? I'd use the 
>> non-debug config file from an equivalent version Fedora kernel, unless asked 
>> otherwise. And also should I test it on other vintages? I have here 
>> MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>> 
> Only requirement is that it also includes the previous patch, so it would be
> optimal if you can apply it on top of the previous image.

Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after 
~dozen reboots, I'm not triggering the problem. The only items in dmesg with 
smc in it:

[   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
[   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10


Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-01 Thread Chris Murphy

On Oct 1, 2013, at 9:51 PM, Guenter Roeck  wrote:

> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck  wrote:
>> 
>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck  wrote:
>>>> 
>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>> 
>>>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>> 
>>>>>>> 
>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 
>>>>>>> 0x4601.
>>>>>>> Which I guess explains the subsequent memory allocation error in the 
>>>>>>> log.
>>>>>>> 
>>>>>>> Henrik, any idea what might be going on ? Is it possible that the 
>>>>>>> previous
>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>> 
>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>> generations.
>>>>>> 
>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>> 
>>>>> Chris,
>>>>> 
>>>>> any chance you can load this patch on an affected machine so we can get
>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>> knowing that it really fixes the problem.
>>>> 
>>>> Yes. What kernel.org source version should I apply it against? I'd use the 
>>>> non-debug config file from an equivalent version Fedora kernel, unless 
>>>> asked otherwise. And also should I test it on other vintages? I have here 
>>>> MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>> 
>>> Only requirement is that it also includes the previous patch, so it would be
>>> optimal if you can apply it on top of the previous image.
>> 
>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after 
>> ~dozen reboots, I'm not triggering the problem. The only items in dmesg with 
>> smc in it:
>> 
>> [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>> [   13.833402] input: applesmc as 
>> /devices/platform/applesmc.768/input/input10
>> 
> 
> Hi Chris,
> 
> That only means that you did not hit the problem. There may be some secondary
> trigger (cold boot ? coffee on the cpu ?).
> 
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

I have no idea how to trigger it. I have tried cold and warm boots. Boots 
between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other 
testing, maybe I'll stumble onto it.


Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-02 Thread Chris Murphy

On Oct 2, 2013, at 12:02 PM, Guenter Roeck  wrote:

> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" 
>>>>>>> message, so
>>>>>>> I think that is a pre-requisite.
>>>>>> 
>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>> commands produce more data than we actually read. This would
>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>> 
>>>>> 
>>>>> Good point.
>>>>> 
>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in 
>>>>> this case ?
>>>> 
>>>> The explanation I have there is that the (newer) SMC needs the
>>>> application to read the 'no more bytes' or it will get confused. It
>>>> makes sense, if the number of bytes to read is no longer specified.
>>>> 
>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>> That might make sense.
>> 
>> It also points at the possibility of a smaller patch to test, but I
>> have not had the time to check this very deeply myself:
>> 
> I like this patch much more than the previous patch. Chris, can you test it ?

Yes. Building now. What kernel message should I be looking for? At least on 
2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with 
previous patch at least is not causing problems on them so far, which works 
well as I can test more on the 2008 model.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-02 Thread Chris Murphy

On Oct 2, 2013, at 2:59 PM, Guenter Roeck  wrote:

> On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
>> 
>> On Oct 2, 2013, at 12:02 PM, Guenter Roeck  wrote:
>> 
>>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" 
>>>>>>>>> message, so
>>>>>>>>> I think that is a pre-requisite.
>>>>>>>> 
>>>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>>>> commands produce more data than we actually read. This would
>>>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>>>> 
>>>>>>> 
>>>>>>> Good point.
>>>>>>> 
>>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in 
>>>>>>> this case ?
>>>>>> 
>>>>>> The explanation I have there is that the (newer) SMC needs the
>>>>>> application to read the 'no more bytes' or it will get confused. It
>>>>>> makes sense, if the number of bytes to read is no longer specified.
>>>>>> 
>>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem 
>>>>> ?
>>>>> That might make sense.
>>>> 
>>>> It also points at the possibility of a smaller patch to test, but I
>>>> have not had the time to check this very deeply myself:
>>>> 
>>> I like this patch much more than the previous patch. Chris, can you test it 
>>> ?
>> 
>> Yes. Building now. What kernel message should I be looking for? At least on 
>> 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel 
>> with previous patch at least is not causing problems on them so far, which 
>> works well as I can test more on the 2008 model.
>> 
> None, if I understand correctly and if the patch really fixes the root cause
> of the problem.

A vast majority of the Ooops I've had are when booting from flash media, 
testing Fedora installs. Is it possible the much slower kernel load and boot 
time is a trigger? If so, I'll look into modifying the media to accept the 
custom kernel and requisite fat initramfs.


Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-10-07 Thread Chris Murphy

On Oct 7, 2013, at 5:42 PM, Guenter Roeck  wrote:

> On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> 
>>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
>> From: Henrik Rydberg 
>> Date: Wed, 2 Oct 2013 19:15:03 +0200
>> Subject: [PATCH] applesmc remedy take 2
>> 
>> Conjectured problem: there are remnant bytes ready on the data line
>> which corrupts the read after a failure.
>> 
>> Remedy: assuming bit0 is the read valid line, try to flush it before
>> starting a new command.
>> 
>> Tests by Chris suggests reading the status is enough for the problem
>> to go away, which is consistent with a change in the SMC interface,
>> where the number of bytes to read is no longer specified, but found
>> out by reading until end of data.
>> 
>> Tested on a MacBookAir3,1, but the original problem has not been
>> reproduced.
> 
> So, what should we do with this patch ? Apply it ?

So far I'm getting nothing on the original machine. As of today it's applied as 
the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment 
I'm a bit too dense to figure out how to get a new kernel applied to an 
existing live package so I can try this on a USB stick. While maybe unrelated, 
the oops was occurring at least 4x as often booted from USB stick media than 
HDD.

Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-09-27 Thread Chris Murphy

On Sep 27, 2013, at 11:59 AM, Guenter Roeck  wrote:

> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck  wrote:
>> 
>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg  
>>>> wrote:
>>>>>>>> This suggests that initialization may be attempted more than once. The 
>>>>>>>> key cache
>>>>>>>> is allocated only once, but the number of keys is read for each 
>>>>>>>> attempt.
>>>>>>>> 
>>>>>>>> No idea if that can happen, but if the number of keys can increase 
>>>>>>>> after
>>>>>>>> the first initialization attempt you would have an explanation for the 
>>>>>>>> crash.
>>>>>>> 
>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>> 
>>>>>> Should we apply this patch even though it may not solve the specific 
>>>>>> problem ?
>>>>> 
>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>> it is at least run-tested.
>>>>> 
>>>>>> Again, not sure if the key count can change, but the current code is at 
>>>>>> the very
>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>> verifying the cache size.
>>>>> 
>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>> see any other logical explanation. There is of course always the
>>>>> possibility that the problem is somewhere else completely.
>>>>> 
>>>>> Proper patch attached.
>>>>> 
>>>>> Thanks,
>>>>> Henrik
>>>>> 
>>>>> ---
>>>>> 
>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>> From: Henrik Rydberg 
>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>> 
>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>> has not been verified to fix the crash, but should be applied
>>>>> regardless.
>>>>> 
>>>>> Reported-by: 
>>>>> Suggested-by: Guenter Roeck 
>>>>> Signed-off-by: Henrik Rydberg 
>>>>> ---
>>>>> drivers/hwmon/applesmc.c | 11 ++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>> 
>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>> 
>>> I sent a pull request to Linus, so you should be able to pull it from
>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>> solves the problem (or doesn't).
>> 
>> I'll start running it when it appears in koji. It's very transient, maybe 
>> one oops per week with lots of (other) testing. I'm not even sure if it 
>> happens on warm or cold boots or both.
>> 
> When you do, can you possibly trigger an event based on the warning added
> with the patch ? This might help us to identify if the problem fixed
> with the patch actually happens.

I don't understand the question. I'm uncertain how to trigger, and also what 
event.

Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-09-27 Thread Chris Murphy

On Sep 27, 2013, at 11:12 AM, Guenter Roeck  wrote:

> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg  wrote:
>> This suggests that initialization may be attempted more than once. The 
>> key cache
>> is allocated only once, but the number of keys is read for each attempt.
>> 
>> No idea if that can happen, but if the number of keys can increase after
>> the first initialization attempt you would have an explanation for the 
>> crash.
> 
> Good idea, and easy enough to test with the patch below.
> 
 Should we apply this patch even though it may not solve the specific 
 problem ?
>>> 
>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>> it is at least run-tested.
>>> 
 Again, not sure if the key count can change, but the current code is at 
 the very
 least inconsistent, as it keeps reading the key count without updating or
 verifying the cache size.
>>> 
>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>> see any other logical explanation. There is of course always the
>>> possibility that the problem is somewhere else completely.
>>> 
>>> Proper patch attached.
>>> 
>>> Thanks,
>>> Henrik
>>> 
>>> ---
>>> 
>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>> From: Henrik Rydberg 
>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>> 
>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>> Guenter pointed at the initialization problem fixed below. The patch
>>> has not been verified to fix the crash, but should be applied
>>> regardless.
>>> 
>>> Reported-by: 
>>> Suggested-by: Guenter Roeck 
>>> Signed-off-by: Henrik Rydberg 
>>> ---
>>> drivers/hwmon/applesmc.c | 11 ++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>> 
>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>> 
> I sent a pull request to Linus, so you should be able to pull it from
> the upstream kernel shortly. Would be great to get feedback if the patch
> solves the problem (or doesn't).

I'll start running it when it appears in koji. It's very transient, maybe one 
oops per week with lots of (other) testing. I'm not even sure if it happens on 
warm or cold boots or both.

Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-09-30 Thread Chris Murphy

On Sep 27, 2013, at 5:33 PM, Guenter Roeck  wrote:

> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck  wrote:
>> 
>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck  wrote:
>>>> 
>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg  
>>>>>> wrote:
>>>>>>>>>> This suggests that initialization may be attempted more than once. 
>>>>>>>>>> The key cache
>>>>>>>>>> is allocated only once, but the number of keys is read for each 
>>>>>>>>>> attempt.
>>>>>>>>>> 
>>>>>>>>>> No idea if that can happen, but if the number of keys can increase 
>>>>>>>>>> after
>>>>>>>>>> the first initialization attempt you would have an explanation for 
>>>>>>>>>> the crash.
>>>>>>>>> 
>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>> 
>>>>>>>> Should we apply this patch even though it may not solve the specific 
>>>>>>>> problem ?
>>>>>>> 
>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>> it is at least run-tested.
>>>>>>> 
>>>>>>>> Again, not sure if the key count can change, but the current code is 
>>>>>>>> at the very
>>>>>>>> least inconsistent, as it keeps reading the key count without updating 
>>>>>>>> or
>>>>>>>> verifying the cache size.
>>>>>>> 
>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>> see any other logical explanation. There is of course always the
>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>> 
>>>>>>> Proper patch attached.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Henrik
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>> From: Henrik Rydberg 
>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>> 
>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>> regardless.
>>>>>>> 
>>>>>>> Reported-by: 
>>>>>>> Suggested-by: Guenter Roeck 
>>>>>>> Signed-off-by: Henrik Rydberg 
>>>>>>> ---
>>>>>>> drivers/hwmon/applesmc.c | 11 ++-
>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>> 
>>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>> 
>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>> solves the problem (or doesn't).
>>>> 
>>>> I'll start running it when it appears in koji. It's very transient, maybe 
>>>> one oops per week with lots of (other) testing. I'm not even sure if it 
>>>> happens on warm or cold boots or both.
>>>> 
>>> When you do, can you possibly trigger an event based on the warning added
>>> with the patch ? This might help us to identify if the problem fixed
>>> with the patch actually happens.
>> 
>> I don't understand the question. I'm uncertain how to trigger, and also what 
>> event.
>> 
> 
> The patch includes a new warning message.
> 
>   pr_warn("key count changed from %d to %d\n",
>s->key_count, count);
> 
> It would be great if there would be a means to detect if this message is seen
> in a kernel log, because it would show that the potential crash condition
> fixed with the patch was actually encountered. This would help us to determine
> if we actually fixed the problem or not.
> 
> Of course, we'll know if is wasn't fixed if the system still crashes.

Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. 

[   10.886016] applesmc: key count changed from 261 to 1174405121

Attaching new full dmesg to the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11

Chris--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-09-19 Thread Chris Murphy

On Sep 19, 2012, at 12:52 PM, Nix wrote:

> So I have this x86-64 server running Linux 3.5.1 with a SATA-on-PCIe
> Areca 1210 hardware RAID-5 controller 

Did you find this? Same controller family. Weird that this just shows up now, 
but perhaps instead of it being "bad hardware" out the gate, something's 
happened to it and now it's failing as you suspect.

http://www.xtremesystems.org/forums/showthread.php?276187-Raid-Locks-Up


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-10-01 Thread Chris Murphy

On Oct 1, 2012, at 3:33 PM, Pierre Beck wrote:
> It's particularly annoying when in RAID and the disk could've simply been 
> kicked within few seconds. Something that needs improvement IMHO.

Except that while this helps with faster recovery, you're now degraded. You 
wouldn't want this "fast recovery" behavior if you're at your critical number 
of disks remaining or you lose the array upon a few seconds worth of subsequent 
problems. So we kinda need context specific behavior.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FIXED] Re: 5:11: in-kernel BTF is malformed

2021-02-04 Thread Chris Murphy
On Thu, Feb 4, 2021 at 9:33 AM Arnaldo Carvalho de Melo  wrote:
>
> So I think that for the problems related to building the kernel with gcc
> 11 in Fedora Rawhide using the default that is now DWARF5, pahole 1.20
> is good to go and I'll tag it now.

dwarves-1.20-1.fc34.x86_64
libdwarves1-1.20-1.fc34.x86_64

Fixes both "failed to validate module [?] BTF: -22" type errors,
and 'in-kernel BTF is malformed" with qemu-kvm and libvirt.

Is that expected? Or maybe the second issue was fixed by
gcc-11.0.0-0.18.fc34.x86_64 [(GCC) 11.0.0 20210130]? This is what I
get for changing more than one thing at once.

--
Chris Murphy


Re: small regression: hwmon: (applesmc) Check key count before proceeding - 5f4513864304672e6ea9eac60583eeac32e679f2

2013-11-24 Thread Chris Murphy

On Nov 24, 2013, at 9:57 AM, Henrik Rydberg  wrote:

> Hi Chris,
> 
>> Well, it seems to be a another one off event. It's the same hardware as
> before, and it was booting from a USB stick containing Fedora 20 final test
> candidate 2 which uses kernel 3.11.8. An immediate reboot did not reproduce 
> the
> problem, nor multiple subsequent reboots. I think I previously mentioned a
> preponderance of these events happen when booting from USB sticks.
> 
> Ok, thanks, that makes sense. So at least one problem is still there, but
> possibly very difficult to hit.
> 
>> It would be nice to have an identical model for this testing. I suspect most 
>> users wouldn't go to the trouble to report the occasional, seemingly one off,
> events like this. So unfortunately it's  uncertain if the hardware I have has 
> a
> unique problem, or if it's a model specific behavior.
> 
> I will keep this in mind next time I see this particular model. It is also
> possible that the SMC needs to be reset after having testing various more or
> less successful patches. Given the tiny cross-section, it looks like this one
> can rest for now.

This SMC has been reset within that past week, after a rest without its battery 
or power for about 2 weeks. So unless the SMC is particularly prone to 
corruption, I think the cause is elsewhere.

OS X always says this battery needs servicing, but doesn't elaborate. The 
manifestation is merely limited battery life, about 1 hour. All testing is 
always done with power connected.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: small regression: hwmon: (applesmc) Check key count before proceeding - 5f4513864304672e6ea9eac60583eeac32e679f2

2013-11-24 Thread Chris Murphy

On Nov 24, 2013, at 2:44 AM, Henrik Rydberg  wrote:

> Hi Michele,
> 
>> The issue Chris has seen in Fedora on one MacBookPro4,1
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1033414) is that this
>> machine returns a huge number from read_register_count() so now we will
>> try to allocate an insane amount of memory and we will barf:
>> [8.603053] applesmc: key count changed from 261 to 1392508929
> 
> I was under the impression that this machine was tested before, and that
> 
> commit 25f2bd7f5add608c1d1405938f39c96927b275ca
> Author: Henrik Rydberg 
> Date:   Wed Oct 2 19:15:03 2013 +0200
> 
>hwmon: (applesmc) Always read until end of data
> 
> resolved this problem? But if the kernel under test is 3.11.8, both patches 
> are
> already present... Chris, could you please sched some light on this before
> moving on?

Well, it seems to be a another one off event. It's the same hardware as before, 
and it was booting from a USB stick containing Fedora 20 final test candidate 2 
which uses kernel 3.11.8. An immediate reboot did not reproduce the problem, 
nor multiple subsequent reboots. I think I previously mentioned a preponderance 
of these events happen when booting from USB sticks.

It would be nice to have an identical model for this testing. I suspect most 
users wouldn't go to the trouble to report the occasional, seemingly one off, 
events like this. So unfortunately it's  uncertain if the hardware I have has a 
unique problem, or if it's a model specific behavior.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: btrfs "possible irq lock inversion dependency detected"

2014-02-17 Thread Chris Murphy

On Feb 17, 2014, at 1:09 PM, Tommi Rantala  wrote:

> Hello,
> 
> Saw this while fuzzing the kernel with Trinity.
> 
> Tommi
> 
> 
> [  396.136048] =
> [  396.136048] [ INFO: possible irq lock inversion dependency detected ]
> [  396.136048] 3.14.0-rc3 #1 Not tainted
> [  396.136048] -
> [  396.136048] kswapd0/1482 just changed the state of lock:
> [  396.136048]  (&delayed_node->mutex){+.+.-.}, at: [] 
> __btrfs_release_delayed_node+0x4b/0x1e0
> [  396.136048] but this lock took another, RECLAIM_FS-unsafe lock in the past:
> [  396.136048]  (&found->groups_sem){+.}

Looks like this is the same thing previously report on Btrfs list with 
3.14.0-rc1 here:
https://bugzilla.redhat.com/show_bug.cgi?id=1062439

Which points to this:
https://bugzilla.redhat.com/show_bug.cgi?id=1062833#c24

Which points to this patch:
http://marc.info/?l=linux-netdev&m=139233546723342&q=raw


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: btrfs balance 4.0 regression?

2015-05-14 Thread Chris Murphy
On Thu, May 14, 2015 at 6:33 PM, Omar Sandoval  wrote:
>
>
> Yup, Chris says he has a proper fix but it hasn't hit the list yet.
>
>
> Actually, ext4 convert is broken anyways (with irrelevant output
> elided):


I'm curious how this bug ended up in mainline. Isn't there an XFS test
for both balance+convert and ext4 convert? If not, shouldn't there be?
It's not a data loss bug but Btrfs is in a transitional stretch where
functionality loss bugs are no longer minor. (I'd look but I'm lazy
and xfs tests doesn't appear to be indexed.)


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel warning at fs/btrfs/inode.c:8693 btrfs_destroy_inode+0x1fa/0x2a0 [btrfs]()

2015-03-26 Thread Chris Murphy
On Thu, Mar 26, 2015 at 6:38 PM, Nikolaus Rath  wrote:

> I'm running 4.0-rc3, and I'm regularly getting these warnings in my
> kernel log:

> Mar 26 17:31:13 vostro kernel: [21480.088682] WARNING: CPU: 0 PID: 28958 at 
> fs/btrfs/inode.c:8693 btrfs_destroy_inode+0x1fa/0x2a0 [btrfs]()


It's known.
https://lkml.org/lkml/2015/3/7/41



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel warning at fs/btrfs/inode.c:8693 btrfs_destroy_inode+0x1fa/0x2a0 [btrfs]()

2015-03-26 Thread Chris Murphy
On Thu, Mar 26, 2015 at 9:39 PM, Nikolaus Rath  wrote:

> Thanks. Does this mean that I risk data corruption when using btrfs with
> 4.0-rc3, or is this relatively harmless?

I can't answer that. I'd say use 3.18.9 or 3.19.2 if you want reduced
risk of corruption, or use the current week's rc (which is rc5) if you
can accept a bit more risk for testing purposes.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


WARNING: at drivers/thunderbolt/switch.c:594 tb_switch_add+0x69b/0x780

2018-09-24 Thread Chris Murphy
This appears to be new in 4.19 rc5. But it doesn't happen every boot.
Details and attachments in the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=201227



[   35.502605] f29h.local kernel: pci_bus :02: Allocating resources
[   51.172922] f29h.local kernel: thunderbolt :03:00.0: timeout
reading config space 1 from 0x1
[   51.172925] f29h.local kernel: [ cut here ]
[   51.172927] f29h.local kernel: thunderbolt :03:00.0: 0:3: non
switch port without a PHY
[   51.172954] f29h.local kernel: WARNING: CPU: 2 PID: 2036 at
drivers/thunderbolt/switch.c:594 tb_switch_add+0x69b/0x780
[thunderbolt]
[   51.172955] f29h.local kernel: Modules linked in: hidp vfat fat
thunderbolt(+) rfcomm fuse ccm devlink nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc
arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp hp_wmi
iTCO_wdt kvm_intel iTCO_vendor_support sparse_keymap snd_soc_skl
wmi_bmof kvm intel_wmi_thunderbolt snd_soc_skl_ipc iwlmvm
snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core mac80211
[   51.172987] f29h.local kernel:  snd_hda_codec_hdmi snd_compress
irqbypass snd_hda_codec_conexant crct10dif_pclmul
snd_hda_codec_generic ac97_bus ghash_clmulni_intel snd_pcm_dmaengine
intel_cstate snd_hda_intel iwlwifi intel_uncore snd_hda_codec
intel_rapl_perf snd_hda_core uvcvideo snd_hwdep btusb snd_seq btrtl
snd_seq_device btbcm videobuf2_vmalloc btintel videobuf2_memops
videobuf2_v4l2 snd_pcm videobuf2_common cfg80211 bluetooth videodev
joydev snd_timer media snd i2c_i801 soundcore idma64 ecdh_generic
mei_me mei rfkill intel_pch_thermal intel_lpss_pci intel_lpss
processor_thermal_device intel_soc_dts_iosf wmi int3403_thermal
int340x_thermal_zone pinctrl_sunrisepoint pinctrl_intel
int3400_thermal hp_wireless acpi_thermal_rel acpi_pad pcc_cpufreq
crc32_generic crc32_pclmul f2fs dm_crypt btrfs libcrc32c xor
zstd_decompress
[   51.173016] f29h.local kernel:  zstd_compress xxhash i915 raid6_pq
i2c_algo_bit drm_kms_helper nvme crc32c_intel drm nvme_core serio_raw
video hid_apple lz4 lz4_compress
[   51.173025] f29h.local kernel: CPU: 2 PID: 2036 Comm: systemd-udevd
Not tainted 4.19.0-0.rc5.git0.1.fc30.x86_64 #1
[   51.173027] f29h.local kernel: Hardware name: HP HP Spectre
Notebook/81A0, BIOS F.41 06/15/2018
[   51.173032] f29h.local kernel: RIP: 0010:tb_switch_add+0x69b/0x780
[thunderbolt]
[   51.173034] f29h.local kernel: Code: 89 4c 24 08 44 89 44 24 04 e8
e1 96 35 d3 44 8b 44 24 04 4c 89 f2 48 8b 4c 24 08 48 89 c6 48 c7 c7
38 e7 29 c1 e8 cf 13 e2 d2 <0f> 0b e9 fc fb ff ff 41 8b 97 ec 02 00 00
41 8b 87 e8 02 00 00 8b
[   51.173035] f29h.local kernel: RSP: 0018:ba6904cb3a68 EFLAGS: 00010286
[   51.173037] f29h.local kernel: RAX:  RBX:
923a44b46098 RCX: 0006
[   51.173039] f29h.local kernel: RDX: 0007 RSI:
0082 RDI: 923ab6b16860
[   51.173040] f29h.local kernel: RBP: c129d68d R08:
0005 R09: 0007
[   51.173041] f29h.local kernel: R10:  R11:
959b116d R12: c129d695
[   51.173042] f29h.local kernel: R13: 0003 R14:
923a59d50c68 R15: 923a44b464f8
[   51.173044] f29h.local kernel: FS:  7f34253cd940()
GS:923ab6b0() knlGS:
[   51.173045] f29h.local kernel: CS:  0010 DS:  ES:  CR0:
80050033
[   51.173046] f29h.local kernel: CR2: 7f15b8039228 CR3:
0002316b4002 CR4: 003606e0
[   51.173048] f29h.local kernel: Call Trace:
[   51.173054] f29h.local kernel:  icm_start+0x59/0xa0 [thunderbolt]
[   51.173059] f29h.local kernel:  tb_domain_add+0xa6/0x140 [thunderbolt]
[   51.173064] f29h.local kernel:  nhi_probe+0x2be/0x560 [thunderbolt]
[   51.173068] f29h.local kernel:  local_pci_probe+0x41/0x90
[   51.173071] f29h.local kernel:  pci_device_probe+0x188/0x1a0
[   51.173074] f29h.local kernel:  really_probe+0x235/0x3a0
[   51.173076] f29h.local kernel:  driver_probe_device+0xb3/0xf0
[   51.173079] f29h.local kernel:  __driver_attach+0xdd/0x110
[   51.173081] f29h.local kernel:  ? driver_probe_device+0xf0/0xf0
[   51.173084] f29h.local kernel:  bus_for_each_dev+0x76/0xc0
[   51.173086] f29h.local kernel:  ? klist_add_tail+0x3b/0x60
[   51.173089] f29h.local kernel:  bus_add_driver+0x152/0x230
[   51.173090] f29h.local kernel:  ? 0xc12b
[   51.173093] f29h.local kernel:  driver_register+0x6b/0xb0
[   51.173094] f29h.local kernel:  ? 0xc12b
[   51.173099] f29h.local kernel:  nhi_init+0x2b/0x1000 [thunderbolt]
[   51.173103] f29h.local kernel:  do_one_ini

Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine

2016-08-02 Thread Chris Murphy
Yesterday I saw oom killer knocking off processes during a simple cp
-a from one Btrfs to another, in a VM, with kernel
4.8.0-0.rc0.git3.1.fc25.x86_64. The call trace looks different than
Markus' so I'm not sure it's the same problem. It's not always
reproducible. It hasn't happened on 4.7.0 though. According ot koji
this is Linux v4.7-6438-gc624c86

This is 'journalctl -o short-monotonic -b-4 -k'

https://drive.google.com/open?id=0B_2Asp8DGjJ9ZC1JSDJnaWpnSEE


Chris Murphy


Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine

2016-08-02 Thread Chris Murphy
On Tue, Aug 2, 2016 at 9:38 AM, Chris Murphy  wrote:
> Yesterday I saw oom killer knocking off processes during a simple cp
> -a from one Btrfs to another, in a VM, with kernel
> 4.8.0-0.rc0.git3.1.fc25.x86_64. The call trace looks different than
> Markus' so I'm not sure it's the same problem. It's not always
> reproducible. It hasn't happened on 4.7.0 though. According ot koji
> this is Linux v4.7-6438-gc624c86
>
> This is 'journalctl -o short-monotonic -b-4 -k'
>
> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZC1JSDJnaWpnSEE

Btrfs volume 2 is created at [  625.769736]  and mounted at [
640.119150], with the copy starting shortly after that. OOM happens at
[  856.212658] . There are a bunch of earlier bug messages, BUG:
sleeping function called from invalid context at mm/slab.h:393.


-- 
Chris Murphy


Re: confusing mountinfo output when bind-mounting files

2016-03-21 Thread Chris Murphy
On Mon, Mar 21, 2016 at 9:21 AM, Tycho Andersen
 wrote:
> Hi all,
>
> I'm seeing some strange behavior when bind mounting files from a btrfs
> subvolume. Consider the output below:
>
> root@criu2:/tmp# mount -o loop /tmp/tester.btrfs /tmp/dir1
> root@criu2:/tmp# touch dir1/file
> root@criu2:/tmp# sudo mount --bind dir1/file dir2/file
> root@criu2:/tmp# grep "/tmp/dir" /proc/self/mountinfo
> 128 24 0:45 / /tmp/dir1 rw,relatime shared:107 - btrfs /dev/loop0 
> rw,space_cache,subvolid=5,subvol=/
> 129 24 0:45 /file /tmp/dir2/file rw,relatime shared:107 - btrfs /dev/loop0 
> rw,space_cache,subvolid=5,subvol=/file
> root@criu2:/tmp# btrfs --version
> btrfs-progs v4.4
> root@criu2:/tmp# uname -a
> Linux criu2 4.4.0-8-generic #23-Ubuntu SMP Wed Feb 24 20:45:30 UTC 2016 
> x86_64 x86_64 x86_64 GNU/Linux
>
> The issue here is that the "subvol=" mount option for the target of the bind
> mount is "/file" when no such subvolume actually exists. Is this
> intended? It's confusing to say the least, but seems like a bug to me.

Since btrfs mount subvol= is a bind mount behind the scene, I'm
not sure the mount info code distinguishes between bind mounts.

At the moment, this is something of a secret decoder ring where if you
see subvolid=5 first, then anything after that other than / is just
not true (can't be). Hence probably why both subvolid and subvol are
listed for now; you kinda have to parse them both.



-- 
Chris Murphy