Re: linux-5.10.11 build failure

2021-02-01 Thread Chris Clayton
Hi Greg,

On 29/01/2021 15:14, Josh Poimboeuf wrote:
> On Fri, Jan 29, 2021 at 12:09:53PM +0100, Greg Kroah-Hartman wrote:
>> On Fri, Jan 29, 2021 at 11:03:26AM +0000, Chris Clayton wrote:
>>>
>>>
>>> On 29/01/2021 10:11, Greg Kroah-Hartman wrote:
>>>> On Thu, Jan 28, 2021 at 10:00:15AM -0600, Josh Poimboeuf wrote:
...
>>>>
>>>> It is in Linus's tree now :)
>>>>
>>>> Now grabbed.
>>>>
>>>
>>> Are you sure, Greg? I don't see the patch in Linus' tree at
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git. Nor do 
>>> is see it in your stable queue at
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/. 
>>> For clarity, I've attached the patch which
>>> fixes problem I reported and is currently sat in 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git As I
>>> understand it, the patch is scheduled to be included in a pull request to 
>>> Linus this weekend in time for -rc6.
>>>
>>> In fact, I did a pull from Linus' tree a few minutes ago and the build 
>>> failed in the way I reported in this thread. I
>>> added the patch and the build now succeeds.
>>
>> Ok, sorry, no, I grabbed 1d489151e9f9 ("objtool: Don't fail on missing
>> symbol table") which is what Josh asked me to take.  I got that confused
>> here.
> 
> I'm probably responsible for that confusion, I got mixed up myself.
> It'll be a good idea to take both anyway.
> 

The patch is now in Linus' tree at 5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae

Thanks.

Chris


Re: linux-5.10.11 build failure

2021-01-28 Thread Chris Clayton



On 28/01/2021 15:52, Josh Poimboeuf wrote:
> On Thu, Jan 28, 2021 at 11:24:47AM +, Thomas Backlund wrote:
>> Den 28.1.2021 kl. 12:05, skrev Chris Clayton:
>>>
>>> On 28/01/2021 09:34, Greg Kroah-Hartman wrote:
>>>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote:
>>>>> Hi,
>>>>>
>>>>> Building 5.10.11 fails on my (x86-64) laptop thusly:
>>>>>
>>>>> ..
>>>>>
>>>>>   AS  arch/x86/entry/thunk_64.o
>>>>>CC  arch/x86/entry/vsyscall/vsyscall_64.o
>>>>>AS  arch/x86/realmode/rm/header.o
>>>>>CC  arch/x86/mm/pat/set_memory.o
>>>>>CC  arch/x86/events/amd/core.o
>>>>>CC  arch/x86/kernel/fpu/init.o
>>>>>CC  arch/x86/entry/vdso/vma.o
>>>>>CC  kernel/sched/core.o
>>>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at 
>>>>> offset 0x3e
>>>>>
>>>>>AS  arch/x86/realmode/rm/trampoline_64.o
>>>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] 
>>>>> Error 255
>>>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
>>>>> make[2]: *** Waiting for unfinished jobs
>>>>>
>>>>> ..
>>>>>
>>>>> Compiler is latest snapshot of gcc-10.
>>>>>
>>>>> Happy to test the fix but please cc me as I'm not subscribed
>>>>
>>>> Can you do 'git bisect' to track down the offending commit?
>>>>
>>>
>>> Sure, but I'll hold that request for a while. I updated to binutils-2.36 on 
>>> Monday and I'm pretty sure that is a feature
>>> of this build fail. I've reverted binutils to 2.35.1, and the build 
>>> succeeds. Updated to 2.36 again and, surprise,
>>> surprise, the kernel build fails again.
>>>
>>> I've had a glance at the binutils ML and there are all sorts of issues 
>>> being reported, but it's beyond my knowledge to
>>> assess if this build error is related to any of them.
>>>
>>> I'll stick with binutils-2.35.1 for the time being.
>>>
>>>> And what exact gcc version are you using?
>>>>
>>>
>>>   It's built from the 10-20210123 snapshot tarball.
>>>
>>> I can report this to the binutils folks, but might it be better if the 
>>> objtool maintainer looks at it first? The
>>> binutils change might just have opened the gate to a bug in objtool.
>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>>
>>>
>>
>>
>> AFAIK you need this in stable trees:
>>
>>  From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001
>> From: Josh Poimboeuf 
>> Date: Thu, 14 Jan 2021 16:14:01 -0600
>> Subject: [PATCH] objtool: Don't fail on missing symbol table
> 
> Actually I think you need:
> 
>   5e6dca82bcaa ("x86/entry: Emit a symbol for register restoring thunk")
> 
> I submitted a patch to stable list a few days ago.
> 

Yes, that's what I concluded, Josh. 5.10.11 builds with that patch added but 
it's not in Linus's tree yet, so, as I
understand it, is not yet a candidate from stable.


> (Though it's possible you need both commits, I'm not sure if binutils
>  2.36 has the symbol stripping stuff)
> 


Re: linux-5.10.11 build failure

2021-01-28 Thread Chris Clayton



On 28/01/2021 14:41, Greg Kroah-Hartman wrote:
> On Thu, Jan 28, 2021 at 01:38:25PM +0000, Chris Clayton wrote:
>> Thanks, Thomas.
>>
>> On 28/01/2021 11:24, Thomas Backlund wrote:
>>> Den 28.1.2021 kl. 12:05, skrev Chris Clayton:
>>>>
>>>> On 28/01/2021 09:34, Greg Kroah-Hartman wrote:
>>>>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Building 5.10.11 fails on my (x86-64) laptop thusly:
>>>>>>
>>>>>> ..
>>>>>>
>>>>>>   AS  arch/x86/entry/thunk_64.o
>>>>>>CC  arch/x86/entry/vsyscall/vsyscall_64.o
>>>>>>AS  arch/x86/realmode/rm/header.o
>>>>>>CC  arch/x86/mm/pat/set_memory.o
>>>>>>CC  arch/x86/events/amd/core.o
>>>>>>CC  arch/x86/kernel/fpu/init.o
>>>>>>CC  arch/x86/entry/vdso/vma.o
>>>>>>CC  kernel/sched/core.o
>>>>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at 
>>>>>> offset 0x3e
>>>>>>
>>>>>>AS  arch/x86/realmode/rm/trampoline_64.o
>>>>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] 
>>>>>> Error 255
>>>>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
>>>>>> make[2]: *** Waiting for unfinished jobs
>>>>>>
>>>>>> ..
>>>>>>
>>>>>> Compiler is latest snapshot of gcc-10.
>>>>>>
>>>>>> Happy to test the fix but please cc me as I'm not subscribed
>>>>>
>>>>> Can you do 'git bisect' to track down the offending commit?
>>>>>
>>>>
>>>> Sure, but I'll hold that request for a while. I updated to binutils-2.36 
>>>> on Monday and I'm pretty sure that is a feature
>>>> of this build fail. I've reverted binutils to 2.35.1, and the build 
>>>> succeeds. Updated to 2.36 again and, surprise,
>>>> surprise, the kernel build fails again.
>>>>
>>>> I've had a glance at the binutils ML and there are all sorts of issues 
>>>> being reported, but it's beyond my knowledge to
>>>> assess if this build error is related to any of them.
>>>>
>>>> I'll stick with binutils-2.35.1 for the time being.
>>>>
>>>>> And what exact gcc version are you using?
>>>>>
>>>>
>>>>   It's built from the 10-20210123 snapshot tarball.
>>>>
>>>> I can report this to the binutils folks, but might it be better if the 
>>>> objtool maintainer looks at it first? The
>>>> binutils change might just have opened the gate to a bug in objtool.
>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>>
>>>>
>>>
>>>
>>> AFAIK you need this in stable trees:
>>>
>>>  From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001
>>> From: Josh Poimboeuf 
>>> Date: Thu, 14 Jan 2021 16:14:01 -0600
>>> Subject: [PATCH] objtool: Don't fail on missing symbol table
>>>
>>>
>>
>> That may be the caae, but it doesn't fix the build failure I've reported in 
>> this thread. However, as suggested by Tor,
>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae
>>  does fix it.
>>
>> That hasn't made Linus' tree yet and I don't see a pull request, but it is 
>> in linux-next so I guess it could make it in
>> -rc6.
> 
> Ok, thanks, so this is not a new regression for 5.10.y.
> 

That seems to be the case, Greg. Neither 5.10.10 nor 5.10.9 build either.

> greg k-h
> 


Re: linux-5.10.11 build failure

2021-01-28 Thread Chris Clayton
Thanks, Thomas.

On 28/01/2021 11:24, Thomas Backlund wrote:
> Den 28.1.2021 kl. 12:05, skrev Chris Clayton:
>>
>> On 28/01/2021 09:34, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote:
>>>> Hi,
>>>>
>>>> Building 5.10.11 fails on my (x86-64) laptop thusly:
>>>>
>>>> ..
>>>>
>>>>   AS  arch/x86/entry/thunk_64.o
>>>>CC  arch/x86/entry/vsyscall/vsyscall_64.o
>>>>AS  arch/x86/realmode/rm/header.o
>>>>CC  arch/x86/mm/pat/set_memory.o
>>>>CC  arch/x86/events/amd/core.o
>>>>CC  arch/x86/kernel/fpu/init.o
>>>>CC  arch/x86/entry/vdso/vma.o
>>>>CC  kernel/sched/core.o
>>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at 
>>>> offset 0x3e
>>>>
>>>>AS  arch/x86/realmode/rm/trampoline_64.o
>>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error 
>>>> 255
>>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
>>>> make[2]: *** Waiting for unfinished jobs
>>>>
>>>> ..
>>>>
>>>> Compiler is latest snapshot of gcc-10.
>>>>
>>>> Happy to test the fix but please cc me as I'm not subscribed
>>>
>>> Can you do 'git bisect' to track down the offending commit?
>>>
>>
>> Sure, but I'll hold that request for a while. I updated to binutils-2.36 on 
>> Monday and I'm pretty sure that is a feature
>> of this build fail. I've reverted binutils to 2.35.1, and the build 
>> succeeds. Updated to 2.36 again and, surprise,
>> surprise, the kernel build fails again.
>>
>> I've had a glance at the binutils ML and there are all sorts of issues being 
>> reported, but it's beyond my knowledge to
>> assess if this build error is related to any of them.
>>
>> I'll stick with binutils-2.35.1 for the time being.
>>
>>> And what exact gcc version are you using?
>>>
>>
>>   It's built from the 10-20210123 snapshot tarball.
>>
>> I can report this to the binutils folks, but might it be better if the 
>> objtool maintainer looks at it first? The
>> binutils change might just have opened the gate to a bug in objtool.
>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
> 
> 
> AFAIK you need this in stable trees:
> 
>  From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001
> From: Josh Poimboeuf 
> Date: Thu, 14 Jan 2021 16:14:01 -0600
> Subject: [PATCH] objtool: Don't fail on missing symbol table
> 
> 

That may be the caae, but it doesn't fix the build failure I've reported in 
this thread. However, as suggested by Tor,
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae
 does fix it.

That hasn't made Linus' tree yet and I don't see a pull request, but it is in 
linux-next so I guess it could make it in
-rc6.

Chris
> --
> Thomas
> 


Re: linux-5.10.11 build failure

2021-01-28 Thread Chris Clayton


On 28/01/2021 09:34, Greg Kroah-Hartman wrote:
> On Thu, Jan 28, 2021 at 09:17:10AM +0000, Chris Clayton wrote:
>> Hi,
>>
>> Building 5.10.11 fails on my (x86-64) laptop thusly:
>>
>> ..
>>
>>  AS  arch/x86/entry/thunk_64.o
>>   CC  arch/x86/entry/vsyscall/vsyscall_64.o
>>   AS  arch/x86/realmode/rm/header.o
>>   CC  arch/x86/mm/pat/set_memory.o
>>   CC  arch/x86/events/amd/core.o
>>   CC  arch/x86/kernel/fpu/init.o
>>   CC  arch/x86/entry/vdso/vma.o
>>   CC  kernel/sched/core.o
>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at 
>> offset 0x3e
>>
>>   AS  arch/x86/realmode/rm/trampoline_64.o
>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error 
>> 255
>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
>> make[2]: *** Waiting for unfinished jobs
>>
>> ..
>>
>> Compiler is latest snapshot of gcc-10.
>>
>> Happy to test the fix but please cc me as I'm not subscribed
> 
> Can you do 'git bisect' to track down the offending commit?
> 

Sure, but I'll hold that request for a while. I updated to binutils-2.36 on 
Monday and I'm pretty sure that is a feature
of this build fail. I've reverted binutils to 2.35.1, and the build succeeds. 
Updated to 2.36 again and, surprise,
surprise, the kernel build fails again.

I've had a glance at the binutils ML and there are all sorts of issues being 
reported, but it's beyond my knowledge to
assess if this build error is related to any of them.

I'll stick with binutils-2.35.1 for the time being.

> And what exact gcc version are you using?
>

 It's built from the 10-20210123 snapshot tarball.

I can report this to the binutils folks, but might it be better if the objtool 
maintainer looks at it first? The
binutils change might just have opened the gate to a bug in objtool.

> thanks,
> 
> greg k-h
> 

Thanks.

Chris


linux-5.10.11 build failure

2021-01-28 Thread Chris Clayton
Hi,

Building 5.10.11 fails on my (x86-64) laptop thusly:

..

 AS  arch/x86/entry/thunk_64.o
  CC  arch/x86/entry/vsyscall/vsyscall_64.o
  AS  arch/x86/realmode/rm/header.o
  CC  arch/x86/mm/pat/set_memory.o
  CC  arch/x86/events/amd/core.o
  CC  arch/x86/kernel/fpu/init.o
  CC  arch/x86/entry/vdso/vma.o
  CC  kernel/sched/core.o
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at offset 
0x3e

  AS  arch/x86/realmode/rm/trampoline_64.o
make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error 255
make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
make[2]: *** Waiting for unfinished jobs

..

Compiler is latest snapshot of gcc-10.

Happy to test the fix but please cc me as I'm not subscribed


Thanks,

Chris


Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

2020-10-15 Thread Chris Clayton
Hi Greg,

On 18/09/2020 15:35, Chris Clayton wrote:
> Mmm, gmail on android seems to have snuck some html into my reply, so here 
> goes again...
> 
> On 14/09/2020 16:58, Greg KH wrote:
>> On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote:
>>> Hi Greg and Arnd,
>>>
>>> On 24/08/2020 04:00, ricky...@realtek.com wrote:
>>>> From: Ricky Wu 
>>>>
>>>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform
>>>> missing card reader
>>>>
>>>
>>> In his changelog above, Ricky didn't mention that this patch fixes a 
>>> regression that was introduced (in 5.1) by commit
>>> bede03a579b3.
>>>
>>> The patch that I posted to LKML contained the appropriate Fixes, etc tags. 
>>> After discussion, the patch was changed to
>>> remove the code that effectively disables the RTS5229 cardreader on (at 
>>> least some) Intel NUC boxes. I prepared the
>>> patch that Ricky submitted but he didn't include my Signed-off-by or the 
>>> Fixes tag. I think the following needs to be
>>> added to the changelog.
>>>
>>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
>>> rts5260")
>>> Link: https://marc.info/?l=linux-kernel&m=159105912832257
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
>>> Signed-off-by: Chris Clayton 
>>>
>>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card 
>>> reader on the Intel NUC6CAYH box.
>>>
>>> My main point, however, is that the patch is also needed in the 5.4 
>>> (longterm) and 5.8 (stable) series kernels.
>>
>> It's too late to change the commit log now that it is in my tree, but
>> once it hits Linus's tree for 5.9-rc1, I can backport it to those stable
>> trees if someone reminds me :)
>>

This is the reminder you suggested. The patch is now in Linus's tree and the 
commit id is
551b6729578a8981c46af964c10bf7d5d9ddca83.

Chris
> 
> Thanks, Greg. I'll send the reminder.
> 
> Chris
>> thanks,
>>
>> greg k-h
>>


Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

2020-09-18 Thread Chris Clayton
Mmm, gmail on android seems to have snuck some html into my reply, so here goes 
again...

On 14/09/2020 16:58, Greg KH wrote:
> On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote:
>> Hi Greg and Arnd,
>>
>> On 24/08/2020 04:00, ricky...@realtek.com wrote:
>>> From: Ricky Wu 
>>>
>>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform
>>> missing card reader
>>>
>>
>> In his changelog above, Ricky didn't mention that this patch fixes a 
>> regression that was introduced (in 5.1) by commit
>> bede03a579b3.
>>
>> The patch that I posted to LKML contained the appropriate Fixes, etc tags. 
>> After discussion, the patch was changed to
>> remove the code that effectively disables the RTS5229 cardreader on (at 
>> least some) Intel NUC boxes. I prepared the
>> patch that Ricky submitted but he didn't include my Signed-off-by or the 
>> Fixes tag. I think the following needs to be
>> added to the changelog.
>>
>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
>> rts5260")
>> Link: https://marc.info/?l=linux-kernel&m=159105912832257
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
>> Signed-off-by: Chris Clayton 
>>
>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card 
>> reader on the Intel NUC6CAYH box.
>>
>> My main point, however, is that the patch is also needed in the 5.4 
>> (longterm) and 5.8 (stable) series kernels.
> 
> It's too late to change the commit log now that it is in my tree, but
> once it hits Linus's tree for 5.9-rc1, I can backport it to those stable
> trees if someone reminds me :)
> 

Thanks, Greg. I'll send the reminder.

Chris
> thanks,
> 
> greg k-h
> 


Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

2020-09-14 Thread Chris Clayton
Thanks Bjorn.

On 13/09/2020 17:49, Bjorn Helgaas wrote:
> On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote:
>> Hi Greg and Arnd,
>>
>> On 24/08/2020 04:00, ricky...@realtek.com wrote:
>>> From: Ricky Wu 
>>>
>>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform
>>> missing card reader
>>
>> In his changelog above, Ricky didn't mention that this patch fixes a
>> regression that was introduced (in 5.1) by commit bede03a579b3.
>>
>> The patch that I posted to LKML contained the appropriate Fixes, etc
>> tags. After discussion, the patch was changed to remove the code
>> that effectively disables the RTS5229 cardreader on (at least some)
>> Intel NUC boxes. I prepared the patch that Ricky submitted but he
>> didn't include my Signed-off-by or the Fixes tag. I think the
>> following needs to be added to the changelog.
>>
>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
>> rts5260")
>> Link: https://marc.info/?l=linux-kernel&m=159105912832257
> 
> Better lore link:
> 
>   Link: 
> https://lore.kernel.org/r/CACYmiSer8FA+qjh8NHZJ2maxSd-=rwddz2f7_-e4um1nxuz...@mail.gmail.com/
> 
> But I'm not sure the above is the most relevant.  Seems like the one
> below is more to the point since it has the exact patch below and is
> part of a thread developing it:
> 
>   Link: 
> https://lore.kernel.org/r/20da8b4b-8426-9568-c0f1-4d1c2006c...@googlemail.com/
> 

Yes, I meant to change the quote to that thread but ... more haste less speed.

>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
>> Signed-off-by: Chris Clayton 
>>
>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express
>> card reader on the Intel NUC6CAYH box.
>>
>> My main point, however, is that the patch is also needed in the 5.4
>> (longterm) and 5.8 (stable) series kernels.
> 
> This would be accomplished by:
> 
> Cc: sta...@vger.kernel.org# v5.1+
> 

Thanks for the tip.

I'm about to set off on a 4-day journey, so I'll send an updated patch when I 
return on Friday,


>>> Signed-off-by: Ricky Wu 
>>> ---
>>>  drivers/misc/cardreader/rtsx_pcr.c | 4 
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/misc/cardreader/rtsx_pcr.c 
>>> b/drivers/misc/cardreader/rtsx_pcr.c
>>> index 37ccc67f4914..3a4a7b0cc098 100644
>>> --- a/drivers/misc/cardreader/rtsx_pcr.c
>>> +++ b/drivers/misc/cardreader/rtsx_pcr.c
>>> @@ -1155,10 +1155,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *pcr)
>>> rtsx_pci_write_register(pcr, REG_OCPGLITCH,
>>> SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch);
>>> rtsx_pci_enable_ocp(pcr);
>>> -   } else {
>>> -   /* OC power down */
>>> -   rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
>>> -   OC_POWER_DOWN);
>>> }
>>> }
>>>  }
>>>


Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

2020-09-13 Thread Chris Clayton
Hi Greg and Arnd,

On 24/08/2020 04:00, ricky...@realtek.com wrote:
> From: Ricky Wu 
> 
> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform
> missing card reader
> 

In his changelog above, Ricky didn't mention that this patch fixes a regression 
that was introduced (in 5.1) by commit
bede03a579b3.

The patch that I posted to LKML contained the appropriate Fixes, etc tags. 
After discussion, the patch was changed to
remove the code that effectively disables the RTS5229 cardreader on (at least 
some) Intel NUC boxes. I prepared the
patch that Ricky submitted but he didn't include my Signed-off-by or the Fixes 
tag. I think the following needs to be
added to the changelog.

Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
rts5260")
Link: https://marc.info/?l=linux-kernel&m=159105912832257
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
Signed-off-by: Chris Clayton 

bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader 
on the Intel NUC6CAYH box.

My main point, however, is that the patch is also needed in the 5.4 (longterm) 
and 5.8 (stable) series kernels.

Thanks.


> Signed-off-by: Ricky Wu 
> ---
>  drivers/misc/cardreader/rtsx_pcr.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/misc/cardreader/rtsx_pcr.c 
> b/drivers/misc/cardreader/rtsx_pcr.c
> index 37ccc67f4914..3a4a7b0cc098 100644
> --- a/drivers/misc/cardreader/rtsx_pcr.c
> +++ b/drivers/misc/cardreader/rtsx_pcr.c
> @@ -1155,10 +1155,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *pcr)
>   rtsx_pci_write_register(pcr, REG_OCPGLITCH,
>   SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch);
>   rtsx_pci_enable_ocp(pcr);
> - } else {
> - /* OC power down */
> - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
> - OC_POWER_DOWN);
>   }
>   }
>  }
> 


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-22 Thread Chris Clayton



Hi Ricky.

On 05/08/2020 13:48, Chris Clayton wrote:
> Hi Ricky

>>
>> Ah, OK. I'll prepare the patch and send it to you once I've tested it.
>>
> 
> Please see the patch included below. As you suggested, it removes the code 
> that does the OC power down on card readers
> that are not members of your A series. The patch is against a fresh pull of 
> Linus's tree this morning
> (v5.8-2768-g4da9f3302615).
> 
> I've tested the resultant kernel on my Intel NUC6CAYH box (which contains an 
> NUC66AYB board) and the rts5229 works fine.
> I've also tested it on my laptop which also has a card reader supported by 
> the rtsx_pci driver (an RTL8411B) and that
> also works fine.
> 
> As I mentioned yesterday, I think it's a candidate for the 5.4 ans 5.7 stable 
> series.
> 
> Thanks for your patience!
> 
> Chris
> 
> Signed-off-by: Chris Clayton 
> 
> --- a/drivers/misc/cardreader/rtsx_pcr.c2020-08-05 07:10:21.752072515 
> +0100
> +++ b/drivers/misc/cardreader/rtsx_pcr.c2020-08-05 07:11:05.568074278 
> +0100
> @@ -1172,10 +1172,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *
> rtsx_pci_write_register(pcr, REG_OCPGLITCH,
> SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch);
> rtsx_pci_enable_ocp(pcr);
> -   } else {
> -   /* OC power down */
> -   rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
> -   OC_POWER_DOWN);
> }
> }
>  }
> 
> 

Is there some problem with my patch? If you are too busy to deal with it, 
perhaps I can submit directly it to Greg/Arnd.

Thanks

Chris


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-05 Thread Chris Clayton
Hi Ricky

On 05/08/2020 06:51, Chris Clayton wrote:
> Thanks, Ricky.
> 
> On 05/08/2020 03:35, 吳昊澄 Ricky wrote:
>>
>>
>>> -Original Message-
>>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>>> Sent: Tuesday, August 04, 2020 7:52 PM
>>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org
>>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann
>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader 
>>> on
>>> Intel NUC boxes
>>>
>>>
>>>
>>> On 04/08/2020 11:46, 吳昊澄 Ricky wrote:
>>>>> -Original Message-
>>>>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>>>>> Sent: Tuesday, August 04, 2020 4:51 PM
>>>>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org
>>>>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann
>>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card 
>>>>> reader on
>>>>> Intel NUC boxes
>>>>>
>>>>>
>>>>>
>>>>> On 04/08/2020 09:08, 吳昊澄 Ricky wrote:
>>>>>>> -Original Message-
>>>>>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
>>>>>>> Sent: Tuesday, August 04, 2020 3:49 PM
>>>>>>> To: 吳昊澄 Ricky
>>>>>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com;
>>>>> Arnd
>>>>>>> Bergmann
>>>>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card 
>>>>>>> reader
>>> on
>>>>>>> Intel NUC boxes
>>>>>>>
>>>>>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN,
>>>>> OC_POWER_DOWN);
>>>>>>>> This register operation saved power under 1mA, so if do not care the 
>>>>>>>> 1mA
>>>>>>> power we can have a patch to remove it, make compatible with NUC6
>>>>>>>> We tested others our card reader that remove it, we did not see any 
>>>>>>>> side
>>>>> effect
>>>>>>>>
>>>>>>>> Hi Greg k-h,
>>>>>>>>
>>>>>>>> Do you have any comments?
>>>>>>>
>>>>>>> comments on what?  I don't know what you are responding to here, sorry.
>>>>>>>
>>>>>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but
>>> it
>>>>> will have more compatibility
>>>>>>
>>>>>
>>>>> Ricky,
>>>>>
>>>>> I don't understand why you want to completely remove the code that sets up
>>> the
>>>>> 1mA power saving. That code was there
>>>>> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I
>>>>> assume it benefits some of the Realtek card
>>>>> readers. Before your patch however, rtsx_pci_init_ocp() was not called 
>>>>> for the
>>>>> rts5229 reader, but the patch introduced
>>>>> an unconditional call to that function into rtsx_pci_init_hw(), which is 
>>>>> run for
>>> the
>>>>> rts5229. That is what now disables
>>>>> the card reader.
>>>>>
>>>>> Now, I don't know whether other cards are affected, although I don't 
>>>>> recall
>>>>> seeing any reported as I searched the kernel
>>>>> and ubuntu bugzillas for any analysis of the problem. I know this is not 
>>>>> what
>>> the
>>>>> patch I sent does, but having thought
>>>>> about it more, seems to me that the simplest fix is to skip the new call 
>>>>> to
>>>>> rtsx_pci_init_ocp() if the reader is an rts5229.
>>>>>
>>>>
>>>> Because we are thinking about if others our card reader that not belong A
>>> series(my ocp patch coverage) also on NUC6 platform maybe have the same
>>> problem...
>>>>
>>>
>>> OK. What if we do make the new call but only for the card readers that are 
>>> in the
>>> A series? Are they the

Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-04 Thread Chris Clayton
Thanks, Ricky.

On 05/08/2020 03:35, 吳昊澄 Ricky wrote:
> 
> 
>> -Original Message-----
>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>> Sent: Tuesday, August 04, 2020 7:52 PM
>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org
>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann
>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader 
>> on
>> Intel NUC boxes
>>
>>
>>
>> On 04/08/2020 11:46, 吳昊澄 Ricky wrote:
>>>> -Original Message-
>>>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>>>> Sent: Tuesday, August 04, 2020 4:51 PM
>>>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org
>>>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann
>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card 
>>>> reader on
>>>> Intel NUC boxes
>>>>
>>>>
>>>>
>>>> On 04/08/2020 09:08, 吳昊澄 Ricky wrote:
>>>>>> -Original Message-
>>>>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
>>>>>> Sent: Tuesday, August 04, 2020 3:49 PM
>>>>>> To: 吳昊澄 Ricky
>>>>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com;
>>>> Arnd
>>>>>> Bergmann
>>>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card 
>>>>>> reader
>> on
>>>>>> Intel NUC boxes
>>>>>>
>>>>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN,
>>>> OC_POWER_DOWN);
>>>>>>> This register operation saved power under 1mA, so if do not care the 1mA
>>>>>> power we can have a patch to remove it, make compatible with NUC6
>>>>>>> We tested others our card reader that remove it, we did not see any side
>>>> effect
>>>>>>>
>>>>>>> Hi Greg k-h,
>>>>>>>
>>>>>>> Do you have any comments?
>>>>>>
>>>>>> comments on what?  I don't know what you are responding to here, sorry.
>>>>>>
>>>>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but
>> it
>>>> will have more compatibility
>>>>>
>>>>
>>>> Ricky,
>>>>
>>>> I don't understand why you want to completely remove the code that sets up
>> the
>>>> 1mA power saving. That code was there
>>>> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I
>>>> assume it benefits some of the Realtek card
>>>> readers. Before your patch however, rtsx_pci_init_ocp() was not called for 
>>>> the
>>>> rts5229 reader, but the patch introduced
>>>> an unconditional call to that function into rtsx_pci_init_hw(), which is 
>>>> run for
>> the
>>>> rts5229. That is what now disables
>>>> the card reader.
>>>>
>>>> Now, I don't know whether other cards are affected, although I don't recall
>>>> seeing any reported as I searched the kernel
>>>> and ubuntu bugzillas for any analysis of the problem. I know this is not 
>>>> what
>> the
>>>> patch I sent does, but having thought
>>>> about it more, seems to me that the simplest fix is to skip the new call to
>>>> rtsx_pci_init_ocp() if the reader is an rts5229.
>>>>
>>>
>>> Because we are thinking about if others our card reader that not belong A
>> series(my ocp patch coverage) also on NUC6 platform maybe have the same
>> problem...
>>>
>>
>> OK. What if we do make the new call but only for the card readers that are 
>> in the
>> A series? Are they the ones that have
>> PID_5nnn defines in include/linux/rtsx_pci.h? Or is there another simple way 
>> of
>> identifying that a reader is a member of
>> the A series?
>>
>> I'm thinking of something like:
>> static bool rtsx_pci_is_series_A(pcr)
>> {
>>  unsigned short device = pcr->pci->device;
>>
>>  return device == PID524A || device == PID_5249 || device == PID_5250 ||
>> device == PID_525A
>>  || device == PID_525A || device == PID_5260 || device ==
>> PID_5261;
>> }
>>
>> then in rtsx_pci_init_hw() change the unconditional call to:
>>
>>  if rtsx_pci_is_series_A(pcr)
>>  rtsx_pci_init_ocp();
>>
>> Does that seem OK?
>>
> Previously, I want to remove
> else {
>   /* OC power down */
>   rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
>   OC_POWER_DOWN);
> }
> Because in our A-series card Reader we already assigned option->ocp_en to 1 
> in self init_params() , this is an easy way to fix this problem
> 

Ah, OK. I'll prepare the patch and send it to you once I've tested it.

Chris


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-04 Thread Chris Clayton



On 04/08/2020 11:46, 吳昊澄 Ricky wrote:
>> -Original Message-
>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>> Sent: Tuesday, August 04, 2020 4:51 PM
>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org
>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann
>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader 
>> on
>> Intel NUC boxes
>>
>>
>>
>> On 04/08/2020 09:08, 吳昊澄 Ricky wrote:
>>>> -Original Message-
>>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
>>>> Sent: Tuesday, August 04, 2020 3:49 PM
>>>> To: 吳昊澄 Ricky
>>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com;
>> Arnd
>>>> Bergmann
>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card 
>>>> reader on
>>>> Intel NUC boxes
>>>>
>>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote:
>>>>> Hi Chris,
>>>>>
>>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN,
>> OC_POWER_DOWN);
>>>>> This register operation saved power under 1mA, so if do not care the 1mA
>>>> power we can have a patch to remove it, make compatible with NUC6
>>>>> We tested others our card reader that remove it, we did not see any side
>> effect
>>>>>
>>>>> Hi Greg k-h,
>>>>>
>>>>> Do you have any comments?
>>>>
>>>> comments on what?  I don't know what you are responding to here, sorry.
>>>>
>>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but it
>> will have more compatibility
>>>
>>
>> Ricky,
>>
>> I don't understand why you want to completely remove the code that sets up 
>> the
>> 1mA power saving. That code was there
>> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I
>> assume it benefits some of the Realtek card
>> readers. Before your patch however, rtsx_pci_init_ocp() was not called for 
>> the
>> rts5229 reader, but the patch introduced
>> an unconditional call to that function into rtsx_pci_init_hw(), which is run 
>> for the
>> rts5229. That is what now disables
>> the card reader.
>>
>> Now, I don't know whether other cards are affected, although I don't recall
>> seeing any reported as I searched the kernel
>> and ubuntu bugzillas for any analysis of the problem. I know this is not 
>> what the
>> patch I sent does, but having thought
>> about it more, seems to me that the simplest fix is to skip the new call to
>> rtsx_pci_init_ocp() if the reader is an rts5229.
>>
> 
> Because we are thinking about if others our card reader that not belong A 
> series(my ocp patch coverage) also on NUC6 platform maybe have the same 
> problem... 
>  

OK. What if we do make the new call but only for the card readers that are in 
the A series? Are they the ones that have
PID_5nnn defines in include/linux/rtsx_pci.h? Or is there another simple way of 
identifying that a reader is a member of
the A series?

I'm thinking of something like:
static bool rtsx_pci_is_series_A(pcr)
{
unsigned short device = pcr->pci->device;

return device == PID524A || device == PID_5249 || device == PID_5250 || 
device == PID_525A
|| device == PID_525A || device == PID_5260 || device 
== PID_5261;
}

then in rtsx_pci_init_hw() change the unconditional call to:

if rtsx_pci_is_series_A(pcr)
rtsx_pci_init_ocp();

Does that seem OK?

>> If you agree, I can prepare a patch and send it to you. Whatever the 
>> solution is, it
>> will also be needed in the stable
>> kernels later than 5.0.
>>
> 
> OK, I agree your opinion, for now can only patch rts5229 first make NUC6 user 
> can work well
> 
> Thank you 
> 
>> Chris
>>>> greg k-h
>>>>
>>>> --Please consider the environment before printing this e-mail.


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-04 Thread Chris Clayton



On 04/08/2020 09:08, 吳昊澄 Ricky wrote:
>> -Original Message-
>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
>> Sent: Tuesday, August 04, 2020 3:49 PM
>> To: 吳昊澄 Ricky
>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd
>> Bergmann
>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader 
>> on
>> Intel NUC boxes
>>
>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote:
>>> Hi Chris,
>>>
>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN, OC_POWER_DOWN);
>>> This register operation saved power under 1mA, so if do not care the 1mA
>> power we can have a patch to remove it, make compatible with NUC6
>>> We tested others our card reader that remove it, we did not see any side 
>>> effect
>>>
>>> Hi Greg k-h,
>>>
>>> Do you have any comments?
>>
>> comments on what?  I don't know what you are responding to here, sorry.
>>
> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but it 
> will have more compatibility
> 

Ricky,

I don't understand why you want to completely remove the code that sets up the 
1mA power saving. That code was there
even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I assume 
it benefits some of the Realtek card
readers. Before your patch however, rtsx_pci_init_ocp() was not called for the 
rts5229 reader, but the patch introduced
an unconditional call to that function into rtsx_pci_init_hw(), which is run 
for the rts5229. That is what now disables
the card reader.

Now, I don't know whether other cards are affected, although I don't recall 
seeing any reported as I searched the kernel
and ubuntu bugzillas for any analysis of the problem. I know this is not what 
the patch I sent does, but having thought
about it more, seems to me that the simplest fix is to skip the new call to 
rtsx_pci_init_ocp() if the reader is an rts5229.

If you agree, I can prepare a patch and send it to you. Whatever the solution 
is, it will also be needed in the stable
kernels later than 5.0.

Chris
>> greg k-h
>>
>> --Please consider the environment before printing this e-mail.


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-02 Thread Chris Clayton
Hi, Ricky

On 03/08/2020 04:01, 吳昊澄 Ricky wrote:
> Hi Chris,
> 
> We don’t think this is our bug...
> This register(FPDCTL) write to OC_POWER_DOWN is for our power saving feature, 
> not to disable the reader
> In your case, we cannot reproduce this on our side that we mention before, we 
> don’t have the platform(Intel NUC Tall Arches Canyon NUC6CAYH Celeron J345) 
> to see this issue
> But we think this issue maybe only on this platform, our RTS5229 works well 
> on the new kernel all platform that we have  
> 
> Ricky

Perhaps I should have used the word regression rather than bug. I didn't buy 
the machine until earlier this year, but
other people who have reported this problem have indicated that until 
bede03a579b3 was applied (during the 5.1 merge
window), the driver supported the card reader on this on the Intel NUC boxes. I 
know that is true because I built and
tested a 5.0 kernel and the card reader worked fine. I've also built and tested 
an 5.1-rc1 kernel and the card reader no
longer works. Whether by design or by accident, the card reader worked and now 
it doesn't. That's a regression in my book!

Since you signed off the patch that caused the regression, I believe it is your 
bug.

Thanks.

Chris
> 
>> -Original Message-
>> From: Chris Clayton [mailto:chris2...@googlemail.com]
>> Sent: Monday, August 03, 2020 3:59 AM
>> To: LKML; 吳昊澄 Ricky; gre...@linuxfoundation.org; rdun...@infradead.org;
>> philqua...@gmail.com; Arnd Bergmann
>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader 
>> on
>> Intel NUC boxes
>>
>> Sorry, I should have said that the patch is against 5.7.12. It applies to 
>> upstream,
>> but with offsets.
>>
>> On 02/08/2020 20:48, Chris Clayton wrote:
>>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card
>> reader on my Intel NUC6CAYH box.
>>>
>>> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to 
>>> rtsx_pci_init_ocp()
>> was added to rtsx_pci_init_hw().
>>> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, 
>>> so in
>> rtsx_pci_init_ocp() the cardreader
>>> gets disabled.
>>>
>>> I've avoided this by making excution code that results in the reader being
>> disabled conditional on the device
>>> not being an RTS5229. Of course, other rtsxxx card readers may also be
>> disabled by this bug. I don't have the
>>> knowledge to address that, so I'll leave to the driver maintainers.
>>>
>>> The patch to avoid the bug is attached.
>>>
>>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a
>> rts5260")
>>> Link: https://marc.info/?l=linux-kernel&m=159105912832257
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
>>> Signed-off-by: Chris Clayton 
>>>
>>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card
>> reader on my Intel NUC6CAYH box.
>>>
>>> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to 
>>> rtsx_pci_init_ocp()
>> was added to rtsx_pci_init_hw().
>>> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, 
>>> so in
>> rtsx_pci_init_ocp() the cardreader
>>> gets disabled.
>>>
>>> I've avoided this by making excution code that results in the reader being
>> disabled conditional on the device
>>> not being an RTS5229. Of course, other rtsxxx card readers may also be
>> disabled by this bug. I don't have the
>>> knowledge to address that, so I'll leave to the driver maintainers.
>>>
>>> The patch to avoid the bug is attached.
>>>
>>> Chris
>>>
>>
>> --Please consider the environment before printing this e-mail.


Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-02 Thread Chris Clayton
Sorry, I should have said that the patch is against 5.7.12. It applies to 
upstream, but with offsets.

On 02/08/2020 20:48, Chris Clayton wrote:
> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card 
> reader on my Intel NUC6CAYH box.
> 
> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to 
> rtsx_pci_init_ocp() was added to rtsx_pci_init_hw().
> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so 
> in rtsx_pci_init_ocp() the cardreader
> gets disabled.
> 
> I've avoided this by making excution code that results in the reader being 
> disabled conditional on the device
> not being an RTS5229. Of course, other rtsxxx card readers may also be 
> disabled by this bug. I don't have the
> knowledge to address that, so I'll leave to the driver maintainers.
> 
> The patch to avoid the bug is attached.
> 
> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
> rts5260")
> Link: https://marc.info/?l=linux-kernel&m=159105912832257
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
> Signed-off-by: Chris Clayton 
> 
> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card 
> reader on my Intel NUC6CAYH box.
> 
> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to 
> rtsx_pci_init_ocp() was added to rtsx_pci_init_hw().
> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so 
> in rtsx_pci_init_ocp() the cardreader
> gets disabled.
> 
> I've avoided this by making excution code that results in the reader being 
> disabled conditional on the device
> not being an RTS5229. Of course, other rtsxxx card readers may also be 
> disabled by this bug. I don't have the
> knowledge to address that, so I'll leave to the driver maintainers.
> 
> The patch to avoid the bug is attached.
> 
> Chris
> 


PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes

2020-08-02 Thread Chris Clayton
bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader 
on my Intel NUC6CAYH box.

The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to rtsx_pci_init_ocp() 
was added to rtsx_pci_init_hw().
At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so 
in rtsx_pci_init_ocp() the cardreader
gets disabled.

I've avoided this by making excution code that results in the reader being 
disabled conditional on the device
not being an RTS5229. Of course, other rtsxxx card readers may also be disabled 
by this bug. I don't have the
knowledge to address that, so I'll leave to the driver maintainers.

The patch to avoid the bug is attached.

Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a 
rts5260")
Link: https://marc.info/?l=linux-kernel&m=159105912832257
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003
Signed-off-by: Chris Clayton 

bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader 
on my Intel NUC6CAYH box.

The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to rtsx_pci_init_ocp() 
was added to rtsx_pci_init_hw().
At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so 
in rtsx_pci_init_ocp() the cardreader
gets disabled.

I've avoided this by making excution code that results in the reader being 
disabled conditional on the device
not being an RTS5229. Of course, other rtsxxx card readers may also be disabled 
by this bug. I don't have the
knowledge to address that, so I'll leave to the driver maintainers.

The patch to avoid the bug is attached.

Chris
--- linux-5.7.12/drivers/misc/cardreader/rtsx_pcr.c.orig	2020-08-02 13:36:50.216947944 +0100
+++ linux-5.7.12/drivers/misc/cardreader/rtsx_pcr.c	2020-08-02 18:37:30.456610731 +0100
@@ -1200,9 +1200,13 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *
 SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch);
 			rtsx_pci_enable_ocp(pcr);
 		} else {
-			/* OC power down */
-			rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
-OC_POWER_DOWN);
+			/* On (some?) Intel NUC platforms, this disables
+			 * the rts5229 cardreader, so don't do it
+			 */
+			if(!CHK_PCI_PID(pcr, 0x5229))
+/* OC power down */
+rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
+	OC_POWER_DOWN);
 		}
 	}
 }


Re: Linux 5.3.6

2019-10-13 Thread Chris Clayton



On 12/10/2019 21:55, Gabriel C wrote:
> Am Sa., 12. Okt. 2019 um 21:16 Uhr schrieb Chris Clayton
> :
>>
>>
>>> I'm announcing the release of the 5.3.6 kernel.
>>
>>
>> 5.3.6 build fails here with:
>>
>> arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found
>>   CC  arch/x86/kernel/cpu/mce/threshold.o
>> make[3]: *** [arch/x86/entry/vdso/Makefile:59: 
>> arch/x86/entry/vdso/vdso64.so.dbg] Error 1
>> make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg'
>> make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2
>> make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2
>> make[1]: *** Waiting for unfinished jobs
>>
> 
> What is your default linker ?
> 
> Also does make LD=ld.bfd fixes that for you ?
> 

Thanks, Gabriel. The default linker is gold, but your suggestion above fixed 
the build. I think I'll set the default to
LD.bfd.

> See https://bugzilla.kernel.org/show_bug.cgi?id=204951
> 
> BR,
> 
> Gabriel C.
> 


Re: Linux 5.3.6

2019-10-12 Thread Chris Clayton


> I'm announcing the release of the 5.3.6 kernel.


5.3.6 build fails here with:

arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found
  CC  arch/x86/kernel/cpu/mce/threshold.o
make[3]: *** [arch/x86/entry/vdso/Makefile:59: 
arch/x86/entry/vdso/vdso64.so.dbg] Error 1
make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg'
make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2
make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2
make[1]: *** Waiting for unfinished jobs....

Chris Clayton


Re: [PATCH] timekeeping/vsyscall: Prevent math overflow in BOOTTIME update

2019-08-22 Thread Chris Clayton
Thanks Thomas.

On 22/08/2019 12:00, Thomas Gleixner wrote:
> The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the
> nanoseconds based boot time offset left by the clocksource shift. That
> overflows once the boot time offset becomes large enough. As a consequence
> CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to
> misbehave.
> 
> Fix it by storing a timespec64 representation of the offset when boot time
> is adjusted and add that to the MONOTONIC base time value in the vdso data
> page. Using the timespec64 representation avoids a 64bit division in the
> update code.
> 

I've tested resume from both suspend and hibernate and this patch fixes the 
problem I reported.

Tested-by: Chris Clayton 

> Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() 
> implementation")
> Reported-by: Chris Clayton 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/timekeeper_internal.h |5 +
>  kernel/time/timekeeping.c   |5 +
>  kernel/time/vsyscall.c  |   22 +-
>  3 files changed, 23 insertions(+), 9 deletions(-)
> 
> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -57,6 +57,7 @@ struct tk_read_base {
>   * @cs_was_changed_seq:  The sequence number of clocksource change events
>   * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second
>   * @raw_sec: CLOCK_MONOTONIC_RAW  time in seconds
> + * @monotonic_to_boot:   CLOCK_MONOTONIC to CLOCK_BOOTTIME offset
>   * @cycle_interval:  Number of clock cycles in one NTP interval
>   * @xtime_interval:  Number of clock shifted nano seconds in one NTP
>   *   interval.
> @@ -84,6 +85,9 @@ struct tk_read_base {
>   *
>   * wall_to_monotonic is no longer the boot time, getboottime must be
>   * used instead.
> + *
> + * @monotonic_to_boottime is a timespec64 representation of @offs_boot to
> + * accelerate the VDSO update for CLOCK_BOOTTIME.
>   */
>  struct timekeeper {
>   struct tk_read_base tkr_mono;
> @@ -99,6 +103,7 @@ struct timekeeper {
>   u8  cs_was_changed_seq;
>   ktime_t next_leap_ktime;
>   u64 raw_sec;
> + struct timespec64   monotonic_to_boot;
>  
>   /* The following members are for timekeeping internal use */
>   u64 cycle_interval;
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -146,6 +146,11 @@ static void tk_set_wall_to_mono(struct t
>  static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
>  {
>   tk->offs_boot = ktime_add(tk->offs_boot, delta);
> + /*
> +  * Timespec representation for VDSO update to avoid 64bit division
> +  * on every update.
> +  */
> + tk->monotonic_to_boot = ktime_to_timespec64(tk->offs_boot);
>  }
>  
>  /*
> --- a/kernel/time/vsyscall.c
> +++ b/kernel/time/vsyscall.c
> @@ -17,7 +17,7 @@ static inline void update_vdso_data(stru
>   struct timekeeper *tk)
>  {
>   struct vdso_timestamp *vdso_ts;
> - u64 nsec;
> + u64 nsec, sec;
>  
>   vdata[CS_HRES_COARSE].cycle_last= tk->tkr_mono.cycle_last;
>   vdata[CS_HRES_COARSE].mask  = tk->tkr_mono.mask;
> @@ -45,23 +45,27 @@ static inline void update_vdso_data(stru
>   }
>   vdso_ts->nsec   = nsec;
>  
> - /* CLOCK_MONOTONIC_RAW */
> - vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
> - vdso_ts->sec= tk->raw_sec;
> - vdso_ts->nsec   = tk->tkr_raw.xtime_nsec;
> + /* Copy MONOTONIC time for BOOTTIME */
> + sec = vdso_ts->sec;
> + /* Add the boot offset */
> + sec += tk->monotonic_to_boot.tv_sec;
> + nsec+= (u64)tk->monotonic_to_boot.tv_nsec << tk->tkr_mono.shift;
>  
>   /* CLOCK_BOOTTIME */
>   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME];
> - vdso_ts->sec= tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
> - nsec = tk->tkr_mono.xtime_nsec;
> - nsec += ((u64)(tk->wall_to_monotonic.tv_nsec +
> -ktime_to_ns(tk->offs_boot)) << tk->tkr_mono.shift);
> + vdso_ts->sec= sec;
> +
>   while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
>   nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift);
>   vdso_ts->sec++;
>   }
>   vdso_ts->nsec   = nsec;
>  
> + /* CLOCK_MONOTONIC_RAW */
> + vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
> + vdso_ts->sec= tk->raw_sec;
> + vdso_ts->nsec   = tk->tkr_raw.xtime_nsec;
> +
>   /* CLOCK_TAI */
>   vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI];
>   vdso_ts->sec= tk->xtime_sec + (s64)tk->tai_offset;
> 


Re: PROBLEM: 5.3.0-rc* causes iwlwifi failure

2019-08-22 Thread Chris Clayton
Thanks, Stuart.

On 18/08/2019 11:55, Stuart Little wrote:
> On Sun, Aug 18, 2019 at 09:17:59AM +0100, Chris Clayton wrote:
>>
>>
>> On 17/08/2019 22:44, Stuart Little wrote:
>>> After some private coaching from Serge Belyshev on git-revert I can confirm 
>>> that reverting that commit atop the current tree resolves the issue (the 
>>> wifi card scans for and finds networks just fine, no dmesg errors reported, 
>>> etc.).
>>>
>>
>> I've reported the "Microcode SW error detected" issue too, but, wrongly, 
>> only to LKML. I'll point that thread to this
>> one. I've also been experiencing my network stopping working after suspend 
>> resume, but haven't got round to reporting
>> that yet.
>>
>> What was the git magic that you acquired to revert the patch, please?
>>

By following the advice below, I reverted 
4fd445a2c855bbcab81fbe06d110e78dbd974a5b and using the resultant kernel I
haven't seen the "Microcode SW error detected" again. I am, however, still 
experiencing the problem of my network not
working after resume from suspend. I've reported it to LKML, so it can be 
followed there should anyone need/want to.

> 
> $ git revert 
> 
> This will fail as noted, but will place in a revert mode where you can fix 
> the errors.
> 
> $ git status
> 
> will show (it did in my case, for the latest Linux tree at the time I did 
> this) a modified file
> 
> drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> 
> to be committed without issue and a conflicted file
> 
> drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
> 
> whose conflicts you have to first resolve.
> 
> I then opened that conflicted file in a text editor and simply removed 
> everything between the lines
> 
> <<<<<<< HEAD
> 
> and 
> 
>>>>>>>> parent of 4fd445a2c855... iwlwifi: mvm: Add log information about SAR 
>>>>>>>> status
> 
> (inclusive). This resolved the conflict, whereupon
> 
> $ git revert --continue
> 
> and
> 
> $ git commit -a
> 
> will finish the reversion. 
> 
>>> On Sat, Aug 17, 2019 at 11:59:59AM +0300, Serge Belyshev wrote:
>>>>
>>>>> I am on an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz running Linux
>>>>> x86_64 (Slackware), with a custom-compiled 5.3.0-rc4 (.config
>>>>> attached).
>>>>>
>>>>> I am using the Intel wifi adapter on this machine:
>>>>>
>>>>> 02:00.0 Network controller: Intel Corporation Device 24fb (rev 10)
>>>>>
>>>>> with the iwlwifi driver. I am attaching the output to 'lspci -vv -s
>>>>> 02:00.0' as the file device-info.
>>>>>
>>>>> All 5.3.0-rc* versions I have tried (including rc4) cause multiple
>>>>> dmesg iwlwifi-related errors (dmesg attached). Examples:
>>>>>
>>>>> iwlwifi :02:00.0: Failed to get geographic profile info -5
>>>>> iwlwifi :02:00.0: Microcode SW error detected.  Restarting 0x8200
>>>>> iwlwifi :02:00.0: 0x0038 | BAD_COMMAND
>>>>>
>>>>
>>>> I have my logs filled with similar garbage throughout 5.3-rc*. Also
>>>> since 5.3-rcsomething not only it WARNS in dmesg about firmware failure,
>>>> but completely stops working after suspend/resume cycle.
>>>>
>>>> It looks like that:
>>>>
>>>> commit 4fd445a2c855bbcab81fbe06d110e78dbd974a5b
>>>> Author: Haim Dreyfuss 
>>>> Date:   Thu May 2 11:45:02 2019 +0300
>>>>
>>>> iwlwifi: mvm: Add log information about SAR status
>>>> 
>>>> Inform users when SAR status is changing.
>>>> 
>>>> Signed-off-by: Haim Dreyfuss 
>>>> Signed-off-by: Luca Coelho 
>>>>
>>>>
>>>> is the culprit. (manually) reverting it on top of 5.3-rc4 makes
>>>> everything work again.
>>>


Regression in 5.3-rc1 and later

2019-08-21 Thread Chris Clayton
Hi everyone,

Firstly, apologies to anyone on the long cc list that turns out not to be 
particularly interested in the following, but
you were all marked as cc'd in the commit message below.

I've found a problem that isn't present in 5.2 series or 4.19 series kernels, 
and seems to have arrived in 5.3-rc1. The
problem is that if I suspend (to ram) my laptop, on resume 14 minutes or more 
after suspending, I have no networking
functionality. If I resume the laptop after 13 minutes or less, networking 
works fine. I haven't tried to get finer
grained timings between 13 and 14 minutes, but can do if it would help.

ifconfig shows that wlan0 is still up and still has its assigned ip address 
but, for instance, a ping of any other
device on my network, fails as does pinging, say, kernel.org. I've tried 
"downing" the network with (/sbin/ifdown) and
unloading the iwlmvm module and then reloading the module and "upping" 
(/sbin/ifup) the network, but my network is still
unusable. I should add that the problem also manifests if I hibernate the 
laptop, although my testing of this has been
minimal. I can do more if required.

As I say, the problem first appears in 5.3-rc1, so I've bisected between 5.2.0 
and 5.3-rc1 and that concluded with:

[chris:~/kernel/linux]$ git bisect good
7ac8707479886c75f353bfb6a8273f423cfccb23 is the first bad commit
commit 7ac8707479886c75f353bfb6a8273f423cfccb23
Author: Vincenzo Frascino 
Date:   Fri Jun 21 10:52:49 2019 +0100

x86/vdso: Switch to generic vDSO implementation

The x86 vDSO library requires some adaptations to take advantage of the
newly introduced generic vDSO library.

Introduce the following changes:
 - Modification of vdso.c to be compliant with the common vdso datapage
 - Use of lib/vdso for gettimeofday

[ tglx: Massaged changelog and cleaned up the function signature formatting 
]

Signed-off-by: Vincenzo Frascino 
Signed-off-by: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-m...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Daniel Lezcano 
Cc: Mark Salyzyn 
Cc: Peter Collingbourne 
Cc: Shuah Khan 
Cc: Dmitry Safonov <0x7f454...@gmail.com>
Cc: Rasmus Villemoes 
Cc: Huw Davies 
Cc: Shijith Thotton 
Cc: Andre Przywara 
Link: 
https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frasc...@arm.com

 arch/x86/Kconfig |   3 +
 arch/x86/entry/vdso/Makefile |   9 ++
 arch/x86/entry/vdso/vclock_gettime.c | 245 ---
 arch/x86/entry/vdso/vdsox32.lds.S|   1 +
 arch/x86/entry/vsyscall/Makefile |   2 -
 arch/x86/entry/vsyscall/vsyscall_gtod.c  |  83 ---
 arch/x86/include/asm/pvclock.h   |   2 +-
 arch/x86/include/asm/vdso/gettimeofday.h | 191 
 arch/x86/include/asm/vdso/vsyscall.h |  44 ++
 arch/x86/include/asm/vgtod.h |  75 +-
 arch/x86/include/asm/vvar.h  |   7 +-
 arch/x86/kernel/pvclock.c|   1 +
 12 files changed, 284 insertions(+), 379 deletions(-)
 delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
 create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/x86/include/asm/vdso/vsyscall.h

To confirm my bisection was correct, I did a git checkout of 
7ac8707479886c75f353bfb6a8273f423cfccb2. As expected, the
kernel exhibited the problem I've described. However, a kernel built at the 
immediately preceding (parent?) commit
(bfe801ebe84f42b4666d3f0adde90f504d56e35b) has a working network after a (>= 
14minute) suspend/resume cycle.

As the module name implies, I'm using wireless networking. The hardware is 
detected as "Intel(R) Wireless-AC 9260
160MHz, REV=0x324" by iwlwifi.

I'm more than happy to provide additional diagnostics (but may need a little 
hand-holding) and to apply diagnostic or
fix patches, but please cc me on any reply as I'm not subscribed to any of the 
kernel-related mailing lists.

Chris


Re: iwlwifi: microcode SW error detected

2019-08-20 Thread Chris Clayton



On 18/08/2019 09:21, Chris Clayton wrote:
> 
> 
> On 17/08/2019 08:19, Chris Clayton wrote:
>> Hi.
>>
>> I just found the following error in the output from dmesg.
>>
>> [ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting 
>> 0x0.
> 
> Since reporting, I've found that this problem is being explored in the thread 
> that starts at
> https://marc.info/?l=linux-kernel&m=15660151913.

Mmm, that's a dead link. Don't knwo what happened there but the real link is
https://marc.info/?l=linux-kernel&m=156265244614126

> 
> Chris
> 


linux-5.3.0-rc5: new build warning

2019-08-18 Thread Chris Clayton
Hi,

I've just built 5.3.0-rc5 and a warning that I do not recall having seen before 
was emitted:

...
  HOSTCC  scripts/extract-cert
  HOSTCC   /mnt/kernel/linux/tools/objtool/fixdep.o
  HOSTLD  arch/x86/tools/relocs
  HOSTLD   /mnt/kernel/linux/tools/objtool/fixdep-in.o
  LINK /mnt/kernel/linux/tools/objtool/fixdep
  CC   /mnt/kernel/linux/tools/objtool/builtin-check.o
  CC   /mnt/kernel/linux/tools/objtool/builtin-orc.o
  GEN  /mnt/kernel/linux/tools/objtool/arch/x86/lib/inat-tables.c
awk: arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence 
`\:' is not a known regexp operator
awk: arch/x86/tools/gen-insn-attr-x86.awk:350: 
(FILENAME=arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape
sequence `\&' is not a known regexp operator
  CC   /mnt/kernel/linux/tools/objtool/exec-cmd.o
  CC   /mnt/kernel/linux/tools/objtool/check.o
  CC   /mnt/kernel/linux/tools/objtool/arch/x86/decode.o
  CC   /mnt/kernel/linux/tools/objtool/orc_gen.o
  CC   /mnt/kernel/linux/tools/objtool/help.o
  CC   /mnt/kernel/linux/tools/objtool/orc_dump.o
  CC   /mnt/kernel/linux/tools/objtool/pager.o
 ...


Happy to test the fix, but please cc me as I'm not subscribed

Chris


Re: iwlwifi: microcode SW error detected

2019-08-18 Thread Chris Clayton



On 17/08/2019 08:19, Chris Clayton wrote:
> Hi.
> 
> I just found the following error in the output from dmesg.
> 
> [ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting 
> 0x0.

Since reporting, I've found that this problem is being explored in the thread 
that starts at
https://marc.info/?l=linux-kernel&m=15660151913.

Chris


Re: PROBLEM: 5.3.0-rc* causes iwlwifi failure

2019-08-18 Thread Chris Clayton



On 17/08/2019 22:44, Stuart Little wrote:
> After some private coaching from Serge Belyshev on git-revert I can confirm 
> that reverting that commit atop the current tree resolves the issue (the wifi 
> card scans for and finds networks just fine, no dmesg errors reported, etc.).
> 

I've reported the "Microcode SW error detected" issue too, but, wrongly, only 
to LKML. I'll point that thread to this
one. I've also been experiencing my network stopping working after suspend 
resume, but haven't got round to reporting
that yet.

What was the git magic that you acquired to revert the patch, please?

> On Sat, Aug 17, 2019 at 11:59:59AM +0300, Serge Belyshev wrote:
>>
>>> I am on an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz running Linux
>>> x86_64 (Slackware), with a custom-compiled 5.3.0-rc4 (.config
>>> attached).
>>>
>>> I am using the Intel wifi adapter on this machine:
>>>
>>> 02:00.0 Network controller: Intel Corporation Device 24fb (rev 10)
>>>
>>> with the iwlwifi driver. I am attaching the output to 'lspci -vv -s
>>> 02:00.0' as the file device-info.
>>>
>>> All 5.3.0-rc* versions I have tried (including rc4) cause multiple
>>> dmesg iwlwifi-related errors (dmesg attached). Examples:
>>>
>>> iwlwifi :02:00.0: Failed to get geographic profile info -5
>>> iwlwifi :02:00.0: Microcode SW error detected.  Restarting 0x8200
>>> iwlwifi :02:00.0: 0x0038 | BAD_COMMAND
>>>
>>
>> I have my logs filled with similar garbage throughout 5.3-rc*. Also
>> since 5.3-rcsomething not only it WARNS in dmesg about firmware failure,
>> but completely stops working after suspend/resume cycle.
>>
>> It looks like that:
>>
>> commit 4fd445a2c855bbcab81fbe06d110e78dbd974a5b
>> Author: Haim Dreyfuss 
>> Date:   Thu May 2 11:45:02 2019 +0300
>>
>> iwlwifi: mvm: Add log information about SAR status
>> 
>> Inform users when SAR status is changing.
>> 
>> Signed-off-by: Haim Dreyfuss 
>> Signed-off-by: Luca Coelho 
>>
>>
>> is the culprit. (manually) reverting it on top of 5.3-rc4 makes
>> everything work again.
> 


iwlwifi: microcode SW error detected

2019-08-17 Thread Chris Clayton
Hi.

I just found the following error in the output from dmesg.

[ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting 
0x0.
[ 4023.460178] iwlwifi :02:00.0: Start IWL Error Log Dump:
[ 4023.460179] iwlwifi :02:00.0: Status: 0x0080, count: 6
[ 4023.460180] iwlwifi :02:00.0: Loaded firmware version: 46.93e59cf4.0
[ 4023.460181] iwlwifi :02:00.0: 0x22CE | ADVANCED_SYSASSERT
[ 4023.460182] iwlwifi :02:00.0: 0x0590A2F0 | trm_hw_status0
[ 4023.460182] iwlwifi :02:00.0: 0x | trm_hw_status1
[ 4023.460183] iwlwifi :02:00.0: 0x00488472 | branchlink2
[ 4023.460183] iwlwifi :02:00.0: 0x00479392 | interruptlink1
[ 4023.460184] iwlwifi :02:00.0: 0x | interruptlink2
[ 4023.460184] iwlwifi :02:00.0: 0x012C | data1
[ 4023.460185] iwlwifi :02:00.0: 0x | data2
[ 4023.460186] iwlwifi :02:00.0: 0x0400 | data3
[ 4023.460186] iwlwifi :02:00.0: 0x42001A44 | beacon time
[ 4023.460187] iwlwifi :02:00.0: 0x4E9F05CD | tsf low
[ 4023.460187] iwlwifi :02:00.0: 0x00D8 | tsf hi
[ 4023.460188] iwlwifi :02:00.0: 0x | time gp1
[ 4023.460188] iwlwifi :02:00.0: 0xEF55F6D0 | time gp2
[ 4023.460189] iwlwifi :02:00.0: 0x0001 | uCode revision type
[ 4023.460190] iwlwifi :02:00.0: 0x002E | uCode version major
[ 4023.460190] iwlwifi :02:00.0: 0x93E59CF4 | uCode version minor
[ 4023.460191] iwlwifi :02:00.0: 0x0321 | hw version
[ 4023.460191] iwlwifi :02:00.0: 0x00C89004 | board version
[ 4023.460192] iwlwifi :02:00.0: 0x0A05001C | hcmd
[ 4023.460192] iwlwifi :02:00.0: 0xA2F93802 | isr0
[ 4023.460193] iwlwifi :02:00.0: 0x0004 | isr1
[ 4023.460193] iwlwifi :02:00.0: 0x1802 | isr2
[ 4023.460194] iwlwifi :02:00.0: 0x40417DCD | isr3
[ 4023.460195] iwlwifi :02:00.0: 0x | isr4
[ 4023.460195] iwlwifi :02:00.0: 0x0A04001C | last cmd Id
[ 4023.460196] iwlwifi :02:00.0: 0x00018802 | wait_event
[ 4023.460196] iwlwifi :02:00.0: 0x4A88 | l2p_control
[ 4023.460197] iwlwifi :02:00.0: 0x0020 | l2p_duration
[ 4023.460197] iwlwifi :02:00.0: 0x03BF | l2p_mhvalid
[ 4023.460198] iwlwifi :02:00.0: 0x00EF | l2p_addr_match
[ 4023.460198] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel
[ 4023.460199] iwlwifi :02:00.0: 0x19071250 | timestamp
[ 4023.460199] iwlwifi :02:00.0: 0x14C0E8E8 | flow_handler
[ 4023.460257] iwlwifi :02:00.0: 0x | ADVANCED_SYSASSERT
[ 4023.460257] iwlwifi :02:00.0: 0x | umac branchlink1
[ 4023.460258] iwlwifi :02:00.0: 0x | umac branchlink2
[ 4023.460258] iwlwifi :02:00.0: 0x | umac interruptlink1
[ 4023.460259] iwlwifi :02:00.0: 0x | umac interruptlink2
[ 4023.460260] iwlwifi :02:00.0: 0x | umac data1
[ 4023.460260] iwlwifi :02:00.0: 0x | umac data2
[ 4023.460261] iwlwifi :02:00.0: 0x | umac data3
[ 4023.460261] iwlwifi :02:00.0: 0x | umac major
[ 4023.460262] iwlwifi :02:00.0: 0x | umac minor
[ 4023.460262] iwlwifi :02:00.0: 0x | frame pointer
[ 4023.460263] iwlwifi :02:00.0: 0x | stack pointer
[ 4023.460263] iwlwifi :02:00.0: 0x | last host cmd
[ 4023.460264] iwlwifi :02:00.0: 0x | isr status reg
[ 4023.460278] iwlwifi :02:00.0: Fseq Registers:
[ 4023.460282] iwlwifi :02:00.0: 0x0568FC22 | FSEQ_ERROR_CODE
[ 4023.460289] iwlwifi :02:00.0: 0x | FSEQ_TOP_INIT_VERSION
[ 4023.460297] iwlwifi :02:00.0: 0xDFFC324F | FSEQ_CNVIO_INIT_VERSION
[ 4023.460304] iwlwifi :02:00.0: 0xA371 | FSEQ_OTP_VERSION
[ 4023.460312] iwlwifi :02:00.0: 0xC338B29A | FSEQ_TOP_CONTENT_VERSION
[ 4023.460319] iwlwifi :02:00.0: 0xD9E91E16 | FSEQ_ALIVE_TOKEN
[ 4023.460327] iwlwifi :02:00.0: 0xAC99E6BF | FSEQ_CNVI_ID
[ 4023.460334] iwlwifi :02:00.0: 0x07665623 | FSEQ_CNVR_ID
[ 4023.460342] iwlwifi :02:00.0: 0x01000200 | CNVI_AUX_MISC_CHIP
[ 4023.460349] iwlwifi :02:00.0: 0x01300202 | CNVR_AUX_MISC_CHIP
[ 4023.460357] iwlwifi :02:00.0: 0x485B | 
CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[ 4023.460413] iwlwifi :02:00.0: 0x0BADCAFE | 
CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[ 4023.460421] iwlwifi :02:00.0: Collecting data: trigger 2 fired.
[ 4023.460424] ieee80211 phy0: Hardware restart was requested
[ 4024.639366] iwlwifi :02:00.0: Applying debug destination EXTERNAL_DRAM
[ 4024.753171] iwlwifi :02:00.0: Applying debug destination EXTERNAL_DRAM
[ 4024.817999] iwlwifi :02:00.0: FW already configured (0) - re-configuring
[ 4024.829374] iwlwifi :02:00.0: BIOS contains WGDS but no WRDS

The output messages from the driver when the system starts are:

[3.667365] iwlwifi :02:00.0: enabling device ( -> 0002)
[3.670357] iwlwifi :02:00.0: Found debug destination: EXTERNAL_DRAM
[3.670360] iwlwifi :02:00.0: Found debug configuration: 0
[3.670525] iwlwifi 000

Re: [PATCH v2] x86/boot: save fields explicitly, zero out everything else

2019-08-10 Thread Chris Clayton



On 31/07/2019 06:46, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Recent gcc compilers (gcc 9.1) generate warnings about an
> out of bounds memset, if you trying memset across several fields
> of a struct. This generated a couple of warnings on x86_64 builds.
> 
> Fix this by explicitly saving the fields in struct boot_params
> that are intended to be preserved, and zeroing all the rest.
> 

I applied John's patch below to v5.3-rc3-285-gecb095bff5d4 and have been 
running the resultant kernel for two days now,
including 7 or 8 cold starts and reboots. The warnings that were produced by 
gcc9 are no longer emitted and, other than
a pre-existing problem (no network after resume from suspend or hibernate which 
I will investigate and, if necessary,
report later today), the kernel has supported my typical day to day activities 
(building software, email, browsing,
listening to music, watching video) without problem.

Tested-by: Chris Clayton 

> Suggested-by: Thomas Gleixner 
> Suggested-by: H. Peter Anvin 
> Signed-off-by: John Hubbard 
> ---
>  arch/x86/include/asm/bootparam_utils.h | 62 +++---
>  1 file changed, 47 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/bootparam_utils.h 
> b/arch/x86/include/asm/bootparam_utils.h
> index 101eb944f13c..514aee24b8de 100644
> --- a/arch/x86/include/asm/bootparam_utils.h
> +++ b/arch/x86/include/asm/bootparam_utils.h
> @@ -18,6 +18,20 @@
>   * Note: efi_info is commonly left uninitialized, but that field has a
>   * private magic, so it is better to leave it unchanged.
>   */
> +
> +#define sizeof_mbr(type, member) ({ sizeof(((type *)0)->member); })
> +
> +#define BOOT_PARAM_PRESERVE(struct_member)   \
> + {   \
> + .start = offsetof(struct boot_params, struct_member),   \
> + .len   = sizeof_mbr(struct boot_params, struct_member), \
> + }
> +
> +struct boot_params_to_save {
> + unsigned int start;
> + unsigned int len;
> +};
> +
>  static void sanitize_boot_params(struct boot_params *boot_params)
>  {
>   /* 
> @@ -35,21 +49,39 @@ static void sanitize_boot_params(struct boot_params 
> *boot_params)
>* problems again.
>*/
>   if (boot_params->sentinel) {
> - /* fields in boot_params are left uninitialized, clear them */
> - boot_params->acpi_rsdp_addr = 0;
> - memset(&boot_params->ext_ramdisk_image, 0,
> -(char *)&boot_params->efi_info -
> - (char *)&boot_params->ext_ramdisk_image);
> - memset(&boot_params->kbd_status, 0,
> -(char *)&boot_params->hdr -
> -(char *)&boot_params->kbd_status);
> - memset(&boot_params->_pad7[0], 0,
> -(char *)&boot_params->edd_mbr_sig_buffer[0] -
> - (char *)&boot_params->_pad7[0]);
> - memset(&boot_params->_pad8[0], 0,
> -(char *)&boot_params->eddbuf[0] -
> - (char *)&boot_params->_pad8[0]);
> - memset(&boot_params->_pad9[0], 0, sizeof(boot_params->_pad9));
> + static struct boot_params scratch;
> + char *bp_base = (char *)boot_params;
> + char *save_base = (char *)&scratch;
> + int i;
> +
> + const struct boot_params_to_save to_save[] = {
> + BOOT_PARAM_PRESERVE(screen_info),
> + BOOT_PARAM_PRESERVE(apm_bios_info),
> + BOOT_PARAM_PRESERVE(tboot_addr),
> + BOOT_PARAM_PRESERVE(ist_info),
> + BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
> + BOOT_PARAM_PRESERVE(hd0_info),
> + BOOT_PARAM_PRESERVE(hd1_info),
> + BOOT_PARAM_PRESERVE(sys_desc_table),
> + BOOT_PARAM_PRESERVE(olpc_ofw_header),
> + BOOT_PARAM_PRESERVE(efi_info),
> + BOOT_PARAM_PRESERVE(alt_mem_k),
> + BOOT_PARAM_PRESERVE(scratch),
> + BOOT_PARAM_PRESERVE(e820_entries),
> + BOOT_PARAM_PRESERVE(eddbuf_entries),
> + BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
> + BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
> + BOOT_PARAM_PRESERVE(e820_table),
> + BOOT_PARAM_PRESERVE(eddbuf),
> + };
> +
> + memset(&scratch, 0, sizeof(scratch));
> +
> + for (i = 0; i < ARRAY_SIZE(to_save); i++)
> + memcpy(save_base + to_save[i].start,
> +bp_base + to_save[i].start, to_save[i].len);
> +
> + memcpy(boot_params, save_base, sizeof(*boot_params));
>   }
>  }
>  
> 


Re: Warnings whilst building 5.2.0+

2019-08-06 Thread Chris Clayton



On 09/07/2019 12:39, Chris Clayton wrote:
> 
> 
> On 09/07/2019 11:37, Enrico Weigelt, metux IT consult wrote:
>> On 09.07.19 08:06, Chris Clayton wrote:
>>
>> Hi,
>>
>>> I've pulled Linus' tree this morning and, after running 'make oldconfig', 
>>> tried a build. During that build I got the
>>> following warnings, which look to me like they should be fixed. 'git 
>>> describe' shows v5.2-915-g5ad18b2e60b7 and my
>>> compiler is the 20190706 snapshot of gcc 9.
>>
>> Thanks for the report. I'm rebuilding right know anyways, so I'll look
>> out for it.
> 
> Thanks for the reply.
> 
>>> In file included from arch/x86/kernel/head64.c:35:
>>> In function 'sanitize_boot_params',
>>> inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2:
>>> ./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset 
>>> [197, 448] from the object at 'boot_params' is
>>> out of the bounds of referenced subobject 'ext_ramdisk_image' with type 
>>> 'unsigned int' at offset 192 [-Warray-bounds]
>>>40 |   memset(&boot_params->ext_ramdisk_image, 0,
>>>   |   ^~
>>>41 |  (char *)&boot_params->efi_info -
>>>   |  
>>>42 |(char *)&boot_params->ext_ramdisk_image);
>>>   |
>>> ./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset 
>>> [493, 497] from the object at 'boot_params' is
>>> out of the bounds of referenced subobject 'kbd_status' with type 'unsigned 
>>> char' at offset 491 [-Warray-bounds]
>>>43 |   memset(&boot_params->kbd_status, 0,
>>>   |   ^~~
>>>44 |  (char *)&boot_params->hdr -
>>>   |  ~~~
>>>45 |  (char *)&boot_params->kbd_status);
>>>   |  ~
>>
>> Can you check older versions, too ? Maybe also trying older gcc ?
>>
> 
> I see the same warnings building linux-5.2.0 with gcc9. However, I don't see 
> the warnings building linux-5.2.0 with the
> the 20190705 of gcc8. So the warnings could result from an improvement (i.e. 
> the problem was in the kernel, but
> undiscovered by gcc8) or from a regression in gcc9.
> 

>From the discussion starting at 
>https://marc.info/?l=linux-kernel&m=156401014023908, it would appear that the 
>problem is
undiscovered by gcc8. Building a fresh pull of Linus' tree this morning 
(v5.3-rc3-282-g33920f1ec5bf), I see that the
warnings are still being emitted. Adding the participants in the other 
discussion to this one.

>>
>> --mtx
>>


Re: Warnings whilst building 5.2.0+

2019-07-09 Thread Chris Clayton



On 09/07/2019 11:37, Enrico Weigelt, metux IT consult wrote:
> On 09.07.19 08:06, Chris Clayton wrote:
> 
> Hi,
> 
>> I've pulled Linus' tree this morning and, after running 'make oldconfig', 
>> tried a build. During that build I got the
>> following warnings, which look to me like they should be fixed. 'git 
>> describe' shows v5.2-915-g5ad18b2e60b7 and my
>> compiler is the 20190706 snapshot of gcc 9.
> 
> Thanks for the report. I'm rebuilding right know anyways, so I'll look
> out for it.

Thanks for the reply.

>> In file included from arch/x86/kernel/head64.c:35:
>> In function 'sanitize_boot_params',
>> inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2:
>> ./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset 
>> [197, 448] from the object at 'boot_params' is
>> out of the bounds of referenced subobject 'ext_ramdisk_image' with type 
>> 'unsigned int' at offset 192 [-Warray-bounds]
>>40 |   memset(&boot_params->ext_ramdisk_image, 0,
>>   |   ^~
>>41 |  (char *)&boot_params->efi_info -
>>   |  
>>42 |(char *)&boot_params->ext_ramdisk_image);
>>   |
>> ./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset 
>> [493, 497] from the object at 'boot_params' is
>> out of the bounds of referenced subobject 'kbd_status' with type 'unsigned 
>> char' at offset 491 [-Warray-bounds]
>>43 |   memset(&boot_params->kbd_status, 0,
>>   |   ^~~
>>44 |  (char *)&boot_params->hdr -
>>   |  ~~~
>>45 |  (char *)&boot_params->kbd_status);
>>   |  ~
> 
> Can you check older versions, too ? Maybe also trying older gcc ?
> 

I see the same warnings building linux-5.2.0 with gcc9. However, I don't see 
the warnings building linux-5.2.0 with the
the 20190705 of gcc8. So the warnings could result from an improvement (i.e. 
the problem was in the kernel, but
undiscovered by gcc8) or from a regression in gcc9.

> 
> --mtx
> 


Warnings whilst building 5.2.0+

2019-07-08 Thread Chris Clayton
Hi,

I've pulled Linus' tree this morning and, after running 'make oldconfig', tried 
a build. During that build I got the
following warnings, which look to me like they should be fixed. 'git describe' 
shows v5.2-915-g5ad18b2e60b7 and my
compiler is the 20190706 snapshot of gcc 9.

In file included from arch/x86/kernel/head64.c:35:
In function 'sanitize_boot_params',
inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2:
./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset [197, 
448] from the object at 'boot_params' is
out of the bounds of referenced subobject 'ext_ramdisk_image' with type 
'unsigned int' at offset 192 [-Warray-bounds]
   40 |   memset(&boot_params->ext_ramdisk_image, 0,
  |   ^~
   41 |  (char *)&boot_params->efi_info -
  |  
   42 |(char *)&boot_params->ext_ramdisk_image);
  |
./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset [493, 
497] from the object at 'boot_params' is
out of the bounds of referenced subobject 'kbd_status' with type 'unsigned 
char' at offset 491 [-Warray-bounds]
   43 |   memset(&boot_params->kbd_status, 0,
  |   ^~~
   44 |  (char *)&boot_params->hdr -
  |  ~~~
   45 |  (char *)&boot_params->kbd_status);
  |  ~


Happy to test any patches, but please cc me as I'm not subscribed to LKML.

Chris


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 13:23, Maciej S. Szmigiero wrote:
> On 11.10.2018 10:24, Chris Clayton wrote:
>> On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
>>> On 11.10.2018 00:49, Chris Clayton wrote:
>>>>> Now, knowing the "right" value you can experiment with what 
>>>>> rtl_init_rxcfg()
>>>>> writes (under the "default:" label for your NIC model).
>>>>>
>>>>
>>>> This might be more interesting. Through a combination of viewing the 
>>>> output from pr_notice() and the output from
>>>> "ethtool -d", I can see RxConfig with the following values
>>>>
>>>>During boot:0x00028700
>>>>Before suspend: 0x0002870e
>>>>During resume:  0x00024000
>>>>Post resume:0x0002870e
>>>>
>>>> As I did with 4.18.10 early on in the process, I removed the call to 
>>>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>>>> installed and rebooted. Now I see the following values:
>>>>
>>>>During boot:0x00028700
>>>>Before suspend: 0x0002870e
>>>>During resume:  0x00024000
>>>>Post resume:0x0002400e
>>>>
>>>
>>> Now we can finally see some difference...
>>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
>>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
>>> is kind of expected - one can see that the working configuration
>>> post-resume has bit 14 (or 0x4000) set, too.
>>>
>>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
>>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
>>>
>>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
>>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
>>> change:
>>> --- r8169.c
>>> +++ r8169.c
>>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>>> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>>> case RTL_GIGA_MAC_VER_34:
>>> case RTL_GIGA_MAC_VER_35:
>>> +   case RTL_GIGA_MAC_VER_38:
>>> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
>>> RX_DMA_BURST);
>>> break;
>>> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>>>
>>> This will add RX_MULTI_EN also for your chip model (you need to add back
>>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>>>
>>
>> That's done the trick. With the above change applied, my network runs 
>> running fine after a suspend/resume cycle and the
>> ping times are back in the 14-15ms range.
> 
> Nice!
> 
> I will submit a patch, it would be great if you could test it and then
> add a "Tested-by:" tag.
>  

Will do, Maciej.

Thanks for solving this.
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-11 Thread Chris Clayton



On 11/10/2018 01:12, Maciej S. Szmigiero wrote:
> On 11.10.2018 00:49, Chris Clayton wrote:
>>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>>> writes (under the "default:" label for your NIC model).
>>>
>>
>> This might be more interesting. Through a combination of viewing the output 
>> from pr_notice() and the output from
>> "ethtool -d", I can see RxConfig with the following values
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002870e
>>
>> As I did with 4.18.10 early on in the process, I removed the call to 
>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
>> installed and rebooted. Now I see the following values:
>>
>>  During boot:0x00028700
>>  Before suspend: 0x0002870e
>>  During resume:  0x00024000
>>  Post resume:0x0002400e
>>
> 
> Now we can finally see some difference...
> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST
> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this
> is kind of expected - one can see that the working configuration
> post-resume has bit 14 (or 0x4000) set, too.
> 
> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is
> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35.
> 
> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same
> family as your RTL_GIGA_MAC_VER_38, so can you please try the following
> change:
> --- r8169.c
> +++ r8169.c
> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816
>   case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
>   case RTL_GIGA_MAC_VER_34:
>   case RTL_GIGA_MAC_VER_35:
> + case RTL_GIGA_MAC_VER_38:
>   RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
> RX_DMA_BURST);
>   break;
>   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
> 
> This will add RX_MULTI_EN also for your chip model (you need to add back
> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally).
>

That's done the trick. With the above change applied, my network runs running 
fine after a suspend/resume cycle and the
ping times are back in the 14-15ms range.

Chris

> If this does not help then I would try another values in the above write:
> 1) RTL_W32(tp, RxConfig, 0x00024000);
> 2) RTL_W32(tp, RxConfig, 0x4000);
> 3) RTL_W32(tp, RxConfig, RX_DMA_BURST);
> 4) RTL_W32(tp, RxConfig, RX128_INT_EN);
> 
>> Chris
> 
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>> Hi again,
>>>>>
>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>>>> tried it anyway. I can confirm that the
>>>>> regression is still present and my network still fails when, after a 
>>>>> resume from suspend (to ram or disk), I open my
>>>>> browser or my mail client. In both those cases the failure is almost 
>>>>> immediate - e.g. my home page doesn't get displayed
>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>>>> quickly but the reported time increases from
>>>>> 14-15ms to more than 1000ms.
>>>>
>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>> state (before a suspend) and in the broken state (after a resume).
>>>> Maybe there will be some obvious in the difference.
>>>>
>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>
>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).
> 

This might be more interesting. Through a combination of viewing the output 
from pr_notice() and the output from
"ethtool -d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

As I did with 4.18.10 early on in the process, I removed the call to 
rtl_init_rxcfg() from rtl_hw_start() and rebuilt,
installed and rebooted. Now I see the following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002400e

As with 4.18.10, networking now appears to be stable after the resume. Starting 
a browser results in my homepage being
displayed and I've spent a few minutes surfing with no interruptions. 
Similarly, ping runs without stopping. I simply
don't know enough to know what might now be enabled or disabled by this change 
in value, but hopefully it will provide a
clue to someone as to what is going on.

Chris

> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Too late at night to be doing this stuff. Clicked send instead of saving a 
draft. Sorry, please ignore.

On 10/10/2018 23:30, Chris Clayton wrote:
> OK, right kernel/module used this time. Please see findings below.
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
>>>> Thanks to Maciej and Heiner for their replies.
>>>>
>>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>>> Hi again,
>>>>>>
>>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>>>>>> but tried it anyway. I can confirm that the
>>>>>> regression is still present and my network still fails when, after a 
>>>>>> resume from suspend (to ram or disk), I open my
>>>>>> browser or my mail client. In both those cases the failure is almost 
>>>>>> immediate - e.g. my home page doesn't get displayed
>>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>>>>>> so quickly but the reported time increases from
>>>>>> 14-15ms to more than 1000ms.
>>>>>
>>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>>> state (before a suspend) and in the broken state (after a resume).
>>>>> Maybe there will be some obvious in the difference.
>>>>>
>>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>>
>>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>>> device. They are identical.
>>>>
>>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
>>>> pre and post suspend. Again, they are identical.
>>>> Heiner specifically suggested looking at the RxConfig. The value of that 
>>>> is 0x0002870e both pre and post suspend.
>>>>
>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> 
> This change made no difference. Networking still dies if I open a browser or 
> leave ping running long enough.
> 
>> 2) Check the original value of RxConfig (after a resume) before
>> rtl_init_rxcfg() overwrites it (compile tested only):
>> --- r8169.c.ori
>> +++ r8169.c
>> @@ -5155,6 +5155,9 @@
>>  /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>>  RTL_R8(tp, IntrMask);
>>  RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
>> +
>> +pr_notice("RxConfig before init was %.8x\n",
>> +(unsigned int)RTL_R32(tp, RxConfig));
>>  rtl_init_rxcfg(tp);
>>  rtl_set_tx_config_registers(tp);
>>  
>>
>> This should be the value that you got when you removed the call to
>> rtl_init_rxcfg() for testing.
>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
>> writes (under the "default:" label for your NIC model).
> 
> This might be more interesting. Through combination of viewing the output 
> from pr_notice() and the output from "ethtool
> -d", I can see RxConfig with the following values
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
> I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
> installed and rebooted. Now I see the
> following values:
> 
>   During boot:0x00028700
>   Before suspend: 0x0002870e
>   During resume:  0x00024000
>   Post resume:0x0002870e
> 
>>
>> Hope this helps,
>> Maciej
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
OK, right kernel/module used this time. Please see findings below.

On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>> Hi again,
>>>>>
>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>>>> tried it anyway. I can confirm that the
>>>>> regression is still present and my network still fails when, after a 
>>>>> resume from suspend (to ram or disk), I open my
>>>>> browser or my mail client. In both those cases the failure is almost 
>>>>> immediate - e.g. my home page doesn't get displayed
>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>>>> quickly but the reported time increases from
>>>>> 14-15ms to more than 1000ms.
>>>>
>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>> state (before a suspend) and in the broken state (after a resume).
>>>> Maybe there will be some obvious in the difference.
>>>>
>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>
>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 

This change made no difference. Networking still dies if I open a browser or 
leave ping running long enough.

> 2) Check the original value of RxConfig (after a resume) before
> rtl_init_rxcfg() overwrites it (compile tested only):
> --- r8169.c.ori
> +++ r8169.c
> @@ -5155,6 +5155,9 @@
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> +
> + pr_notice("RxConfig before init was %.8x\n",
> + (unsigned int)RTL_R32(tp, RxConfig));
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 
> This should be the value that you got when you removed the call to
> rtl_init_rxcfg() for testing.
> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg()
> writes (under the "default:" label for your NIC model).

This might be more interesting. Through combination of viewing the output from 
pr_notice() and the output from "ethtool
-d", I can see RxConfig with the following values

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, 
installed and rebooted. Now I see the
following values:

During boot:0x00028700
Before suspend: 0x0002870e
During resume:  0x00024000
Post resume:0x0002870e

> 
> Hope this helps,
> Maciej
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton
Sorry, I forgot that editing r8169.c and rebuilding would result in rc7+, so I 
tested the wrong kernel/module to get the
results I provided below. That, however, may make the results more interesting 
because they happened with a virgin rc7
kernel/module.

I'll test your proposals properly later.

Chris

On 10/10/2018 09:09, Chris Clayton wrote:
> 
> 
> On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
>> On 09.10.2018 22:36, Heiner Kallweit wrote:
>>> On 09.10.2018 16:40, Chris Clayton wrote:
>>>> Thanks to Maciej and Heiner for their replies.
>>>>
>>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>>> Hi again,
>>>>>>
>>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, 
>>>>>> but tried it anyway. I can confirm that the
>>>>>> regression is still present and my network still fails when, after a 
>>>>>> resume from suspend (to ram or disk), I open my
>>>>>> browser or my mail client. In both those cases the failure is almost 
>>>>>> immediate - e.g. my home page doesn't get displayed
>>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite 
>>>>>> so quickly but the reported time increases from
>>>>>> 14-15ms to more than 1000ms.
>>>>>
>>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>>> state (before a suspend) and in the broken state (after a resume).
>>>>> Maybe there will be some obvious in the difference.
>>>>>
>>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>>
>>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>>> device. They are identical.
>>>>
>>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" 
>>>> pre and post suspend. Again, they are identical.
>>>> Heiner specifically suggested looking at the RxConfig. The value of that 
>>>> is 0x0002870e both pre and post suspend.
>>>>
>>> Hmm, this is very weird, especially taking into account that in your 
>>> original
>>> report you state that removing the call to rtl_init_rxcfg() from 
>>> rtl_hw_start()
>>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>>> register values seem to be the same before and after resume. So how can the
>>> chip behave differently?
>>> So far my best guess is that some chip quirk causes it to accept writes to
>>> register RxConfig, but to misinterpret or ignore the written value.
>>> So far your report is the only one (affecting RTL8411), but we don't know
>>> whether other chip versions are affected too.
>>
>> Also, it is interesting that even if one removes a call to
>> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
>> written to moments later by rtl_set_rx_mode().
>>
>> The only chip accesses in the meantime seems to be a write to TxConfig by
>> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
>> to MAR0 earlier in rtl_set_rx_mode().
>>
>> My proposals are:
>> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
>> in rtl_hw_start().
>> Maybe the chip does not like sometimes that RxConfig is written before
>> TxConfig.
>>
> After testing your first proposal, which made no  difference, I founf the 
> following in dmesg in the output from dmesg:
> 
> [  761.999468] [ cut here ]
> [  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
> dev_watchdog+0x1e9/0x1f0
> [  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
> iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
> nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
> videobuf2_memops snd_hda_codec_via
> videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common 
> usbhid realtek coretemp snd_hda_intel hwmon
> snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
> unloaded: btintel]
> [  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
> [  761.999504] Hardware name: Notebook W65_67SZ   
>  /W65_67SZ
>, BIOS 1.03.05 02/26/2014
&

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-10 Thread Chris Clayton



On 10/10/2018 01:24, Maciej S. Szmigiero wrote:
> On 09.10.2018 22:36, Heiner Kallweit wrote:
>> On 09.10.2018 16:40, Chris Clayton wrote:
>>> Thanks to Maciej and Heiner for their replies.
>>>
>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>>> Hi again,
>>>>>
>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>>>> tried it anyway. I can confirm that the
>>>>> regression is still present and my network still fails when, after a 
>>>>> resume from suspend (to ram or disk), I open my
>>>>> browser or my mail client. In both those cases the failure is almost 
>>>>> immediate - e.g. my home page doesn't get displayed
>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>>>> quickly but the reported time increases from
>>>>> 14-15ms to more than 1000ms.
>>>>
>>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>>> state (before a suspend) and in the broken state (after a resume).
>>>> Maybe there will be some obvious in the difference.
>>>>
>>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>>
>>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>>> device. They are identical.
>>>
>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>>> and post suspend. Again, they are identical.
>>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>>> 0x0002870e both pre and post suspend.
>>>
>> Hmm, this is very weird, especially taking into account that in your original
>> report you state that removing the call to rtl_init_rxcfg() from 
>> rtl_hw_start()
>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
>> register values seem to be the same before and after resume. So how can the
>> chip behave differently?
>> So far my best guess is that some chip quirk causes it to accept writes to
>> register RxConfig, but to misinterpret or ignore the written value.
>> So far your report is the only one (affecting RTL8411), but we don't know
>> whether other chip versions are affected too.
> 
> Also, it is interesting that even if one removes a call to
> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get
> written to moments later by rtl_set_rx_mode().
> 
> The only chip accesses in the meantime seems to be a write to TxConfig by
> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes
> to MAR0 earlier in rtl_set_rx_mode().
> 
> My proposals are:
> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);"
> in rtl_hw_start().
> Maybe the chip does not like sometimes that RxConfig is written before
> TxConfig.
> 
After testing your first proposal, which made no  difference, I founf the 
following in dmesg in the output from dmesg:

[  761.999468] [ cut here ]
[  761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 
dev_watchdog+0x1e9/0x1f0
[  761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep 
iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE
nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc 
videobuf2_memops snd_hda_codec_via
videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid 
realtek coretemp snd_hda_intel hwmon
snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last 
unloaded: btintel]
[  761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328
[  761.999504] Hardware name: Notebook W65_67SZ 
   /W65_67SZ
   , BIOS 1.03.05 02/26/2014
[  761.999508] Workqueue: events rtl_task [r8169]
[  761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0
[  761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b 
c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1
81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 
07 00 00 00 00
[  761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282
[  761.999514] RAX:  RBX:  RCX: 0006
[  761.999516] RDX: 0007 RSI: 0096 RDI: 88040f8153d0
[  761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 034e
[  761.999517] R10: 0007 R11:  R12: 88040ca9a39c
[  761.

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton



On 09/10/2018 22:39, Heiner Kallweit wrote:
> On 09.10.2018 16:40, Chris Clayton wrote:
>> Thanks to Maciej and Heiner for their replies.
>>
>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>>> On 07.10.2018 21:36, Chris Clayton wrote:
>>>> Hi again,
>>>>
>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>>>> tried it anyway. I can confirm that the
>>>> regression is still present and my network still fails when, after a 
>>>> resume from suspend (to ram or disk), I open my
>>>> browser or my mail client. In both those cases the failure is almost 
>>>> immediate - e.g. my home page doesn't get displayed
>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>>>> quickly but the reported time increases from
>>>> 14-15ms to more than 1000ms.
>>>
>>> You can try comparing chip registers (ethtool -d eth0) in the working
>>> state (before a suspend) and in the broken state (after a resume).
>>> Maybe there will be some obvious in the difference.
>>>
>>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>>
>> Maciej suggested comparing the output from lspci -vv for the ethernet 
>> device. They are identical.
>>
>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre 
>> and post suspend. Again, they are identical.
>> Heiner specifically suggested looking at the RxConfig. The value of that is 
>> 0x0002870e both pre and post suspend.
>>
>> I've attached files I redirected the outputs to.
>>
>> Please don't hesitate to ask for any other information needed to solve this 
>> problem. In the meantime, I've now got
>> scripts that stop the network during suspend and restart it during resume. 
>> (Those scripts were removed whilst I gathered
>> the diagnostics shown in the attachments.)
>>
> I'd like to check whether it may be a timing issue. The following 
> experimental patch
> adds a PCI commit after writing register ChipCmd. Could you please check 
> whether
> it changes anything?
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 7d3f671e1..f3c359492 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct  rtl8169_private *tp)
>   /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
>   RTL_R8(tp, IntrMask);
>   RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
> + RTL_R8(tp, ChipCmd);
>   rtl_init_rxcfg(tp);
>   rtl_set_tx_config_registers(tp);
>  
> 

Sorry, this patch doesn't make any difference - my network still fails. After a 
suspend/resume my browsers (chromium
and firefox) both fail to open my home page (https://www.google.co.uk). The 
ping time for one of my ISP's name servers
increases from 14-15ms to more than 1000ms, although it after a few pings it 
does reduce. As the screen grab below
shows, the network does eventually fail

$ ping NS1
PING ns1 (90.207.238.97): 56 data bytes
64 bytes from 90.207.238.97: icmp_seq=0 ttl=251 time=1017.289 ms
64 bytes from 90.207.238.97: icmp_seq=1 ttl=251 time=1018.051 ms
64 bytes from 90.207.238.97: icmp_seq=2 ttl=251 time=1015.271 ms
64 bytes from 90.207.238.97: icmp_seq=3 ttl=251 time=1015.495 ms
64 bytes from 90.207.238.97: icmp_seq=6 ttl=251 time=1015.646 ms
64 bytes from 90.207.238.97: icmp_seq=7 ttl=251 time=1022.609 ms
64 bytes from 90.207.238.97: icmp_seq=8 ttl=251 time=1015.612 ms
64 bytes from 90.207.238.97: icmp_seq=10 ttl=251 time=1015.551 ms
64 bytes from 90.207.238.97: icmp_seq=12 ttl=251 time=1015.446 ms
64 bytes from 90.207.238.97: icmp_seq=13 ttl=251 time=1015.657 ms
64 bytes from 90.207.238.97: icmp_seq=14 ttl=251 time=1015.614 ms
64 bytes from 90.207.238.97: icmp_seq=15 ttl=251 time=1015.651 ms
64 bytes from 90.207.238.97: icmp_seq=17 ttl=251 time=1015.459 ms
64 bytes from 90.207.238.97: icmp_seq=18 ttl=251 time=1015.443 ms
64 bytes from 90.207.238.97: icmp_seq=19 ttl=251 time=1015.936 ms
64 bytes from 90.207.238.97: icmp_seq=20 ttl=251 time=1015.681 ms
64 bytes from 90.207.238.97: icmp_seq=22 ttl=251 time=1015.410 ms
64 bytes from 90.207.238.97: icmp_seq=23 ttl=251 time=1015.487 ms
64 bytes from 90.207.238.97: icmp_seq=24 ttl=251 time=1016.169 ms
64 bytes from 90.207.238.97: icmp_seq=25 ttl=251 time=1015.659 ms
64 bytes from 90.207.238.97: icmp_seq=26 ttl=251 time=14.606 ms
64 bytes from 90.207.238.97: icmp_seq=30 ttl=251 time=32.765 ms
64 bytes from 90.207.238.97: icmp_seq=31 ttl=251 time=115.052 ms
64 bytes from 90.207.238.97: icmp_seq=33 ttl=25

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-09 Thread Chris Clayton
Thanks to Maciej and Heiner for their replies.

On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
> On 07.10.2018 21:36, Chris Clayton wrote:
>> Hi again,
>>
>> I didn't think there was anything in 4.19-rc7 to fix this regression, but 
>> tried it anyway. I can confirm that the
>> regression is still present and my network still fails when, after a resume 
>> from suspend (to ram or disk), I open my
>> browser or my mail client. In both those cases the failure is almost 
>> immediate - e.g. my home page doesn't get displayed
>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
>> quickly but the reported time increases from
>> 14-15ms to more than 1000ms.
> 
> You can try comparing chip registers (ethtool -d eth0) in the working
> state (before a suspend) and in the broken state (after a resume).
> Maybe there will be some obvious in the difference.
> 
> The same goes for the PCI configuration (lspci -d :8168 -vv).
> 
Maciej suggested comparing the output from lspci -vv for the ethernet device. 
They are identical.

Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and 
post suspend. Again, they are identical.
Heiner specifically suggested looking at the RxConfig. The value of that is 
0x0002870e both pre and post suspend.

I've attached files I redirected the outputs to.

Please don't hesitate to ask for any other information needed to solve this 
problem. In the meantime, I've now got
scripts that stop the network during suspend and restart it during resume. 
(Those scripts were removed whilst I gathered
the diagnostics shown in the attachments.)

Chris

>> Chris
> 
> Maciej
> 
ethtool -d eth0
===
RealTek RTL8411 registers:

0x00: MAC Address  80:fa:5b:08:d0:3d
0x08: Multicast Address Filter 0x 0x0080
0x10: Dump Tally Counter Command   0x0c2ec000 0x0004
0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x0004
0x28: Tx High Priority Ring Addr   0x 0x
0x30: Flash memory read/write 0x
0x34: Early Rx Byte Count  0
0x36: Early Rx Status   0x00
0x37: Command   0x0c
  Rx on, Tx on
0x3C: Interrupt Mask  0x803f
  SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 
0x3E: Interrupt Status0x
  
0x40: Tx Configuration0x4b800f80
0x44: Rx Configuration0x0002870e
0x48: Timer count 0x
0x4C: Missed packet counter 0x00
0x50: EEPROM Command0x10
0x51: Config 0  0x00
0x52: Config 1  0xcf
0x53: Config 2  0x3c
0x54: Config 3  0x60
0x55: Config 4  0x10
0x56: Config 5  0x02
0x58: Timer interrupt 0x
0x5C: Multiple Interrupt Select   0x
0x60: PHY access  0x80040de1
0x64: TBI control and status  0x2701
0x68: TBI Autonegotiation advertisement (ANAR)0xf70c
0x6A: TBI Link partner ability (LPAR) 0x0002
0x6C: PHY status0xeb
0x84: PM wakeup frame 00x 0x
0x8C: PM wakeup frame 10x 0x
0x94: PM wakeup frame 2 (low)  0x 0x
0x9C: PM wakeup frame 2 (high) 0x 0x
0xA4: PM wakeup frame 3 (low)  0x 0x
0xAC: PM wakeup frame 3 (high) 0x 0x
0xB4: PM wakeup frame 4 (low)  0x 0x
0xBC: PM wakeup frame 4 (high) 0x 0x
0xC4: Wakeup frame 0 CRC  0x
0xC6: Wakeup frame 1 CRC  0x
0xC8: Wakeup frame 2 CRC  0x
0xCA: Wakeup frame 3 CRC  0x
0xCC: Wakeup frame 4 CRC  0x
0xDA: RX packet maximum size  0x4000
0xE0: C+ Command  0x20e1
  VLAN de-tagging
  RX checksumming
0xE2: Interrupt Mitigation0x5151
  TxTimer:   5
  TxPackets: 1
  RxTimer:   5
  RxPackets: 1
0xE4: Rx Ring Addr 0x07935000 0x0004
0xEC: Early Tx threshold0x27
0xF0: Func Event  0x0040003f
0xF4: Func Event Mask 0x
0xF8: Func Preset State   0x00031eff
0xFC: Func For

Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-07 Thread Chris Clayton
Hi again,

I didn't think there was anything in 4.19-rc7 to fix this regression, but tried 
it anyway. I can confirm that the
regression is still present and my network still fails when, after a resume 
from suspend (to ram or disk), I open my
browser or my mail client. In both those cases the failure is almost immediate 
- e.g. my home page doesn't get displayed
in the browser. Pinging one of my ISPs name servers doesn't fail quite so 
quickly but the reported time increases from
14-15ms to more than 1000ms.

Chris

On 04/10/2018 09:41, Chris Clayton wrote:
> Hi Heiner,
> 
> Here's the reply to your questions. Sorry for the delay.
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>>> Hi,
>>>>
>>>>> Hi,
>>>>>
>>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>>>> network problems after resuming from a
>>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>>
>>>>> The pattern of the problem is that when I first boot, the network is 
>>>>> fine. But, after resume from suspend I find that
>>>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>>>> 14-15ms to more than 1000ms. Moreover, when I
>>>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>>>> (https://www.google.co.uk) and pings of the
>>>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>>>> can revive the network by stopping it with
>>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>>>> module and load it again.
>>>>
>>>> Please have a look at the following thread:
>>>> https://lkml.org/lkml/2018/9/25/1118
>>>>
>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?
> 
> $ dmesg | grep r8169
> [5.274938] libphy: r8169: probed
> [5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
> [RTL8211E Gigabit Ethernet]
> (mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
> [9.460876] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
> control rx/tx
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
>>
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y". I've re-enabled it now.
> 
> Chris
> 
>> Heiner
>>
>>>> Maciej
>>>>
>>> Chris
>>>
>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-10-04 Thread Chris Clayton
Hi Heiner,

Here's the reply to your questions. Sorry for the delay.

On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
>>>> Hi,
>>>>
>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>>> network problems after resuming from a
>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>
>>>> The pattern of the problem is that when I first boot, the network is fine. 
>>>> But, after resume from suspend I find that
>>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>>> 14-15ms to more than 1000ms. Moreover, when I
>>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>>> (https://www.google.co.uk) and pings of the
>>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>>> can revive the network by stopping it with
>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>>> module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep r8169
[5.274938] libphy: r8169: probed
[5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver 
[RTL8211E Gigabit Ethernet]
(mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
[9.460876] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow 
control rx/tx

> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?
> 

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y". I've re-enabled it now.

Chris

> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton
Sorry, sent by accident. Note to self - don't attempt email until after second 
cup of coffee.

On 29/09/2018 08:25, Chris Clayton wrote:
> 
> 
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>>> Hi,
>>>>
>>>>> Hi,
>>>>>
>>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>>>> network problems after resuming from a
>>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>>
>>>>> The pattern of the problem is that when I first boot, the network is 
>>>>> fine. But, after resume from suspend I find that
>>>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>>>> 14-15ms to more than 1000ms. Moreover, when I
>>>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>>>> (https://www.google.co.uk) and pings of the
>>>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>>>> can revive the network by stopping it with
>>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>>>> module and load it again.
>>>>
>>>> Please have a look at the following thread:
>>>> https://lkml.org/lkml/2018/9/25/1118
>>>>
>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the 
>>> problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has 
>> no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact 
>> chip version.
>> Can you provide the dmesg part with the XID?

I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable 
series kernel in which eth0 continues to
function reliably after a suspend/resume cycle. The second dmesg output below 
is taken from that kernel. The first one
was from an up-to-date 4.19 kernel
> 
> $ dmesg | grep -i r8169
> [5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 19
> [5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.235218] r8169 :05:00.2 eth0: link down
> [   11.717460] r8169 :05:00.2 eth0: link up
> 
> $ dmesg | grep -i r8169
> [5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
> control
> [5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
> 48800800, IRQ 29
> [5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, 
> tx checksumming: ko]
> [   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended 
> config space, falling back to CSI
> [   10.459217] r8169 :05:00.2 eth0: link down
> [   10.459880] r8169 :05:00.2 eth0: link down
> [   12.015158] r8169 :05:00.2 eth0: link up
> 
> 
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
> 
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
> sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the 
> recommendation in the kconfig help. Help on MSI
> has a very clear "say Y".

As I said above I have re-enabled MSI.
> 
>>
>> Heiner
>>
>>>> Maciej
>>>>
>>> Chris
>>>
>>


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-29 Thread Chris Clayton



On 28/09/2018 23:13, Heiner Kallweit wrote:
> On 29.09.2018 00:00, Chris Clayton wrote:
>> Thanks Maciej.
>>
>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
>>>> Hi,
>>>>
>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>>>> network problems after resuming from a
>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>
>>>> The pattern of the problem is that when I first boot, the network is fine. 
>>>> But, after resume from suspend I find that
>>>> the time taken for a ping of one of my ISP's nameservers increases from 
>>>> 14-15ms to more than 1000ms. Moreover, when I
>>>> open a browser (chromium or firefox), it fails to retrieve my home page 
>>>> (https://www.google.co.uk) and pings of the
>>>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>>>> can revive the network by stopping it with
>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>>>> module and load it again.
>>>
>>> Please have a look at the following thread:
>>> https://lkml.org/lkml/2018/9/25/1118
>>>
>>
>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem 
>> is not solved by it. Similarly, I applied
>> Heiner's patch to the 4.19, but again the problem is not solved.
>>
> I think we talk about two different issues here. The one the fix is for has 
> no link to suspend/resume.
> 
> Chris, the lspci output doesn't provide enough detail to determine the exact 
> chip version.
> Can you provide the dmesg part with the XID?

$ dmesg | grep -i r8169
[5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 19
[5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.232077] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.235218] r8169 :05:00.2 eth0: link down
[   11.717460] r8169 :05:00.2 eth0: link up

$ dmesg | grep -i r8169
[5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM 
control
[5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 
48800800, IRQ 29
[5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx 
checksumming: ko]
[   10.456081] r8169 :05:00.2 eth0: No native access to PCI extended config 
space, falling back to CSI
[   10.459217] r8169 :05:00.2 eth0: link down
[   10.459880] r8169 :05:00.2 eth0: link down
[   12.015158] r8169 :05:00.2 eth0: link up


> According to your lspci output neither MSI nor MSI-X is active.
> Do you have to use nomsi for whatever reason?

No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% 
sure that it used to be - I've no idea how
it got dropped. If I'm not sure about an option, I start by taking the 
recommendation in the kconfig help. Help on MSI
has a very clear "say Y".

> 
> Heiner
> 
>>> Maciej
>>>
>> Chris
>>
> 


Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

2018-09-28 Thread Chris Clayton
Thanks Maciej.

On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
> Hi,
> 
>> Hi,
>>
>> I upgraded my kernel to 4.18.10 recently and have since been experiencing 
>> network problems after resuming from a
>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>
>> The pattern of the problem is that when I first boot, the network is fine. 
>> But, after resume from suspend I find that
>> the time taken for a ping of one of my ISP's nameservers increases from 
>> 14-15ms to more than 1000ms. Moreover, when I
>> open a browser (chromium or firefox), it fails to retrieve my home page 
>> (https://www.google.co.uk) and pings of the
>> nameserver fail with the message "Destination Host Unreachable". Often, I 
>> can revive the network by stopping it with
>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 
>> module and load it again.
> 
> Please have a look at the following thread:
> https://lkml.org/lkml/2018/9/25/1118
> 

I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is 
not solved by it. Similarly, I applied
Heiner's patch to the 4.19, but again the problem is not solved.

> Maciej
> 
Chris


Re: [PATCH V2] mfd: rtsx: release IRQ during shutdown

2018-01-03 Thread Chris Clayton


On 03/01/18 12:32, Sinan Kaya wrote:
> 'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during
> shutdown")' revealed a resource leak in rtsx_pci driver during shutdown.
> 
> Issue shows up as a warning during shutdown as follows:
> 
> remove_proc_entry: removing non-empty directory 'irq/17', leaking at least
> 'rtsx_pci'
> WARNING: CPU: 0 PID: 1578 at fs/proc/generic.c:572
> remove_proc_entry+0x11d/0x130
> Modules linked in 
> ...
> Call Trace:
> unregister_irq_proc
> free_desc
> irq_free_descs
> mp_unmap_irq
> acpi_unregister_gsi_apic
> acpi_pci_irq_disable
> do_pci_disable_device
> pci_disable_device
> device_shutdown
> kernel_restart
> Sys_reboot
> 
> Even though rtsx_pci driver implements a shutdown callback, it is not
> releasing the interrupt that it registered during probe. This is causing
> the ACPI layer to complain that the shared IRQ is in use while freeing
> IRQ.
> 
> This code releases the IRQ to prevent resource leak and eliminate the
> warning.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=198141
> Reported-by: Chris Clayton 
> Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/mfd/rtsx_pcr.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c
> index 590fb9a..c3ed885 100644
> --- a/drivers/mfd/rtsx_pcr.c
> +++ b/drivers/mfd/rtsx_pcr.c
> @@ -1543,6 +1543,9 @@ static void rtsx_pci_shutdown(struct pci_dev *pcidev)
>   rtsx_pci_power_off(pcr, HOST_ENTER_S1);
>  
>   pci_disable_device(pcidev);
> + free_irq(pcr->irq, (void *)pcr);
> + if (pcr->msi_en)
> + pci_disable_msi(pcr->pci);
>  }
>  
>  #else /* CONFIG_PM */

I've applied v2 of the patch and built and installed the kernel (-rc6). All I 
can say, is that my system still closes
down without the warning and call trace that the unpatched kernel produces. 
It's the best I can do by way of a test
because I have no idea what the code added in v2 is supposed to achieve and, 
because my system shuts down (or reboots)
moments later, there is no opportunity to check. If that constitutes a valid 
test:

Tested-by: Chris Clayton 

> 


Re: Oops on 4.15-rc[123] on shutdown/reboot

2017-12-11 Thread Chris Clayton
On 11/12/17 17:17, Bjorn Helgaas wrote:
> [+cc linux-pci]
> 
> On Mon, Dec 11, 2017 at 11:29:50AM -0500, Sinan Kaya wrote:
>> Hi Chris,
>>
>>>
>>> I'm more than happy to provide additional diagnostics and test proposed 
>>> fixes. As a starter for ten, I've attached the
>>> output from 'lspci -v'. If, however, you need to see the backtrace, I'll 
>>> need some advice on how to capture that.
>>>
>>
>> Can you open a bugzilla and also share the boot log?
>>
>> There must be something unique about your system.
> 
> Can you attach "lspci -vv" output (as root) to the bugzilla, too?
> 

I've opened the bugzilla report (Bug 198141) and attached the dmesg and lspci 
-vv outputs to it.




Re: Oops on 4.15-rc[123] on shutdown/reboot

2017-12-11 Thread Chris Clayton


On 11/12/17 17:24, Sinan Kaya wrote:
> On 12/11/2017 12:06 PM, Chris Clayton wrote:
>> Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and 
>> add this and the lspci output that I sent with
>> my original repoart.
> 
> This was helpful. I don't see any AER/DPC in your log. It looks like the only 
> PCIe
> portdrv service you have is PME.
> 
> Can we do a quick hack and return immediately from 
> 
> static int pcie_pme_probe(struct pcie_device *srv)
> 
> by putting return 0; at the top. 
> 
> Same thing in 
> 
> static void pcie_pme_remove(struct pcie_device *srv)
> 
> just place a return at the top.
> 

I made those changes (to drivers/pci/pcie/pme.c) and built and installed the 
kernel.  Sorry, but I still get the oops
when I reboot.

> I'm hoping your problem will go away after this. Then, we can start peeling 
> the onion.
> 


Re: Oops on 4.15-rc[123] on shutdown/reboot

2017-12-11 Thread Chris Clayton


On 11/12/17 16:29, Sinan Kaya wrote:
> Hi Chris,
> 
>>
>> I'm more than happy to provide additional diagnostics and test proposed 
>> fixes. As a starter for ten, I've attached the
>> output from 'lspci -v'. If, however, you need to see the backtrace, I'll 
>> need some advice on how to capture that.
>>
> 
> Can you open a bugzilla and also share the boot log?
> 

Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and add 
this and the lspci output that I sent with
my original repoart.

> There must be something unique about your system.
>  
> Sinan
> 
[0.00] Linux version 4.15.0-rc3 (chris@laptop) (gcc version 7.2.1 20171207 (GCC)) #398 SMP PREEMPT Mon Dec 11 07:46:20 GMT 2017
[0.00] Command line: ro root=/dev/sda2 resume=/dev/sda6 rootfstype=ext4 net.ifnames=0
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x0100-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable
[0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable
[0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved
[0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable
[0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved
[0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable
[0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved
[0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable
[0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] random: fast init done
[0.00] SMBIOS 2.7 present.
[0.00] DMI: Notebook W65_67SZ/W65_67SZ, BIOS 1.03.05 02/26/2014
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C write-protect
[0.00]   D-E7FFF uncachable
[0.00]   E8000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask 7C write-back
[0.00]   1 base 04 mask 7FE000 write-back
[0.00]   2 base 00E000 mask 7FE000 uncachable
[0.00]   3 base 00DE00 mask 7FFE00 uncachable
[0.00]   4 base 00DD00 mask 7FFF00 uncachable
[0.00]   5 base 041FE0 mask 7FFFE0 uncachable
[0.00]   6 disabled
[0.00]   7 disabled
[0.00]   8 disabled
[0.00]   9 disabled
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.00] e820: update [mem 0xdd00-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at [(ptrval)]
[0.00] Base memory trampoline at [(ptrval)] 97000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x02098000, 0x02098fff] PGTABLE
[0.00] BRK [0x02099000, 0x02099fff] PGTABLE
[0.00] BRK [0x0209a000, 0x0209afff] PGTABLE
[0.00] BRK [0x0209b000, 0x0209bfff] PGTABLE
[0.00] BRK [0x0209c000, 0x0209cfff] PGTABLE
[0.00] BRK [0x0209d000, 0x0209dfff] PGTABLE
[0.00] ACPI: Early tabl

Oops on 4.15-rc[123] on shutdown/reboot

2017-12-11 Thread Chris Clayton
I've been getting an oops when shutting down my laptop (with /sbin/halt) or 
rebooting it (/sbin/reboot or
/usr/sbin/kexec). Unfortunately, I can't provide the backtrace because it is on 
the screen for only a moment before the
system shuts down/reboots.

I have however, bisected it and the outcome is:

cc27b735ad3a75574a6ab1a66ed6b09385e77e5e is the first bad commit
commit cc27b735ad3a75574a6ab1a66ed6b09385e77e5e
Author: Sinan Kaya 
Date:   Wed Oct 25 15:01:02 2017 -0400

PCI/portdrv: Turn off PCIe services during shutdown

Some of the PCIe services such as AER are being left enabled during
shutdown. This might cause spurious AER errors while SOC is being powered
down.

Clean up the PCIe services gracefully during shutdown to clear these false
positives.

Signed-off-by: Sinan Kaya 
Signed-off-by: Bjorn Helgaas 

:04 04 5a827d6956c581344a0bf392e30155c337673c1d 
76c6a39b53604a0a0a370383c3503f80aa7cbc1e M  drivers

I'm confident that this is the correct outcome because a kernel built with the 
preceding commit
(6018182d3158505f11103adaee8ffb53424df986) does not oops. Nor does -rc3 with 
the patch reversed.

I'm more than happy to provide additional diagnostics and test proposed fixes. 
As a starter for ten, I've attached the
output from 'lspci -v'. If, however, you need to see the backtrace, I'll need 
some advice on how to capture that.

Chris

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor 
DRAM Controller (rev 06)
Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor 
DRAM Controller
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=0c 

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor 
PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 16
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: None
Memory behind bridge: None
Prefetchable memory behind bridge: None
Capabilities: [88] Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th 
Gen Core Processor PCI Express x16 Controller
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [a0] Express Root Port (Slot+), MSI 00
Kernel driver in use: pcieport

00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor 
Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
Subsystem: CLEVO/KAPOK Computer 4th Gen Core Processor Integrated 
Graphics Controller
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f780 (64-bit, non-prefetchable) [size=4M]
Memory at e000 (64-bit, prefetchable) [size=256M]
I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c [disabled] [size=128K]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915

00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor 
HD Audio Controller (rev 06)
Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor 
HD Audio Controller
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f7f14000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family 
USB xHCI (rev 05) (prog-if 30 [XHCI])
Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB 
xHCI
Flags: bus master, medium devsel, latency 0, IRQ 16
Memory at f7f0 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+
Kernel driver in use: xhci_hcd

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series 
Chipset Family MEI Controller #1 (rev 04)
Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family MEI 
Controller
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f7f1e000 (64-bit, non-prefetchable) [size=16]
Capabilities: [50] Power Management version 3
Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+

00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family 
USB EHCI #2 (rev 05) (prog-if 20 [EHCI])
Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB 
EHCI
Flags: bus master, medium devsel, laten

"PM / QoS: Fix device resume latency PM QoS" breaks sound

2017-10-28 Thread Chris Clayton
Hi,

I pulled the latestchanges from Linus' tree this evening and have found that 
with the new kernel, sound is not working
on my laptop. More precisely, the built-in speakers don't produce any sound. 
Sound does work when I use ear-plugs in the
headphone socket. It also works via a bluetooth speaker.

I've bisected the problem and ended up at:

0cc2b4e5a020fc7f4d1795741c116c983e9467d7 is the first bad commit
commit 0cc2b4e5a020fc7f4d1795741c116c983e9467d7
Author: Rafael J. Wysocki 
Date:   Tue Oct 24 15:20:45 2017 +0200

PM / QoS: Fix device resume latency PM QoS

The special value of 0 for device resume latency PM QoS means
"no restriction", but there are two problems with that.

First, device resume latency PM QoS requests with 0 as the
value are always put in front of requests with positive
values in the priority lists used internally by the PM QoS
framework, causing 0 to be chosen as an effective constraint
value.  However, that 0 is then interpreted as "no restriction"
effectively overriding the other requests with specific
restrictions which is incorrect.

Second, the users of device resume latency PM QoS have no
way to specify that *any* resume latency at all should be
avoided, which is an artificial limitation in general.

To address these issues, modify device resume latency PM QoS to
use S32_MAX as the "no constraint" value and 0 as the "no
latency at all" one and rework its users (the cpuidle menu
governor, the genpd QoS governor and the runtime PM framework)
to follow these changes.

Also add a special "n/a" value to the corresponding user space I/F
to allow user space to indicate that it cannot accept any resume
latencies at all for the given device.

Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency 
constraints)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
Reported-by: Reinette Chatre 
Tested-by: Reinette Chatre 
Signed-off-by: Rafael J. Wysocki 
Acked-by: Alex Shi 
Cc: All applicable 

:04 04 f0c128ec799bb9894cfc5c341f88ad7bdfb15bac 
9a2e8171ca47f864bd534cd9c160cce58449a889 M  Documentation
:04 04 0028ffec81675e686bdd621c0445d3e814d7980c 
29db53c6356a6fed9c8bdbc2d6bc7bd56a96e529 M  drivers
:04 04 2e66b79bd2ffb4fcb00f04a69a0afe5c80d1d3f3 
dd6d8e90b59389cd2bd8a0c92716d79d2eeb8268 M  include

With that change reverted, the speakers emit sound again.

The audio devices identified by "lspci -vv" are as follows:

00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor 
HD Audio Controller (rev 06)
Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor 
HD Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- 

Re: [PATCH 4.12 004/106] scsi: sg: fix SG_DXFER_FROM_DEV transfers

2017-08-10 Thread Chris Clayton


On 09/08/17 17:51, Greg Kroah-Hartman wrote:
> 4.12-stable review patch.  If anyone has any objections, please let me know.
> 
> ---
 I repeat my comments when the patch was queued for stable:

1. Johannes' commit message says that the transfer must have a length bigger 
than 0, so the code should return false if
the length is less than or equal to 0, but the test is for less than 0.

2. But in any case, there's another patch that removes all this 
sg_is_valid_dxfer() jiggery-pokery and replaces it with
a simpler test. It hasn't reached Linus' tree yet but is, I believe, cc'd to 
stable.


As Johannes said in response to the second of my comments, the patch that 
replaces sg_is_valid_dxfer() with a simpler
test is now in Linus' tree - commit f930c7043663188429cd9b254e9d761edfc101ce. 
Without that change, I think there is
still some breakage in sg.

Chris
---
> 
> From: Johannes Thumshirn 
> 
> commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 upstream.
> 
> SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set
> it to NULL for the old sg_io read/write interface, but must have a
> length bigger than 0. This fixes a regression introduced by commit
> 28676d869bbb ("scsi: sg: check for valid direction before starting the
> request")
> 
> Signed-off-by: Johannes Thumshirn 
> Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the 
> request")
> Reported-by: Chris Clayton 
> Tested-by: Chris Clayton 
> Cc: Douglas Gilbert 
> Reviewed-by: Hannes Reinecke 
> Tested-by: Chris Clayton 
> Acked-by: Douglas Gilbert 
> Signed-off-by: Martin K. Petersen 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  drivers/scsi/sg.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_
>   if (hp->dxferp || hp->dxfer_len > 0)
>   return false;
>   return true;
> - case SG_DXFER_TO_DEV:
>   case SG_DXFER_FROM_DEV:
> + if (hp->dxfer_len < 0)
> + return false;
> + return true;
> + case SG_DXFER_TO_DEV:
>   case SG_DXFER_TO_FROM_DEV:
>   if (!hp->dxferp || hp->dxfer_len == 0)
>   return false;
> 
> 


Re: [PATCH v2] scsi: sg: fix SG_DXFER_FROM_DEV transfers

2017-07-07 Thread Chris Clayton


On 07/07/17 09:56, Johannes Thumshirn wrote:
> SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set
> it to NULL for the old sg_io read/write interface, but must have a length
> bigger than 0. This fixes a regression introduced by commit 28676d869bbb
> ("scsi: sg: check for valid direction before starting the request")
> 

I've tested this new patch and the Nero applications can still find the optical 
drives on my laptop.

Tested-by: Chris Clayton 

> Signed-off-by: Johannes Thumshirn 
> Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the 
> request")
> Reported-by: Chris Clayton 
> Tested-by: Chris Clayton 
> Cc: Douglas Gilbert 
> Reviewed-by: Hannes Reinecke 
> ---
> Changes to v1:
> * Fix breakage of the sg_io v3 interface, verified using sg_inq
> 
>  drivers/scsi/sg.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 21225d62b0c1..1e82d4128a84 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
>   if (hp->dxferp || hp->dxfer_len > 0)
>   return false;
>   return true;
> - case SG_DXFER_TO_DEV:
>   case SG_DXFER_FROM_DEV:
> + if (hp->dxfer_len < 0)
> + return false;
> + return true;
> + case SG_DXFER_TO_DEV:
>   case SG_DXFER_TO_FROM_DEV:
>   if (!hp->dxferp || hp->dxfer_len == 0)
>   return false;
> 


4.7.0-rc7+: Oops during boot with USB pen drive inserted

2016-07-21 Thread Chris Clayton
Hi,

With Linus' latest and greatest, I get an opps when I boot my laptop with a pen 
drive inserted in any USB port. The oops
message is:

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2)

The oops seems to be 100% repeatable. If a USB pen drive is not inserted, the 
laptop boots successfully.

I've taken a photograph of the oops and you can view it at
http://s714.photobucket.com/user/chris2553/media/IMG_20160722_053841.jpg.html.

At the top of the picture, I notice that the partitions on my actual boot disk 
are being reported as being on /dev/sdb,
so it seems likely that, at this point, the pen drive is being seen as 
/dev/sda, although that has scrolled off the
screen. I don't boot via a ramdisk - my kernel has ext4 built in.  The grub2 
entry is:

menuentry "Krisux, Linux 4.7.0-rc7+" {
insmod ext2
set root=(hd0,2)

linux   /boot/vmlinuz-4.7.0-rc7+ ro root=/dev/sda2 resume=/dev/sda6 
rootfstype=ext4 net.ifnames=0

}

(BTW, Krisux is not a real distro - it's just the name I have given Linux from 
Scratch system.)

The stack that is dumped is:

dump_stack
panic
printk
mount_block_root
prepare_namespace
kernel_init_freeable
kernel_init
ret_from_fork
rest_init

I realise I could work around this by specifying the boot partition by, say, 
its UUID, but I thought you would want me
to report this anyway.

Of course, I'm happy to provide any other information required and to test any 
fix, but I will be out and about for the
next 14-16 hours, so it will be later tonight or maybe even tomorrow before I 
can respond.

Chris



Re: 4.5 Regression - mouse not working after resume from suspend

2016-02-18 Thread Chris Clayton
I can only assume that although a non-working bluetooth mouse is a symptom of 
this regression, the silence of the
bluetooth folks is because the fault does not lie in the BT subsystem. 
Consequently, I'm transferring the problem back
to LKML in the hope that someone else can solve the problem.

On 15/02/16 23:40, Chris Clayton wrote:
> Hi,
> 
> Is there anything else I can do to help diagnose this regression.
> 
> To summarise my BT mouse does not work after resuming from suspend to disk or 
> ram. IT works perfectly in earlier 4.4,
> 4.3 and 4.2 kernels. I bisected and found the first bad commit is:
> 
> 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit
> commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd
> Author: Johan Hedberg 
> Date:   Wed Nov 25 16:15:44 2015 +0200
> 
> Bluetooth: Perform HCI update for power on synchronously
> 
> Johan requested additional information, which I provided. Checking the 
> archive at marc.info, it seems the mail didn't
> make it to the mailing list. Maybe it exceeded a size limit, I don't know. 
> Anyway I copied the mail to Johan and Marcel.
> 
> A bit more experimentation revealed that I can reactivate the mouse if I 
> restart the bluetooth daemon after the machine
> resumes.
> 
> Please let me know if I can provide anything else.
> 
> Thanks
> 
> Chris
> 
> On 06/02/16 15:23, Chris Clayton wrote:
>> Hi Johan,
>>
>> The information you requested has been captured from v4.5-rc2-340-g5af9c2e 
>> and is included below.
>>
>> On 06/02/16 14:33, Johan Hedberg wrote:
>>> Hi Chris,
>>>
>>> On Sat, Feb 06, 2016, Chris Clayton wrote:
>>>> On 06/02/16 11:38, Chris Clayton wrote:
>>>>> On 06/02/16 08:37, Chris Clayton wrote:
>>>>>> There seems to be a regression in resuming my laptop from a suspend
>>>>>> to RAM or disk. The symptom is that my bluetooth
>>>>>> mouse doesn't work after the resume. The kernel is built after a
>>>>>> pull of Linus' tree this morning (v4.5-rc2-340-g5af9c2e).
>>>>>>
>>>>>> Attached is the output from dmesg showing the boot, suspend (to
>>>>>> RAM) and resume. You'll see that during the resume,
>>>>>> error -517 is being reported for some devices. Suspend/resume has worked 
>>>>>> perfectly with a 4.[234].x kernels.
>>>>>>
>>>>>> I'll start a bisection, but thought I'd give a heads up in case
>>>>>> someone can see the problem before I get done with the
>>>>>> bisect.
>>>>>
>>>>> The bisection ended up at:
>>>>>
>>>>> 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit
>>>>> commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd
>>>>> Author: Johan Hedberg 
>>>>> Date:   Wed Nov 25 16:15:44 2015 +0200
>>>>>
>>>>> Bluetooth: Perform HCI update for power on synchronously
>>>>>
>>>>> The request to update HCI during power on is always coming either from
>>>>> hdev->req_workqueue or through an ioctl, so it's safe to use
>>>>> hci_req_sync for it. This way we also eliminate potential races with
>>>>> incoming mgmt commands or other actions while powering on.
>>>>>
>>>>> Part of this refactoring is the splitting of mgmt_powered() into
>>>>> mgmt_power_on() and __mgmt_power_off() functions. The main reason is
>>>>> the different requirements as far as hdev locking is concerned, as
>>>>> highlighted with the __ prefix of the power off API.
>>>>>
>>>>> Since the power on in the case of clearing the AUTO_OFF flag cannot be
>>>>> done synchronously in the set_powered mgmt handler, the hci_power_on
>>>>> work callback is extended to cover this (which also simplifies the
>>>>> set_powered helper a lot).
>>>>>
>>>>> Signed-off-by: Johan Hedberg 
>>>>> Signed-off-by: Marcel Holtmann 
>>>>>
>>>>> :04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 
>>>>> a1eff79cec3ee7208e5aa200ab5069726bbeea8e M  include
>>>>> :04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 
>>>>> 0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M  net
>>>>
>>>> I've just built a kernel at bf943cbf76ecd3b9838a80d5e08777b0f4ccc665
>>>&g

Re: 4.5 Regression - mouse not working after resume from suspend

2016-02-06 Thread Chris Clayton


On 06/02/16 11:38, Chris Clayton wrote:
> 
> 
> On 06/02/16 08:37, Chris Clayton wrote:
>> There seems to be a regression in resuming my laptop from a suspend to RAM 
>> or disk. The symptom is that my bluetooth
>> mouse doesn't work after the resume. The kernel is built after a pull of 
>> Linus' tree this morning (v4.5-rc2-340-g5af9c2e).
>>
>> Attached is the output from dmesg showing the boot, suspend (to RAM) and 
>> resume. You'll see that during the resume,
>> error -517 is being reported for some devices. Suspend/resume has worked 
>> perfectly with a 4.[234].x kernels.
>>
>> I'll start a bisection, but thought I'd give a heads up in case someone can 
>> see the problem before I get done with the
>> bisect.
>>
> 
> The bisection ended up at:
> 
> 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit
> commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd
> Author: Johan Hedberg 
> Date:   Wed Nov 25 16:15:44 2015 +0200
> 
> Bluetooth: Perform HCI update for power on synchronously
> 
> The request to update HCI during power on is always coming either from
> hdev->req_workqueue or through an ioctl, so it's safe to use
> hci_req_sync for it. This way we also eliminate potential races with
> incoming mgmt commands or other actions while powering on.
> 
> Part of this refactoring is the splitting of mgmt_powered() into
> mgmt_power_on() and __mgmt_power_off() functions. The main reason is
> the different requirements as far as hdev locking is concerned, as
> highlighted with the __ prefix of the power off API.
> 
> Since the power on in the case of clearing the AUTO_OFF flag cannot be
> done synchronously in the set_powered mgmt handler, the hci_power_on
> work callback is extended to cover this (which also simplifies the
> set_powered helper a lot).
> 
> Signed-off-by: Johan Hedberg 
> Signed-off-by: Marcel Holtmann 
> 
> :04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 
> a1eff79cec3ee7208e5aa200ab5069726bbeea8e M  include
> :04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 
> 0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M  net
> 
> 

I've just built a kernel at bf943cbf76ecd3b9838a80d5e08777b0f4ccc665 (the 
commit prior to the one the bisect landed on)
and my BT mouse works fine after a suspend/resume. With a kernel built at 
2ff13894cfb877cb3d02d96a8402202f0a6f3efd, the
mouse does not work after resume.

> The bisect log is:
> 
> git bisect start
> # bad: [5af9c2e19da6514a1a50b07d97d93b74a7711873] Merge branch 'akpm' 
> (patches from Andrew)
> git bisect bad 5af9c2e19da6514a1a50b07d97d93b74a7711873
> # good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4
> git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
> # bad: [63b6da39bb38e8f1a1ef3180d32a39d6baf9da84] perf: Fix 
> perf_event_exit_task() race
> git bisect bad 63b6da39bb38e8f1a1ef3180d32a39d6baf9da84
> # bad: [aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7
> # good: [60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af] Merge tag 
> 'upstream-4.5-rc1' of git://git.infradead.org/linux-ubifs
> git bisect good 60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af
> # bad: [a188222b6ed29404ac2d4232d35d1fe0e77af370] net: Rename 
> NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK
> git bisect bad a188222b6ed29404ac2d4232d35d1fe0e77af370
> # good: [1343c65f70ee1b1f968a08b30e1836a4e37116cd] fm10k: always check 
> init_hw for errors
> git bisect good 1343c65f70ee1b1f968a08b30e1836a4e37116cd
> # good: [bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9] Merge branch 
> 'for-4.5-ancestor-test' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> git bisect good bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9
> # good: [a4fcad656e1100bdda9b0b752b93a1a276810469] fm10k: whitespace cleanups
> git bisect good a4fcad656e1100bdda9b0b752b93a1a276810469
> # bad: [7302b9d90117496049dd4bfa28755f7c2ed55b27] ieee802154/adf7242: Driver 
> for ADF7242 MAC IEEE802154
> git bisect bad 7302b9d90117496049dd4bfa28755f7c2ed55b27
> # bad: [a0c38245153abe1fd844af9b166d1a5d5dafe7b1] Bluetooth: hci_intel: Use 
> shorter timeout for HCI commands
> git bisect bad a0c38245153abe1fd844af9b166d1a5d5dafe7b1
> # good: [bf943cbf76ecd3b9838a80d5e08777b0f4ccc665] Bluetooth: Move fast 
> connectable code to hci_request.c
> git bisect good bf943cbf76ecd3b9838a80d5e08777b0f4ccc665
> # bad: [742c59516822f4a4bc23b0961d88c569a7f1bf71] Bluetooth: Simplify setting 
> Configuration Field
> git bisect bad 742c59516822f4a4bc23b0961d88c569a7f1bf71
>

Re: 4.5 Regression - mouse not working after resume from suspend

2016-02-06 Thread Chris Clayton


On 06/02/16 08:37, Chris Clayton wrote:
> There seems to be a regression in resuming my laptop from a suspend to RAM or 
> disk. The symptom is that my bluetooth
> mouse doesn't work after the resume. The kernel is built after a pull of 
> Linus' tree this morning (v4.5-rc2-340-g5af9c2e).
> 
> Attached is the output from dmesg showing the boot, suspend (to RAM) and 
> resume. You'll see that during the resume,
> error -517 is being reported for some devices. Suspend/resume has worked 
> perfectly with a 4.[234].x kernels.
> 
> I'll start a bisection, but thought I'd give a heads up in case someone can 
> see the problem before I get done with the
> bisect.
> 

The bisection ended up at:

2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit
commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd
Author: Johan Hedberg 
Date:   Wed Nov 25 16:15:44 2015 +0200

Bluetooth: Perform HCI update for power on synchronously

The request to update HCI during power on is always coming either from
hdev->req_workqueue or through an ioctl, so it's safe to use
hci_req_sync for it. This way we also eliminate potential races with
incoming mgmt commands or other actions while powering on.

Part of this refactoring is the splitting of mgmt_powered() into
mgmt_power_on() and __mgmt_power_off() functions. The main reason is
the different requirements as far as hdev locking is concerned, as
highlighted with the __ prefix of the power off API.

Since the power on in the case of clearing the AUTO_OFF flag cannot be
done synchronously in the set_powered mgmt handler, the hci_power_on
work callback is extended to cover this (which also simplifies the
set_powered helper a lot).

Signed-off-by: Johan Hedberg 
Signed-off-by: Marcel Holtmann 

:04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 
a1eff79cec3ee7208e5aa200ab5069726bbeea8e M  include
:04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 
0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M  net


The bisect log is:

git bisect start
# bad: [5af9c2e19da6514a1a50b07d97d93b74a7711873] Merge branch 'akpm' (patches 
from Andrew)
git bisect bad 5af9c2e19da6514a1a50b07d97d93b74a7711873
# good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4
git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
# bad: [63b6da39bb38e8f1a1ef3180d32a39d6baf9da84] perf: Fix 
perf_event_exit_task() race
git bisect bad 63b6da39bb38e8f1a1ef3180d32a39d6baf9da84
# bad: [aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7
# good: [60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af] Merge tag 'upstream-4.5-rc1' 
of git://git.infradead.org/linux-ubifs
git bisect good 60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af
# bad: [a188222b6ed29404ac2d4232d35d1fe0e77af370] net: Rename NETIF_F_ALL_CSUM 
to NETIF_F_CSUM_MASK
git bisect bad a188222b6ed29404ac2d4232d35d1fe0e77af370
# good: [1343c65f70ee1b1f968a08b30e1836a4e37116cd] fm10k: always check init_hw 
for errors
git bisect good 1343c65f70ee1b1f968a08b30e1836a4e37116cd
# good: [bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9] Merge branch 
'for-4.5-ancestor-test' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect good bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9
# good: [a4fcad656e1100bdda9b0b752b93a1a276810469] fm10k: whitespace cleanups
git bisect good a4fcad656e1100bdda9b0b752b93a1a276810469
# bad: [7302b9d90117496049dd4bfa28755f7c2ed55b27] ieee802154/adf7242: Driver 
for ADF7242 MAC IEEE802154
git bisect bad 7302b9d90117496049dd4bfa28755f7c2ed55b27
# bad: [a0c38245153abe1fd844af9b166d1a5d5dafe7b1] Bluetooth: hci_intel: Use 
shorter timeout for HCI commands
git bisect bad a0c38245153abe1fd844af9b166d1a5d5dafe7b1
# good: [bf943cbf76ecd3b9838a80d5e08777b0f4ccc665] Bluetooth: Move fast 
connectable code to hci_request.c
git bisect good bf943cbf76ecd3b9838a80d5e08777b0f4ccc665
# bad: [742c59516822f4a4bc23b0961d88c569a7f1bf71] Bluetooth: Simplify setting 
Configuration Field
git bisect bad 742c59516822f4a4bc23b0961d88c569a7f1bf71
# bad: [02c04afea93fbba7925984df455bc63e7d92da97] Bluetooth: Simplify 
read_adv_features code
git bisect bad 02c04afea93fbba7925984df455bc63e7d92da97
# bad: [2ff13894cfb877cb3d02d96a8402202f0a6f3efd] Bluetooth: Perform HCI update 
for power on synchronously
git bisect bad 2ff13894cfb877cb3d02d96a8402202f0a6f3efd
# first bad commit: [2ff13894cfb877cb3d02d96a8402202f0a6f3efd] Bluetooth: 
Perform HCI update for power on synchronously


Just shout if you need any additional diagnotics.


> Chris
> 


4.5 Regression - mouse not working after resume from suspend

2016-02-06 Thread Chris Clayton
There seems to be a regression in resuming my laptop from a suspend to RAM or 
disk. The symptom is that my bluetooth
mouse doesn't work after the resume. The kernel is built after a pull of Linus' 
tree this morning (v4.5-rc2-340-g5af9c2e).

Attached is the output from dmesg showing the boot, suspend (to RAM) and 
resume. You'll see that during the resume,
error -517 is being reported for some devices. Suspend/resume has worked 
perfectly with a 4.[234].x kernels.

I'll start a bisection, but thought I'd give a heads up in case someone can see 
the problem before I get done with the
bisect.

Chris
[0.00] Linux version 4.5.0-rc2+ (chris@laptop) (gcc version 5.3.1 
20160202 (GCC) ) #318 SMP PREEMPT Sat Feb 6 06:38:55 GMT 2016
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.5.0-rc2+ ro 
root=/dev/sda2 resume=/dev/sda6 rootfstype=ext4 net.ifnames=0
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable
[0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable
[0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved
[0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable
[0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved
[0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable
[0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved
[0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable
[0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI: Notebook W65_67SZ   
 /W65_67SZ, BIOS 1.03.05 02/26/2014
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C write-protect
[0.00]   D-E7FFF uncachable
[0.00]   E8000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask 7C write-back
[0.00]   1 base 04 mask 7FE000 write-back
[0.00]   2 base 00E000 mask 7FE000 uncachable
[0.00]   3 base 00DE00 mask 7FFE00 uncachable
[0.00]   4 base 00DD00 mask 7FFF00 uncachable
[0.00]   5 base 041FE0 mask 7FFFE0 uncachable
[0.00]   6 disabled
[0.00]   7 disabled
[0.00]   8 disabled
[0.00]   9 disabled
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xdd00-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at 
[880fd820]
[0.00] Base memory trampoline at [88097000] 97000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] BRK [0x01a05000, 0x01a05fff] PGTABLE
[0.00] BRK [0x01a06000, 0x01a06fff] PGTABLE
[0.00] BRK [0x01a07000, 0x01a07fff] PGTABLE
[0.00] BRK [0x01a08000, 0x01a08fff] PGTABLE
[0.00] BRK [0x01a09000, 0x01a09fff] PGTA

Re: EXT4: new warnings from 4.3.0-rc2

2015-09-21 Thread Chris Clayton


On 09/21/15 15:55, Chris Clayton wrote:
> Thanks Ortwin.
> 
> On 09/21/15 14:27, Ortwin Glück wrote:
>>> [2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature 
>>> incompatibilities
>>> [2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature 
>>> incompatibilities
>>
>> As the kernel doesn't know which FS your root is, it tries the whole list of 
>> filesystems (init/do_mounts.c
>> mount_block_root()). Since the removal of ext3, now the ext4 code is 
>> responsbile for mounting ext3. Since your FS is
>> ext4 and not ext3, the probe for ext3 fails. That's what the message tells 
>> you. You get these even in previous kernels
>> if you say N to ext3 during config.
>>
> No, I do not get the messages from 4.2.0 even though it is configured the 
> same as 4.3.0-rc3 as far as EXT{2,3,4} is
> concerned:
> 
> # CONFIG_EXT2_FS is not set
> # CONFIG_EXT3_FS is not set
> CONFIG_EXT4_FS=y
> CONFIG_EXT4_USE_FOR_EXT2=y
> # CONFIG_EXT4_FS_POSIX_ACL is not set
> # CONFIG_EXT4_FS_SECURITY is not set
> # CONFIG_EXT4_ENCRYPTION is not set
> # CONFIG_EXT4_DEBUG is not set
> [chris:~/kernel/linux]$ cd ../linux-4.2.0/
> [chris:~/kernel/linux-4.2.0]$ grep EXT[234] .config
> # CONFIG_EXT2_FS is not set
> # CONFIG_EXT3_FS is not set
> CONFIG_EXT4_FS=y
> CONFIG_EXT4_USE_FOR_EXT23=y
> # CONFIG_EXT4_FS_POSIX_ACL is not set
> # CONFIG_EXT4_FS_SECURITY is not set
> # CONFIG_EXT4_ENCRYPTION is not set
> # CONFIG_EXT4_DEBUG is not set
> 
> That's why I said they are new messages.
> 
> I've just booted 4.1.7 and I get the messages from that kernel too. I wonder 
> if there's a recent fix that has made it
> into 4.1.7, but not into 4.2.0. I'll apply Greg's 4.2.1-rc1 patch and see 
> what I get with that.
> 

Applying the 4.2.1-rc1 patch results in a kernel that emits the messages, so I 
guess my fix-not-yet-in-4.2 theory is right.

I'll just ignore the messages. Sorry for the noise.

> Chris
> 
> 
>> If it bugs you, you can add a hint to your kernel command line: 
>> rootfstype=ext4
>>
>> Ortwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: EXT4: new warnings from 4.3.0-rc2

2015-09-21 Thread Chris Clayton
Thanks Ortwin.

On 09/21/15 14:27, Ortwin Glück wrote:
>> [2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature 
>> incompatibilities
>> [2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature 
>> incompatibilities
> 
> As the kernel doesn't know which FS your root is, it tries the whole list of 
> filesystems (init/do_mounts.c
> mount_block_root()). Since the removal of ext3, now the ext4 code is 
> responsbile for mounting ext3. Since your FS is
> ext4 and not ext3, the probe for ext3 fails. That's what the message tells 
> you. You get these even in previous kernels
> if you say N to ext3 during config.
> 
No, I do not get the messages from 4.2.0 even though it is configured the same 
as 4.3.0-rc3 as far as EXT{2,3,4} is
concerned:

# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_ENCRYPTION is not set
# CONFIG_EXT4_DEBUG is not set
[chris:~/kernel/linux]$ cd ../linux-4.2.0/
[chris:~/kernel/linux-4.2.0]$ grep EXT[234] .config
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT23=y
# CONFIG_EXT4_FS_POSIX_ACL is not set
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_ENCRYPTION is not set
# CONFIG_EXT4_DEBUG is not set

That's why I said they are new messages.

I've just booted 4.1.7 and I get the messages from that kernel too. I wonder if 
there's a recent fix that has made it
into 4.1.7, but not into 4.2.0. I'll apply Greg's 4.2.1-rc1 patch and see what 
I get with that.

Chris


> If it bugs you, you can add a hint to your kernel command line: 
> rootfstype=ext4
> 
> Ortwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


EXT4: new warnings from 4.3.0-rc2

2015-09-21 Thread Chris Clayton
Hi,

I've just built and booted 4.3.0-rc2 and I'm seeing the following new messages 
on the console during boot up:

[2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature 
incompatibilities
[2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature 
incompatibilities

They are immediately followed by:

[2.507948] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: 
(null)
[3.549523] EXT4-fs (sda2): re-mounted. Opts: (null)

and they are the messages I normally see (from 4.2.0 and earlier).

sda2 is my root partition and is mounted OK, so my system is operating as 
before, but I thought you would want a heads
up about these (slightly alarming) new console messages.

The output from dmesg is attached, in case it helps.

Chris
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 4.3.0-rc2 (chris@laptop) (gcc version 5.2.1 
20150915 (GCC) ) #251 SMP PREEMPT Mon Sep 21 07:50:42 BST 2015
[0.00] Command line: root=/dev/sda2 ro resume=/dev/sda6
[0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 
bytes, using 'standard' format.
[0.00] x86/fpu: Using 'eager' FPU context switches.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable
[0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable
[0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved
[0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable
[0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved
[0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable
[0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved
[0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable
[0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI: Notebook W65_67SZ   
 /W65_67SZ, BIOS 1.03.05 02/26/2014
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C write-protect
[0.00]   D-E7FFF uncachable
[0.00]   E8000-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask 7C write-back
[0.00]   1 base 04 mask 7FE000 write-back
[0.00]   2 base 00E000 mask 7FE000 uncachable
[0.00]   3 base 00DE00 mask 7FFE00 uncachable
[0.00]   4 base 00DD00 mask 7FFF00 uncachable
[0.00]   5 base 041FE0 mask 7FFFE0 uncachable
[0.00]   6 disabled
[0.00]   7 disabled
[0.00]   8 disabled
[0.00]   9 disabled
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[0.00] e820: update [mem 0xdd00-0x] usable ==> reserved
[0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at 
[880fd820]
[0.00] Base memory trampoline at [88097000] 97000 size 24576
[0.00] Using GB pages for direct mapping
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000

Re: [PATCH] iommu: prompt for IOMMU_IO_PGTABLE_LPAE on ARM archs only

2015-02-16 Thread Chris Clayton


On 02/16/15 16:32, Will Deacon wrote:
> Hi Chris,
> 
> On Sun, Feb 15, 2015 at 11:17:19AM +0000, Chris Clayton wrote:
>> When running "make oldconfig" for an x86_64 kernel, I was prompted for a
>> setting for IOMMU_IO_PGTABLE_LPAE. From the prompt and the help text it
>> appears that this config item is relevant to ARMv7/v8 only. This patch
>> prevents the prompt on non-ARM architectures. Compile tested building a
>> cross-compiled x86_64 kernel in an x86 user space. The resultant kernel
>> boots fine and I am running it now.
>>
>> Fixes: e1d3c0fd701df831169b116cd5c5d6203ac07f70
>> Cc: will.dea...@arm.com
>> Signed-off-by: Chris Clayton 
>>
>> --- linux/drivers/iommu/Kconfig.orig2015-02-15 09:44:01.235927248 +
>> +++ linux/drivers/iommu/Kconfig 2015-02-15 09:44:41.131926434 +
>> @@ -22,6 +22,7 @@ config IOMMU_IO_PGTABLE
>>
>>  config IOMMU_IO_PGTABLE_LPAE
>> bool "ARMv7/v8 Long Descriptor Format"
>> +   depends on ARM || ARM64
>> select IOMMU_IO_PGTABLE
>> help
>>   Enable support for the ARM long descriptor pagetable format.
> 
> What's the problem with this? The page-table code is intentionally
> decoupled from the CPU architecture and having this boot-tested on x86
> found some real bugs that I'm currently fixing. Sure, you probably don't
> need this on your box, but it's not default y and you don't have to
> select it.
> 

There's no real problem except that, as I said, the prompt and the help text 
suggest that the config is relevant to ARM
architecture only. Same with the help text. When it popped up on x86_64, it was 
a surprise.

As you say, I can simply answer "N", but the prompt and the help need 
correcting, because for an ordinary Joe User like
me, it's misleading.


> Will
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iommu: prompt for IOMMU_IO_PGTABLE_LPAE on ARM archs only

2015-02-15 Thread Chris Clayton
When running "make oldconfig" for an x86_64 kernel, I was prompted for a 
setting for IOMMU_IO_PGTABLE_LPAE. From the
prompt and the help text it appears that this config item is relevant to 
ARMv7/v8 only. This patch prevents the prompt
on non-ARM architectures. Compile tested building a cross-compiled x86_64 
kernel in an x86 user space. The resultant
kernel boots fine and I am running it now.

Fixes: e1d3c0fd701df831169b116cd5c5d6203ac07f70
Cc: will.dea...@arm.com
Signed-off-by: Chris Clayton 

--- linux/drivers/iommu/Kconfig.orig2015-02-15 09:44:01.235927248 +
+++ linux/drivers/iommu/Kconfig 2015-02-15 09:44:41.131926434 +
@@ -22,6 +22,7 @@ config IOMMU_IO_PGTABLE

 config IOMMU_IO_PGTABLE_LPAE
bool "ARMv7/v8 Long Descriptor Format"
+   depends on ARM || ARM64
select IOMMU_IO_PGTABLE
help
  Enable support for the ARM long descriptor pagetable format.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG in 3.19.0-rc3+

2015-01-11 Thread Chris Clayton
Thanks Konstantin.

[snip]

>>>
>>> Looks like degree (%edx) is 1 on anon-vma desruction.
>>> Probably I've overlooked some weird conrner case in vma splitting/merging.
>>>
>>> Could you try this patch. It disables vma merging end eliminates half
>>> of complicated paths.
>>> As I see merging is optional, everything should work fine without it.
>>>
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -1048,7 +1048,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
>>>  * We later require that vma->vm_flags == vm_flags,
>>>  * so this tests vma->vm_flags & VM_SPECIAL, too.
>>>  */
>>> -   if (vm_flags & VM_SPECIAL)
>>> +   if (1)
>>> return NULL;
>>>
>>> if (prev)
>>>
>>>
>>>
>>>
>>> Code from your oops.
>>>
>>> Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 8b 45 10 48 8d 55 10 48
>>> 83 e8 10 49 39 d6 74 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f>
>>> 0b 0f 1f 40 00 e8 6b fc ff ff eb 9a 66 0f 1f 84 00 00 00 00
>>> All code
>>> 
>>>0: 00 ad de 48 89 43 add%ch,0x438948de(%rbp)
>>>6: 18 e8 sbb%ch,%al
>>>8: c5 f9 00 (bad)
>>>b: 00 48 8b add%cl,-0x75(%rax)
>>>e: 45 10 48 8d   adc%r9b,-0x73(%r8)
>>>   12: 55   push   %rbp
>>>   13: 10 48 83 adc%cl,-0x7d(%rax)
>>>   16: e8 10 49 39 d6   callq  0xd639492b
>>>   1b: 74 54 je 0x71
>>>   1d: 48 8b 7d 08   mov0x8(%rbp),%rdi
>>>   21: 48 89 eb mov%rbp,%rbx
>>>   24: 8b 57 34 mov0x34(%rdi),%edx
>>>   27: 85 d2 test   %edx,%edx
>>>   29: 74 9e je 0xffc9
>>>   2b:* 0f 0b ud2 <-- trapping instruction
>>>   2d: 0f 1f 40 00   nopl   0x0(%rax)
>>>   31: e8 6b fc ff ff   callq  0xfca1
>>>   36: eb 9a jmp0xffd2
>>>   38: 66   data16
>>>   39: 0f   .byte 0xf
>>>   3a: 1f   (bad)
>>>   3b: 84 00 test   %al,(%rax)
>>>   3d: 00 00 add%al,(%rax)
>>> ...
>>>
>>> Code starting with the faulting instruction
>>> ===
>>>0: 0f 0b ud2
>>>2: 0f 1f 40 00   nopl   0x0(%rax)
>>>6: e8 6b fc ff ff   callq  0xfc76
>>>b: eb 9a jmp0xffa7
>>>d: 66   data16
>>>e: 0f   .byte 0xf
>>>f: 1f   (bad)
>>>   10: 84 00 test   %al,(%rax)
>>>   12: 00 00 add%al,(%rax)
>>>
>>>
>>> +Added Oded Gabbay  into cc, he's reported this
>>> problem too.
>>>
>> Thanks for the fast reply.
>>
>> I applied the patch and tested it. I wasn't able to reproduce *my* problem,
>> so you are definitely in the right direction :)
> 
> Ok. I've found something. Try patch from attachment.
> 

Your patch has fixed the BUG for me. Thank you.

Tested-by: Chris Clayton 

>>
>> Oded
>>
>>>>
>>>> In case it helps, I've attached the xz-compressed related config file.
>>>>
>>>> Chris
>>>>
>>>>>
>>>>> I've attached the full kernel log file for that boot.
>>>>>
>>>>> Chris
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG in 3.19.0-rc3+

2015-01-11 Thread Chris Clayton


On 01/11/15 09:52, Oded Gabbay wrote:
> 
> 
> On 01/11/2015 11:37 AM, Konstantin Khlebnikov wrote:
>> On Sun, Jan 11, 2015 at 11:16 AM, Chris Clayton
>>  wrote:
>>> Hi,
>>>
>>> I've done the bisect and the outcome is below, but, because I almost always 
>>> forget to mention it, I'll say here that I
>>> am running a 32 bit user space on a 64 bit kernel.
>>>
>>> On 01/10/15 20:17, Chris Clayton wrote:
>>>> Hi,
>>>>
>>>> I'm getting a bug a BUG report from a kernel built from a pull (earlier 
>>>> today) of the current development kernel
>>>> (running git describe gives v3.19-rc3-169-geb74926). So that I have 
>>>> useable wireless networking, I have also applied the
>>>> latest seven iwlwifi patches from the wireless-drivers git tree. Prior to 
>>>> today's pull, I was not seeing anything
>>>> unusual in dmesg.
>>>>
>>>> The BUG reported is as follows:
>>>>
>>>> Jan 10 19:41:32 laptop kernel: [ cut here ]
>>>> Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399!
>>>> Jan 10 19:41:32 laptop kernel: invalid opcode:  [#1] PREEMPT SMP
>>>> Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via 
>>>> iwlmvm coretemp snd_hda_codec_hdmi
>>>> snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller 
>>>> x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211
>>>> snd_hda_codec snd_hwdep
>>>> Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted 
>>>> 3.19.0-rc3+ #42
>>>> Jan 10 19:41:32 laptop kernel: Hardware name: Notebook 
>>>> W65_67SZ/W65_67SZ
>>>>, BIOS 1.03.05 02/26/2014
>>>> Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 
>>>> task.ti: 880408dd4000
>>>> Jan 10 19:41:32 laptop kernel: RIP: 0010:[]  
>>>> [] unlink_anon_vmas+0x17a/0x200
>>>> Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88  EFLAGS: 00010286
>>>> Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 
>>>> RCX: 
>>>> Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 
>>>> RDI: 880409f04320
>>>> Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08:  
>>>> R09: 88040d801c00
>>>> Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 
>>>> R12: 880409f04320
>>>> Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 
>>>> R15: 88040cb13210
>>>> Jan 10 19:41:33 laptop kernel: FS:  () 
>>>> GS:88041fa4() knlGS:
>>>> Jan 10 19:41:33 laptop kernel: CS:  0010 DS: 002b ES: 002b CR0: 
>>>> 80050033
>>>> Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 
>>>> CR4: 001407e0
>>>> Jan 10 19:41:33 laptop kernel: Stack:
>>>> Jan 10 19:41:33 laptop kernel:  88040d6cfbd8 88040d6cfba0 
>>>> 88040cecd160 88040cb13210
>>>> Jan 10 19:41:33 laptop kernel:  88040cbbb630 f7151000 
>>>> 880408dd7e28 
>>>> Jan 10 19:41:33 laptop kernel:   810e3633 
>>>>  
>>>> Jan 10 19:41:33 laptop kernel: Call Trace:
>>>> Jan 10 19:41:33 laptop kernel:  [] ? 
>>>> free_pgtables+0x83/0xf0
>>>> Jan 10 19:41:34 laptop kernel:  [] ? exit_mmap+0xc3/0x150
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> __do_page_fault+0x17d/0x4b0
>>>> Jan 10 19:41:34 laptop kernel:  [] ? mmput+0x21/0xc0
>>>> Jan 10 19:41:34 laptop kernel:  [] ? do_exit+0x26d/0xa50
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> mntput_no_expire+0x9/0x140
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> task_work_run+0xbc/0xf0
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> do_group_exit+0x34/0xb0
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> SyS_exit_group+0xf/0x10
>>>> Jan 10 19:41:34 laptop kernel:  [] ? 
>>>> sysenter_dispatch+0x7/0x1e
>>>> Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 
>>>> 48 8b 45 10 48 8d 55 

Re: BUG in 3.19.0-rc3+

2015-01-11 Thread Chris Clayton
Hi,

I've done the bisect and the outcome is below, but, because I almost always 
forget to mention it, I'll say here that I
am running a 32 bit user space on a 64 bit kernel.

On 01/10/15 20:17, Chris Clayton wrote:
> Hi,
> 
> I'm getting a bug a BUG report from a kernel built from a pull (earlier 
> today) of the current development kernel
> (running git describe gives v3.19-rc3-169-geb74926). So that I have useable 
> wireless networking, I have also applied the
> latest seven iwlwifi patches from the wireless-drivers git tree. Prior to 
> today's pull, I was not seeing anything
> unusual in dmesg.
> 
> The BUG reported is as follows:
> 
> Jan 10 19:41:32 laptop kernel: [ cut here ]
> Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399!
> Jan 10 19:41:32 laptop kernel: invalid opcode:  [#1] PREEMPT SMP
> Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via 
> iwlmvm coretemp snd_hda_codec_hdmi
> snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller 
> x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211
> snd_hda_codec snd_hwdep
> Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted 
> 3.19.0-rc3+ #42
> Jan 10 19:41:32 laptop kernel: Hardware name: Notebook
>  W65_67SZ/W65_67SZ
>, BIOS 1.03.05 02/26/2014
> Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 
> task.ti: 880408dd4000
> Jan 10 19:41:32 laptop kernel: RIP: 0010:[]  
> [] unlink_anon_vmas+0x17a/0x200
> Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88  EFLAGS: 00010286
> Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 
> RCX: 
> Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 
> RDI: 880409f04320
> Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08:  
> R09: 88040d801c00
> Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 
> R12: 880409f04320
> Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 
> R15: 88040cb13210
> Jan 10 19:41:33 laptop kernel: FS:  () 
> GS:88041fa4() knlGS:
> Jan 10 19:41:33 laptop kernel: CS:  0010 DS: 002b ES: 002b CR0: 
> 80050033
> Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 
> CR4: 001407e0
> Jan 10 19:41:33 laptop kernel: Stack:
> Jan 10 19:41:33 laptop kernel:  88040d6cfbd8 88040d6cfba0 
> 88040cecd160 88040cb13210
> Jan 10 19:41:33 laptop kernel:  88040cbbb630 f7151000 
> 880408dd7e28 
> Jan 10 19:41:33 laptop kernel:   810e3633 
>  
> Jan 10 19:41:33 laptop kernel: Call Trace:
> Jan 10 19:41:33 laptop kernel:  [] ? free_pgtables+0x83/0xf0
> Jan 10 19:41:34 laptop kernel:  [] ? exit_mmap+0xc3/0x150
> Jan 10 19:41:34 laptop kernel:  [] ? 
> __do_page_fault+0x17d/0x4b0
> Jan 10 19:41:34 laptop kernel:  [] ? mmput+0x21/0xc0
> Jan 10 19:41:34 laptop kernel:  [] ? do_exit+0x26d/0xa50
> Jan 10 19:41:34 laptop kernel:  [] ? 
> mntput_no_expire+0x9/0x140
> Jan 10 19:41:34 laptop kernel:  [] ? task_work_run+0xbc/0xf0
> Jan 10 19:41:34 laptop kernel:  [] ? do_group_exit+0x34/0xb0
> Jan 10 19:41:34 laptop kernel:  [] ? SyS_exit_group+0xf/0x10
> Jan 10 19:41:34 laptop kernel:  [] ? 
> sysenter_dispatch+0x7/0x1e
> Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 
> 8b 45 10 48 8d 55 10 48 83 e8 10 49 39 d6 74
> 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> 0b 0f 1f 40 00 e8 6b fc ff 
> ff eb 9a 66 0f 1f 84 00 00 00 00
> Jan 10 19:41:34 laptop kernel: RIP  [] 
> unlink_anon_vmas+0x17a/0x200
> Jan 10 19:41:34 laptop kernel:  RSP 
> Jan 10 19:41:34 laptop kernel: ---[ end trace 4aa713b2a9aa664b ]---
> Jan 10 19:41:34 laptop kernel: Fixing recursive fault but reboot is needed!
> Jan 10 19:41:34 laptop kernel: nf_conntrack version 0.5.0 (16384 buckets, 
> 65536 max)

[snip]

> 
> I won't get time tonight, but I can bisect it tomorrow, so this is just a 
> heads up in case the problem (and fix) jumps
> out at anyone.  Before I bisect I'll build and run a kernel without the 
> iwlwifi patches.

The bisect ended up at:

7a3ef208e662f4b63d43a23f61a64a129c525bbc is the first bad commit
commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc
Author: Konstantin Khlebnikov 
Date:   Thu Jan 8 14:32:15 2015 -0800

mm: prevent endless growth of anon_vma hierarchy

Constantly forking task causes unlimited grow of anon_vma chain.  Each
next child allocates new level of anon_

BUG in 3.19.0-rc3+

2015-01-10 Thread Chris Clayton
Hi,

I'm getting a bug a BUG report from a kernel built from a pull (earlier today) 
of the current development kernel
(running git describe gives v3.19-rc3-169-geb74926). So that I have useable 
wireless networking, I have also applied the
latest seven iwlwifi patches from the wireless-drivers git tree. Prior to 
today's pull, I was not seeing anything
unusual in dmesg.

The BUG reported is as follows:

Jan 10 19:41:32 laptop kernel: [ cut here ]
Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399!
Jan 10 19:41:32 laptop kernel: invalid opcode:  [#1] PREEMPT SMP
Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via 
iwlmvm coretemp snd_hda_codec_hdmi
snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller 
x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211
snd_hda_codec snd_hwdep
Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted 
3.19.0-rc3+ #42
Jan 10 19:41:32 laptop kernel: Hardware name: Notebook 
W65_67SZ/W65_67SZ
   , BIOS 1.03.05 02/26/2014
Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 
task.ti: 880408dd4000
Jan 10 19:41:32 laptop kernel: RIP: 0010:[]  
[] unlink_anon_vmas+0x17a/0x200
Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88  EFLAGS: 00010286
Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 RCX: 

Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 RDI: 
880409f04320
Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08:  R09: 
88040d801c00
Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 R12: 
880409f04320
Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 R15: 
88040cb13210
Jan 10 19:41:33 laptop kernel: FS:  () 
GS:88041fa4() knlGS:
Jan 10 19:41:33 laptop kernel: CS:  0010 DS: 002b ES: 002b CR0: 80050033
Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 CR4: 
001407e0
Jan 10 19:41:33 laptop kernel: Stack:
Jan 10 19:41:33 laptop kernel:  88040d6cfbd8 88040d6cfba0 
88040cecd160 88040cb13210
Jan 10 19:41:33 laptop kernel:  88040cbbb630 f7151000 
880408dd7e28 
Jan 10 19:41:33 laptop kernel:   810e3633 
 
Jan 10 19:41:33 laptop kernel: Call Trace:
Jan 10 19:41:33 laptop kernel:  [] ? free_pgtables+0x83/0xf0
Jan 10 19:41:34 laptop kernel:  [] ? exit_mmap+0xc3/0x150
Jan 10 19:41:34 laptop kernel:  [] ? 
__do_page_fault+0x17d/0x4b0
Jan 10 19:41:34 laptop kernel:  [] ? mmput+0x21/0xc0
Jan 10 19:41:34 laptop kernel:  [] ? do_exit+0x26d/0xa50
Jan 10 19:41:34 laptop kernel:  [] ? 
mntput_no_expire+0x9/0x140
Jan 10 19:41:34 laptop kernel:  [] ? task_work_run+0xbc/0xf0
Jan 10 19:41:34 laptop kernel:  [] ? do_group_exit+0x34/0xb0
Jan 10 19:41:34 laptop kernel:  [] ? SyS_exit_group+0xf/0x10
Jan 10 19:41:34 laptop kernel:  [] ? 
sysenter_dispatch+0x7/0x1e
Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 8b 
45 10 48 8d 55 10 48 83 e8 10 49 39 d6 74
54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> 0b 0f 1f 40 00 e8 6b fc ff ff 
eb 9a 66 0f 1f 84 00 00 00 00
Jan 10 19:41:34 laptop kernel: RIP  [] 
unlink_anon_vmas+0x17a/0x200
Jan 10 19:41:34 laptop kernel:  RSP 
Jan 10 19:41:34 laptop kernel: ---[ end trace 4aa713b2a9aa664b ]---
Jan 10 19:41:34 laptop kernel: Fixing recursive fault but reboot is needed!
Jan 10 19:41:34 laptop kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 
max)
Jan 10 19:41:34 laptop kernel: [ cut here ]
Jan 10 19:41:35 laptop kernel: kernel BUG at mm/rmap.c:399!
Jan 10 19:41:35 laptop kernel: invalid opcode:  [#2] PREEMPT SMP
Jan 10 19:41:35 laptop kernel: Modules linked in: iptable_filter xt_conntrack 
ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
rfcomm snd_hda_codec_via iwlmvm coretemp
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel mac80211 hwmon 
snd_hda_controller x86_pkg_temp_thermal
acpi_cpufreq iwlwifi cfg80211 snd_hda_codec snd_hwdep
Jan 10 19:41:35 laptop kernel: CPU: 0 PID: 678 Comm: krootimage Tainted: G  
D3.19.0-rc3+ #42
Jan 10 19:41:35 laptop kernel: Hardware name: Notebook 
W65_67SZ/W65_67SZ
   , BIOS 1.03.05 02/26/2014
Jan 10 19:41:35 laptop kernel: task: 880408de26c0 ti: 880409fcc000 
task.ti: 880409fcc000
Jan 10 19:41:35 laptop kernel: RIP: 0010:[]  
[] unlink_anon_vmas+0x17a/0x200
Jan 10 19:41:35 laptop kernel: RSP: 0018:880409fcfd88  EFLAGS: 00010286
Jan 10 19:41:35 laptop kernel: RAX: 880408370d90 RBX: 880408370d80 RCX: 

Jan 10 19:41:35 laptop kernel: RDX: 000

Re: [tip:x86/urgent] x86: Use $(OBJDUMP) instead of plain objdump

2014-12-01 Thread Chris Clayton
Hi,

Is it planned to get this fix through in time for release of 3.18? It appears 
to have been in -next for a week.

Chris

On 11/23/14 20:24, tip-bot for Chris Clayton wrote:
> Commit-ID:  e2e68ae688b0a3766cd75aedf4ed4e39be402009
> Gitweb: http://git.kernel.org/tip/e2e68ae688b0a3766cd75aedf4ed4e39be402009
> Author:     Chris Clayton 
> AuthorDate: Sat, 22 Nov 2014 09:51:10 +
> Committer:  Thomas Gleixner 
> CommitDate: Sun, 23 Nov 2014 21:21:53 +0100
> 
> x86: Use $(OBJDUMP) instead of plain objdump
> 
> commit e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd'
> broke the cross compile of x86. It added a objdump invocation, which
> invokes the host native objdump and ignores an active cross tool
> chain.
> 
> Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into
> account.
> 
> [ tglx: Massage changelog and use $(OBJDUMP) ]
> 
> Fixes: e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd'
> Signed-off-by: Chris Clayton 
> Acked-by: Kees Cook 
> Acked-by: Borislav Petkov 
> Cc: Junjie Mao 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: sta...@vger.kernel.org
> Link: http://lkml.kernel.org/r/54705c8e.1080...@googlemail.com
> Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/boot/compressed/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index be1e07d..45abc36 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ)  := xz
>  suffix-$(CONFIG_KERNEL_LZO)  := lzo
>  suffix-$(CONFIG_KERNEL_LZ4)  := lz4
>  
> -RUN_SIZE = $(shell objdump -h vmlinux | \
> +RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
>perl $(srctree)/arch/x86/tools/calc_run_size.pl)
>  quiet_cmd_mkpiggy = MKPIGGY $@
>cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false 
> )
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86: Use $(OBJDUMP) instead of plain objdump

2014-11-24 Thread tip-bot for Chris Clayton
Commit-ID:  e2e68ae688b0a3766cd75aedf4ed4e39be402009
Gitweb: http://git.kernel.org/tip/e2e68ae688b0a3766cd75aedf4ed4e39be402009
Author: Chris Clayton 
AuthorDate: Sat, 22 Nov 2014 09:51:10 +
Committer:  Thomas Gleixner 
CommitDate: Sun, 23 Nov 2014 21:21:53 +0100

x86: Use $(OBJDUMP) instead of plain objdump

commit e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd'
broke the cross compile of x86. It added a objdump invocation, which
invokes the host native objdump and ignores an active cross tool
chain.

Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into
account.

[ tglx: Massage changelog and use $(OBJDUMP) ]

Fixes: e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd'
Signed-off-by: Chris Clayton 
Acked-by: Kees Cook 
Acked-by: Borislav Petkov 
Cc: Junjie Mao 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/54705c8e.1080...@googlemail.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/boot/compressed/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index be1e07d..45abc36 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ):= xz
 suffix-$(CONFIG_KERNEL_LZO):= lzo
 suffix-$(CONFIG_KERNEL_LZ4):= lz4
 
-RUN_SIZE = $(shell objdump -h vmlinux | \
+RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
 perl $(srctree)/arch/x86/tools/calc_run_size.pl)
 quiet_cmd_mkpiggy = MKPIGGY $@
   cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: cross-compiling x86_64 kernel on i386 user-space fails

2014-11-22 Thread Chris Clayton
Hi,

Commit e6023367d779060fddc9a52d1f474085b2b36298 broke building an x86_64 kernel 
in an i386. The change added a call to
objdump but neglected to cater for cross-compiling.

The patch below fixes the problem for me. I see the commit is now in 3.14 and 
3.17 -stable, so the patch needs to go
there too.

CC: Junjie Mao 
CC: Kees Cook 
CC: Thomas Gleixner 
CC: Ingo Molnar 
Signed-off-by: Chris Clayton 
---
--- linux/arch/x86/boot/compressed/Makefile~2014-11-22 08:56:50.359706324 
+
+++ linux/arch/x86/boot/compressed/Makefile 2014-11-22 09:04:06.615693435 
+
@@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ):= xz
 suffix-$(CONFIG_KERNEL_LZO):= lzo
 suffix-$(CONFIG_KERNEL_LZ4):= lz4

-RUN_SIZE = $(shell objdump -h vmlinux | \
+RUN_SIZE = $(shell ${CROSS_COMPILE}objdump -h vmlinux | \
 perl $(srctree)/arch/x86/tools/calc_run_size.pl)
 quiet_cmd_mkpiggy = MKPIGGY $@
   cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Networking problem with 3.11-rc6+

2013-09-03 Thread Chris Clayton



On 08/20/13 22:54, Francois Romieu wrote:

Chris Clayton  :
[...]

[0.207094] acpi PNP0A08:00: ACPI _OSC support notification failed, 
disabling PCIe ASPM
[0.207155] acpi PNP0A08:00: Unable to request _OSC control (_OSC support 
mask: 0x08)

[...]

[5.311191] r8169 :07:00.0: can't disable ASPM; OS doesn't have ASPM 
control

[...]

[   65.181715] [] rtl8169_interrupt [r8169]
[   65.181717] Disabling IRQ #17

Let me know if I can provide any additional information, although to
be honest, the boot completed and the KDE desktop started up OK, so
there may not be much else I can provide unless I find that the
problem is repeatable.


Please don't hesitate if it happens again, even if it can't be reproduced
on demand.



I've booted the kernel numerous times since the incident reported above, 
but haven't encountered the problem again. If it does happen again, are 
there any files from /sys or /proc that might help diagnostics?


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Networking problem with 3.11-rc6+

2013-08-20 Thread Chris Clayton

Hello,

I've just booted my laptop and found that networking was broken. Pings 
of other devices on my home network failed. A reboot has restored 
networking, but I thought I should report the problem anyway. I'll have 
no time tomorrow, but on Thursday I'll do a few boots to ascertain how 
repeatable the problem is.


Attached is a complete dmesg, but perhaps the most relevant part is:

[   65.181557] irq 17: nobody cared (try booting with the "irqpoll" option)
[   65.181566] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-rc6+ #124
[   65.181568] Hardware name: FUJITSU LIFEBOOK AH531/FJNBB0F, BIOS 1.30 
05/28/2012
[   65.181571]  0011 c141bb51 f3154840 c108d581 c14e071c 0011 
f3003950 6650
[   65.181581]  f3154840 0011 c108d8f0 1e35395f 0011 c135453b 
 
[   65.181588]   0011  c108bb0b 0e0e192a 75c91d86 
a880c0ad de2115ee

[   65.181597] Call Trace:
[   65.181608]  [] ? dump_stack+0x48/0x76
[   65.181614]  [] ? __report_bad_irq+0x21/0xc0
[   65.181619]  [] ? note_interrupt+0xf0/0x1a0
[   65.181624]  [] ? cpuidle_enter_state+0x3b/0xc0
[   65.181632]  [] ? handle_irq_event_percpu+0x9b/0x120
[   65.181641]  [] ? __io_apic_modify_irq+0x39/0x40
[   65.181646]  [] ? handle_irq_event+0x29/0x40
[   65.181650]  [] ? unmask_irq+0x20/0x20
[   65.181654]  [] ? handle_fasteoi_irq+0x46/0xd0
[   65.181656][] ? do_IRQ+0x31/0xa0
[   65.181667]  [] ? common_interrupt+0x2c/0x31
[   65.181673]  [] ? kmsg_dump_rewind_nolock+0x2b/0x40
[   65.181678]  [] ? cpuidle_enter_state+0x3b/0xc0
[   65.181682]  [] ? cpuidle_idle_call+0x7b/0x110
[   65.181688]  [] ? arch_cpu_idle+0x5/0x20
[   65.181692]  [] ? cpu_startup_entry+0xae/0xf0
[   65.181697]  [] ? start_kernel+0x2a8/0x2ad
[   65.181702]  [] ? repair_env_string+0x4d/0x4d
[   65.181704] handlers:
[   65.181715] [] rtl8169_interrupt [r8169]
[   65.181717] Disabling IRQ #17


Let me know if I can provide any additional information, although to be 
honest, the boot completed and the KDE desktop started up OK, so there 
may not be much else I can provide unless I find that the problem is 
repeatable.


Chris
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.11.0-rc6+ (chris@laptop) (gcc version 4.8.2 
20130815 (prerelease) (GCC) ) #124 SMP PREEMPT Tue Aug 20 10:54:09 BST 2013
[0.00] Disabled fast string operations
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xdab0efff] usable
[0.00] BIOS-e820: [mem 0xdab0f000-0xdad4efff] reserved
[0.00] BIOS-e820: [mem 0xdad4f000-0xdad6efff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdad6f000-0xdaf1efff] reserved
[0.00] BIOS-e820: [mem 0xdaf1f000-0xdaf9efff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdaf9f000-0xdaffefff] ACPI data
[0.00] BIOS-e820: [mem 0xdafff000-0xdaff] usable
[0.00] BIOS-e820: [mem 0xdb00-0xdf9f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed1-0xfed19fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xffd8-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00021fdf] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.6 present.
[0.00] DMI: FUJITSU LIFEBOOK AH531/FJNBB0F, BIOS 1.30 05/28/2012
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x21fe00 max_arch_pfn = 0x100
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 0FFC0 mask FFFC0 write-protect
[0.00]   1 base 0 mask F8000 write-back
[0.00]   2 base 08000 mask FC000 write-back
[0.00]   3 base 0C000 mask FE000 write-back
[0.00]   4 base 0DC00 mask FFC00 uncachable
[0.00]   5 base 0DB00 mask FFF00 uncachable
[0.00]   6 base 1 mask F write-back
[0.00]   7 base 2 mask FE000 write-back
[0.00]   8 base 21FE0 mask FFFE0 unc

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-07-09 Thread Chris Clayton

Hello again, Bjorn.

On 04/01/13 18:28, Bjorn Helgaas wrote:




Hi Chris,

The current Linux acpiphp driver doesn't do anything unless it finds
devices with _EJ0 or _RMV methods, and your DSDT has neither.  But I
think that implementation is incorrect because I'm not convinced that
those methods are required in order to do hotplug via ACPI.  For
example, your DSDT *does* have an _L01 method that does notifications
to the root ports.  I suspect that hotplug events on your box generate
an SCI that invokes that method.  Linux basically ignores the
resulting notify events, and I suspect that hotplug works on Windows
because it is paying attention to them.

Can you build a kernel with CONFIG_ACPI_DEBUG=y, do the following, and
attach all the output to the bugzilla?

1) Boot with empty ExpressCard slot (without using "pcie_ports=native")
2) # echo 0x00010004 > /sys/module/acpi/parameters/debug_layer
3) # echo 0x0804 > /sys/module/acpi/parameters/debug_level
4) # lspci -vv
5) # setpci -s 1c.3 0x42.w
6) # setpci -s 1c.3 0x5a.w
7) # setpci -s 1c.3 0xd8.l
8) Insert ExpressCard
9) # setpci -s 1c.3 0x5a.w
10) # dmesg

Here's what I think we'll see:

- Slot Implemented (bit 8 of XCAP at 0x42) set, indicating a slot is
implemented below this root port
- Hot Plug SCI Enable (bit 30 of MPC at 0xd8) set, indicating that the
root port should generate an SCI whenever a hotplug event is detected
- Presence Detect State (bit 6 of SLTSTS at 0x5a) change from 0 with
the slot empty to 1 with the slot occupied
- pciehp doing nothing (since _OSC didn't grant the OS permission to
use PCIe native hotplug)
- dmesg indication of the SCI, leading to a Bus Check notification to
\_SB.PCI0.RP04, which is the 1c.3 root port leading to the ExpressCard
slot



As far as I can see, your predicted outcomes where correct. I've added the
logs to the bugzilla report.


Yes, it behaved exactly as I expected.  I attached a few more details
of the analysis to the bug report
(https://bugzilla.kernel.org/show_bug.cgi?id=54981).  I think we need
to rework acpiphp to fix this.  We will fix it, but I don't know who
will do it, or when it will happen.  My list is growing faster than I
can deal with :(



I see that there has been quite a bit of work in the acpiphp area 
recently. Is any of it intended to fix (or ease the subsequent fixing 
of) this bug report, please?


It's not a big deal if it isn't - I do have a system that, via kernel 
command line arguments, recognises expresscard devices when I plug them 
into the slot. But when the release comes along that includes the 
reworking of acpiphp that you mentioned, I will give extra attention to 
hotplugging when I try out the -rc kernels on my laptop.


Thanks


Thanks very much for your excellent data collection!



It's my pleasure!


Bjorn


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data

2013-04-03 Thread Chris Clayton



On 04/03/13 15:32, Michal Marek wrote:

On 1.4.2013 11:28, Chris Clayton wrote:

Ping!

This is still happening with 3.9-rc5.

[chris:~/kernel/linux]$ make bzImage
...
Kernel: arch/x86/boot/bzImage is ready  (#14)
[chris:~/kernel/linux]$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig
3.9.0-rc5
[chris:~/kernel/linux]$ make kernelrelease
3.9.0-rc5


You need to run make -s kernelrelease.



Ah, right. I didn't see that announcement. The -s argument was not 
necessary with earlier releases.


Sorry for the noise.

Chris


Michal


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data

2013-04-01 Thread Chris Clayton

Ping!

This is still happening with 3.9-rc5.

[chris:~/kernel/linux]$ make bzImage
...
Kernel: arch/x86/boot/bzImage is ready  (#14)
[chris:~/kernel/linux]$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig
3.9.0-rc5
[chris:~/kernel/linux]$ make kernelrelease
3.9.0-rc5

I'm assuming that the output from 'make kernelrelease' should be the 
string that represents the kernel release and only that string.


Chris


On 03/09/13 19:38, Chris Clayton wrote:

In Linus' current tree, the first time the command "make kernelrelease"
is run after building a kernel, the output contains some unwanted text.
Subsequent uses of the command produce the expected output. This appears
to be a regression - 3.8.2 does not have this problem.

This is easily demonstrated from the command line by the following:


...
System is 2311 kB
CRC a4e38b86
Kernel: arch/x86/boot/bzImage is ready  (#186)
$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc1+
$ make kernelrelease
3.9.0-rc1+

Happy to test the fix.

Chris




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-03-19 Thread Chris Clayton

Hi Bjorn,

Sorry I meant to reply to your mail so that copy recipients will know 
the outcome of the things you asked me to do. But then I forgot. Doh! 
You should have received the attachment notification from bugzilla, I think.


On 03/15/13 22:48, Bjorn Helgaas wrote:

On Tue, Mar 12, 2013 at 4:20 PM, Bjorn Helgaas  wrote:

On Sat, Mar 9, 2013 at 2:20 AM, Chris Clayton  wrote:

On 03/08/13 22:57, Bjorn Helgaas wrote:



Thanks.  I opened this bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of
your logs.


Hi Chris,

The current Linux acpiphp driver doesn't do anything unless it finds
devices with _EJ0 or _RMV methods, and your DSDT has neither.  But I
think that implementation is incorrect because I'm not convinced that
those methods are required in order to do hotplug via ACPI.  For
example, your DSDT *does* have an _L01 method that does notifications
to the root ports.  I suspect that hotplug events on your box generate
an SCI that invokes that method.  Linux basically ignores the
resulting notify events, and I suspect that hotplug works on Windows
because it is paying attention to them.

Can you build a kernel with CONFIG_ACPI_DEBUG=y, do the following, and
attach all the output to the bugzilla?

1) Boot with empty ExpressCard slot (without using "pcie_ports=native")
2) # echo 0x00010004 > /sys/module/acpi/parameters/debug_layer
3) # echo 0x0804 > /sys/module/acpi/parameters/debug_level
4) # lspci -vv
5) # setpci -s 1c.3 0x42.w
6) # setpci -s 1c.3 0x5a.w
7) # setpci -s 1c.3 0xd8.l
8) Insert ExpressCard
9) # setpci -s 1c.3 0x5a.w
10) # dmesg

Here's what I think we'll see:

- Slot Implemented (bit 8 of XCAP at 0x42) set, indicating a slot is
implemented below this root port
- Hot Plug SCI Enable (bit 30 of MPC at 0xd8) set, indicating that the
root port should generate an SCI whenever a hotplug event is detected
- Presence Detect State (bit 6 of SLTSTS at 0x5a) change from 0 with
the slot empty to 1 with the slot occupied
- pciehp doing nothing (since _OSC didn't grant the OS permission to
use PCIe native hotplug)
- dmesg indication of the SCI, leading to a Bus Check notification to
\_SB.PCI0.RP04, which is the 1c.3 root port leading to the ExpressCard
slot



As far as I can see, your predicted outcomes where correct. I've added 
the logs to the bugzilla report.



A Bus Check notification means we're supposed to re-enumerate starting
at that device.  If we *did* re-enumerate, we would find the new
ExpressCard.

Thanks,
   Bjorn



Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data

2013-03-09 Thread Chris Clayton
In Linus' current tree, the first time the command "make kernelrelease" 
is run after building a kernel, the output contains some unwanted text. 
Subsequent uses of the command produce the expected output. This appears 
to be a regression - 3.8.2 does not have this problem.


This is easily demonstrated from the command line by the following:


...
System is 2311 kB
CRC a4e38b86
Kernel: arch/x86/boot/bzImage is ready  (#186)
$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc1+
$ make kernelrelease
3.9.0-rc1+

Happy to test the fix.

Chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-03-09 Thread Chris Clayton

Hi Bjorn,

On 03/08/13 22:57, Bjorn Helgaas wrote:

[+cc Rafael, in case the _OSC thing rings a bell with him]

On Fri, Mar 8, 2013 at 3:44 AM, Chris Clayton  wrote:

On 03/08/13 00:39, Bjorn Helgaas wrote:

On Thu, Mar 7, 2013 at 1:21 PM, Chris Clayton 
wrote:

On 03/07/13 17:30, Bjorn Helgaas wrote:



3) HVR-1400 not being recognized when inserted.  This is a PCI hotplug
issue, and I *can* help with this.  I don't know what your hardware
is, but in general, pciehp should take care of this.  If it doesn't,
or if you have to use an argument like "pcie_ports=native", we should
fix this.



So 3) is the thing I might be able to help with.  If there's still a
problem here (and even having to boot with an argument is a problem),
let's start by collecting complete dmesg logs, with and without your
"pcie_ports" option.  Boot without the card installed, then insert it
and remove it.  If you can use something like v3.9-rc1 with
CONFIG_HOTPLUG_PCI_PCIE=y, that would be ideal.



OK, I've gathered these logs using a kernel built from a pull of Linus'
tree
this afternoon (v3.9-rc1-108-g9f22578). Also, the cx23885 driver is still
blacklisted to avoid unnecessary noise and the chance of an oops if the
card
springs out again when I insert it. The driver does load if it's not
blacklisted (and the pcie_ports=native option is present).

The two logs are attached. As you will see, nothing at all happens when
the
pcie_ports=native option is absent. The nf_conntrack message is normally
the
last one from a normal boot.



Perfect, thanks!

It looks like something's going wrong when we evaluate _OSC.  Can you
collect an acpidump from your machine?


A bziped file containing the output from acpidump is attached.


Thanks.  I opened this bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of
your logs.



Excellent, thanks.


Here's your _OSC method from the acpidump:

 Method (_OSC, 4, Serialized) {
 ...
 If (LAnd (LEqual (Arg0, GUID), NEXP))
 ... # normal case
 Else {
 Or (CDW1, 0x04, CDW1)  # "unrecognized UUID" error
 Return (Local0)
 }

It fails with "unrecognized UUID" if either (1) we supply the wrong
UUID or (2) "NEXP" is false.  I have no idea what NEXP is; your
DSDT.dsl never sets it, so maybe it's related to a BIOS setup option
or something?  I found a BIOS manual [1] but didn't see anything
likely.  I guess it might be worth you looking, or maybe trying a
"reset to defaults" if it's not too destructive for you.  You don't


As the manual showed, there aren't too many user-changeable settings in 
the BIOS on this machine, so I tried a "reset to defaults". 
Unfortunately, it made no difference.



have a copy of Windows on that box, do you?  I *assume* hotplug would
work fine with Windows and maybe we could figure out what it is doing
differently.



Yes, I have Windows 7 Home Premium (64 bit), although, as I said 
previously, I rarely boot it. One occasion when I usually do is when I 
buy new hardware. The hotplug works fine in Windows with the two 
expresscards that I own.


I'm happy to help work out what's different on Windows, but I have no 
diagnostic tools installed, so I might need some hand-holding. One 
immediate difference is that my Windows installation is 64bit whereas my 
linux installation is 32 bit.


Thanks for your help so far,

Chris


Bjorn

[1] 
http://solutions.us.fujitsu.com/www/content/pdf/SupportGuides/AH530_BIOS_Guide_FPC58-2843-01_rA.pdf


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel 3.6.7: sysfs: cannot create duplicate filename warnings

2013-02-06 Thread Chris Clayton

Hi,

I've been investigating problem with my Hauppauge WinTV HVR-1400 TV card 
(an expresscard) [1], and I've come across warnings produced when I 
unload and then reload the cx23885 driver.


Attached is the output from dmesg that shows the card being detected (by 
the PCI express hotplug driver) when it is inserted, the drivers being 
loaded and unloaded and, finally, the driver being reloaded and 
producing the warnings.


I get the same warning with 3.8.0-rc6+ (freshly pulled this morning).

I've searched Google and find thousands of hits going back over two 
years, but none of the first 20 pages of hits provided a solution.


Please let me know if I can provide any additional diagnostics, but cc 
me on any reply as I'm not subscribed.


Chris

[1] http://www.spinics.net/lists/linux-media/msg59468.html

Plug in the hvr1400

[  165.300637] pciehp :00:1c.3:pcie04: Card present on Slot(3)
[  165.401471] pci :02:00.0: [14f1:8852] type 00 class 0x04
[  165.401528] pci :02:00.0: reg 10: [mem 0x-0x001f 64bit]
[  165.401678] pci :02:00.0: supports D1 D2
[  165.401679] pci :02:00.0: PME# supported from D0 D1 D2 D3hot
[  165.409465] pci :02:00.0: BAR 0: assigned [mem 0xf0e0-0xf0ff 
64bit]
[  165.409501] pcieport :00:1c.3: PCI bridge to [bus 02-06]
[  165.409505] pcieport :00:1c.3:   bridge window [io  0x3000-0x3fff]
[  165.409510] pcieport :00:1c.3:   bridge window [mem 
0xf0d0-0xf14f]
[  165.409514] pcieport :00:1c.3:   bridge window [mem 
0xf040-0xf0bf 64bit pref]
[  165.409530] pci :02:00.0: no hotplug settings from platform

Load the drivers

[  211.641874] cx23885 driver version 0.0.3 loaded
[  211.641897] cx23885 :02:00.0: enabling device ( -> 0002)
[  211.642000] CORE cx23885[0]: subsystem: 0070:8010, board: Hauppauge 
WinTV-HVR1400 [card=9,autodetected]
[  211.781181] cx23885[0] - extra delay being applied for HVR1400
[  211.809194] tveeprom 7-0050: Hauppauge model 80019, rev B2F1, serial# 3758890
[  211.809197] tveeprom 7-0050: MAC address is 00:0d:fe:39:5b:2a
[  211.809198] tveeprom 7-0050: tuner model is Xceive XC3028L (idx 151, type 4)
[  211.809200] tveeprom 7-0050: TV standards PAL(B/G) PAL(I) SECAM(L/L') 
PAL(D/D1/K) ATSC/DVB Digital (eeprom 0xf4)
[  211.809201] tveeprom 7-0050: audio processor is CX23885 (idx 39)
[  211.809202] tveeprom 7-0050: decoder processor is CX23885 (idx 33)
[  211.809203] tveeprom 7-0050: has radio
[  211.809204] cx23885[0]: hauppauge eeprom: model=80019
[  211.809205] cx23885_dvb_register() allocating 1 frontend(s)
[  211.809207] cx23885[0]: cx23885 based dvb card
[  211.915346] xc2028 8-0064: creating new instance
[  211.915349] xc2028 8-0064: type set to XCeive xc2028/xc3028 tuner
[  211.915353] DVB: registering new adapter (cx23885[0])
[  211.915357] cx23885 :02:00.0: DVB: registering adapter 0 frontend 0 
(DiBcom 7000PC)...
[  211.915576] cx23885_dev_checkrevision() Hardware revision = 0xb0
[  211.915582] cx23885[0]/0: found at :02:00.0, rev: 2, irq: 19, latency: 
0, mmio: 0xf0e0
[  211.977653] xc2028 8-0064: Loading 81 firmware images from xc3028L-v36.fw, 
type: xc2028 firmware, ver 3.6

Unload the drivers

[  228.016560] xc2028 8-0064: destroying instance

Reload the drivers

[  240.384907] cx23885 driver version 0.0.3 loaded
[  240.385099] CORE cx23885[0]: subsystem: 0070:8010, board: Hauppauge 
WinTV-HVR1400 [card=9,autodetected]
[  240.524290] cx23885[0] - extra delay being applied for HVR1400
[  240.552265] tveeprom 7-0050: Hauppauge model 80019, rev B2F1, serial# 3758890
[  240.552267] tveeprom 7-0050: MAC address is 00:0d:fe:39:5b:2a
[  240.552268] tveeprom 7-0050: tuner model is Xceive XC3028L (idx 151, type 4)
[  240.552269] tveeprom 7-0050: TV standards PAL(B/G) PAL(I) SECAM(L/L') 
PAL(D/D1/K) ATSC/DVB Digital (eeprom 0xf4)
[  240.552270] tveeprom 7-0050: audio processor is CX23885 (idx 39)
[  240.552271] tveeprom 7-0050: decoder processor is CX23885 (idx 33)
[  240.552272] tveeprom 7-0050: has radio
[  240.552273] cx23885[0]: hauppauge eeprom: model=80019
[  240.552275] cx23885_dvb_register() allocating 1 frontend(s)
[  240.552277] cx23885[0]: cx23885 based dvb card
[  240.626253] xc2028 8-0064: creating new instance
[  240.626255] xc2028 8-0064: type set to XCeive xc2028/xc3028 tuner
[  240.626258] DVB: registering new adapter (cx23885[0])
[  240.626263] cx23885 :02:00.0: DVB: registering adapter 0 frontend 0 
(DiBcom 7000PC)...
[  240.626316] xc2028 8-0064: Loading 81 firmware images from xc3028L-v36.fw, 
type: xc2028 firmware, ver 3.6
[  240.626366] [ cut here ]
[  240.626371] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xae/0xe0()
[  240.626372] Hardware name: LIFEBOOK AH531
[  240.626373] sysfs: cannot create duplicate filename 
'/devices/pci:00/:00:1c.3/:02:00.0/dvb/dvb0.frontend0'
[  240.626374] Modules linked in: cx23885(+) tveeprom btcx_risc videobuf_dvb 
cx2341x videobuf_dma_sg videobuf_core dib7000p d

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-31 Thread Chris Clayton

Hi Martin,

On 01/28/13 21:02, Martin Mokrejs wrote:

Hi Chris,

Chris Clayton wrote:

Hi Martin,

On 01/28/13 12:12, Martin Mokrejs wrote:

Chris Clayton wrote:


[snip]


[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6

   ^^
**typo**
I've run the test again with pcie_ports=native and the directories now get 
populated. Even better though, is that when I plug in the card, hotplug 
**works** and the card's drivers are loaded.


BTW, I have with acpiphp on 3.7.4:

ls -la /sys/bus/pci_express/devices
total 0
drwxr-xr-x 2 root root 0 Jan 28 13:07 .
drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
$ ls -la /sys/bus/pci/devices/slots


 **typo**
It should be /sys/bus/pci/slots.


ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
$


With acpiphp, I get /sys/bus/pci_express/devices populated but 
/sys/bus/pci/slots is empty.


OK, I haven't realized the typo, but I have here with acpiphp:

# ls -laR /sys/bus/pci/slots
/sys/bus/pci/slots:
total 0
drwxr-xr-x 3 root root 0 Jan 27 17:14 .
drwxr-xr-x 5 root root 0 Jan 25 15:56 ..
drwxr-xr-x 2 root root 0 Jan 27 17:14 1

/sys/bus/pci/slots/1:
total 0
drwxr-xr-x 2 root root0 Jan 27 17:14 .
drwxr-xr-x 3 root root0 Jan 27 17:14 ..
-r--r--r-- 1 root root 4096 Jan 28 21:31 adapter
-r--r--r-- 1 root root 4096 Jan 27 17:14 address
-rw-r--r-- 1 root root 4096 Jan 28 21:31 attention
-r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed
-r--r--r-- 1 root root 4096 Jan 28 21:31 latch
-r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed
lrwxrwxrwx 1 root root0 Jan 28 21:31 module -> ../../../../module/acpiphp
-rw-r--r-- 1 root root 4096 Jan 28 21:31 power
#




And for me hotplug also works (as far as I can tell). ;-)



Excellent! Thank you so much for your help (and patience) Martin and Yijing.

Now to solving why running scandvb doesn't find any TV channels.


Would be fine if you could re-do the PresDet checks and confirm whether it is 
also broken
for you under pciehp.


I've struggled with this a little. For some reason, the expresscard
doesn't always stay properly inserted in the slot when I insert it.
Now that hotplug is working, the modules are being loaded and when
the card pops out again, I get an oops because, of course, the driver
is running and the card disappears. Perhaps the driver can be made a
bit more robust to sudden disappearance of the card. I'll report the


Yes, I had or maybe still have same issues here. I used to get an Oops
for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was
fine always for a VIA-based firewire card and serial PL2303-based one.
I found out it is better if a usb device is connected to the USB card
because if that slips out then the libata layer quickly realizes that.
If there was no device connected, the usb waits too long before it removes
the usb hub from the system. And if you plugin the card meanwhile
back into the slot, weird thing happen.

My usb3 expresscard device has arrived and I get an oops with that too, 
if I remove it without unloading the driver first. I guess it shouldn't 
be a surprise that the driver isn't expecting the device to disappear.


As I mentioned, I have some trouble with the WinTV-HVR-1400 card, which 
sometimes pops out again, if I push it into the slot too hard (but I'm 
geeting better at that with practice). So what I've done (with the usb3 
card too) to avoid the oopsen is blacklist the driver in 
/etc/modprobe.d/blacklist.conf and then load them when I'm sure the card 
is properly inserted. Not exactly hotplug, but at least I don't have to 
reboot because of an oops- and it's not something I'm doing several 
times an hour.


Chris



oops later. Anyway, to run these tests I built a kernel without the
dvb card's drivers, effectively simulating the situation I had before
Yijing got hotplug working for me. The card popping out may also have
affected these diffs a bit because, for example, the first one has
the CorrErr flag changed, possibly because I had to have two or more
goes at getting the card to lock in the slot. Yesterday that diff
showed no changes. Anyway, here are the diffs:

diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
262c262
<   DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
---

   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-

295c295
< 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
---

40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04



diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt




BTW, with the NEC-based card only after every second removal of the card I got
into PresDet- state. So, on every other diff attempt you won't see a differenc

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Chris Clayton

Hi Martin,

On 01/28/13 12:12, Martin Mokrejs wrote:

Chris Clayton wrote:


[snip]


[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6

  ^^
   **typo**
I've run the test again with pcie_ports=native and the directories now get 
populated. Even better though, is that when I plug in the card, hotplug 
**works** and the card's drivers are loaded.


BTW, I have with acpiphp on 3.7.4:

ls -la /sys/bus/pci_express/devices
total 0
drwxr-xr-x 2 root root 0 Jan 28 13:07 .
drwxr-xr-x 4 root root 0 Jan 28 13:07 ..
$ ls -la /sys/bus/pci/devices/slots

   
**typo**
It should be /sys/bus/pci/slots.


ls: cannot access /sys/bus/pci/devices/slots: No such file or directory
$

With acpiphp, I get /sys/bus/pci_express/devices populated but 
/sys/bus/pci/slots is empty.



And for me hotplug also works (as far as I can tell). ;-)



Excellent! Thank you so much for your help (and patience) Martin and Yijing.

Now to solving why running scandvb doesn't find any TV channels.


Would be fine if you could re-do the PresDet checks and confirm whether it is 
also broken
for you under pciehp.
I've struggled with this a little. For some reason, the expresscard 
doesn't always stay properly inserted in the slot when I insert it. Now 
that hotplug is working, the modules are being loaded and when the card 
pops out again, I get an oops because, of course, the driver is running 
and the card disappears. Perhaps the driver can be made a bit more 
robust to sudden disappearance of the card. I'll report the oops later. 
Anyway, to run these tests I built a kernel without the dvb card's 
drivers, effectively simulating the situation I had before Yijing got 
hotplug working for me. The card popping out may also have affected 
these diffs a bit because, for example, the first one has the CorrErr 
flag changed, possibly because I had to have two or more goes at getting 
the card to lock in the slot. Yesterday that diff showed no changes. 
Anyway, here are the diffs:


diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
262c262
<   DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-

---
>   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- 
AuxPwr+ TransPend-

295c295
< 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
---
> 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04


diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt



=
diff lspci.before_insertion.txt lspci.after_1st_removal.txt
112c112
< 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0
---
> 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0
262,263c262,263
<   DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
<   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <1us, L1 <16us

---
>   DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- 
AuxPwr+ TransPend-
>   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <512ns, L1 <16us

265c265
<   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- 
Retrain- CommClk-

---
>   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+

267c267
<   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-

---
>   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt+ ABWMgmt-

273c273
<   Changed: MRL- PresDet- LinkState-
---
>   Changed: MRL- PresDet- LinkState+
295,296c295,296
< 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
< 50: 03 00 01 10 60 b2 1c 00 28 00 00 00 00 00 00 00
---
> 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04
> 50: 40 00 11 50 60 b2 1c 00 28 00 00 01 00 00 00 00

===
diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt




diff lspci.before_insertion.txt lspci.after_insertion.txt
112c112
< 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0
---
> 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0
263c263
<   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <1us, L1 <16us

---
>   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <512ns, L1 <16us

265c265
<   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- 
Retrain- CommClk-

---
>   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+

267c267
<   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-

---
>

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Chris Clayton


[snip]


[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6

 ^^
  **typo**
I've run the test again with pcie_ports=native and the directories now 
get populated. Even better though, is that when I plug in the card, 
hotplug **works** and the card's drivers are loaded.


Excellent! Thank you so much for your help (and patience) Martin and Yijing.

Now to solving why running scandvb doesn't find any TV channels.

Chris

[snip]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Chris Clayton

[no one screamed, so linux-media ml dropped]

Hi Martin,

On 01/28/13 10:56, Martin Mokrejs wrote:



Chris Clayton wrote:

Hi Yijing,

On 01/28/13 02:40, Yijing Wang wrote:

Hi Chris,
 Sorry for the delay reply. It seems like my reply last night was missed.

  From the sysinfo you provide, there are no pcie port devices under 
/sys/bus/pci_express/devices.
Maybe because there are some problems with _OSC in your laptop, so pcie port 
driver won't create pcie port device
for hotplug, aer and so on.

Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop.
Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules.
After above actions, enter /sys/bus/pci_express/devices/ directory and 
/sys/bus/pci/slots/
Some slots and pcie port devices should be there now.


Sorry, I've tried your suggestion, but the two directories are still empty.

I verified the test environment as follows:

[chris:~]$ uname -a
Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux
[chris:~]$ grep acpiphp /boot/System.map-3.7.4
[chris:~]$ modinfo acpiphp
modinfo: ERROR: Module acpiphp not found.
[chris:~]$ modinfo pciehp
filename:   /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko
license:GPL
description:PCI Express Hot Plug Controller Driver
author: Dan Zink , Greg Kroah-Hartman , 
Dely Sy 
depends:
intree: Y
vermagic:   3.7.4 SMP preempt mod_unload CORE2
parm:   pciehp_detect_mode:Slot detection mode: pcie, acpi, auto
   pcie  - Use PCIe based slot detection
   acpi  - Use ACPI for slot detection
   auto(default) - Auto select mode. Use acpi option if duplicate
   slot ids are found. Otherwise, use pcie option
  (charp)
parm:   pciehp_debug:Debugging mode enabled or not (bool)
parm:   pciehp_poll_mode:Using polling mechanism for hot-plug events or 
not (bool)
parm:   pciehp_poll_time:Polling mechanism frequency, in seconds (int)
parm:   pciehp_force:Force pciehp, even if OSHP is missing (bool)
[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
[chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1
[chris:~]$ lsmod
Module  Size  Used by
pciehp 19907  0
[...]

You will notice that the kernel I have used is 3.7.4. I hope that's a suitable 
kernel for your tests. I've moved away from the 3.8 development kernel onto one 
that's stable and on which Martin has identified a solution. I see Greg KH 
released 3.7.5 yesterday and it includes a pciehp change. I'll upgrade to that, 
run the tests again and report back.

One question - should I include the (acpi) pci_slot driver in the kernel build 
or does pciehp populate the directories without pci_slot?


Hi Chris,
   I am not a kernel developer but from the other threads at linux-pci I 
gathered there are in some
scenarios problems with improper loading of the hotplug modules. Therefore, the 
patches floating
now around are to disable hotplug module availability. Therefore, I suggested 
you to try only
only static kernel support for hotplug. That way you don't hit the issue. That 
is for sure not
addressed in 3.7.5, seems that it is probably in -next.
Martin

In a few minutes I'll be sending out another reply to Yijing's 
suggestions because I noticed a typo in the parameter I added to the 
kernel command line. I'm now going back through email to remember why we 
were trying to get those /sys/bus/pci... directories populated.


Watch this space! :-)
Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-28 Thread Chris Clayton

Hi Yijing,

On 01/28/13 02:40, Yijing Wang wrote:

Hi Chris,
Sorry for the delay reply. It seems like my reply last night was missed.

 From the sysinfo you provide, there are no pcie port devices under 
/sys/bus/pci_express/devices.
Maybe because there are some problems with _OSC in your laptop, so pcie port 
driver won't create pcie port device
for hotplug, aer and so on.

Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop.
Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules.
After above actions, enter /sys/bus/pci_express/devices/ directory and 
/sys/bus/pci/slots/
Some slots and pcie port devices should be there now.


Sorry, I've tried your suggestion, but the two directories are still empty.

I verified the test environment as follows:

[chris:~]$ uname -a
Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 
GNU/Linux

[chris:~]$ grep acpiphp /boot/System.map-3.7.4
[chris:~]$ modinfo acpiphp
modinfo: ERROR: Module acpiphp not found.
[chris:~]$ modinfo pciehp
filename:   /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko
license:GPL
description:PCI Express Hot Plug Controller Driver
author: Dan Zink , Greg Kroah-Hartman 
, Dely Sy 

depends:
intree: Y
vermagic:   3.7.4 SMP preempt mod_unload CORE2
parm:   pciehp_detect_mode:Slot detection mode: pcie, acpi, auto
  pcie  - Use PCIe based slot detection
  acpi  - Use ACPI for slot detection
  auto(default) - Auto select mode. Use acpi option if duplicate
  slot ids are found. Otherwise, use pcie option
 (charp)
parm:   pciehp_debug:Debugging mode enabled or not (bool)
parm:   pciehp_poll_mode:Using polling mechanism for hot-plug 
events or not (bool)
parm:   pciehp_poll_time:Polling mechanism frequency, in seconds 
(int)

parm:   pciehp_force:Force pciehp, even if OSHP is missing (bool)
[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6
[chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1
[chris:~]$ lsmod
Module  Size  Used by
pciehp 19907  0
[...]

You will notice that the kernel I have used is 3.7.4. I hope that's a 
suitable kernel for your tests. I've moved away from the 3.8 development 
kernel onto one that's stable and on which Martin has identified a 
solution. I see Greg KH released 3.7.5 yesterday and it includes a 
pciehp change. I'll upgrade to that, run the tests again and report back.


One question - should I include the (acpi) pci_slot driver in the kernel 
build or does pciehp populate the directories without pci_slot?


Thanks again.


/sys/bus/pci_express/devices:
total 0

/sys/bus/pci_express/drivers:
total 0
drwxr-xr-x 2 root root 0 Jan 27 13:17 pciehp/


On 2013/1/28 6:53, Chris Clayton wrote:

Thanks again, Martin.

Firstly, maybe we should remove the linux-media list from the copy list. I 
imagine this hotplug stuff is just noise to them.

[snip]

Do you have any other express card around to try if it works at all? Try that 
always after a cold boot.


Not at the moment, but I ordered at USB3 expresscard yesterday, so I will have 
one soon.


Posting a diff result of the below procedure might help:

# lspci -vvvxxx > lspci.before_insertion.txt

[plug your card into the slot]

# lspci -vvvxxx > lspci.after_insertion.txt

[ unplug your card]

# lspci -vvvxxx > lspci.after_1st_removal.txt

[re-plug your card into the slot]

# lspci -vvvxxx > lspci.after_1st_re-insertion.txt

[ unplug your card]

# lspci -vvvxxx > lspci.after_2nd_removal.txt



OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still 
under development, I've switched to 3.7.4, mainly because you are having 
success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as 
follows:

[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6
[chris:~]$ dmesg | grep ASPM
[0.00] PCIe ASPM is disabled
[0.348959]  pci:00: ACPI _OSC support notification failed, disabling 
PCIe ASPM
[chris:~]$ dmesg | grep acpiphp
[0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[chris:~]$ dmesg | grep pciehp
[chris:~]$ uname -a
Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux



Then compare them using diff. These should have no difference:

diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt


Correct, there were no differences.



These may have only little difference, or none:

diff lspci.before_insertion.txt lspci.after_1st_removal.txt


263c263
<   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
<1us, L1 <16us
---
  >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 
<512ns, L1 <16us
265c265
&l

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Chris Clayton

Thanks again, Martin.

Firstly, maybe we should remove the linux-media list from the copy list. 
I imagine this hotplug stuff is just noise to them.


[snip]

Do you have any other express card around to try if it works at all? Try that 
always after a cold boot.

Not at the moment, but I ordered at USB3 expresscard yesterday, so I 
will have one soon.



Posting a diff result of the below procedure might help:

# lspci -vvvxxx > lspci.before_insertion.txt

[plug your card into the slot]

# lspci -vvvxxx > lspci.after_insertion.txt

[ unplug your card]

# lspci -vvvxxx > lspci.after_1st_removal.txt

[re-plug your card into the slot]

# lspci -vvvxxx > lspci.after_1st_re-insertion.txt

[ unplug your card]

# lspci -vvvxxx > lspci.after_2nd_removal.txt



OK, I've been using kernel 3.8.0-rc kernels so far, but given that is 
still under development, I've switched to 3.7.4, mainly because you are 
having success with 3.7.x, acpiphp and pcie_aspm=off. I verified the 
environment as follows:


[chris:~]$ cat /proc/cmdline
root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6
[chris:~]$ dmesg | grep ASPM
[0.00] PCIe ASPM is disabled
[0.348959]  pci:00: ACPI _OSC support notification failed, 
disabling PCIe ASPM

[chris:~]$ dmesg | grep acpiphp
[0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[chris:~]$ dmesg | grep pciehp
[chris:~]$ uname -a
Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 
GNU/Linux




Then compare them using diff. These should have no difference:

diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt


Correct, there were no differences.



These may have only little difference, or none:

diff lspci.before_insertion.txt lspci.after_1st_removal.txt


263c263
<   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <1us, L1 <16us

---
 >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <512ns, L1 <16us

265c265
<   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- 
Retrain- CommClk-

---
 >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+

267c267
<   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-

---
 >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- 
SlotClk+ DLActive- BWMgmt+ ABWMgmt-

273c273
<   Changed: MRL- PresDet- LinkState-
---
 >   Changed: MRL- PresDet- LinkState+
295,296c295,296
< 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
< 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
---
 > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
 > 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00


diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt


No difference.



Finally, these should confirm whether the PresDet works for you (for me NOT 
with pciehp but does work with acpiphp).
You should see PresDet- to PresDet+ changes in:


Yes, I do see the PresDet- to PresDet+ changes


diff lspci.before_insertion.txt lspci.after_insertion.txt


263c263
<   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <1us, L1 <16us

---
 >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, 
Latency L0 <512ns, L1 <16us

265c265
<   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- 
Retrain- CommClk-

---
 >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk+

267c267
<   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-

---
 >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- 
SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

272,273c272,273
<   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- 
PresDet- Interlock-

<   Changed: MRL- PresDet- LinkState-
---
 >   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- 
PresDet+ Interlock-

 >   Changed: MRL- PresDet- LinkState+
295,296c295,296
< 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
< 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
---
 > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
 > 50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00


diff lspci.after_1st_removal.txt lspci.after_1st_re-insertion.txt

267c267
<   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt+ ABWMgmt-

---
 >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- 
SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

272c272
<   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- 
PresDet- Interlock-

---
 >   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- 
PresDet+ Interlock-

296c296
< 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00
---
 > 50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00



You should see PresDet+ to PresDet- changes in:

Yes, I see those changes too.

diff lspci.aft

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Chris Clayton



On 01/27/13 14:26, Martin Mokrejs wrote:

Chris Clayton wrote:



On 01/27/13 12:18, Yijing Wang wrote:

于 2013-01-27 19:19, Chris Clayton 写道:

Hi Yijing

On 01/27/13 02:45, Yijing Wang wrote:

于 2013-01-27 4:54, Chris Clayton 写道:

Hi Martin,

On 01/24/13 19:21, Martin Mokrejs wrote:

Hi Chris,
  try to include in kernel only acpiphp and omit pciehp. Don't use modules 
but include
them statically. And try, in addition, check whether "pcie_aspm=off" in 
grub.conf helped.



Thanks for the tip. I had the pciehp driver installed, but it was a module and 
not loaded. I didn't have acpiphp enabled at all. Building them both in 
statically, appears to have papered over the cracks of the oops :-)


Not loaded pciehp driver? Remove the device from this slot without poweroff ?



That's correct. When I first encountered the oops, I did not have the pciehp 
driver loaded and removing the device from the slot whilst the laptop was 
powered on resulted in the oops.


Hmm, that's unsafe and dangerous, because device now may be running.
There are two ways to trigger pci hot-add or hot-remove in linux, after loaded 
pciehp or acpiphp module
(the two modules only one can loaded into system at the same time). You can 
trigger hot-add/hot-remove by
sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button 
on hardware (if your laptop supports that).



OK, thanks for the advice.




  The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.


Those discussions are way above my level of knowledge. I guess all this work 
will be merged into mainline in due course, so I'll watch for them in 3.9 or 
later. Unless, of course, there is a tree I could clone and help test the 
changes with my laptop and expresscard.

Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card 
recognised by rebooting with the card inserted (or by writing 1 
to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel 
bugzilla, so I'll look through them and see what's being done.


Hi Chris,
  What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
pciehp_poll_time=1 ?

Can you resend the dmesg log and "lspci -vvv" info after hotplug device from 
your Fujitsu laptop with above module parameters?



I wasn't sure whether or not the pciehp driver should be loaded on its own or 
with the acpiphp driver also loaded. So I built them both as modules and 
planned to try both, pciehp only and acpiphp only. However, I've found that 
acpiphp will not load (regardless of whether or not pciehp is already loaded). 
What I get is:

[chris:~]$ sudo modprobe acpiphp debug=1
modprobe: ERROR: could not insert 'acpiphp': No such device


Are you sure you had pciehp already loaded?


Yes, I'm sure it was.




Currently, If your hardware support pciehp native hotplug, acpiphp driver will 
be rejected when loading it in system
(you can force loading it by add boot parameter pcie_aspm=off as Martin said).



OK, thanks again for the advice. I've disabled the acpiphp driver.


Pitty. For me only with acpiphp works detection of express card in the slot. 
With pciehp
the PresDet is not updated properly upon removal/insertion and sometimes, 
probably as a result
of the previous, PresDet on the SltSta: line of lspci is not correct. So I 
moved away from pciehp.
I have a SandyBridge based laptop so I was hoping with your i5-based laptop you 
have also great
chance to get rid of pciehp issues.



I've just (very carefully) set this up again (i.e. no pciehp driver 
(module or builtin), acpiphp driver built in and pcie_aspm=off on the 
kernel command line (via grub). My card is not detected on insertion. :-(


Thanks for your suggestions, Martin. I am grateful.

Chris

Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Chris Clayton



On 01/27/13 12:18, Yijing Wang wrote:

于 2013-01-27 19:19, Chris Clayton 写道:

Hi Yijing

On 01/27/13 02:45, Yijing Wang wrote:

于 2013-01-27 4:54, Chris Clayton 写道:

Hi Martin,

On 01/24/13 19:21, Martin Mokrejs wrote:

Hi Chris,
 try to include in kernel only acpiphp and omit pciehp. Don't use modules 
but include
them statically. And try, in addition, check whether "pcie_aspm=off" in 
grub.conf helped.



Thanks for the tip. I had the pciehp driver installed, but it was a module and 
not loaded. I didn't have acpiphp enabled at all. Building them both in 
statically, appears to have papered over the cracks of the oops :-)


Not loaded pciehp driver? Remove the device from this slot without poweroff ?



That's correct. When I first encountered the oops, I did not have the pciehp 
driver loaded and removing the device from the slot whilst the laptop was 
powered on resulted in the oops.


Hmm, that's unsafe and dangerous, because device now may be running.
There are two ways to trigger pci hot-add or hot-remove in linux, after loaded 
pciehp or acpiphp module
(the two modules only one can loaded into system at the same time). You can 
trigger hot-add/hot-remove by
sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button 
on hardware (if your laptop supports that).



OK, thanks for the advice.




 The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.


Those discussions are way above my level of knowledge. I guess all this work 
will be merged into mainline in due course, so I'll watch for them in 3.9 or 
later. Unless, of course, there is a tree I could clone and help test the 
changes with my laptop and expresscard.

Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card 
recognised by rebooting with the card inserted (or by writing 1 
to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel 
bugzilla, so I'll look through them and see what's being done.


Hi Chris,
 What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 
pciehp_poll_time=1 ?

Can you resend the dmesg log and "lspci -vvv" info after hotplug device from 
your Fujitsu laptop with above module parameters?



I wasn't sure whether or not the pciehp driver should be loaded on its own or 
with the acpiphp driver also loaded. So I built them both as modules and 
planned to try both, pciehp only and acpiphp only. However, I've found that 
acpiphp will not load (regardless of whether or not pciehp is already loaded). 
What I get is:

[chris:~]$ sudo modprobe acpiphp debug=1
modprobe: ERROR: could not insert 'acpiphp': No such device



Currently, If your hardware support pciehp native hotplug, acpiphp driver will 
be rejected when loading it in system
(you can force loading it by add boot parameter pcie_aspm=off as Martin said).



OK, thanks again for the advice. I've disabled the acpiphp driver.


and at the end of the dmesg output I see:

[   68.199789] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[   68.199970] acpiphp_glue: Total 0 slots

The pciehp driver loads OK. I've attached pciehp-only which shows the dmesg and 
lscpi output that you asked for.

As I said before, the only way that I can get the card detected with rebooting 
the laptop is to write 1 to /sys/bus/pci/rescan. In the hope that it might help 
(e.g. it shows details of the expresscard I'm using), I've also attached the 
output from dmesg and lspci after a rescan.


In this case, i guess your slot maybe always power on, once you insert your 
pcie card, and use rescan intercace, you can find them.

I checked the WinTV-HVR-1400 expressed card device's parent port device, as 
bellow.
I found the powerctrl in slot cap is clear. So I doubt the hardware support pci 
hotplug.


Mmm, that's odd because I dual-boot Windows 7 on this laptop and when I 
plug it in under Windows 7, it appears in Device Manager and works 
perfectly.




Chris, Can you try to add and remove device by /sys/bus/pci/slots/3/power? (use 
#modprobe pciehp pciehp_debug=1)


/sys/bus/pci/slots is empty:

[chris:~]$ ls -l /sys/bus/pci/slots/
total 0

Using Google, I find that the acpi slot detection driver should populate 
that directory. It is configured in and built into the kernel 
statically, so I don't know what's happening there.


By the way, there is also /sys/bus/pci_express directory:

[chris:~]$ ls -Rl /sys/bus/pci_express/
/sys/bus/pci_express/:
total 0
drwxr-xr-x 2 root root0 Jan 27 12:52 devices/
drwxr-xr-x 3 root root0 Jan 27 13:07 drivers/
-rw-r--r-- 1 root root 4096 Jan 27 13:07 drivers_autoprobe
--w--- 1

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Chris Clayton

Hi Martin,

On 01/26/13 21:14, Martin Mokrejs wrote:

Hi Chris,

Chris Clayton wrote:

Hi Martin,

On 01/24/13 19:21, Martin Mokrejs wrote:

Hi Chris,
try to include in kernel only acpiphp and omit pciehp. Don't use modules 
but include
them statically. And try, in addition, check whether "pcie_aspm=off" in 
grub.conf helped.



Thanks for the tip. I had the pciehp driver installed, but it was a module and 
not loaded. I didn't have acpiphp enabled at all. Building them both in 
statically, appears to have papered over the cracks of the oops :-)


The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.


Those discussions are way above my level of knowledge. I guess all this work 
will be merged into mainline in due course, so I'll watch for them in 3.9 or 
later. Unless, of course, there is a tree I could clone and help test the 
changes with my laptop and expresscard.

Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card 
recognised by rebooting with the card inserted (or by writing 1 
to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel 
bugzilla, so I'll look through them and see what's being done.


That's what I suspected. Compile in statically acpiphp, no pciehp at all (not 
even as a module).
Then it might work for you -- at least it does for me, provided I use 
"pcie_aspm=off".

Thanks again for the suggestion. Unfortunately, that doesn't fix the 
problem on my laptop.


You may have seen the suggestion I've had from Yijing. I'm just building 
the kernel to test that out.


Chris

Martin



Thanks again.

Chris


Martin

Chris Clayton wrote:

Hi,

I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I 
removed from the expresscard slot in my laptop. I will quite understand if the response 
to this report is "don't do that!", but in that case, how should one remove one 
of these cards?

I have attached three files:

1. the dmesg output from when I rebooted the machine after the oops. I have 
turned debugging on in the dib700p and cx23885 modules via modules options in 
/etc/modprobe.d/hvr1400.conf;

2. the .config file for the kernel that oopsed.

3. the text of the oops message. I've typed this up from a photograph of the 
screen because the laptop was locked up and there was nothing in the log files. 
Apologies for any typos, but I have tried to be careful.

Assuming the answer isn't don't do that, let me know if I can provide any 
additional diagnostics, test any patches, etc. Please, however, cc me as I'm 
not subscribed.

Chris




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-26 Thread Chris Clayton

Hi Martin,

On 01/24/13 19:21, Martin Mokrejs wrote:

Hi Chris,
   try to include in kernel only acpiphp and omit pciehp. Don't use modules but 
include
them statically. And try, in addition, check whether "pcie_aspm=off" in 
grub.conf helped.



Thanks for the tip. I had the pciehp driver installed, but it was a 
module and not loaded. I didn't have acpiphp enabled at all. Building 
them both in statically, appears to have papered over the cracks of the 
oops :-)



   The best would if you subscribe to linux-pci, and read my recent threads
about similar issues I had with express cards with Dell Vostro 3550. Further, 
there is
a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just 
browse the
archives of linux-pci and see the pacthes and the discussion.


Those discussions are way above my level of knowledge. I guess all this 
work will be merged into mainline in due course, so I'll watch for them 
in 3.9 or later. Unless, of course, there is a tree I could clone and 
help test the changes with my laptop and expresscard.


Hotplug isn't working at all on my Fujitsu laptop, so I can only get the 
card recognised by rebooting with the card inserted (or by writing 1 
to/sys/bus/pci/rescan). There seem to be a few reports on this in the 
kernel bugzilla, so I'll look through them and see what's being done.


Thanks again.

Chris


Martin

Chris Clayton wrote:

Hi,

I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I 
removed from the expresscard slot in my laptop. I will quite understand if the response 
to this report is "don't do that!", but in that case, how should one remove one 
of these cards?

I have attached three files:

1. the dmesg output from when I rebooted the machine after the oops. I have 
turned debugging on in the dib700p and cx23885 modules via modules options in 
/etc/modprobe.d/hvr1400.conf;

2. the .config file for the kernel that oopsed.

3. the text of the oops message. I've typed this up from a photograph of the 
screen because the laptop was locked up and there was nothing in the log files. 
Apologies for any typos, but I have tried to be careful.

Assuming the answer isn't don't do that, let me know if I can provide any 
additional diagnostics, test any patches, etc. Please, however, cc me as I'm 
not subscribed.

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap

2012-12-04 Thread Chris Clayton



On 12/03/12 23:17, Andrew Morton wrote:

On Sun, 02 Dec 2012 19:55:09 +
Chris Clayton  wrote:




On 11/29/12 10:52, Chris Clayton wrote:

On 11/28/12 23:52, Andrew Morton wrote:

On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu  wrote:


Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
of pages occupied by memmap


How are people to test this?  "does it boot"?



I've been running kernels with Gerry's 5 patches applied for 11 days
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
joined the conversation because my laptop would not resume from suspend
to disk - it either froze or rebooted. With the patches applied the
laptop does successfully resume and has been stable.

Since Monday, I have have been running a kernel with the patches (plus,
from today, the patch you mailed yesterday) applied to 3.7rc7, without
problems.



I've been running 3.7-rc7 with the patches listed below for a week now
and it has been perfectly stable. In particular, my laptop will now
successfully resume from suspend to disk, which always failed without
the patches.

  From Jiang Liu:
1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
zone->managed_pages if appreciated
3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
pages in the zone
4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
occupied by memmap
5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
allocated by bootmem allocator

  From Andrew Morton:
6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch

Tested-by: Chris Clayton 


Thanks.

I have only two of these five patches queued for 3.8:
mm-introduce-new-field-managed_pages-to-struct-zone.patch and
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch.
I don't recall what happened with the other three.



Gerry posted version 1 of the five patches on 18 November. Version 2 of 
the first and fourth patches were posted on 21 November.


BTW, the title of Andrew's patch that I applied is wrong above. It 
should be 
mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix.


Chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap

2012-12-02 Thread Chris Clayton



On 12/02/12 19:55, Chris Clayton wrote:



On 11/29/12 10:52, Chris Clayton wrote:

On 11/28/12 23:52, Andrew Morton wrote:

On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu  wrote:


Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
of pages occupied by memmap


How are people to test this?  "does it boot"?



I've been running kernels with Gerry's 5 patches applied for 11 days
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
joined the conversation because my laptop would not resume from suspend
to disk - it either froze or rebooted. With the patches applied the
laptop does successfully resume and has been stable.

Since Monday, I have have been running a kernel with the patches (plus,
from today, the patch you mailed yesterday) applied to 3.7rc7, without
problems.



I've been running 3.7-rc7 with the patches listed below for a week now
and it has been perfectly stable. In particular, my laptop will now
successfully resume from suspend to disk, which always failed without
the patches.



I should have said, of course, that it was -rc6 and earlier that would 
not boot without Jiang Liu's patches. I applied those patches to rc-6 
and my resume after suspend to disk problem was fixed. For a subsequent 
week I have been running with the patches applied to -rc7, with Andrew's 
patch also applied for the last 3 days. -rc7 was not subject to the 
resume problem because the patch which broke it had been reverted.
All this has been on a 64bit laptop, but running a 32bit kernel with 
HIGHMEM.


Apologies for yesterday's inaccuracy. I shouldn't send testing reports 
when I'm in a hurry.



 From Jiang Liu:
1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct
zone
2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with
zone->managed_pages if appreciated
3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing
pages in the zone
4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages
occupied by memmap
5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages
allocated by bootmem allocator

 From Andrew Morton:
6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch

Tested-by: Chris Clayton 


Thanks,
Chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap

2012-12-02 Thread Chris Clayton



On 11/29/12 10:52, Chris Clayton wrote:

On 11/28/12 23:52, Andrew Morton wrote:

On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu  wrote:


Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation
of pages occupied by memmap


How are people to test this?  "does it boot"?



I've been running kernels with Gerry's 5 patches applied for 11 days
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I
joined the conversation because my laptop would not resume from suspend
to disk - it either froze or rebooted. With the patches applied the
laptop does successfully resume and has been stable.

Since Monday, I have have been running a kernel with the patches (plus,
from today, the patch you mailed yesterday) applied to 3.7rc7, without
problems.



I've been running 3.7-rc7 with the patches listed below for a week now 
and it has been perfectly stable. In particular, my laptop will now 
successfully resume from suspend to disk, which always failed without 
the patches.


From Jiang Liu:
1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone
2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with 
zone->managed_pages if appreciated
3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing 
pages in the zone
4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages 
occupied by memmap
5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages 
allocated by bootmem allocator


From Andrew Morton:
6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch

Tested-by: Chris Clayton 


Thanks,
Chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap

2012-11-29 Thread Chris Clayton

On 11/28/12 23:52, Andrew Morton wrote:

On Wed, 21 Nov 2012 23:09:46 +0800
Jiang Liu  wrote:


Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages 
occupied by memmap


How are people to test this?  "does it boot"?



I've been running kernels with Gerry's 5 patches applied for 11 days 
now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I 
joined the conversation because my laptop would not resume from suspend 
to disk - it either froze or rebooted. With the patches applied the 
laptop does successfully resume and has been stable.


Since Monday, I have have been running a kernel with the patches (plus, 
from today, the patch you mailed yesterday) applied to 3.7rc7, without 
problems.


Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages

2012-11-26 Thread Chris Clayton



On 11/22/12 09:23, Chris Clayton wrote:

This patchset has only been tested on x86_64 with nobootmem.c. So need
help to test this patchset on machines:
1) use bootmem.c
2) have highmem

This patchset applies to "f4a75d2e Linux 3.7-rc6" from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git



I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
the kernel allows my system to resume from a suspend to disc. Although
my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):

[chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32
.config
CONFIG_X86_32=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
# CONFIG_X86_32_IRIS is not set
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y

I can also say that a quick browse of the output of dmesg, shows nothing
out of the ordinary. I have insufficient knowledge to comment on the
patches, but I will run the kernel over the next few days and report
back later in the week.



Well, I've been running the kernel since Sunday and have had no problems
with my normal work mix of browsing, browsing the internet, video
editing, listening to music and building software. I'm now running a
kernel that build with the new patches 1 and 4 from yesterday (plus the
original 1, 2 and 5). All seems OK so far, including a couple of resumes
from suspend to disk.




-rc6 with Gerry's patches has run fine, including numerous resumes from 
suspend to disk, which fails (freezing or rebooting) without the 
patches. I've now applied the patches to -rc7 (they apply with a few 
offsets, but look OK) and will run that for a day or two.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages

2012-11-22 Thread Chris Clayton

This patchset has only been tested on x86_64 with nobootmem.c. So need
help to test this patchset on machines:
1) use bootmem.c
2) have highmem

This patchset applies to "f4a75d2e Linux 3.7-rc6" from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git



I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that
the kernel allows my system to resume from a suspend to disc. Although
my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):

[chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
CONFIG_X86_32=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
# CONFIG_X86_32_IRIS is not set
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y

I can also say that a quick browse of the output of dmesg, shows nothing
out of the ordinary. I have insufficient knowledge to comment on the
patches, but I will run the kernel over the next few days and report
back later in the week.



Well, I've been running the kernel since Sunday and have had no problems 
with my normal work mix of browsing, browsing the internet, video 
editing, listening to music and building software. I'm now running a 
kernel that build with the new patches 1 and 4 from yesterday (plus the 
original 1, 2 and 5). All seems OK so far, including a couple of resumes 
from suspend to disk.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages

2012-11-18 Thread Chris Clayton

Hi Gerry.

On 11/18/12 16:07, Jiang Liu wrote:

The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to
resolve an issue caused by inaccurate zone->present_pages, but that
fix is incomplete and causes regresions with HIGHMEM. And it has been
reverted by commit
5576646 revert "mm: fix-up zone present pages"

This is a following-up patchset for the issue above. It introduces a
new field named "managed_pages" to struct zone, which counts pages
managed by the buddy system from the zone. And zone->present_pages
is used to count pages existing in the zone, which is
spanned_pages - absent_pages.

But that way, zone->present_pages will be kept in consistence with
pgdat->node_present_pages, which is sum of zone->present_pages.

This patchset has only been tested on x86_64 with nobootmem.c. So need
help to test this patchset on machines:
1) use bootmem.c
2) have highmem

This patchset applies to "f4a75d2e Linux 3.7-rc6" from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git



I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that 
the kernel allows my system to resume from a suspend to disc. Although 
my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM):


[chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config
CONFIG_X86_32=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
# CONFIG_X86_32_IRIS is not set
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y

I can also say that a quick browse of the output of dmesg, shows nothing 
out of the ordinary. I have insufficient knowledge to comment on the 
patches, but I will run the kernel over the next few days and report 
back later in the week.


Chris


Any comments and helps are welcomed!

Jiang Liu (5):
   mm: introduce new field "managed_pages" to struct zone
   mm: replace zone->present_pages with zone->managed_pages if
 appreciated
   mm: set zone->present_pages to number of existing pages in the zone
   mm: provide more accurate estimation of pages occupied by memmap
   mm: increase totalram_pages when free pages allocated by bootmem
 allocator

  include/linux/mmzone.h |1 +
  mm/bootmem.c   |   14 
  mm/memory_hotplug.c|6 
  mm/mempolicy.c |2 +-
  mm/nobootmem.c |   15 
  mm/page_alloc.c|   89 +++-
  mm/vmscan.c|   16 -
  mm/vmstat.c|8 +++--
  8 files changed, 108 insertions(+), 43 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-15 Thread Chris Clayton



On 11/15/12 19:24, Andrew Morton wrote:

On Wed, 14 Nov 2012 22:52:03 +0800
Jiang Liu  wrote:


So how about totally reverting the changeset 
7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
and I will post another version once I found a cleaner way?


We do need to get this regression fixed and I guess that a
straightforward revert is an acceptable way of doing that, for now.


I queued the below, with a plan to send it to Linus next week.



I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it 
fixes the problem I had with my laptop failing to resume (by either 
freezing or rebooting) after a suspend to disk.


Tested-by: Chris Clayton 



From: Andrew Morton 
Subject: revert "mm: fix-up zone present pages"

Revert

commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu 
AuthorDate: Mon Oct 8 16:33:06 2012 -0700
Commit: Linus Torvalds 
CommitDate: Tue Oct 9 16:22:54 2012 +0900

 mm: fix-up zone present pages


That patch tried to fix a issue when calculating zone->present_pages, but
it caused a regression on 32bit systems with HIGHMEM.  With that
changeset, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu 
Cc: Jiang Liu 
Cc: Petr Tesarik 
Cc: "Luck, Tony" 
Cc: Mel Gorman 
Cc: Yinghai Lu 
Cc: Minchan Kim 
Cc: Johannes Weiner 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
---

  arch/ia64/mm/init.c |1 -
  include/linux/mm.h  |4 
  mm/bootmem.c|   10 +-
  mm/memory_hotplug.c |7 ---
  mm/nobootmem.c  |3 ---
  mm/page_alloc.c |   34 --
  6 files changed, 1 insertion(+), 58 deletions(-)

diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c
--- a/arch/ia64/mm/init.c~revert-1
+++ a/arch/ia64/mm/init.c
@@ -637,7 +637,6 @@ mem_init (void)

high_memory = __va(max_low_pfn * PAGE_SIZE);

-   reset_zone_present_pages();
for_each_online_pgdat(pgdat)
if (pgdat->bdata->node_bootmem_map)
totalram_pages += free_all_bootmem_node(pgdat);
diff -puN include/linux/mm.h~revert-1 include/linux/mm.h
--- a/include/linux/mm.h~revert-1
+++ a/include/linux/mm.h
@@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa
  static inline bool page_is_guard(struct page *page) { return false; }
  #endif /* CONFIG_DEBUG_PAGEALLOC */

-extern void reset_zone_present_pages(void);
-extern void fixup_zone_present_pages(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
-
  #endif /* __KERNEL__ */
  #endif /* _LINUX_MM_H */
diff -puN mm/bootmem.c~revert-1 mm/bootmem.c
--- a/mm/bootmem.c~revert-1
+++ a/mm/bootmem.c
@@ -198,8 +198,6 @@ static unsigned long __init free_all_boo
int order = ilog2(BITS_PER_LONG);

__free_pages_bootmem(pfn_to_page(start), order);
-   
fixup_zone_present_pages(page_to_nid(pfn_to_page(start)),
-   start, start + BITS_PER_LONG);
count += BITS_PER_LONG;
start += BITS_PER_LONG;
} else {
@@ -210,9 +208,6 @@ static unsigned long __init free_all_boo
if (vec & 1) {
page = pfn_to_page(start + off);
__free_pages_bootmem(page, 0);
-   fixup_zone_present_pages(
-   page_to_nid(page),
-   start + off, start + off + 1);
count++;
}
vec >>= 1;
@@ -226,11 +221,8 @@ static unsigned long __init free_all_boo
pages = bdata->node_low_pfn - bdata->node_min_pfn;
pages = bootmem_bootmap_pages(pages);
count += pages;
-   while (pages--) {
-   fixup_zone_present_pages(page_to_nid(page),
-   page_to_pfn(page), page_to_pfn(page) + 1);
+   while (pages--)
__free_pages_bootmem(page++, 0);
-   }

bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count);

diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~revert-1
+++ a/mm/memory_hotplug.c
@@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo
  void __ref put_page_bootmem(struct page *page)
  {
unsigned long type;
-   struct zon

Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d

2012-11-06 Thread Chris Clayton



On 11/06/12 01:31, Jiang Liu wrote:

Changeset 7f1290f2f2 tries to fix a issue when calculating
zone->present_pages, but it causes a regression to 32bit systems with
HIGHMEM. With that changeset, function reset_zone_present_pages()
resets all zone->present_pages to zero, and fixup_zone_present_pages()
is called to recalculate zone->present_pages when boot allocator frees
core memory pages into buddy allocator. Because highmem pages are not
freed by bootmem allocator, all highmem zones' present_pages becomes
zero.

Actually there's no need to recalculate present_pages for highmem zone
because bootmem allocator never allocates pages from them. So fix the
regression by skipping highmem in function reset_zone_present_pages()
and fixup_zone_present_pages().

Signed-off-by: Jiang Liu 
Signed-off-by: Jianguo Wu 
Reported-by: Maciej Rutecki 
Tested-by: Maciej Rutecki 
Cc: Chris Clayton 
Cc: Rafael J. Wysocki 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Minchan Kim 
Cc: KAMEZAWA Hiroyuki 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

---

Hi Maciej,
Thanks for reporting and bisecting. We have analyzed the regression
and worked out a patch for it. Could you please help to verify whether it
fix the regression?
Thanks!
Gerry



Thanks Gerry.

I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the 
problem I had with my laptop failing to resume after a suspend to disk.


Tested-by: Chris Clayton 


---
  mm/page_alloc.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..2311f15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void)
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
-   z->present_pages = 0;
+   if (!is_highmem(z))
+   z->present_pages = 0;
}
}
  }
@@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long 
start_pfn,

for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
+   if (is_highmem(z))
+   continue;
+
zone_start_pfn = z->zone_start_pfn;
zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
-   /* if the two regions intersect */
if (!(zone_start_pfn >= end_pfn  || zone_end_pfn <= 
start_pfn))
z->present_pages += min(end_pfn, zone_end_pfn) -
max(start_pfn, zone_start_pfn);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-3.7.0-rc3+: Warning found in log files

2012-11-05 Thread Chris Clayton

Hi,

Whilst browsing the log file on my laptop (looking for diagnostics for a 
problem I have resuming from a suspend to disc), I spotted the warning 
below.


I see that vlc is mentioned. I did have a problem yesterday when, 
through vlc's UPNP functionality, I was unable to play a video stored on 
my NAS. I can't be sure whether that is related to this warning because 
I parked it and got on with investigating my resume problem. I've had 
few goes at playing the video this morning and I still can't get it to 
play via upnp, but the warning hasn't appeared, so this may be a read 
herring.


Nov  4 10:26:14 laptop kernel: [ cut here ]
Nov  4 10:26:15 laptop kernel: WARNING: at 
arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0xa6/0xd0()

Nov  4 10:26:15 laptop kernel: Hardware name: LIFEBOOK AH531
Nov  4 10:26:15 laptop kernel: empty IPI mask
Nov  4 10:26:15 laptop kernel: Modules linked in: r8169 iptable_filter 
xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack coretemp hwmon kvm_intel 
snd_hda_codec_hdmi usb_storage kvm snd_hda_codec_realtek snd_hda_intel 
snd_hda_codec fujitsu_laptop [last unloaded: r8169]
Nov  4 10:26:15 laptop kernel: Pid: 25461, comm: vlc Not tainted 
3.7.0-rc3+ #39

Nov  4 10:26:15 laptop kernel: Call Trace:
Nov  4 10:26:15 laptop kernel:  [] ? 
warn_slowpath_common+0x78/0xb0
Nov  4 10:26:15 laptop kernel:  [] ? 
default_send_IPI_mask_logical+0xa6/0xd0
Nov  4 10:26:15 laptop kernel:  [] ? 
default_send_IPI_mask_logical+0xa6/0xd0

Nov  4 10:26:15 laptop kernel:  [] ? warn_slowpath_fmt+0x33/0x40
Nov  4 10:26:15 laptop kernel:  [] ? 
default_send_IPI_mask_logical+0xa6/0xd0
Nov  4 10:26:15 laptop kernel:  [] ? 
native_send_call_func_ipi+0x3f/0x60
Nov  4 10:26:15 laptop kernel:  [] ? 
smp_call_function_many+0x178/0x220

Nov  4 10:26:15 laptop kernel:  [] ? do_flush_tlb_all+0x60/0x60
Nov  4 10:26:15 laptop kernel:  [] ? 
native_flush_tlb_others+0x28/0x30

Nov  4 10:26:15 laptop kernel:  [] ? flush_tlb_page+0x82/0xd0
Nov  4 10:26:15 laptop kernel:  [] ? 
ptep_set_access_flags+0x61/0x70

Nov  4 10:26:15 laptop kernel:  [] ? do_wp_page+0x33a/0x7f0
Nov  4 10:26:15 laptop kernel:  [] ? enqueue_task_fair+0x9f/0x1e0
Nov  4 10:26:15 laptop kernel:  [] ? handle_pte_fault+0x49d/0x870
Nov  4 10:26:15 laptop kernel:  [] ? move_addr_to_user+0x69/0x80
Nov  4 10:26:15 laptop kernel:  [] ? sys_getpeername+0x65/0xa0
Nov  4 10:26:15 laptop kernel:  [] ? handle_mm_fault+0xda/0x140
Nov  4 10:26:15 laptop kernel:  [] ? __do_page_fault+0x141/0x460
Nov  4 10:26:15 laptop kernel:  [] ? 
generic_smp_call_function_interrupt+0x73/0x160

Nov  4 10:26:16 laptop kernel:  [] ? irq_exit+0x54/0x90
Nov  4 10:26:16 laptop kernel:  [] ? 
call_function_interrupt+0x2d/0x34

Nov  4 10:26:16 laptop kernel:  [] ? vmalloc_sync_all+0x10/0x10
Nov  4 10:26:16 laptop kernel:  [] ? error_code+0x5a/0x60
Nov  4 10:26:16 laptop kernel:  [] ? vmalloc_sync_all+0x10/0x10
Nov  4 10:26:16 laptop kernel: ---[ end trace ceb5cce1ffac0038 ]---

Let me know if I can provide any additional diagnostics, but please cc 
me as I'm not subscribed.


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >