Re: linux-5.10.11 build failure
Hi Greg, On 29/01/2021 15:14, Josh Poimboeuf wrote: > On Fri, Jan 29, 2021 at 12:09:53PM +0100, Greg Kroah-Hartman wrote: >> On Fri, Jan 29, 2021 at 11:03:26AM +0000, Chris Clayton wrote: >>> >>> >>> On 29/01/2021 10:11, Greg Kroah-Hartman wrote: >>>> On Thu, Jan 28, 2021 at 10:00:15AM -0600, Josh Poimboeuf wrote: ... >>>> >>>> It is in Linus's tree now :) >>>> >>>> Now grabbed. >>>> >>> >>> Are you sure, Greg? I don't see the patch in Linus' tree at >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git. Nor do >>> is see it in your stable queue at >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/. >>> For clarity, I've attached the patch which >>> fixes problem I reported and is currently sat in >>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git As I >>> understand it, the patch is scheduled to be included in a pull request to >>> Linus this weekend in time for -rc6. >>> >>> In fact, I did a pull from Linus' tree a few minutes ago and the build >>> failed in the way I reported in this thread. I >>> added the patch and the build now succeeds. >> >> Ok, sorry, no, I grabbed 1d489151e9f9 ("objtool: Don't fail on missing >> symbol table") which is what Josh asked me to take. I got that confused >> here. > > I'm probably responsible for that confusion, I got mixed up myself. > It'll be a good idea to take both anyway. > The patch is now in Linus' tree at 5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae Thanks. Chris
Re: linux-5.10.11 build failure
On 28/01/2021 15:52, Josh Poimboeuf wrote: > On Thu, Jan 28, 2021 at 11:24:47AM +, Thomas Backlund wrote: >> Den 28.1.2021 kl. 12:05, skrev Chris Clayton: >>> >>> On 28/01/2021 09:34, Greg Kroah-Hartman wrote: >>>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote: >>>>> Hi, >>>>> >>>>> Building 5.10.11 fails on my (x86-64) laptop thusly: >>>>> >>>>> .. >>>>> >>>>> AS arch/x86/entry/thunk_64.o >>>>>CC arch/x86/entry/vsyscall/vsyscall_64.o >>>>>AS arch/x86/realmode/rm/header.o >>>>>CC arch/x86/mm/pat/set_memory.o >>>>>CC arch/x86/events/amd/core.o >>>>>CC arch/x86/kernel/fpu/init.o >>>>>CC arch/x86/entry/vdso/vma.o >>>>>CC kernel/sched/core.o >>>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at >>>>> offset 0x3e >>>>> >>>>>AS arch/x86/realmode/rm/trampoline_64.o >>>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] >>>>> Error 255 >>>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o' >>>>> make[2]: *** Waiting for unfinished jobs >>>>> >>>>> .. >>>>> >>>>> Compiler is latest snapshot of gcc-10. >>>>> >>>>> Happy to test the fix but please cc me as I'm not subscribed >>>> >>>> Can you do 'git bisect' to track down the offending commit? >>>> >>> >>> Sure, but I'll hold that request for a while. I updated to binutils-2.36 on >>> Monday and I'm pretty sure that is a feature >>> of this build fail. I've reverted binutils to 2.35.1, and the build >>> succeeds. Updated to 2.36 again and, surprise, >>> surprise, the kernel build fails again. >>> >>> I've had a glance at the binutils ML and there are all sorts of issues >>> being reported, but it's beyond my knowledge to >>> assess if this build error is related to any of them. >>> >>> I'll stick with binutils-2.35.1 for the time being. >>> >>>> And what exact gcc version are you using? >>>> >>> >>> It's built from the 10-20210123 snapshot tarball. >>> >>> I can report this to the binutils folks, but might it be better if the >>> objtool maintainer looks at it first? The >>> binutils change might just have opened the gate to a bug in objtool. >>> >>>> thanks, >>>> >>>> greg k-h >>>> >>> >> >> >> AFAIK you need this in stable trees: >> >> From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001 >> From: Josh Poimboeuf >> Date: Thu, 14 Jan 2021 16:14:01 -0600 >> Subject: [PATCH] objtool: Don't fail on missing symbol table > > Actually I think you need: > > 5e6dca82bcaa ("x86/entry: Emit a symbol for register restoring thunk") > > I submitted a patch to stable list a few days ago. > Yes, that's what I concluded, Josh. 5.10.11 builds with that patch added but it's not in Linus's tree yet, so, as I understand it, is not yet a candidate from stable. > (Though it's possible you need both commits, I'm not sure if binutils > 2.36 has the symbol stripping stuff) >
Re: linux-5.10.11 build failure
On 28/01/2021 14:41, Greg Kroah-Hartman wrote: > On Thu, Jan 28, 2021 at 01:38:25PM +0000, Chris Clayton wrote: >> Thanks, Thomas. >> >> On 28/01/2021 11:24, Thomas Backlund wrote: >>> Den 28.1.2021 kl. 12:05, skrev Chris Clayton: >>>> >>>> On 28/01/2021 09:34, Greg Kroah-Hartman wrote: >>>>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote: >>>>>> Hi, >>>>>> >>>>>> Building 5.10.11 fails on my (x86-64) laptop thusly: >>>>>> >>>>>> .. >>>>>> >>>>>> AS arch/x86/entry/thunk_64.o >>>>>>CC arch/x86/entry/vsyscall/vsyscall_64.o >>>>>>AS arch/x86/realmode/rm/header.o >>>>>>CC arch/x86/mm/pat/set_memory.o >>>>>>CC arch/x86/events/amd/core.o >>>>>>CC arch/x86/kernel/fpu/init.o >>>>>>CC arch/x86/entry/vdso/vma.o >>>>>>CC kernel/sched/core.o >>>>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at >>>>>> offset 0x3e >>>>>> >>>>>>AS arch/x86/realmode/rm/trampoline_64.o >>>>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] >>>>>> Error 255 >>>>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o' >>>>>> make[2]: *** Waiting for unfinished jobs >>>>>> >>>>>> .. >>>>>> >>>>>> Compiler is latest snapshot of gcc-10. >>>>>> >>>>>> Happy to test the fix but please cc me as I'm not subscribed >>>>> >>>>> Can you do 'git bisect' to track down the offending commit? >>>>> >>>> >>>> Sure, but I'll hold that request for a while. I updated to binutils-2.36 >>>> on Monday and I'm pretty sure that is a feature >>>> of this build fail. I've reverted binutils to 2.35.1, and the build >>>> succeeds. Updated to 2.36 again and, surprise, >>>> surprise, the kernel build fails again. >>>> >>>> I've had a glance at the binutils ML and there are all sorts of issues >>>> being reported, but it's beyond my knowledge to >>>> assess if this build error is related to any of them. >>>> >>>> I'll stick with binutils-2.35.1 for the time being. >>>> >>>>> And what exact gcc version are you using? >>>>> >>>> >>>> It's built from the 10-20210123 snapshot tarball. >>>> >>>> I can report this to the binutils folks, but might it be better if the >>>> objtool maintainer looks at it first? The >>>> binutils change might just have opened the gate to a bug in objtool. >>>> >>>>> thanks, >>>>> >>>>> greg k-h >>>>> >>>> >>> >>> >>> AFAIK you need this in stable trees: >>> >>> From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001 >>> From: Josh Poimboeuf >>> Date: Thu, 14 Jan 2021 16:14:01 -0600 >>> Subject: [PATCH] objtool: Don't fail on missing symbol table >>> >>> >> >> That may be the caae, but it doesn't fix the build failure I've reported in >> this thread. However, as suggested by Tor, >> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae >> does fix it. >> >> That hasn't made Linus' tree yet and I don't see a pull request, but it is >> in linux-next so I guess it could make it in >> -rc6. > > Ok, thanks, so this is not a new regression for 5.10.y. > That seems to be the case, Greg. Neither 5.10.10 nor 5.10.9 build either. > greg k-h >
Re: linux-5.10.11 build failure
Thanks, Thomas. On 28/01/2021 11:24, Thomas Backlund wrote: > Den 28.1.2021 kl. 12:05, skrev Chris Clayton: >> >> On 28/01/2021 09:34, Greg Kroah-Hartman wrote: >>> On Thu, Jan 28, 2021 at 09:17:10AM +, Chris Clayton wrote: >>>> Hi, >>>> >>>> Building 5.10.11 fails on my (x86-64) laptop thusly: >>>> >>>> .. >>>> >>>> AS arch/x86/entry/thunk_64.o >>>>CC arch/x86/entry/vsyscall/vsyscall_64.o >>>>AS arch/x86/realmode/rm/header.o >>>>CC arch/x86/mm/pat/set_memory.o >>>>CC arch/x86/events/amd/core.o >>>>CC arch/x86/kernel/fpu/init.o >>>>CC arch/x86/entry/vdso/vma.o >>>>CC kernel/sched/core.o >>>> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at >>>> offset 0x3e >>>> >>>>AS arch/x86/realmode/rm/trampoline_64.o >>>> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error >>>> 255 >>>> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o' >>>> make[2]: *** Waiting for unfinished jobs >>>> >>>> .. >>>> >>>> Compiler is latest snapshot of gcc-10. >>>> >>>> Happy to test the fix but please cc me as I'm not subscribed >>> >>> Can you do 'git bisect' to track down the offending commit? >>> >> >> Sure, but I'll hold that request for a while. I updated to binutils-2.36 on >> Monday and I'm pretty sure that is a feature >> of this build fail. I've reverted binutils to 2.35.1, and the build >> succeeds. Updated to 2.36 again and, surprise, >> surprise, the kernel build fails again. >> >> I've had a glance at the binutils ML and there are all sorts of issues being >> reported, but it's beyond my knowledge to >> assess if this build error is related to any of them. >> >> I'll stick with binutils-2.35.1 for the time being. >> >>> And what exact gcc version are you using? >>> >> >> It's built from the 10-20210123 snapshot tarball. >> >> I can report this to the binutils folks, but might it be better if the >> objtool maintainer looks at it first? The >> binutils change might just have opened the gate to a bug in objtool. >> >>> thanks, >>> >>> greg k-h >>> >> > > > AFAIK you need this in stable trees: > > From 1d489151e9f9d1647110277ff77282fe4d96d09b Mon Sep 17 00:00:00 2001 > From: Josh Poimboeuf > Date: Thu, 14 Jan 2021 16:14:01 -0600 > Subject: [PATCH] objtool: Don't fail on missing symbol table > > That may be the caae, but it doesn't fix the build failure I've reported in this thread. However, as suggested by Tor, https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=5e6dca82bcaa49348f9e5fcb48df4881f6d6c4ae does fix it. That hasn't made Linus' tree yet and I don't see a pull request, but it is in linux-next so I guess it could make it in -rc6. Chris > -- > Thomas >
Re: linux-5.10.11 build failure
On 28/01/2021 09:34, Greg Kroah-Hartman wrote: > On Thu, Jan 28, 2021 at 09:17:10AM +0000, Chris Clayton wrote: >> Hi, >> >> Building 5.10.11 fails on my (x86-64) laptop thusly: >> >> .. >> >> AS arch/x86/entry/thunk_64.o >> CC arch/x86/entry/vsyscall/vsyscall_64.o >> AS arch/x86/realmode/rm/header.o >> CC arch/x86/mm/pat/set_memory.o >> CC arch/x86/events/amd/core.o >> CC arch/x86/kernel/fpu/init.o >> CC arch/x86/entry/vdso/vma.o >> CC kernel/sched/core.o >> arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at >> offset 0x3e >> >> AS arch/x86/realmode/rm/trampoline_64.o >> make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error >> 255 >> make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o' >> make[2]: *** Waiting for unfinished jobs >> >> .. >> >> Compiler is latest snapshot of gcc-10. >> >> Happy to test the fix but please cc me as I'm not subscribed > > Can you do 'git bisect' to track down the offending commit? > Sure, but I'll hold that request for a while. I updated to binutils-2.36 on Monday and I'm pretty sure that is a feature of this build fail. I've reverted binutils to 2.35.1, and the build succeeds. Updated to 2.36 again and, surprise, surprise, the kernel build fails again. I've had a glance at the binutils ML and there are all sorts of issues being reported, but it's beyond my knowledge to assess if this build error is related to any of them. I'll stick with binutils-2.35.1 for the time being. > And what exact gcc version are you using? > It's built from the 10-20210123 snapshot tarball. I can report this to the binutils folks, but might it be better if the objtool maintainer looks at it first? The binutils change might just have opened the gate to a bug in objtool. > thanks, > > greg k-h > Thanks. Chris
linux-5.10.11 build failure
Hi, Building 5.10.11 fails on my (x86-64) laptop thusly: .. AS arch/x86/entry/thunk_64.o CC arch/x86/entry/vsyscall/vsyscall_64.o AS arch/x86/realmode/rm/header.o CC arch/x86/mm/pat/set_memory.o CC arch/x86/events/amd/core.o CC arch/x86/kernel/fpu/init.o CC arch/x86/entry/vdso/vma.o CC kernel/sched/core.o arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at offset 0x3e AS arch/x86/realmode/rm/trampoline_64.o make[2]: *** [scripts/Makefile.build:360: arch/x86/entry/thunk_64.o] Error 255 make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o' make[2]: *** Waiting for unfinished jobs .. Compiler is latest snapshot of gcc-10. Happy to test the fix but please cc me as I'm not subscribed Thanks, Chris
Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
Hi Greg, On 18/09/2020 15:35, Chris Clayton wrote: > Mmm, gmail on android seems to have snuck some html into my reply, so here > goes again... > > On 14/09/2020 16:58, Greg KH wrote: >> On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote: >>> Hi Greg and Arnd, >>> >>> On 24/08/2020 04:00, ricky...@realtek.com wrote: >>>> From: Ricky Wu >>>> >>>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform >>>> missing card reader >>>> >>> >>> In his changelog above, Ricky didn't mention that this patch fixes a >>> regression that was introduced (in 5.1) by commit >>> bede03a579b3. >>> >>> The patch that I posted to LKML contained the appropriate Fixes, etc tags. >>> After discussion, the patch was changed to >>> remove the code that effectively disables the RTS5229 cardreader on (at >>> least some) Intel NUC boxes. I prepared the >>> patch that Ricky submitted but he didn't include my Signed-off-by or the >>> Fixes tag. I think the following needs to be >>> added to the changelog. >>> >>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a >>> rts5260") >>> Link: https://marc.info/?l=linux-kernel&m=159105912832257 >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 >>> Signed-off-by: Chris Clayton >>> >>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card >>> reader on the Intel NUC6CAYH box. >>> >>> My main point, however, is that the patch is also needed in the 5.4 >>> (longterm) and 5.8 (stable) series kernels. >> >> It's too late to change the commit log now that it is in my tree, but >> once it hits Linus's tree for 5.9-rc1, I can backport it to those stable >> trees if someone reminds me :) >> This is the reminder you suggested. The patch is now in Linus's tree and the commit id is 551b6729578a8981c46af964c10bf7d5d9ddca83. Chris > > Thanks, Greg. I'll send the reminder. > > Chris >> thanks, >> >> greg k-h >>
Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
Mmm, gmail on android seems to have snuck some html into my reply, so here goes again... On 14/09/2020 16:58, Greg KH wrote: > On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote: >> Hi Greg and Arnd, >> >> On 24/08/2020 04:00, ricky...@realtek.com wrote: >>> From: Ricky Wu >>> >>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform >>> missing card reader >>> >> >> In his changelog above, Ricky didn't mention that this patch fixes a >> regression that was introduced (in 5.1) by commit >> bede03a579b3. >> >> The patch that I posted to LKML contained the appropriate Fixes, etc tags. >> After discussion, the patch was changed to >> remove the code that effectively disables the RTS5229 cardreader on (at >> least some) Intel NUC boxes. I prepared the >> patch that Ricky submitted but he didn't include my Signed-off-by or the >> Fixes tag. I think the following needs to be >> added to the changelog. >> >> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a >> rts5260") >> Link: https://marc.info/?l=linux-kernel&m=159105912832257 >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 >> Signed-off-by: Chris Clayton >> >> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card >> reader on the Intel NUC6CAYH box. >> >> My main point, however, is that the patch is also needed in the 5.4 >> (longterm) and 5.8 (stable) series kernels. > > It's too late to change the commit log now that it is in my tree, but > once it hits Linus's tree for 5.9-rc1, I can backport it to those stable > trees if someone reminds me :) > Thanks, Greg. I'll send the reminder. Chris > thanks, > > greg k-h >
Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
Thanks Bjorn. On 13/09/2020 17:49, Bjorn Helgaas wrote: > On Sun, Sep 13, 2020 at 09:40:56AM +0100, Chris Clayton wrote: >> Hi Greg and Arnd, >> >> On 24/08/2020 04:00, ricky...@realtek.com wrote: >>> From: Ricky Wu >>> >>> this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform >>> missing card reader >> >> In his changelog above, Ricky didn't mention that this patch fixes a >> regression that was introduced (in 5.1) by commit bede03a579b3. >> >> The patch that I posted to LKML contained the appropriate Fixes, etc >> tags. After discussion, the patch was changed to remove the code >> that effectively disables the RTS5229 cardreader on (at least some) >> Intel NUC boxes. I prepared the patch that Ricky submitted but he >> didn't include my Signed-off-by or the Fixes tag. I think the >> following needs to be added to the changelog. >> >> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a >> rts5260") >> Link: https://marc.info/?l=linux-kernel&m=159105912832257 > > Better lore link: > > Link: > https://lore.kernel.org/r/CACYmiSer8FA+qjh8NHZJ2maxSd-=rwddz2f7_-e4um1nxuz...@mail.gmail.com/ > > But I'm not sure the above is the most relevant. Seems like the one > below is more to the point since it has the exact patch below and is > part of a thread developing it: > > Link: > https://lore.kernel.org/r/20da8b4b-8426-9568-c0f1-4d1c2006c...@googlemail.com/ > Yes, I meant to change the quote to that thread but ... more haste less speed. >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 >> Signed-off-by: Chris Clayton >> >> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express >> card reader on the Intel NUC6CAYH box. >> >> My main point, however, is that the patch is also needed in the 5.4 >> (longterm) and 5.8 (stable) series kernels. > > This would be accomplished by: > > Cc: sta...@vger.kernel.org# v5.1+ > Thanks for the tip. I'm about to set off on a 4-day journey, so I'll send an updated patch when I return on Friday, >>> Signed-off-by: Ricky Wu >>> --- >>> drivers/misc/cardreader/rtsx_pcr.c | 4 >>> 1 file changed, 4 deletions(-) >>> >>> diff --git a/drivers/misc/cardreader/rtsx_pcr.c >>> b/drivers/misc/cardreader/rtsx_pcr.c >>> index 37ccc67f4914..3a4a7b0cc098 100644 >>> --- a/drivers/misc/cardreader/rtsx_pcr.c >>> +++ b/drivers/misc/cardreader/rtsx_pcr.c >>> @@ -1155,10 +1155,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *pcr) >>> rtsx_pci_write_register(pcr, REG_OCPGLITCH, >>> SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch); >>> rtsx_pci_enable_ocp(pcr); >>> - } else { >>> - /* OC power down */ >>> - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, >>> - OC_POWER_DOWN); >>> } >>> } >>> } >>>
Re: [PATCH] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
Hi Greg and Arnd, On 24/08/2020 04:00, ricky...@realtek.com wrote: > From: Ricky Wu > > this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform > missing card reader > In his changelog above, Ricky didn't mention that this patch fixes a regression that was introduced (in 5.1) by commit bede03a579b3. The patch that I posted to LKML contained the appropriate Fixes, etc tags. After discussion, the patch was changed to remove the code that effectively disables the RTS5229 cardreader on (at least some) Intel NUC boxes. I prepared the patch that Ricky submitted but he didn't include my Signed-off-by or the Fixes tag. I think the following needs to be added to the changelog. Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a rts5260") Link: https://marc.info/?l=linux-kernel&m=159105912832257 Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 Signed-off-by: Chris Clayton bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader on the Intel NUC6CAYH box. My main point, however, is that the patch is also needed in the 5.4 (longterm) and 5.8 (stable) series kernels. Thanks. > Signed-off-by: Ricky Wu > --- > drivers/misc/cardreader/rtsx_pcr.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/drivers/misc/cardreader/rtsx_pcr.c > b/drivers/misc/cardreader/rtsx_pcr.c > index 37ccc67f4914..3a4a7b0cc098 100644 > --- a/drivers/misc/cardreader/rtsx_pcr.c > +++ b/drivers/misc/cardreader/rtsx_pcr.c > @@ -1155,10 +1155,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *pcr) > rtsx_pci_write_register(pcr, REG_OCPGLITCH, > SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch); > rtsx_pci_enable_ocp(pcr); > - } else { > - /* OC power down */ > - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, > - OC_POWER_DOWN); > } > } > } >
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
Hi Ricky. On 05/08/2020 13:48, Chris Clayton wrote: > Hi Ricky >> >> Ah, OK. I'll prepare the patch and send it to you once I've tested it. >> > > Please see the patch included below. As you suggested, it removes the code > that does the OC power down on card readers > that are not members of your A series. The patch is against a fresh pull of > Linus's tree this morning > (v5.8-2768-g4da9f3302615). > > I've tested the resultant kernel on my Intel NUC6CAYH box (which contains an > NUC66AYB board) and the rts5229 works fine. > I've also tested it on my laptop which also has a card reader supported by > the rtsx_pci driver (an RTL8411B) and that > also works fine. > > As I mentioned yesterday, I think it's a candidate for the 5.4 ans 5.7 stable > series. > > Thanks for your patience! > > Chris > > Signed-off-by: Chris Clayton > > --- a/drivers/misc/cardreader/rtsx_pcr.c2020-08-05 07:10:21.752072515 > +0100 > +++ b/drivers/misc/cardreader/rtsx_pcr.c2020-08-05 07:11:05.568074278 > +0100 > @@ -1172,10 +1172,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr * > rtsx_pci_write_register(pcr, REG_OCPGLITCH, > SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch); > rtsx_pci_enable_ocp(pcr); > - } else { > - /* OC power down */ > - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, > - OC_POWER_DOWN); > } > } > } > > Is there some problem with my patch? If you are too busy to deal with it, perhaps I can submit directly it to Greg/Arnd. Thanks Chris
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
Hi Ricky On 05/08/2020 06:51, Chris Clayton wrote: > Thanks, Ricky. > > On 05/08/2020 03:35, 吳昊澄 Ricky wrote: >> >> >>> -Original Message- >>> From: Chris Clayton [mailto:chris2...@googlemail.com] >>> Sent: Tuesday, August 04, 2020 7:52 PM >>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org >>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann >>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader >>> on >>> Intel NUC boxes >>> >>> >>> >>> On 04/08/2020 11:46, 吳昊澄 Ricky wrote: >>>>> -Original Message- >>>>> From: Chris Clayton [mailto:chris2...@googlemail.com] >>>>> Sent: Tuesday, August 04, 2020 4:51 PM >>>>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org >>>>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann >>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card >>>>> reader on >>>>> Intel NUC boxes >>>>> >>>>> >>>>> >>>>> On 04/08/2020 09:08, 吳昊澄 Ricky wrote: >>>>>>> -Original Message- >>>>>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] >>>>>>> Sent: Tuesday, August 04, 2020 3:49 PM >>>>>>> To: 吳昊澄 Ricky >>>>>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com; >>>>> Arnd >>>>>>> Bergmann >>>>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card >>>>>>> reader >>> on >>>>>>> Intel NUC boxes >>>>>>> >>>>>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN, >>>>> OC_POWER_DOWN); >>>>>>>> This register operation saved power under 1mA, so if do not care the >>>>>>>> 1mA >>>>>>> power we can have a patch to remove it, make compatible with NUC6 >>>>>>>> We tested others our card reader that remove it, we did not see any >>>>>>>> side >>>>> effect >>>>>>>> >>>>>>>> Hi Greg k-h, >>>>>>>> >>>>>>>> Do you have any comments? >>>>>>> >>>>>>> comments on what? I don't know what you are responding to here, sorry. >>>>>>> >>>>>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but >>> it >>>>> will have more compatibility >>>>>> >>>>> >>>>> Ricky, >>>>> >>>>> I don't understand why you want to completely remove the code that sets up >>> the >>>>> 1mA power saving. That code was there >>>>> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I >>>>> assume it benefits some of the Realtek card >>>>> readers. Before your patch however, rtsx_pci_init_ocp() was not called >>>>> for the >>>>> rts5229 reader, but the patch introduced >>>>> an unconditional call to that function into rtsx_pci_init_hw(), which is >>>>> run for >>> the >>>>> rts5229. That is what now disables >>>>> the card reader. >>>>> >>>>> Now, I don't know whether other cards are affected, although I don't >>>>> recall >>>>> seeing any reported as I searched the kernel >>>>> and ubuntu bugzillas for any analysis of the problem. I know this is not >>>>> what >>> the >>>>> patch I sent does, but having thought >>>>> about it more, seems to me that the simplest fix is to skip the new call >>>>> to >>>>> rtsx_pci_init_ocp() if the reader is an rts5229. >>>>> >>>> >>>> Because we are thinking about if others our card reader that not belong A >>> series(my ocp patch coverage) also on NUC6 platform maybe have the same >>> problem... >>>> >>> >>> OK. What if we do make the new call but only for the card readers that are >>> in the >>> A series? Are they the
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
Thanks, Ricky. On 05/08/2020 03:35, 吳昊澄 Ricky wrote: > > >> -Original Message----- >> From: Chris Clayton [mailto:chris2...@googlemail.com] >> Sent: Tuesday, August 04, 2020 7:52 PM >> To: 吳昊澄 Ricky; gre...@linuxfoundation.org >> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann >> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader >> on >> Intel NUC boxes >> >> >> >> On 04/08/2020 11:46, 吳昊澄 Ricky wrote: >>>> -Original Message- >>>> From: Chris Clayton [mailto:chris2...@googlemail.com] >>>> Sent: Tuesday, August 04, 2020 4:51 PM >>>> To: 吳昊澄 Ricky; gre...@linuxfoundation.org >>>> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann >>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card >>>> reader on >>>> Intel NUC boxes >>>> >>>> >>>> >>>> On 04/08/2020 09:08, 吳昊澄 Ricky wrote: >>>>>> -Original Message- >>>>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] >>>>>> Sent: Tuesday, August 04, 2020 3:49 PM >>>>>> To: 吳昊澄 Ricky >>>>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com; >>>> Arnd >>>>>> Bergmann >>>>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card >>>>>> reader >> on >>>>>> Intel NUC boxes >>>>>> >>>>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN, >>>> OC_POWER_DOWN); >>>>>>> This register operation saved power under 1mA, so if do not care the 1mA >>>>>> power we can have a patch to remove it, make compatible with NUC6 >>>>>>> We tested others our card reader that remove it, we did not see any side >>>> effect >>>>>>> >>>>>>> Hi Greg k-h, >>>>>>> >>>>>>> Do you have any comments? >>>>>> >>>>>> comments on what? I don't know what you are responding to here, sorry. >>>>>> >>>>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but >> it >>>> will have more compatibility >>>>> >>>> >>>> Ricky, >>>> >>>> I don't understand why you want to completely remove the code that sets up >> the >>>> 1mA power saving. That code was there >>>> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I >>>> assume it benefits some of the Realtek card >>>> readers. Before your patch however, rtsx_pci_init_ocp() was not called for >>>> the >>>> rts5229 reader, but the patch introduced >>>> an unconditional call to that function into rtsx_pci_init_hw(), which is >>>> run for >> the >>>> rts5229. That is what now disables >>>> the card reader. >>>> >>>> Now, I don't know whether other cards are affected, although I don't recall >>>> seeing any reported as I searched the kernel >>>> and ubuntu bugzillas for any analysis of the problem. I know this is not >>>> what >> the >>>> patch I sent does, but having thought >>>> about it more, seems to me that the simplest fix is to skip the new call to >>>> rtsx_pci_init_ocp() if the reader is an rts5229. >>>> >>> >>> Because we are thinking about if others our card reader that not belong A >> series(my ocp patch coverage) also on NUC6 platform maybe have the same >> problem... >>> >> >> OK. What if we do make the new call but only for the card readers that are >> in the >> A series? Are they the ones that have >> PID_5nnn defines in include/linux/rtsx_pci.h? Or is there another simple way >> of >> identifying that a reader is a member of >> the A series? >> >> I'm thinking of something like: >> static bool rtsx_pci_is_series_A(pcr) >> { >> unsigned short device = pcr->pci->device; >> >> return device == PID524A || device == PID_5249 || device == PID_5250 || >> device == PID_525A >> || device == PID_525A || device == PID_5260 || device == >> PID_5261; >> } >> >> then in rtsx_pci_init_hw() change the unconditional call to: >> >> if rtsx_pci_is_series_A(pcr) >> rtsx_pci_init_ocp(); >> >> Does that seem OK? >> > Previously, I want to remove > else { > /* OC power down */ > rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, > OC_POWER_DOWN); > } > Because in our A-series card Reader we already assigned option->ocp_en to 1 > in self init_params() , this is an easy way to fix this problem > Ah, OK. I'll prepare the patch and send it to you once I've tested it. Chris
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
On 04/08/2020 11:46, 吳昊澄 Ricky wrote: >> -Original Message- >> From: Chris Clayton [mailto:chris2...@googlemail.com] >> Sent: Tuesday, August 04, 2020 4:51 PM >> To: 吳昊澄 Ricky; gre...@linuxfoundation.org >> Cc: LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd Bergmann >> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader >> on >> Intel NUC boxes >> >> >> >> On 04/08/2020 09:08, 吳昊澄 Ricky wrote: >>>> -Original Message- >>>> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] >>>> Sent: Tuesday, August 04, 2020 3:49 PM >>>> To: 吳昊澄 Ricky >>>> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com; >> Arnd >>>> Bergmann >>>> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card >>>> reader on >>>> Intel NUC boxes >>>> >>>> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote: >>>>> Hi Chris, >>>>> >>>>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN, >> OC_POWER_DOWN); >>>>> This register operation saved power under 1mA, so if do not care the 1mA >>>> power we can have a patch to remove it, make compatible with NUC6 >>>>> We tested others our card reader that remove it, we did not see any side >> effect >>>>> >>>>> Hi Greg k-h, >>>>> >>>>> Do you have any comments? >>>> >>>> comments on what? I don't know what you are responding to here, sorry. >>>> >>> Can we have a patch to kernel for NUC6? It may cause more power(1mA) but it >> will have more compatibility >>> >> >> Ricky, >> >> I don't understand why you want to completely remove the code that sets up >> the >> 1mA power saving. That code was there >> even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I >> assume it benefits some of the Realtek card >> readers. Before your patch however, rtsx_pci_init_ocp() was not called for >> the >> rts5229 reader, but the patch introduced >> an unconditional call to that function into rtsx_pci_init_hw(), which is run >> for the >> rts5229. That is what now disables >> the card reader. >> >> Now, I don't know whether other cards are affected, although I don't recall >> seeing any reported as I searched the kernel >> and ubuntu bugzillas for any analysis of the problem. I know this is not >> what the >> patch I sent does, but having thought >> about it more, seems to me that the simplest fix is to skip the new call to >> rtsx_pci_init_ocp() if the reader is an rts5229. >> > > Because we are thinking about if others our card reader that not belong A > series(my ocp patch coverage) also on NUC6 platform maybe have the same > problem... > OK. What if we do make the new call but only for the card readers that are in the A series? Are they the ones that have PID_5nnn defines in include/linux/rtsx_pci.h? Or is there another simple way of identifying that a reader is a member of the A series? I'm thinking of something like: static bool rtsx_pci_is_series_A(pcr) { unsigned short device = pcr->pci->device; return device == PID524A || device == PID_5249 || device == PID_5250 || device == PID_525A || device == PID_525A || device == PID_5260 || device == PID_5261; } then in rtsx_pci_init_hw() change the unconditional call to: if rtsx_pci_is_series_A(pcr) rtsx_pci_init_ocp(); Does that seem OK? >> If you agree, I can prepare a patch and send it to you. Whatever the >> solution is, it >> will also be needed in the stable >> kernels later than 5.0. >> > > OK, I agree your opinion, for now can only patch rts5229 first make NUC6 user > can work well > > Thank you > >> Chris >>>> greg k-h >>>> >>>> --Please consider the environment before printing this e-mail.
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
On 04/08/2020 09:08, 吳昊澄 Ricky wrote: >> -Original Message- >> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] >> Sent: Tuesday, August 04, 2020 3:49 PM >> To: 吳昊澄 Ricky >> Cc: Chris Clayton; LKML; rdun...@infradead.org; philqua...@gmail.com; Arnd >> Bergmann >> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader >> on >> Intel NUC boxes >> >> On Tue, Aug 04, 2020 at 02:44:41AM +, 吳昊澄 Ricky wrote: >>> Hi Chris, >>> >>> rtsx_pci_write_register(pcr, FPDTL, OC_POWER_DOWN, OC_POWER_DOWN); >>> This register operation saved power under 1mA, so if do not care the 1mA >> power we can have a patch to remove it, make compatible with NUC6 >>> We tested others our card reader that remove it, we did not see any side >>> effect >>> >>> Hi Greg k-h, >>> >>> Do you have any comments? >> >> comments on what? I don't know what you are responding to here, sorry. >> > Can we have a patch to kernel for NUC6? It may cause more power(1mA) but it > will have more compatibility > Ricky, I don't understand why you want to completely remove the code that sets up the 1mA power saving. That code was there even before your patch (bede03a579b3b4a036003c4862cc1baa4ddc351f), so I assume it benefits some of the Realtek card readers. Before your patch however, rtsx_pci_init_ocp() was not called for the rts5229 reader, but the patch introduced an unconditional call to that function into rtsx_pci_init_hw(), which is run for the rts5229. That is what now disables the card reader. Now, I don't know whether other cards are affected, although I don't recall seeing any reported as I searched the kernel and ubuntu bugzillas for any analysis of the problem. I know this is not what the patch I sent does, but having thought about it more, seems to me that the simplest fix is to skip the new call to rtsx_pci_init_ocp() if the reader is an rts5229. If you agree, I can prepare a patch and send it to you. Whatever the solution is, it will also be needed in the stable kernels later than 5.0. Chris >> greg k-h >> >> --Please consider the environment before printing this e-mail.
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
Hi, Ricky On 03/08/2020 04:01, 吳昊澄 Ricky wrote: > Hi Chris, > > We don’t think this is our bug... > This register(FPDCTL) write to OC_POWER_DOWN is for our power saving feature, > not to disable the reader > In your case, we cannot reproduce this on our side that we mention before, we > don’t have the platform(Intel NUC Tall Arches Canyon NUC6CAYH Celeron J345) > to see this issue > But we think this issue maybe only on this platform, our RTS5229 works well > on the new kernel all platform that we have > > Ricky Perhaps I should have used the word regression rather than bug. I didn't buy the machine until earlier this year, but other people who have reported this problem have indicated that until bede03a579b3 was applied (during the 5.1 merge window), the driver supported the card reader on this on the Intel NUC boxes. I know that is true because I built and tested a 5.0 kernel and the card reader worked fine. I've also built and tested an 5.1-rc1 kernel and the card reader no longer works. Whether by design or by accident, the card reader worked and now it doesn't. That's a regression in my book! Since you signed off the patch that caused the regression, I believe it is your bug. Thanks. Chris > >> -Original Message- >> From: Chris Clayton [mailto:chris2...@googlemail.com] >> Sent: Monday, August 03, 2020 3:59 AM >> To: LKML; 吳昊澄 Ricky; gre...@linuxfoundation.org; rdun...@infradead.org; >> philqua...@gmail.com; Arnd Bergmann >> Subject: Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader >> on >> Intel NUC boxes >> >> Sorry, I should have said that the patch is against 5.7.12. It applies to >> upstream, >> but with offsets. >> >> On 02/08/2020 20:48, Chris Clayton wrote: >>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card >> reader on my Intel NUC6CAYH box. >>> >>> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to >>> rtsx_pci_init_ocp() >> was added to rtsx_pci_init_hw(). >>> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, >>> so in >> rtsx_pci_init_ocp() the cardreader >>> gets disabled. >>> >>> I've avoided this by making excution code that results in the reader being >> disabled conditional on the device >>> not being an RTS5229. Of course, other rtsxxx card readers may also be >> disabled by this bug. I don't have the >>> knowledge to address that, so I'll leave to the driver maintainers. >>> >>> The patch to avoid the bug is attached. >>> >>> Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a >> rts5260") >>> Link: https://marc.info/?l=linux-kernel&m=159105912832257 >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 >>> Signed-off-by: Chris Clayton >>> >>> bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card >> reader on my Intel NUC6CAYH box. >>> >>> The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to >>> rtsx_pci_init_ocp() >> was added to rtsx_pci_init_hw(). >>> At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, >>> so in >> rtsx_pci_init_ocp() the cardreader >>> gets disabled. >>> >>> I've avoided this by making excution code that results in the reader being >> disabled conditional on the device >>> not being an RTS5229. Of course, other rtsxxx card readers may also be >> disabled by this bug. I don't have the >>> knowledge to address that, so I'll leave to the driver maintainers. >>> >>> The patch to avoid the bug is attached. >>> >>> Chris >>> >> >> --Please consider the environment before printing this e-mail.
Re: PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
Sorry, I should have said that the patch is against 5.7.12. It applies to upstream, but with offsets. On 02/08/2020 20:48, Chris Clayton wrote: > bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card > reader on my Intel NUC6CAYH box. > > The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to > rtsx_pci_init_ocp() was added to rtsx_pci_init_hw(). > At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so > in rtsx_pci_init_ocp() the cardreader > gets disabled. > > I've avoided this by making excution code that results in the reader being > disabled conditional on the device > not being an RTS5229. Of course, other rtsxxx card readers may also be > disabled by this bug. I don't have the > knowledge to address that, so I'll leave to the driver maintainers. > > The patch to avoid the bug is attached. > > Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a > rts5260") > Link: https://marc.info/?l=linux-kernel&m=159105912832257 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 > Signed-off-by: Chris Clayton > > bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card > reader on my Intel NUC6CAYH box. > > The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to > rtsx_pci_init_ocp() was added to rtsx_pci_init_hw(). > At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so > in rtsx_pci_init_ocp() the cardreader > gets disabled. > > I've avoided this by making excution code that results in the reader being > disabled conditional on the device > not being an RTS5229. Of course, other rtsxxx card readers may also be > disabled by this bug. I don't have the > knowledge to address that, so I'll leave to the driver maintainers. > > The patch to avoid the bug is attached. > > Chris >
PATCH: rtsx_pci driver - don't disable the rts5229 card reader on Intel NUC boxes
bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader on my Intel NUC6CAYH box. The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to rtsx_pci_init_ocp() was added to rtsx_pci_init_hw(). At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so in rtsx_pci_init_ocp() the cardreader gets disabled. I've avoided this by making excution code that results in the reader being disabled conditional on the device not being an RTS5229. Of course, other rtsxxx card readers may also be disabled by this bug. I don't have the knowledge to address that, so I'll leave to the driver maintainers. The patch to avoid the bug is attached. Fixes: bede03a579b3 ("misc: rtsx: Enable OCP for rts522a rts524a rts525a rts5260") Link: https://marc.info/?l=linux-kernel&m=159105912832257 Link: https://bugzilla.kernel.org/show_bug.cgi?id=204003 Signed-off-by: Chris Clayton bede03a579b3 introduced a bug which leaves the rts5229 PCI Express card reader on my Intel NUC6CAYH box. The bug is in drivers/misc/cardreader/rtsx_pcr.c. A call to rtsx_pci_init_ocp() was added to rtsx_pci_init_hw(). At the call point, pcr->ops->init_ocp is NULL and pcr->option.ocp_en is 0, so in rtsx_pci_init_ocp() the cardreader gets disabled. I've avoided this by making excution code that results in the reader being disabled conditional on the device not being an RTS5229. Of course, other rtsxxx card readers may also be disabled by this bug. I don't have the knowledge to address that, so I'll leave to the driver maintainers. The patch to avoid the bug is attached. Chris --- linux-5.7.12/drivers/misc/cardreader/rtsx_pcr.c.orig 2020-08-02 13:36:50.216947944 +0100 +++ linux-5.7.12/drivers/misc/cardreader/rtsx_pcr.c 2020-08-02 18:37:30.456610731 +0100 @@ -1200,9 +1200,13 @@ void rtsx_pci_init_ocp(struct rtsx_pcr * SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch); rtsx_pci_enable_ocp(pcr); } else { - /* OC power down */ - rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, -OC_POWER_DOWN); + /* On (some?) Intel NUC platforms, this disables + * the rts5229 cardreader, so don't do it + */ + if(!CHK_PCI_PID(pcr, 0x5229)) +/* OC power down */ +rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN, + OC_POWER_DOWN); } } }
Re: Linux 5.3.6
On 12/10/2019 21:55, Gabriel C wrote: > Am Sa., 12. Okt. 2019 um 21:16 Uhr schrieb Chris Clayton > : >> >> >>> I'm announcing the release of the 5.3.6 kernel. >> >> >> 5.3.6 build fails here with: >> >> arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found >> CC arch/x86/kernel/cpu/mce/threshold.o >> make[3]: *** [arch/x86/entry/vdso/Makefile:59: >> arch/x86/entry/vdso/vdso64.so.dbg] Error 1 >> make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg' >> make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2 >> make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 >> make[1]: *** Waiting for unfinished jobs >> > > What is your default linker ? > > Also does make LD=ld.bfd fixes that for you ? > Thanks, Gabriel. The default linker is gold, but your suggestion above fixed the build. I think I'll set the default to LD.bfd. > See https://bugzilla.kernel.org/show_bug.cgi?id=204951 > > BR, > > Gabriel C. >
Re: Linux 5.3.6
> I'm announcing the release of the 5.3.6 kernel. 5.3.6 build fails here with: arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found CC arch/x86/kernel/cpu/mce/threshold.o make[3]: *** [arch/x86/entry/vdso/Makefile:59: arch/x86/entry/vdso/vdso64.so.dbg] Error 1 make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg' make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2 make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 make[1]: *** Waiting for unfinished jobs.... Chris Clayton
Re: [PATCH] timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
Thanks Thomas. On 22/08/2019 12:00, Thomas Gleixner wrote: > The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the > nanoseconds based boot time offset left by the clocksource shift. That > overflows once the boot time offset becomes large enough. As a consequence > CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to > misbehave. > > Fix it by storing a timespec64 representation of the offset when boot time > is adjusted and add that to the MONOTONIC base time value in the vdso data > page. Using the timespec64 representation avoids a 64bit division in the > update code. > I've tested resume from both suspend and hibernate and this patch fixes the problem I reported. Tested-by: Chris Clayton > Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() > implementation") > Reported-by: Chris Clayton > Signed-off-by: Thomas Gleixner > --- > include/linux/timekeeper_internal.h |5 + > kernel/time/timekeeping.c |5 + > kernel/time/vsyscall.c | 22 +- > 3 files changed, 23 insertions(+), 9 deletions(-) > > --- a/include/linux/timekeeper_internal.h > +++ b/include/linux/timekeeper_internal.h > @@ -57,6 +57,7 @@ struct tk_read_base { > * @cs_was_changed_seq: The sequence number of clocksource change events > * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second > * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds > + * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset > * @cycle_interval: Number of clock cycles in one NTP interval > * @xtime_interval: Number of clock shifted nano seconds in one NTP > * interval. > @@ -84,6 +85,9 @@ struct tk_read_base { > * > * wall_to_monotonic is no longer the boot time, getboottime must be > * used instead. > + * > + * @monotonic_to_boottime is a timespec64 representation of @offs_boot to > + * accelerate the VDSO update for CLOCK_BOOTTIME. > */ > struct timekeeper { > struct tk_read_base tkr_mono; > @@ -99,6 +103,7 @@ struct timekeeper { > u8 cs_was_changed_seq; > ktime_t next_leap_ktime; > u64 raw_sec; > + struct timespec64 monotonic_to_boot; > > /* The following members are for timekeeping internal use */ > u64 cycle_interval; > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -146,6 +146,11 @@ static void tk_set_wall_to_mono(struct t > static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) > { > tk->offs_boot = ktime_add(tk->offs_boot, delta); > + /* > + * Timespec representation for VDSO update to avoid 64bit division > + * on every update. > + */ > + tk->monotonic_to_boot = ktime_to_timespec64(tk->offs_boot); > } > > /* > --- a/kernel/time/vsyscall.c > +++ b/kernel/time/vsyscall.c > @@ -17,7 +17,7 @@ static inline void update_vdso_data(stru > struct timekeeper *tk) > { > struct vdso_timestamp *vdso_ts; > - u64 nsec; > + u64 nsec, sec; > > vdata[CS_HRES_COARSE].cycle_last= tk->tkr_mono.cycle_last; > vdata[CS_HRES_COARSE].mask = tk->tkr_mono.mask; > @@ -45,23 +45,27 @@ static inline void update_vdso_data(stru > } > vdso_ts->nsec = nsec; > > - /* CLOCK_MONOTONIC_RAW */ > - vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW]; > - vdso_ts->sec= tk->raw_sec; > - vdso_ts->nsec = tk->tkr_raw.xtime_nsec; > + /* Copy MONOTONIC time for BOOTTIME */ > + sec = vdso_ts->sec; > + /* Add the boot offset */ > + sec += tk->monotonic_to_boot.tv_sec; > + nsec+= (u64)tk->monotonic_to_boot.tv_nsec << tk->tkr_mono.shift; > > /* CLOCK_BOOTTIME */ > vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME]; > - vdso_ts->sec= tk->xtime_sec + tk->wall_to_monotonic.tv_sec; > - nsec = tk->tkr_mono.xtime_nsec; > - nsec += ((u64)(tk->wall_to_monotonic.tv_nsec + > -ktime_to_ns(tk->offs_boot)) << tk->tkr_mono.shift); > + vdso_ts->sec= sec; > + > while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { > nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); > vdso_ts->sec++; > } > vdso_ts->nsec = nsec; > > + /* CLOCK_MONOTONIC_RAW */ > + vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW]; > + vdso_ts->sec= tk->raw_sec; > + vdso_ts->nsec = tk->tkr_raw.xtime_nsec; > + > /* CLOCK_TAI */ > vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI]; > vdso_ts->sec= tk->xtime_sec + (s64)tk->tai_offset; >
Re: PROBLEM: 5.3.0-rc* causes iwlwifi failure
Thanks, Stuart. On 18/08/2019 11:55, Stuart Little wrote: > On Sun, Aug 18, 2019 at 09:17:59AM +0100, Chris Clayton wrote: >> >> >> On 17/08/2019 22:44, Stuart Little wrote: >>> After some private coaching from Serge Belyshev on git-revert I can confirm >>> that reverting that commit atop the current tree resolves the issue (the >>> wifi card scans for and finds networks just fine, no dmesg errors reported, >>> etc.). >>> >> >> I've reported the "Microcode SW error detected" issue too, but, wrongly, >> only to LKML. I'll point that thread to this >> one. I've also been experiencing my network stopping working after suspend >> resume, but haven't got round to reporting >> that yet. >> >> What was the git magic that you acquired to revert the patch, please? >> By following the advice below, I reverted 4fd445a2c855bbcab81fbe06d110e78dbd974a5b and using the resultant kernel I haven't seen the "Microcode SW error detected" again. I am, however, still experiencing the problem of my network not working after resume from suspend. I've reported it to LKML, so it can be followed there should anyone need/want to. > > $ git revert > > This will fail as noted, but will place in a revert mode where you can fix > the errors. > > $ git status > > will show (it did in my case, for the latest Linux tree at the time I did > this) a modified file > > drivers/net/wireless/intel/iwlwifi/mvm/fw.c > > to be committed without issue and a conflicted file > > drivers/net/wireless/intel/iwlwifi/mvm/nvm.c > > whose conflicts you have to first resolve. > > I then opened that conflicted file in a text editor and simply removed > everything between the lines > > <<<<<<< HEAD > > and > >>>>>>>> parent of 4fd445a2c855... iwlwifi: mvm: Add log information about SAR >>>>>>>> status > > (inclusive). This resolved the conflict, whereupon > > $ git revert --continue > > and > > $ git commit -a > > will finish the reversion. > >>> On Sat, Aug 17, 2019 at 11:59:59AM +0300, Serge Belyshev wrote: >>>> >>>>> I am on an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz running Linux >>>>> x86_64 (Slackware), with a custom-compiled 5.3.0-rc4 (.config >>>>> attached). >>>>> >>>>> I am using the Intel wifi adapter on this machine: >>>>> >>>>> 02:00.0 Network controller: Intel Corporation Device 24fb (rev 10) >>>>> >>>>> with the iwlwifi driver. I am attaching the output to 'lspci -vv -s >>>>> 02:00.0' as the file device-info. >>>>> >>>>> All 5.3.0-rc* versions I have tried (including rc4) cause multiple >>>>> dmesg iwlwifi-related errors (dmesg attached). Examples: >>>>> >>>>> iwlwifi :02:00.0: Failed to get geographic profile info -5 >>>>> iwlwifi :02:00.0: Microcode SW error detected. Restarting 0x8200 >>>>> iwlwifi :02:00.0: 0x0038 | BAD_COMMAND >>>>> >>>> >>>> I have my logs filled with similar garbage throughout 5.3-rc*. Also >>>> since 5.3-rcsomething not only it WARNS in dmesg about firmware failure, >>>> but completely stops working after suspend/resume cycle. >>>> >>>> It looks like that: >>>> >>>> commit 4fd445a2c855bbcab81fbe06d110e78dbd974a5b >>>> Author: Haim Dreyfuss >>>> Date: Thu May 2 11:45:02 2019 +0300 >>>> >>>> iwlwifi: mvm: Add log information about SAR status >>>> >>>> Inform users when SAR status is changing. >>>> >>>> Signed-off-by: Haim Dreyfuss >>>> Signed-off-by: Luca Coelho >>>> >>>> >>>> is the culprit. (manually) reverting it on top of 5.3-rc4 makes >>>> everything work again. >>>
Regression in 5.3-rc1 and later
Hi everyone, Firstly, apologies to anyone on the long cc list that turns out not to be particularly interested in the following, but you were all marked as cc'd in the commit message below. I've found a problem that isn't present in 5.2 series or 4.19 series kernels, and seems to have arrived in 5.3-rc1. The problem is that if I suspend (to ram) my laptop, on resume 14 minutes or more after suspending, I have no networking functionality. If I resume the laptop after 13 minutes or less, networking works fine. I haven't tried to get finer grained timings between 13 and 14 minutes, but can do if it would help. ifconfig shows that wlan0 is still up and still has its assigned ip address but, for instance, a ping of any other device on my network, fails as does pinging, say, kernel.org. I've tried "downing" the network with (/sbin/ifdown) and unloading the iwlmvm module and then reloading the module and "upping" (/sbin/ifup) the network, but my network is still unusable. I should add that the problem also manifests if I hibernate the laptop, although my testing of this has been minimal. I can do more if required. As I say, the problem first appears in 5.3-rc1, so I've bisected between 5.2.0 and 5.3-rc1 and that concluded with: [chris:~/kernel/linux]$ git bisect good 7ac8707479886c75f353bfb6a8273f423cfccb23 is the first bad commit commit 7ac8707479886c75f353bfb6a8273f423cfccb23 Author: Vincenzo Frascino Date: Fri Jun 21 10:52:49 2019 +0100 x86/vdso: Switch to generic vDSO implementation The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library. Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday [ tglx: Massaged changelog and cleaned up the function signature formatting ] Signed-off-by: Vincenzo Frascino Signed-off-by: Thomas Gleixner Cc: linux-a...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-m...@vger.kernel.org Cc: linux-kselft...@vger.kernel.org Cc: Catalin Marinas Cc: Will Deacon Cc: Arnd Bergmann Cc: Russell King Cc: Ralf Baechle Cc: Paul Burton Cc: Daniel Lezcano Cc: Mark Salyzyn Cc: Peter Collingbourne Cc: Shuah Khan Cc: Dmitry Safonov <0x7f454...@gmail.com> Cc: Rasmus Villemoes Cc: Huw Davies Cc: Shijith Thotton Cc: Andre Przywara Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frasc...@arm.com arch/x86/Kconfig | 3 + arch/x86/entry/vdso/Makefile | 9 ++ arch/x86/entry/vdso/vclock_gettime.c | 245 --- arch/x86/entry/vdso/vdsox32.lds.S| 1 + arch/x86/entry/vsyscall/Makefile | 2 - arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 --- arch/x86/include/asm/pvclock.h | 2 +- arch/x86/include/asm/vdso/gettimeofday.h | 191 arch/x86/include/asm/vdso/vsyscall.h | 44 ++ arch/x86/include/asm/vgtod.h | 75 +- arch/x86/include/asm/vvar.h | 7 +- arch/x86/kernel/pvclock.c| 1 + 12 files changed, 284 insertions(+), 379 deletions(-) delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h create mode 100644 arch/x86/include/asm/vdso/vsyscall.h To confirm my bisection was correct, I did a git checkout of 7ac8707479886c75f353bfb6a8273f423cfccb2. As expected, the kernel exhibited the problem I've described. However, a kernel built at the immediately preceding (parent?) commit (bfe801ebe84f42b4666d3f0adde90f504d56e35b) has a working network after a (>= 14minute) suspend/resume cycle. As the module name implies, I'm using wireless networking. The hardware is detected as "Intel(R) Wireless-AC 9260 160MHz, REV=0x324" by iwlwifi. I'm more than happy to provide additional diagnostics (but may need a little hand-holding) and to apply diagnostic or fix patches, but please cc me on any reply as I'm not subscribed to any of the kernel-related mailing lists. Chris
Re: iwlwifi: microcode SW error detected
On 18/08/2019 09:21, Chris Clayton wrote: > > > On 17/08/2019 08:19, Chris Clayton wrote: >> Hi. >> >> I just found the following error in the output from dmesg. >> >> [ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting >> 0x0. > > Since reporting, I've found that this problem is being explored in the thread > that starts at > https://marc.info/?l=linux-kernel&m=15660151913. Mmm, that's a dead link. Don't knwo what happened there but the real link is https://marc.info/?l=linux-kernel&m=156265244614126 > > Chris >
linux-5.3.0-rc5: new build warning
Hi, I've just built 5.3.0-rc5 and a warning that I do not recall having seen before was emitted: ... HOSTCC scripts/extract-cert HOSTCC /mnt/kernel/linux/tools/objtool/fixdep.o HOSTLD arch/x86/tools/relocs HOSTLD /mnt/kernel/linux/tools/objtool/fixdep-in.o LINK /mnt/kernel/linux/tools/objtool/fixdep CC /mnt/kernel/linux/tools/objtool/builtin-check.o CC /mnt/kernel/linux/tools/objtool/builtin-orc.o GEN /mnt/kernel/linux/tools/objtool/arch/x86/lib/inat-tables.c awk: arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence `\:' is not a known regexp operator awk: arch/x86/tools/gen-insn-attr-x86.awk:350: (FILENAME=arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape sequence `\&' is not a known regexp operator CC /mnt/kernel/linux/tools/objtool/exec-cmd.o CC /mnt/kernel/linux/tools/objtool/check.o CC /mnt/kernel/linux/tools/objtool/arch/x86/decode.o CC /mnt/kernel/linux/tools/objtool/orc_gen.o CC /mnt/kernel/linux/tools/objtool/help.o CC /mnt/kernel/linux/tools/objtool/orc_dump.o CC /mnt/kernel/linux/tools/objtool/pager.o ... Happy to test the fix, but please cc me as I'm not subscribed Chris
Re: iwlwifi: microcode SW error detected
On 17/08/2019 08:19, Chris Clayton wrote: > Hi. > > I just found the following error in the output from dmesg. > > [ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting > 0x0. Since reporting, I've found that this problem is being explored in the thread that starts at https://marc.info/?l=linux-kernel&m=15660151913. Chris
Re: PROBLEM: 5.3.0-rc* causes iwlwifi failure
On 17/08/2019 22:44, Stuart Little wrote: > After some private coaching from Serge Belyshev on git-revert I can confirm > that reverting that commit atop the current tree resolves the issue (the wifi > card scans for and finds networks just fine, no dmesg errors reported, etc.). > I've reported the "Microcode SW error detected" issue too, but, wrongly, only to LKML. I'll point that thread to this one. I've also been experiencing my network stopping working after suspend resume, but haven't got round to reporting that yet. What was the git magic that you acquired to revert the patch, please? > On Sat, Aug 17, 2019 at 11:59:59AM +0300, Serge Belyshev wrote: >> >>> I am on an Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz running Linux >>> x86_64 (Slackware), with a custom-compiled 5.3.0-rc4 (.config >>> attached). >>> >>> I am using the Intel wifi adapter on this machine: >>> >>> 02:00.0 Network controller: Intel Corporation Device 24fb (rev 10) >>> >>> with the iwlwifi driver. I am attaching the output to 'lspci -vv -s >>> 02:00.0' as the file device-info. >>> >>> All 5.3.0-rc* versions I have tried (including rc4) cause multiple >>> dmesg iwlwifi-related errors (dmesg attached). Examples: >>> >>> iwlwifi :02:00.0: Failed to get geographic profile info -5 >>> iwlwifi :02:00.0: Microcode SW error detected. Restarting 0x8200 >>> iwlwifi :02:00.0: 0x0038 | BAD_COMMAND >>> >> >> I have my logs filled with similar garbage throughout 5.3-rc*. Also >> since 5.3-rcsomething not only it WARNS in dmesg about firmware failure, >> but completely stops working after suspend/resume cycle. >> >> It looks like that: >> >> commit 4fd445a2c855bbcab81fbe06d110e78dbd974a5b >> Author: Haim Dreyfuss >> Date: Thu May 2 11:45:02 2019 +0300 >> >> iwlwifi: mvm: Add log information about SAR status >> >> Inform users when SAR status is changing. >> >> Signed-off-by: Haim Dreyfuss >> Signed-off-by: Luca Coelho >> >> >> is the culprit. (manually) reverting it on top of 5.3-rc4 makes >> everything work again. >
iwlwifi: microcode SW error detected
Hi. I just found the following error in the output from dmesg. [ 4023.460058] iwlwifi :02:00.0: Microcode SW error detected. Restarting 0x0. [ 4023.460178] iwlwifi :02:00.0: Start IWL Error Log Dump: [ 4023.460179] iwlwifi :02:00.0: Status: 0x0080, count: 6 [ 4023.460180] iwlwifi :02:00.0: Loaded firmware version: 46.93e59cf4.0 [ 4023.460181] iwlwifi :02:00.0: 0x22CE | ADVANCED_SYSASSERT [ 4023.460182] iwlwifi :02:00.0: 0x0590A2F0 | trm_hw_status0 [ 4023.460182] iwlwifi :02:00.0: 0x | trm_hw_status1 [ 4023.460183] iwlwifi :02:00.0: 0x00488472 | branchlink2 [ 4023.460183] iwlwifi :02:00.0: 0x00479392 | interruptlink1 [ 4023.460184] iwlwifi :02:00.0: 0x | interruptlink2 [ 4023.460184] iwlwifi :02:00.0: 0x012C | data1 [ 4023.460185] iwlwifi :02:00.0: 0x | data2 [ 4023.460186] iwlwifi :02:00.0: 0x0400 | data3 [ 4023.460186] iwlwifi :02:00.0: 0x42001A44 | beacon time [ 4023.460187] iwlwifi :02:00.0: 0x4E9F05CD | tsf low [ 4023.460187] iwlwifi :02:00.0: 0x00D8 | tsf hi [ 4023.460188] iwlwifi :02:00.0: 0x | time gp1 [ 4023.460188] iwlwifi :02:00.0: 0xEF55F6D0 | time gp2 [ 4023.460189] iwlwifi :02:00.0: 0x0001 | uCode revision type [ 4023.460190] iwlwifi :02:00.0: 0x002E | uCode version major [ 4023.460190] iwlwifi :02:00.0: 0x93E59CF4 | uCode version minor [ 4023.460191] iwlwifi :02:00.0: 0x0321 | hw version [ 4023.460191] iwlwifi :02:00.0: 0x00C89004 | board version [ 4023.460192] iwlwifi :02:00.0: 0x0A05001C | hcmd [ 4023.460192] iwlwifi :02:00.0: 0xA2F93802 | isr0 [ 4023.460193] iwlwifi :02:00.0: 0x0004 | isr1 [ 4023.460193] iwlwifi :02:00.0: 0x1802 | isr2 [ 4023.460194] iwlwifi :02:00.0: 0x40417DCD | isr3 [ 4023.460195] iwlwifi :02:00.0: 0x | isr4 [ 4023.460195] iwlwifi :02:00.0: 0x0A04001C | last cmd Id [ 4023.460196] iwlwifi :02:00.0: 0x00018802 | wait_event [ 4023.460196] iwlwifi :02:00.0: 0x4A88 | l2p_control [ 4023.460197] iwlwifi :02:00.0: 0x0020 | l2p_duration [ 4023.460197] iwlwifi :02:00.0: 0x03BF | l2p_mhvalid [ 4023.460198] iwlwifi :02:00.0: 0x00EF | l2p_addr_match [ 4023.460198] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel [ 4023.460199] iwlwifi :02:00.0: 0x19071250 | timestamp [ 4023.460199] iwlwifi :02:00.0: 0x14C0E8E8 | flow_handler [ 4023.460257] iwlwifi :02:00.0: 0x | ADVANCED_SYSASSERT [ 4023.460257] iwlwifi :02:00.0: 0x | umac branchlink1 [ 4023.460258] iwlwifi :02:00.0: 0x | umac branchlink2 [ 4023.460258] iwlwifi :02:00.0: 0x | umac interruptlink1 [ 4023.460259] iwlwifi :02:00.0: 0x | umac interruptlink2 [ 4023.460260] iwlwifi :02:00.0: 0x | umac data1 [ 4023.460260] iwlwifi :02:00.0: 0x | umac data2 [ 4023.460261] iwlwifi :02:00.0: 0x | umac data3 [ 4023.460261] iwlwifi :02:00.0: 0x | umac major [ 4023.460262] iwlwifi :02:00.0: 0x | umac minor [ 4023.460262] iwlwifi :02:00.0: 0x | frame pointer [ 4023.460263] iwlwifi :02:00.0: 0x | stack pointer [ 4023.460263] iwlwifi :02:00.0: 0x | last host cmd [ 4023.460264] iwlwifi :02:00.0: 0x | isr status reg [ 4023.460278] iwlwifi :02:00.0: Fseq Registers: [ 4023.460282] iwlwifi :02:00.0: 0x0568FC22 | FSEQ_ERROR_CODE [ 4023.460289] iwlwifi :02:00.0: 0x | FSEQ_TOP_INIT_VERSION [ 4023.460297] iwlwifi :02:00.0: 0xDFFC324F | FSEQ_CNVIO_INIT_VERSION [ 4023.460304] iwlwifi :02:00.0: 0xA371 | FSEQ_OTP_VERSION [ 4023.460312] iwlwifi :02:00.0: 0xC338B29A | FSEQ_TOP_CONTENT_VERSION [ 4023.460319] iwlwifi :02:00.0: 0xD9E91E16 | FSEQ_ALIVE_TOKEN [ 4023.460327] iwlwifi :02:00.0: 0xAC99E6BF | FSEQ_CNVI_ID [ 4023.460334] iwlwifi :02:00.0: 0x07665623 | FSEQ_CNVR_ID [ 4023.460342] iwlwifi :02:00.0: 0x01000200 | CNVI_AUX_MISC_CHIP [ 4023.460349] iwlwifi :02:00.0: 0x01300202 | CNVR_AUX_MISC_CHIP [ 4023.460357] iwlwifi :02:00.0: 0x485B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM [ 4023.460413] iwlwifi :02:00.0: 0x0BADCAFE | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR [ 4023.460421] iwlwifi :02:00.0: Collecting data: trigger 2 fired. [ 4023.460424] ieee80211 phy0: Hardware restart was requested [ 4024.639366] iwlwifi :02:00.0: Applying debug destination EXTERNAL_DRAM [ 4024.753171] iwlwifi :02:00.0: Applying debug destination EXTERNAL_DRAM [ 4024.817999] iwlwifi :02:00.0: FW already configured (0) - re-configuring [ 4024.829374] iwlwifi :02:00.0: BIOS contains WGDS but no WRDS The output messages from the driver when the system starts are: [3.667365] iwlwifi :02:00.0: enabling device ( -> 0002) [3.670357] iwlwifi :02:00.0: Found debug destination: EXTERNAL_DRAM [3.670360] iwlwifi :02:00.0: Found debug configuration: 0 [3.670525] iwlwifi 000
Re: [PATCH v2] x86/boot: save fields explicitly, zero out everything else
On 31/07/2019 06:46, john.hubb...@gmail.com wrote: > From: John Hubbard > > Recent gcc compilers (gcc 9.1) generate warnings about an > out of bounds memset, if you trying memset across several fields > of a struct. This generated a couple of warnings on x86_64 builds. > > Fix this by explicitly saving the fields in struct boot_params > that are intended to be preserved, and zeroing all the rest. > I applied John's patch below to v5.3-rc3-285-gecb095bff5d4 and have been running the resultant kernel for two days now, including 7 or 8 cold starts and reboots. The warnings that were produced by gcc9 are no longer emitted and, other than a pre-existing problem (no network after resume from suspend or hibernate which I will investigate and, if necessary, report later today), the kernel has supported my typical day to day activities (building software, email, browsing, listening to music, watching video) without problem. Tested-by: Chris Clayton > Suggested-by: Thomas Gleixner > Suggested-by: H. Peter Anvin > Signed-off-by: John Hubbard > --- > arch/x86/include/asm/bootparam_utils.h | 62 +++--- > 1 file changed, 47 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/include/asm/bootparam_utils.h > b/arch/x86/include/asm/bootparam_utils.h > index 101eb944f13c..514aee24b8de 100644 > --- a/arch/x86/include/asm/bootparam_utils.h > +++ b/arch/x86/include/asm/bootparam_utils.h > @@ -18,6 +18,20 @@ > * Note: efi_info is commonly left uninitialized, but that field has a > * private magic, so it is better to leave it unchanged. > */ > + > +#define sizeof_mbr(type, member) ({ sizeof(((type *)0)->member); }) > + > +#define BOOT_PARAM_PRESERVE(struct_member) \ > + { \ > + .start = offsetof(struct boot_params, struct_member), \ > + .len = sizeof_mbr(struct boot_params, struct_member), \ > + } > + > +struct boot_params_to_save { > + unsigned int start; > + unsigned int len; > +}; > + > static void sanitize_boot_params(struct boot_params *boot_params) > { > /* > @@ -35,21 +49,39 @@ static void sanitize_boot_params(struct boot_params > *boot_params) >* problems again. >*/ > if (boot_params->sentinel) { > - /* fields in boot_params are left uninitialized, clear them */ > - boot_params->acpi_rsdp_addr = 0; > - memset(&boot_params->ext_ramdisk_image, 0, > -(char *)&boot_params->efi_info - > - (char *)&boot_params->ext_ramdisk_image); > - memset(&boot_params->kbd_status, 0, > -(char *)&boot_params->hdr - > -(char *)&boot_params->kbd_status); > - memset(&boot_params->_pad7[0], 0, > -(char *)&boot_params->edd_mbr_sig_buffer[0] - > - (char *)&boot_params->_pad7[0]); > - memset(&boot_params->_pad8[0], 0, > -(char *)&boot_params->eddbuf[0] - > - (char *)&boot_params->_pad8[0]); > - memset(&boot_params->_pad9[0], 0, sizeof(boot_params->_pad9)); > + static struct boot_params scratch; > + char *bp_base = (char *)boot_params; > + char *save_base = (char *)&scratch; > + int i; > + > + const struct boot_params_to_save to_save[] = { > + BOOT_PARAM_PRESERVE(screen_info), > + BOOT_PARAM_PRESERVE(apm_bios_info), > + BOOT_PARAM_PRESERVE(tboot_addr), > + BOOT_PARAM_PRESERVE(ist_info), > + BOOT_PARAM_PRESERVE(acpi_rsdp_addr), > + BOOT_PARAM_PRESERVE(hd0_info), > + BOOT_PARAM_PRESERVE(hd1_info), > + BOOT_PARAM_PRESERVE(sys_desc_table), > + BOOT_PARAM_PRESERVE(olpc_ofw_header), > + BOOT_PARAM_PRESERVE(efi_info), > + BOOT_PARAM_PRESERVE(alt_mem_k), > + BOOT_PARAM_PRESERVE(scratch), > + BOOT_PARAM_PRESERVE(e820_entries), > + BOOT_PARAM_PRESERVE(eddbuf_entries), > + BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries), > + BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer), > + BOOT_PARAM_PRESERVE(e820_table), > + BOOT_PARAM_PRESERVE(eddbuf), > + }; > + > + memset(&scratch, 0, sizeof(scratch)); > + > + for (i = 0; i < ARRAY_SIZE(to_save); i++) > + memcpy(save_base + to_save[i].start, > +bp_base + to_save[i].start, to_save[i].len); > + > + memcpy(boot_params, save_base, sizeof(*boot_params)); > } > } > >
Re: Warnings whilst building 5.2.0+
On 09/07/2019 12:39, Chris Clayton wrote: > > > On 09/07/2019 11:37, Enrico Weigelt, metux IT consult wrote: >> On 09.07.19 08:06, Chris Clayton wrote: >> >> Hi, >> >>> I've pulled Linus' tree this morning and, after running 'make oldconfig', >>> tried a build. During that build I got the >>> following warnings, which look to me like they should be fixed. 'git >>> describe' shows v5.2-915-g5ad18b2e60b7 and my >>> compiler is the 20190706 snapshot of gcc 9. >> >> Thanks for the report. I'm rebuilding right know anyways, so I'll look >> out for it. > > Thanks for the reply. > >>> In file included from arch/x86/kernel/head64.c:35: >>> In function 'sanitize_boot_params', >>> inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2: >>> ./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset >>> [197, 448] from the object at 'boot_params' is >>> out of the bounds of referenced subobject 'ext_ramdisk_image' with type >>> 'unsigned int' at offset 192 [-Warray-bounds] >>>40 | memset(&boot_params->ext_ramdisk_image, 0, >>> | ^~ >>>41 | (char *)&boot_params->efi_info - >>> | >>>42 |(char *)&boot_params->ext_ramdisk_image); >>> | >>> ./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset >>> [493, 497] from the object at 'boot_params' is >>> out of the bounds of referenced subobject 'kbd_status' with type 'unsigned >>> char' at offset 491 [-Warray-bounds] >>>43 | memset(&boot_params->kbd_status, 0, >>> | ^~~ >>>44 | (char *)&boot_params->hdr - >>> | ~~~ >>>45 | (char *)&boot_params->kbd_status); >>> | ~ >> >> Can you check older versions, too ? Maybe also trying older gcc ? >> > > I see the same warnings building linux-5.2.0 with gcc9. However, I don't see > the warnings building linux-5.2.0 with the > the 20190705 of gcc8. So the warnings could result from an improvement (i.e. > the problem was in the kernel, but > undiscovered by gcc8) or from a regression in gcc9. > >From the discussion starting at >https://marc.info/?l=linux-kernel&m=156401014023908, it would appear that the >problem is undiscovered by gcc8. Building a fresh pull of Linus' tree this morning (v5.3-rc3-282-g33920f1ec5bf), I see that the warnings are still being emitted. Adding the participants in the other discussion to this one. >> >> --mtx >>
Re: Warnings whilst building 5.2.0+
On 09/07/2019 11:37, Enrico Weigelt, metux IT consult wrote: > On 09.07.19 08:06, Chris Clayton wrote: > > Hi, > >> I've pulled Linus' tree this morning and, after running 'make oldconfig', >> tried a build. During that build I got the >> following warnings, which look to me like they should be fixed. 'git >> describe' shows v5.2-915-g5ad18b2e60b7 and my >> compiler is the 20190706 snapshot of gcc 9. > > Thanks for the report. I'm rebuilding right know anyways, so I'll look > out for it. Thanks for the reply. >> In file included from arch/x86/kernel/head64.c:35: >> In function 'sanitize_boot_params', >> inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2: >> ./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset >> [197, 448] from the object at 'boot_params' is >> out of the bounds of referenced subobject 'ext_ramdisk_image' with type >> 'unsigned int' at offset 192 [-Warray-bounds] >>40 | memset(&boot_params->ext_ramdisk_image, 0, >> | ^~ >>41 | (char *)&boot_params->efi_info - >> | >>42 |(char *)&boot_params->ext_ramdisk_image); >> | >> ./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset >> [493, 497] from the object at 'boot_params' is >> out of the bounds of referenced subobject 'kbd_status' with type 'unsigned >> char' at offset 491 [-Warray-bounds] >>43 | memset(&boot_params->kbd_status, 0, >> | ^~~ >>44 | (char *)&boot_params->hdr - >> | ~~~ >>45 | (char *)&boot_params->kbd_status); >> | ~ > > Can you check older versions, too ? Maybe also trying older gcc ? > I see the same warnings building linux-5.2.0 with gcc9. However, I don't see the warnings building linux-5.2.0 with the the 20190705 of gcc8. So the warnings could result from an improvement (i.e. the problem was in the kernel, but undiscovered by gcc8) or from a regression in gcc9. > > --mtx >
Warnings whilst building 5.2.0+
Hi, I've pulled Linus' tree this morning and, after running 'make oldconfig', tried a build. During that build I got the following warnings, which look to me like they should be fixed. 'git describe' shows v5.2-915-g5ad18b2e60b7 and my compiler is the 20190706 snapshot of gcc 9. In file included from arch/x86/kernel/head64.c:35: In function 'sanitize_boot_params', inlined from 'copy_bootdata' at arch/x86/kernel/head64.c:391:2: ./arch/x86/include/asm/bootparam_utils.h:40:3: warning: 'memset' offset [197, 448] from the object at 'boot_params' is out of the bounds of referenced subobject 'ext_ramdisk_image' with type 'unsigned int' at offset 192 [-Warray-bounds] 40 | memset(&boot_params->ext_ramdisk_image, 0, | ^~ 41 | (char *)&boot_params->efi_info - | 42 |(char *)&boot_params->ext_ramdisk_image); | ./arch/x86/include/asm/bootparam_utils.h:43:3: warning: 'memset' offset [493, 497] from the object at 'boot_params' is out of the bounds of referenced subobject 'kbd_status' with type 'unsigned char' at offset 491 [-Warray-bounds] 43 | memset(&boot_params->kbd_status, 0, | ^~~ 44 | (char *)&boot_params->hdr - | ~~~ 45 | (char *)&boot_params->kbd_status); | ~ Happy to test any patches, but please cc me as I'm not subscribed to LKML. Chris
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
On 11/10/2018 13:23, Maciej S. Szmigiero wrote: > On 11.10.2018 10:24, Chris Clayton wrote: >> On 11/10/2018 01:12, Maciej S. Szmigiero wrote: >>> On 11.10.2018 00:49, Chris Clayton wrote: >>>>> Now, knowing the "right" value you can experiment with what >>>>> rtl_init_rxcfg() >>>>> writes (under the "default:" label for your NIC model). >>>>> >>>> >>>> This might be more interesting. Through a combination of viewing the >>>> output from pr_notice() and the output from >>>> "ethtool -d", I can see RxConfig with the following values >>>> >>>>During boot:0x00028700 >>>>Before suspend: 0x0002870e >>>>During resume: 0x00024000 >>>>Post resume:0x0002870e >>>> >>>> As I did with 4.18.10 early on in the process, I removed the call to >>>> rtl_init_rxcfg() from rtl_hw_start() and rebuilt, >>>> installed and rebooted. Now I see the following values: >>>> >>>>During boot:0x00028700 >>>>Before suspend: 0x0002870e >>>>During resume: 0x00024000 >>>>Post resume:0x0002400e >>>> >>> >>> Now we can finally see some difference... >>> Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST >>> (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this >>> is kind of expected - one can see that the working configuration >>> post-resume has bit 14 (or 0x4000) set, too. >>> >>> This bit is described in the driver as RX_MULTI_EN ("8111c only") and is >>> set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. >>> >>> RTL_GIGA_MAC_VER_35 is described in the driver as being in the same >>> family as your RTL_GIGA_MAC_VER_38, so can you please try the following >>> change: >>> --- r8169.c >>> +++ r8169.c >>> @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 >>> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: >>> case RTL_GIGA_MAC_VER_34: >>> case RTL_GIGA_MAC_VER_35: >>> + case RTL_GIGA_MAC_VER_38: >>> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | >>> RX_DMA_BURST); >>> break; >>> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: >>> >>> This will add RX_MULTI_EN also for your chip model (you need to add back >>> the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). >>> >> >> That's done the trick. With the above change applied, my network runs >> running fine after a suspend/resume cycle and the >> ping times are back in the 14-15ms range. > > Nice! > > I will submit a patch, it would be great if you could test it and then > add a "Tested-by:" tag. > Will do, Maciej. Thanks for solving this. >> Chris > > Maciej >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
On 11/10/2018 01:12, Maciej S. Szmigiero wrote: > On 11.10.2018 00:49, Chris Clayton wrote: >>> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >>> writes (under the "default:" label for your NIC model). >>> >> >> This might be more interesting. Through a combination of viewing the output >> from pr_notice() and the output from >> "ethtool -d", I can see RxConfig with the following values >> >> During boot:0x00028700 >> Before suspend: 0x0002870e >> During resume: 0x00024000 >> Post resume:0x0002870e >> >> As I did with 4.18.10 early on in the process, I removed the call to >> rtl_init_rxcfg() from rtl_hw_start() and rebuilt, >> installed and rebooted. Now I see the following values: >> >> During boot:0x00028700 >> Before suspend: 0x0002870e >> During resume: 0x00024000 >> Post resume:0x0002400e >> > > Now we can finally see some difference... > Besides missing RX128_INT_EN (bit 15 or 0x8000) and RX_DMA_BURST > (bits 8-10 or 0x700) - that rtl_init_rxcfg() would normally set so this > is kind of expected - one can see that the working configuration > post-resume has bit 14 (or 0x4000) set, too. > > This bit is described in the driver as RX_MULTI_EN ("8111c only") and is > set by rtl_init_rxcfg() for example for RTL_GIGA_MAC_VER_35. > > RTL_GIGA_MAC_VER_35 is described in the driver as being in the same > family as your RTL_GIGA_MAC_VER_38, so can you please try the following > change: > --- r8169.c > +++ r8169.c > @@ -4271,6 +4271,7 @@ static void rtl_init_rxcfg(struct rtl816 > case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24: > case RTL_GIGA_MAC_VER_34: > case RTL_GIGA_MAC_VER_35: > + case RTL_GIGA_MAC_VER_38: > RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | > RX_DMA_BURST); > break; > case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51: > > This will add RX_MULTI_EN also for your chip model (you need to add back > the call to rtl_init_rxcfg() to rtl_hw_start(), naturally). > That's done the trick. With the above change applied, my network runs running fine after a suspend/resume cycle and the ping times are back in the 14-15ms range. Chris > If this does not help then I would try another values in the above write: > 1) RTL_W32(tp, RxConfig, 0x00024000); > 2) RTL_W32(tp, RxConfig, 0x4000); > 3) RTL_W32(tp, RxConfig, RX_DMA_BURST); > 4) RTL_W32(tp, RxConfig, RX128_INT_EN); > >> Chris > > Maciej >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
OK, right kernel/module used this time. Please see findings below. On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but >>>>> tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a >>>>> resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost >>>>> immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >>>>> quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet >>> device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre >>> and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is >>> 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from >> rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > This change made no difference. Networking still dies if I open a browser or leave ping running long enough. > 2) Check the original value of RxConfig (after a resume) before > rtl_init_rxcfg() overwrites it (compile tested only): > --- r8169.c.ori > +++ r8169.c > @@ -5155,6 +5155,9 @@ > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + > + pr_notice("RxConfig before init was %.8x\n", > + (unsigned int)RTL_R32(tp, RxConfig)); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > > This should be the value that you got when you removed the call to > rtl_init_rxcfg() for testing. > Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() > writes (under the "default:" label for your NIC model). > This might be more interesting. Through a combination of viewing the output from pr_notice() and the output from "ethtool -d", I can see RxConfig with the following values During boot:0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume:0x0002870e As I did with 4.18.10 early on in the process, I removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the following values: During boot:0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume:0x0002400e As with 4.18.10, networking now appears to be stable after the resume. Starting a browser results in my homepage being displayed and I've spent a few minutes surfing with no interruptions. Similarly, ping runs without stopping. I simply don't know enough to know what might now be enabled or disabled by this change in value, but hopefully it will provide a clue to someone as to what is going on. Chris > Hope this helps, > Maciej >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Too late at night to be doing this stuff. Clicked send instead of saving a draft. Sorry, please ignore. On 10/10/2018 23:30, Chris Clayton wrote: > OK, right kernel/module used this time. Please see findings below. > > On 10/10/2018 01:24, Maciej S. Szmigiero wrote: >> On 09.10.2018 22:36, Heiner Kallweit wrote: >>> On 09.10.2018 16:40, Chris Clayton wrote: >>>> Thanks to Maciej and Heiner for their replies. >>>> >>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>>> Hi again, >>>>>> >>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, >>>>>> but tried it anyway. I can confirm that the >>>>>> regression is still present and my network still fails when, after a >>>>>> resume from suspend (to ram or disk), I open my >>>>>> browser or my mail client. In both those cases the failure is almost >>>>>> immediate - e.g. my home page doesn't get displayed >>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite >>>>>> so quickly but the reported time increases from >>>>>> 14-15ms to more than 1000ms. >>>>> >>>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>>> state (before a suspend) and in the broken state (after a resume). >>>>> Maybe there will be some obvious in the difference. >>>>> >>>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>>> >>>> Maciej suggested comparing the output from lspci -vv for the ethernet >>>> device. They are identical. >>>> >>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" >>>> pre and post suspend. Again, they are identical. >>>> Heiner specifically suggested looking at the RxConfig. The value of that >>>> is 0x0002870e both pre and post suspend. >>>> >>> Hmm, this is very weird, especially taking into account that in your >>> original >>> report you state that removing the call to rtl_init_rxcfg() from >>> rtl_hw_start() >>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >>> register values seem to be the same before and after resume. So how can the >>> chip behave differently? >>> So far my best guess is that some chip quirk causes it to accept writes to >>> register RxConfig, but to misinterpret or ignore the written value. >>> So far your report is the only one (affecting RTL8411), but we don't know >>> whether other chip versions are affected too. >> >> Also, it is interesting that even if one removes a call to >> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get >> written to moments later by rtl_set_rx_mode(). >> >> The only chip accesses in the meantime seems to be a write to TxConfig by >> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes >> to MAR0 earlier in rtl_set_rx_mode(). >> >> My proposals are: >> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" >> in rtl_hw_start(). >> Maybe the chip does not like sometimes that RxConfig is written before >> TxConfig. >> > > This change made no difference. Networking still dies if I open a browser or > leave ping running long enough. > >> 2) Check the original value of RxConfig (after a resume) before >> rtl_init_rxcfg() overwrites it (compile tested only): >> --- r8169.c.ori >> +++ r8169.c >> @@ -5155,6 +5155,9 @@ >> /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ >> RTL_R8(tp, IntrMask); >> RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); >> + >> +pr_notice("RxConfig before init was %.8x\n", >> +(unsigned int)RTL_R32(tp, RxConfig)); >> rtl_init_rxcfg(tp); >> rtl_set_tx_config_registers(tp); >> >> >> This should be the value that you got when you removed the call to >> rtl_init_rxcfg() for testing. >> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >> writes (under the "default:" label for your NIC model). > > This might be more interesting. Through combination of viewing the output > from pr_notice() and the output from "ethtool > -d", I can see RxConfig with the following values > > During boot:0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume:0x0002870e > > I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, > installed and rebooted. Now I see the > following values: > > During boot:0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume:0x0002870e > >> >> Hope this helps, >> Maciej >>
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
OK, right kernel/module used this time. Please see findings below. On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but >>>>> tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a >>>>> resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost >>>>> immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >>>>> quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet >>> device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre >>> and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is >>> 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from >> rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > This change made no difference. Networking still dies if I open a browser or leave ping running long enough. > 2) Check the original value of RxConfig (after a resume) before > rtl_init_rxcfg() overwrites it (compile tested only): > --- r8169.c.ori > +++ r8169.c > @@ -5155,6 +5155,9 @@ > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + > + pr_notice("RxConfig before init was %.8x\n", > + (unsigned int)RTL_R32(tp, RxConfig)); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > > This should be the value that you got when you removed the call to > rtl_init_rxcfg() for testing. > Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() > writes (under the "default:" label for your NIC model). This might be more interesting. Through combination of viewing the output from pr_notice() and the output from "ethtool -d", I can see RxConfig with the following values During boot:0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume:0x0002870e I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, installed and rebooted. Now I see the following values: During boot:0x00028700 Before suspend: 0x0002870e During resume: 0x00024000 Post resume:0x0002870e > > Hope this helps, > Maciej >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Sorry, I forgot that editing r8169.c and rebuilding would result in rc7+, so I tested the wrong kernel/module to get the results I provided below. That, however, may make the results more interesting because they happened with a virgin rc7 kernel/module. I'll test your proposals properly later. Chris On 10/10/2018 09:09, Chris Clayton wrote: > > > On 10/10/2018 01:24, Maciej S. Szmigiero wrote: >> On 09.10.2018 22:36, Heiner Kallweit wrote: >>> On 09.10.2018 16:40, Chris Clayton wrote: >>>> Thanks to Maciej and Heiner for their replies. >>>> >>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>>> Hi again, >>>>>> >>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, >>>>>> but tried it anyway. I can confirm that the >>>>>> regression is still present and my network still fails when, after a >>>>>> resume from suspend (to ram or disk), I open my >>>>>> browser or my mail client. In both those cases the failure is almost >>>>>> immediate - e.g. my home page doesn't get displayed >>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite >>>>>> so quickly but the reported time increases from >>>>>> 14-15ms to more than 1000ms. >>>>> >>>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>>> state (before a suspend) and in the broken state (after a resume). >>>>> Maybe there will be some obvious in the difference. >>>>> >>>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>>> >>>> Maciej suggested comparing the output from lspci -vv for the ethernet >>>> device. They are identical. >>>> >>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" >>>> pre and post suspend. Again, they are identical. >>>> Heiner specifically suggested looking at the RxConfig. The value of that >>>> is 0x0002870e both pre and post suspend. >>>> >>> Hmm, this is very weird, especially taking into account that in your >>> original >>> report you state that removing the call to rtl_init_rxcfg() from >>> rtl_hw_start() >>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >>> register values seem to be the same before and after resume. So how can the >>> chip behave differently? >>> So far my best guess is that some chip quirk causes it to accept writes to >>> register RxConfig, but to misinterpret or ignore the written value. >>> So far your report is the only one (affecting RTL8411), but we don't know >>> whether other chip versions are affected too. >> >> Also, it is interesting that even if one removes a call to >> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get >> written to moments later by rtl_set_rx_mode(). >> >> The only chip accesses in the meantime seems to be a write to TxConfig by >> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes >> to MAR0 earlier in rtl_set_rx_mode(). >> >> My proposals are: >> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" >> in rtl_hw_start(). >> Maybe the chip does not like sometimes that RxConfig is written before >> TxConfig. >> > After testing your first proposal, which made no difference, I founf the > following in dmesg in the output from dmesg: > > [ 761.999468] [ cut here ] > [ 761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out > [ 761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 > dev_watchdog+0x1e9/0x1f0 > [ 761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep > iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE > nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc > videobuf2_memops snd_hda_codec_via > videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common > usbhid realtek coretemp snd_hda_intel hwmon > snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last > unloaded: btintel] > [ 761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328 > [ 761.999504] Hardware name: Notebook W65_67SZ > /W65_67SZ >, BIOS 1.03.05 02/26/2014 &
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
On 10/10/2018 01:24, Maciej S. Szmigiero wrote: > On 09.10.2018 22:36, Heiner Kallweit wrote: >> On 09.10.2018 16:40, Chris Clayton wrote: >>> Thanks to Maciej and Heiner for their replies. >>> >>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>> Hi again, >>>>> >>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but >>>>> tried it anyway. I can confirm that the >>>>> regression is still present and my network still fails when, after a >>>>> resume from suspend (to ram or disk), I open my >>>>> browser or my mail client. In both those cases the failure is almost >>>>> immediate - e.g. my home page doesn't get displayed >>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >>>>> quickly but the reported time increases from >>>>> 14-15ms to more than 1000ms. >>>> >>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>> state (before a suspend) and in the broken state (after a resume). >>>> Maybe there will be some obvious in the difference. >>>> >>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>> >>> Maciej suggested comparing the output from lspci -vv for the ethernet >>> device. They are identical. >>> >>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre >>> and post suspend. Again, they are identical. >>> Heiner specifically suggested looking at the RxConfig. The value of that is >>> 0x0002870e both pre and post suspend. >>> >> Hmm, this is very weird, especially taking into account that in your original >> report you state that removing the call to rtl_init_rxcfg() from >> rtl_hw_start() >> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >> register values seem to be the same before and after resume. So how can the >> chip behave differently? >> So far my best guess is that some chip quirk causes it to accept writes to >> register RxConfig, but to misinterpret or ignore the written value. >> So far your report is the only one (affecting RTL8411), but we don't know >> whether other chip versions are affected too. > > Also, it is interesting that even if one removes a call to > rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get > written to moments later by rtl_set_rx_mode(). > > The only chip accesses in the meantime seems to be a write to TxConfig by > rtl_set_tx_config_registers() and then a read of RxConfig plus two writes > to MAR0 earlier in rtl_set_rx_mode(). > > My proposals are: > 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" > in rtl_hw_start(). > Maybe the chip does not like sometimes that RxConfig is written before > TxConfig. > After testing your first proposal, which made no difference, I founf the following in dmesg in the output from dmesg: [ 761.999468] [ cut here ] [ 761.999471] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 761.999483] WARNING: CPU: 0 PID: 8938 at net/sched/sch_generic.c:461 dev_watchdog+0x1e9/0x1f0 [ 761.999484] Modules linked in: btusb btintel r8169 rfcomm bnep iptable_filter xt_conntrack iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 uvcvideo videobuf2_vmalloc videobuf2_memops snd_hda_codec_via videobuf2_v4l2 snd_hda_codec_hdmi snd_hda_codec_generic videobuf2_common usbhid realtek coretemp snd_hda_intel hwmon snd_hda_codec x86_pkg_temp_thermal snd_hwdep libphy snd_hda_core [last unloaded: btintel] [ 761.999503] CPU: 0 PID: 8938 Comm: kworker/0:0 Not tainted 4.19.0-rc7 #328 [ 761.999504] Hardware name: Notebook W65_67SZ /W65_67SZ , BIOS 1.03.05 02/26/2014 [ 761.999508] Workqueue: events rtl_task [r8169] [ 761.999510] RIP: 0010:dev_watchdog+0x1e9/0x1f0 [ 761.999512] Code: 00 48 63 4d e8 eb 99 4c 89 ef c6 05 b6 13 a6 00 01 e8 1b c7 fd ff 89 d9 4c 89 ee 48 c7 c7 40 53 e1 81 48 89 c2 e8 ae f4 a3 ff <0f> 0b eb c0 0f 1f 00 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 [ 761.999513] RSP: 0018:88040f803e98 EFLAGS: 00010282 [ 761.999514] RAX: RBX: RCX: 0006 [ 761.999516] RDX: 0007 RSI: 0096 RDI: 88040f8153d0 [ 761.999517] RBP: 88040ca9a3b8 R08: 813565f0 R09: 034e [ 761.999517] R10: 0007 R11: R12: 88040ca9a39c [ 761.
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
On 09/10/2018 22:39, Heiner Kallweit wrote: > On 09.10.2018 16:40, Chris Clayton wrote: >> Thanks to Maciej and Heiner for their replies. >> >> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>> On 07.10.2018 21:36, Chris Clayton wrote: >>>> Hi again, >>>> >>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but >>>> tried it anyway. I can confirm that the >>>> regression is still present and my network still fails when, after a >>>> resume from suspend (to ram or disk), I open my >>>> browser or my mail client. In both those cases the failure is almost >>>> immediate - e.g. my home page doesn't get displayed >>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >>>> quickly but the reported time increases from >>>> 14-15ms to more than 1000ms. >>> >>> You can try comparing chip registers (ethtool -d eth0) in the working >>> state (before a suspend) and in the broken state (after a resume). >>> Maybe there will be some obvious in the difference. >>> >>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>> >> Maciej suggested comparing the output from lspci -vv for the ethernet >> device. They are identical. >> >> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre >> and post suspend. Again, they are identical. >> Heiner specifically suggested looking at the RxConfig. The value of that is >> 0x0002870e both pre and post suspend. >> >> I've attached files I redirected the outputs to. >> >> Please don't hesitate to ask for any other information needed to solve this >> problem. In the meantime, I've now got >> scripts that stop the network during suspend and restart it during resume. >> (Those scripts were removed whilst I gathered >> the diagnostics shown in the attachments.) >> > I'd like to check whether it may be a timing issue. The following > experimental patch > adds a PCI commit after writing register ChipCmd. Could you please check > whether > it changes anything? > > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 7d3f671e1..f3c359492 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -4641,6 +4641,7 @@ static void rtl_hw_start(struct rtl8169_private *tp) > /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ > RTL_R8(tp, IntrMask); > RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); > + RTL_R8(tp, ChipCmd); > rtl_init_rxcfg(tp); > rtl_set_tx_config_registers(tp); > > Sorry, this patch doesn't make any difference - my network still fails. After a suspend/resume my browsers (chromium and firefox) both fail to open my home page (https://www.google.co.uk). The ping time for one of my ISP's name servers increases from 14-15ms to more than 1000ms, although it after a few pings it does reduce. As the screen grab below shows, the network does eventually fail $ ping NS1 PING ns1 (90.207.238.97): 56 data bytes 64 bytes from 90.207.238.97: icmp_seq=0 ttl=251 time=1017.289 ms 64 bytes from 90.207.238.97: icmp_seq=1 ttl=251 time=1018.051 ms 64 bytes from 90.207.238.97: icmp_seq=2 ttl=251 time=1015.271 ms 64 bytes from 90.207.238.97: icmp_seq=3 ttl=251 time=1015.495 ms 64 bytes from 90.207.238.97: icmp_seq=6 ttl=251 time=1015.646 ms 64 bytes from 90.207.238.97: icmp_seq=7 ttl=251 time=1022.609 ms 64 bytes from 90.207.238.97: icmp_seq=8 ttl=251 time=1015.612 ms 64 bytes from 90.207.238.97: icmp_seq=10 ttl=251 time=1015.551 ms 64 bytes from 90.207.238.97: icmp_seq=12 ttl=251 time=1015.446 ms 64 bytes from 90.207.238.97: icmp_seq=13 ttl=251 time=1015.657 ms 64 bytes from 90.207.238.97: icmp_seq=14 ttl=251 time=1015.614 ms 64 bytes from 90.207.238.97: icmp_seq=15 ttl=251 time=1015.651 ms 64 bytes from 90.207.238.97: icmp_seq=17 ttl=251 time=1015.459 ms 64 bytes from 90.207.238.97: icmp_seq=18 ttl=251 time=1015.443 ms 64 bytes from 90.207.238.97: icmp_seq=19 ttl=251 time=1015.936 ms 64 bytes from 90.207.238.97: icmp_seq=20 ttl=251 time=1015.681 ms 64 bytes from 90.207.238.97: icmp_seq=22 ttl=251 time=1015.410 ms 64 bytes from 90.207.238.97: icmp_seq=23 ttl=251 time=1015.487 ms 64 bytes from 90.207.238.97: icmp_seq=24 ttl=251 time=1016.169 ms 64 bytes from 90.207.238.97: icmp_seq=25 ttl=251 time=1015.659 ms 64 bytes from 90.207.238.97: icmp_seq=26 ttl=251 time=14.606 ms 64 bytes from 90.207.238.97: icmp_seq=30 ttl=251 time=32.765 ms 64 bytes from 90.207.238.97: icmp_seq=31 ttl=251 time=115.052 ms 64 bytes from 90.207.238.97: icmp_seq=33 ttl=25
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Thanks to Maciej and Heiner for their replies. On 09/10/2018 13:32, Maciej S. Szmigiero wrote: > On 07.10.2018 21:36, Chris Clayton wrote: >> Hi again, >> >> I didn't think there was anything in 4.19-rc7 to fix this regression, but >> tried it anyway. I can confirm that the >> regression is still present and my network still fails when, after a resume >> from suspend (to ram or disk), I open my >> browser or my mail client. In both those cases the failure is almost >> immediate - e.g. my home page doesn't get displayed >> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >> quickly but the reported time increases from >> 14-15ms to more than 1000ms. > > You can try comparing chip registers (ethtool -d eth0) in the working > state (before a suspend) and in the broken state (after a resume). > Maybe there will be some obvious in the difference. > > The same goes for the PCI configuration (lspci -d :8168 -vv). > Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical. Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical. Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend. I've attached files I redirected the outputs to. Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered the diagnostics shown in the attachments.) Chris >> Chris > > Maciej > ethtool -d eth0 === RealTek RTL8411 registers: 0x00: MAC Address 80:fa:5b:08:d0:3d 0x08: Multicast Address Filter 0x 0x0080 0x10: Dump Tally Counter Command 0x0c2ec000 0x0004 0x20: Tx Normal Priority Ring Addr 0x07a0a000 0x0004 0x28: Tx High Priority Ring Addr 0x 0x 0x30: Flash memory read/write 0x 0x34: Early Rx Byte Count 0 0x36: Early Rx Status 0x00 0x37: Command 0x0c Rx on, Tx on 0x3C: Interrupt Mask 0x803f SERR LinkChg RxNoBuf TxErr TxOK RxErr RxOK 0x3E: Interrupt Status0x 0x40: Tx Configuration0x4b800f80 0x44: Rx Configuration0x0002870e 0x48: Timer count 0x 0x4C: Missed packet counter 0x00 0x50: EEPROM Command0x10 0x51: Config 0 0x00 0x52: Config 1 0xcf 0x53: Config 2 0x3c 0x54: Config 3 0x60 0x55: Config 4 0x10 0x56: Config 5 0x02 0x58: Timer interrupt 0x 0x5C: Multiple Interrupt Select 0x 0x60: PHY access 0x80040de1 0x64: TBI control and status 0x2701 0x68: TBI Autonegotiation advertisement (ANAR)0xf70c 0x6A: TBI Link partner ability (LPAR) 0x0002 0x6C: PHY status0xeb 0x84: PM wakeup frame 00x 0x 0x8C: PM wakeup frame 10x 0x 0x94: PM wakeup frame 2 (low) 0x 0x 0x9C: PM wakeup frame 2 (high) 0x 0x 0xA4: PM wakeup frame 3 (low) 0x 0x 0xAC: PM wakeup frame 3 (high) 0x 0x 0xB4: PM wakeup frame 4 (low) 0x 0x 0xBC: PM wakeup frame 4 (high) 0x 0x 0xC4: Wakeup frame 0 CRC 0x 0xC6: Wakeup frame 1 CRC 0x 0xC8: Wakeup frame 2 CRC 0x 0xCA: Wakeup frame 3 CRC 0x 0xCC: Wakeup frame 4 CRC 0x 0xDA: RX packet maximum size 0x4000 0xE0: C+ Command 0x20e1 VLAN de-tagging RX checksumming 0xE2: Interrupt Mitigation0x5151 TxTimer: 5 TxPackets: 1 RxTimer: 5 RxPackets: 1 0xE4: Rx Ring Addr 0x07935000 0x0004 0xEC: Early Tx threshold0x27 0xF0: Func Event 0x0040003f 0xF4: Func Event Mask 0x 0xF8: Func Preset State 0x00031eff 0xFC: Func For
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Hi again, I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from 14-15ms to more than 1000ms. Chris On 04/10/2018 09:41, Chris Clayton wrote: > Hi Heiner, > > Here's the reply to your questions. Sorry for the delay. > > On 28/09/2018 23:13, Heiner Kallweit wrote: >> On 29.09.2018 00:00, Chris Clayton wrote: >>> Thanks Maciej. >>> >>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>>> Hi, >>>> >>>>> Hi, >>>>> >>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing >>>>> network problems after resuming from a >>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>>> >>>>> The pattern of the problem is that when I first boot, the network is >>>>> fine. But, after resume from suspend I find that >>>>> the time taken for a ping of one of my ISP's nameservers increases from >>>>> 14-15ms to more than 1000ms. Moreover, when I >>>>> open a browser (chromium or firefox), it fails to retrieve my home page >>>>> (https://www.google.co.uk) and pings of the >>>>> nameserver fail with the message "Destination Host Unreachable". Often, I >>>>> can revive the network by stopping it with >>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 >>>>> module and load it again. >>>> >>>> Please have a look at the following thread: >>>> https://lkml.org/lkml/2018/9/25/1118 >>>> >>> >>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the >>> problem is not solved by it. Similarly, I applied >>> Heiner's patch to the 4.19, but again the problem is not solved. >>> >> I think we talk about two different issues here. The one the fix is for has >> no link to suspend/resume. >> >> Chris, the lspci output doesn't provide enough detail to determine the exact >> chip version. >> Can you provide the dmesg part with the XID? > > $ dmesg | grep r8169 > [5.274938] libphy: r8169: probed > [5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID > 48800800, IRQ 29 > [5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, > tx checksumming: ko] > [9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver > [RTL8211E Gigabit Ethernet] > (mii_bus:phy_addr=r8169-502:00, irq=IGNORE) > [9.460876] r8169 :05:00.2 eth0: No native access to PCI extended > config space, falling back to CSI > [ 11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow > control rx/tx > >> According to your lspci output neither MSI nor MSI-X is active. >> Do you have to use nomsi for whatever reason? >> > > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% > sure that it used to be - I've no idea how > it got dropped. If I'm not sure about an option, I start by taking the > recommendation in the kconfig help. Help on MSI > has a very clear "say Y". I've re-enabled it now. > > Chris > >> Heiner >> >>>> Maciej >>>> >>> Chris >>> >> >>
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Hi Heiner, Here's the reply to your questions. Sorry for the delay. On 28/09/2018 23:13, Heiner Kallweit wrote: > On 29.09.2018 00:00, Chris Clayton wrote: >> Thanks Maciej. >> >> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>> Hi, >>> >>>> Hi, >>>> >>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing >>>> network problems after resuming from a >>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>> >>>> The pattern of the problem is that when I first boot, the network is fine. >>>> But, after resume from suspend I find that >>>> the time taken for a ping of one of my ISP's nameservers increases from >>>> 14-15ms to more than 1000ms. Moreover, when I >>>> open a browser (chromium or firefox), it fails to retrieve my home page >>>> (https://www.google.co.uk) and pings of the >>>> nameserver fail with the message "Destination Host Unreachable". Often, I >>>> can revive the network by stopping it with >>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 >>>> module and load it again. >>> >>> Please have a look at the following thread: >>> https://lkml.org/lkml/2018/9/25/1118 >>> >> >> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem >> is not solved by it. Similarly, I applied >> Heiner's patch to the 4.19, but again the problem is not solved. >> > I think we talk about two different issues here. The one the fix is for has > no link to suspend/resume. > > Chris, the lspci output doesn't provide enough detail to determine the exact > chip version. > Can you provide the dmesg part with the XID? $ dmesg | grep r8169 [5.274938] libphy: r8169: probed [5.276563] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 [5.278158] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-502:00, irq=IGNORE) [9.460876] r8169 :05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 11.005336] r8169 :05:00.2 eth0: Link is Up - 100Mbps/Full - flow control rx/tx > According to your lspci output neither MSI nor MSI-X is active. > Do you have to use nomsi for whatever reason? > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI has a very clear "say Y". I've re-enabled it now. Chris > Heiner > >>> Maciej >>> >> Chris >> > >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Sorry, sent by accident. Note to self - don't attempt email until after second cup of coffee. On 29/09/2018 08:25, Chris Clayton wrote: > > > On 28/09/2018 23:13, Heiner Kallweit wrote: >> On 29.09.2018 00:00, Chris Clayton wrote: >>> Thanks Maciej. >>> >>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>>> Hi, >>>> >>>>> Hi, >>>>> >>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing >>>>> network problems after resuming from a >>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>>> >>>>> The pattern of the problem is that when I first boot, the network is >>>>> fine. But, after resume from suspend I find that >>>>> the time taken for a ping of one of my ISP's nameservers increases from >>>>> 14-15ms to more than 1000ms. Moreover, when I >>>>> open a browser (chromium or firefox), it fails to retrieve my home page >>>>> (https://www.google.co.uk) and pings of the >>>>> nameserver fail with the message "Destination Host Unreachable". Often, I >>>>> can revive the network by stopping it with >>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 >>>>> module and load it again. >>>> >>>> Please have a look at the following thread: >>>> https://lkml.org/lkml/2018/9/25/1118 >>>> >>> >>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the >>> problem is not solved by it. Similarly, I applied >>> Heiner's patch to the 4.19, but again the problem is not solved. >>> >> I think we talk about two different issues here. The one the fix is for has >> no link to suspend/resume. >> >> Chris, the lspci output doesn't provide enough detail to determine the exact >> chip version. >> Can you provide the dmesg part with the XID? I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable series kernel in which eth0 continues to function reliably after a suspend/resume cycle. The second dmesg output below is taken from that kernel. The first one was from an up-to-date 4.19 kernel > > $ dmesg | grep -i r8169 > [5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > [5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM > control > [5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID > 48800800, IRQ 19 > [5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, > tx checksumming: ko] > [ 10.232077] r8169 :05:00.2 eth0: No native access to PCI extended > config space, falling back to CSI > [ 10.235218] r8169 :05:00.2 eth0: link down > [ 11.717460] r8169 :05:00.2 eth0: link up > > $ dmesg | grep -i r8169 > [5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > [5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM > control > [5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID > 48800800, IRQ 29 > [5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, > tx checksumming: ko] > [ 10.456081] r8169 :05:00.2 eth0: No native access to PCI extended > config space, falling back to CSI > [ 10.459217] r8169 :05:00.2 eth0: link down > [ 10.459880] r8169 :05:00.2 eth0: link down > [ 12.015158] r8169 :05:00.2 eth0: link up > > >> According to your lspci output neither MSI nor MSI-X is active. >> Do you have to use nomsi for whatever reason? > > No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% > sure that it used to be - I've no idea how > it got dropped. If I'm not sure about an option, I start by taking the > recommendation in the kconfig help. Help on MSI > has a very clear "say Y". As I said above I have re-enabled MSI. > >> >> Heiner >> >>>> Maciej >>>> >>> Chris >>> >>
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
On 28/09/2018 23:13, Heiner Kallweit wrote: > On 29.09.2018 00:00, Chris Clayton wrote: >> Thanks Maciej. >> >> On 28/09/2018 16:54, Maciej S. Szmigiero wrote: >>> Hi, >>> >>>> Hi, >>>> >>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing >>>> network problems after resuming from a >>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >>>> >>>> The pattern of the problem is that when I first boot, the network is fine. >>>> But, after resume from suspend I find that >>>> the time taken for a ping of one of my ISP's nameservers increases from >>>> 14-15ms to more than 1000ms. Moreover, when I >>>> open a browser (chromium or firefox), it fails to retrieve my home page >>>> (https://www.google.co.uk) and pings of the >>>> nameserver fail with the message "Destination Host Unreachable". Often, I >>>> can revive the network by stopping it with >>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 >>>> module and load it again. >>> >>> Please have a look at the following thread: >>> https://lkml.org/lkml/2018/9/25/1118 >>> >> >> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem >> is not solved by it. Similarly, I applied >> Heiner's patch to the 4.19, but again the problem is not solved. >> > I think we talk about two different issues here. The one the fix is for has > no link to suspend/resume. > > Chris, the lspci output doesn't provide enough detail to determine the exact > chip version. > Can you provide the dmesg part with the XID? $ dmesg | grep -i r8169 [5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [5.321432] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM control [5.322892] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 19 [5.323786] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 10.232077] r8169 :05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 10.235218] r8169 :05:00.2 eth0: link down [ 11.717460] r8169 :05:00.2 eth0: link up $ dmesg | grep -i r8169 [5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [5.208677] r8169 :05:00.2: can't disable ASPM; OS doesn't have ASPM control [5.210066] r8169 :05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29 [5.210676] r8169 :05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 10.456081] r8169 :05:00.2 eth0: No native access to PCI extended config space, falling back to CSI [ 10.459217] r8169 :05:00.2 eth0: link down [ 10.459880] r8169 :05:00.2 eth0: link down [ 12.015158] r8169 :05:00.2 eth0: link up > According to your lspci output neither MSI nor MSI-X is active. > Do you have to use nomsi for whatever reason? No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI has a very clear "say Y". > > Heiner > >>> Maciej >>> >> Chris >> >
Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)
Thanks Maciej. On 28/09/2018 16:54, Maciej S. Szmigiero wrote: > Hi, > >> Hi, >> >> I upgraded my kernel to 4.18.10 recently and have since been experiencing >> network problems after resuming from a >> suspend to RAM or disk. I previously had 4.18.6 and that was OK. >> >> The pattern of the problem is that when I first boot, the network is fine. >> But, after resume from suspend I find that >> the time taken for a ping of one of my ISP's nameservers increases from >> 14-15ms to more than 1000ms. Moreover, when I >> open a browser (chromium or firefox), it fails to retrieve my home page >> (https://www.google.co.uk) and pings of the >> nameserver fail with the message "Destination Host Unreachable". Often, I >> can revive the network by stopping it with >> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 >> module and load it again. > > Please have a look at the following thread: > https://lkml.org/lkml/2018/9/25/1118 > I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied Heiner's patch to the 4.19, but again the problem is not solved. > Maciej > Chris
Re: [PATCH V2] mfd: rtsx: release IRQ during shutdown
On 03/01/18 12:32, Sinan Kaya wrote: > 'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during > shutdown")' revealed a resource leak in rtsx_pci driver during shutdown. > > Issue shows up as a warning during shutdown as follows: > > remove_proc_entry: removing non-empty directory 'irq/17', leaking at least > 'rtsx_pci' > WARNING: CPU: 0 PID: 1578 at fs/proc/generic.c:572 > remove_proc_entry+0x11d/0x130 > Modules linked in > ... > Call Trace: > unregister_irq_proc > free_desc > irq_free_descs > mp_unmap_irq > acpi_unregister_gsi_apic > acpi_pci_irq_disable > do_pci_disable_device > pci_disable_device > device_shutdown > kernel_restart > Sys_reboot > > Even though rtsx_pci driver implements a shutdown callback, it is not > releasing the interrupt that it registered during probe. This is causing > the ACPI layer to complain that the shared IRQ is in use while freeing > IRQ. > > This code releases the IRQ to prevent resource leak and eliminate the > warning. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=198141 > Reported-by: Chris Clayton > Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown") > Signed-off-by: Sinan Kaya > --- > drivers/mfd/rtsx_pcr.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c > index 590fb9a..c3ed885 100644 > --- a/drivers/mfd/rtsx_pcr.c > +++ b/drivers/mfd/rtsx_pcr.c > @@ -1543,6 +1543,9 @@ static void rtsx_pci_shutdown(struct pci_dev *pcidev) > rtsx_pci_power_off(pcr, HOST_ENTER_S1); > > pci_disable_device(pcidev); > + free_irq(pcr->irq, (void *)pcr); > + if (pcr->msi_en) > + pci_disable_msi(pcr->pci); > } > > #else /* CONFIG_PM */ I've applied v2 of the patch and built and installed the kernel (-rc6). All I can say, is that my system still closes down without the warning and call trace that the unpatched kernel produces. It's the best I can do by way of a test because I have no idea what the code added in v2 is supposed to achieve and, because my system shuts down (or reboots) moments later, there is no opportunity to check. If that constitutes a valid test: Tested-by: Chris Clayton >
Re: Oops on 4.15-rc[123] on shutdown/reboot
On 11/12/17 17:17, Bjorn Helgaas wrote: > [+cc linux-pci] > > On Mon, Dec 11, 2017 at 11:29:50AM -0500, Sinan Kaya wrote: >> Hi Chris, >> >>> >>> I'm more than happy to provide additional diagnostics and test proposed >>> fixes. As a starter for ten, I've attached the >>> output from 'lspci -v'. If, however, you need to see the backtrace, I'll >>> need some advice on how to capture that. >>> >> >> Can you open a bugzilla and also share the boot log? >> >> There must be something unique about your system. > > Can you attach "lspci -vv" output (as root) to the bugzilla, too? > I've opened the bugzilla report (Bug 198141) and attached the dmesg and lspci -vv outputs to it.
Re: Oops on 4.15-rc[123] on shutdown/reboot
On 11/12/17 17:24, Sinan Kaya wrote: > On 12/11/2017 12:06 PM, Chris Clayton wrote: >> Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and >> add this and the lspci output that I sent with >> my original repoart. > > This was helpful. I don't see any AER/DPC in your log. It looks like the only > PCIe > portdrv service you have is PME. > > Can we do a quick hack and return immediately from > > static int pcie_pme_probe(struct pcie_device *srv) > > by putting return 0; at the top. > > Same thing in > > static void pcie_pme_remove(struct pcie_device *srv) > > just place a return at the top. > I made those changes (to drivers/pci/pcie/pme.c) and built and installed the kernel. Sorry, but I still get the oops when I reboot. > I'm hoping your problem will go away after this. Then, we can start peeling > the onion. >
Re: Oops on 4.15-rc[123] on shutdown/reboot
On 11/12/17 16:29, Sinan Kaya wrote: > Hi Chris, > >> >> I'm more than happy to provide additional diagnostics and test proposed >> fixes. As a starter for ten, I've attached the >> output from 'lspci -v'. If, however, you need to see the backtrace, I'll >> need some advice on how to capture that. >> > > Can you open a bugzilla and also share the boot log? > Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and add this and the lspci output that I sent with my original repoart. > There must be something unique about your system. > > Sinan > [0.00] Linux version 4.15.0-rc3 (chris@laptop) (gcc version 7.2.1 20171207 (GCC)) #398 SMP PREEMPT Mon Dec 11 07:46:20 GMT 2017 [0.00] Command line: ro root=/dev/sda2 resume=/dev/sda6 rootfstype=ext4 net.ifnames=0 [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x0100-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable [0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS [0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable [0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved [0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable [0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved [0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable [0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS [0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved [0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable [0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] random: fast init done [0.00] SMBIOS 2.7 present. [0.00] DMI: Notebook W65_67SZ/W65_67SZ, BIOS 1.03.05 02/26/2014 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask 7C write-back [0.00] 1 base 04 mask 7FE000 write-back [0.00] 2 base 00E000 mask 7FE000 uncachable [0.00] 3 base 00DE00 mask 7FFE00 uncachable [0.00] 4 base 00DD00 mask 7FFF00 uncachable [0.00] 5 base 041FE0 mask 7FFFE0 uncachable [0.00] 6 disabled [0.00] 7 disabled [0.00] 8 disabled [0.00] 9 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.00] e820: update [mem 0xdd00-0x] usable ==> reserved [0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at [(ptrval)] [0.00] Base memory trampoline at [(ptrval)] 97000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x02098000, 0x02098fff] PGTABLE [0.00] BRK [0x02099000, 0x02099fff] PGTABLE [0.00] BRK [0x0209a000, 0x0209afff] PGTABLE [0.00] BRK [0x0209b000, 0x0209bfff] PGTABLE [0.00] BRK [0x0209c000, 0x0209cfff] PGTABLE [0.00] BRK [0x0209d000, 0x0209dfff] PGTABLE [0.00] ACPI: Early tabl
Oops on 4.15-rc[123] on shutdown/reboot
I've been getting an oops when shutting down my laptop (with /sbin/halt) or rebooting it (/sbin/reboot or /usr/sbin/kexec). Unfortunately, I can't provide the backtrace because it is on the screen for only a moment before the system shuts down/reboots. I have however, bisected it and the outcome is: cc27b735ad3a75574a6ab1a66ed6b09385e77e5e is the first bad commit commit cc27b735ad3a75574a6ab1a66ed6b09385e77e5e Author: Sinan Kaya Date: Wed Oct 25 15:01:02 2017 -0400 PCI/portdrv: Turn off PCIe services during shutdown Some of the PCIe services such as AER are being left enabled during shutdown. This might cause spurious AER errors while SOC is being powered down. Clean up the PCIe services gracefully during shutdown to clear these false positives. Signed-off-by: Sinan Kaya Signed-off-by: Bjorn Helgaas :04 04 5a827d6956c581344a0bf392e30155c337673c1d 76c6a39b53604a0a0a370383c3503f80aa7cbc1e M drivers I'm confident that this is the correct outcome because a kernel built with the preceding commit (6018182d3158505f11103adaee8ffb53424df986) does not oops. Nor does -rc3 with the patch reversed. I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that. Chris 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06) Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 16 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Capabilities: [88] Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Kernel driver in use: pcieport 00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller]) Subsystem: CLEVO/KAPOK Computer 4th Gen Core Processor Integrated Graphics Controller Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f780 (64-bit, non-prefetchable) [size=4M] Memory at e000 (64-bit, prefetchable) [size=256M] I/O ports at f000 [size=64] [virtual] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [a4] PCI Advanced Features Kernel driver in use: i915 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f7f14000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) (prog-if 30 [XHCI]) Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB xHCI Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at f7f0 (64-bit, non-prefetchable) [size=64K] Capabilities: [70] Power Management version 2 Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+ Kernel driver in use: xhci_hcd 00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family MEI Controller Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at f7f1e000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+ 00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05) (prog-if 20 [EHCI]) Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB EHCI Flags: bus master, medium devsel, laten
"PM / QoS: Fix device resume latency PM QoS" breaks sound
Hi, I pulled the latestchanges from Linus' tree this evening and have found that with the new kernel, sound is not working on my laptop. More precisely, the built-in speakers don't produce any sound. Sound does work when I use ear-plugs in the headphone socket. It also works via a bluetooth speaker. I've bisected the problem and ended up at: 0cc2b4e5a020fc7f4d1795741c116c983e9467d7 is the first bad commit commit 0cc2b4e5a020fc7f4d1795741c116c983e9467d7 Author: Rafael J. Wysocki Date: Tue Oct 24 15:20:45 2017 +0200 PM / QoS: Fix device resume latency PM QoS The special value of 0 for device resume latency PM QoS means "no restriction", but there are two problems with that. First, device resume latency PM QoS requests with 0 as the value are always put in front of requests with positive values in the priority lists used internally by the PM QoS framework, causing 0 to be chosen as an effective constraint value. However, that 0 is then interpreted as "no restriction" effectively overriding the other requests with specific restrictions which is incorrect. Second, the users of device resume latency PM QoS have no way to specify that *any* resume latency at all should be avoided, which is an artificial limitation in general. To address these issues, modify device resume latency PM QoS to use S32_MAX as the "no constraint" value and 0 as the "no latency at all" one and rework its users (the cpuidle menu governor, the genpd QoS governor and the runtime PM framework) to follow these changes. Also add a special "n/a" value to the corresponding user space I/F to allow user space to indicate that it cannot accept any resume latencies at all for the given device. Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints) Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323 Reported-by: Reinette Chatre Tested-by: Reinette Chatre Signed-off-by: Rafael J. Wysocki Acked-by: Alex Shi Cc: All applicable :04 04 f0c128ec799bb9894cfc5c341f88ad7bdfb15bac 9a2e8171ca47f864bd534cd9c160cce58449a889 M Documentation :04 04 0028ffec81675e686bdd621c0445d3e814d7980c 29db53c6356a6fed9c8bdbc2d6bc7bd56a96e529 M drivers :04 04 2e66b79bd2ffb4fcb00f04a69a0afe5c80d1d3f3 dd6d8e90b59389cd2bd8a0c92716d79d2eeb8268 M include With that change reverted, the speakers emit sound again. The audio devices identified by "lspci -vv" are as follows: 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) Subsystem: CLEVO/KAPOK Computer Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR-
Re: [PATCH 4.12 004/106] scsi: sg: fix SG_DXFER_FROM_DEV transfers
On 09/08/17 17:51, Greg Kroah-Hartman wrote: > 4.12-stable review patch. If anyone has any objections, please let me know. > > --- I repeat my comments when the patch was queued for stable: 1. Johannes' commit message says that the transfer must have a length bigger than 0, so the code should return false if the length is less than or equal to 0, but the test is for less than 0. 2. But in any case, there's another patch that removes all this sg_is_valid_dxfer() jiggery-pokery and replaces it with a simpler test. It hasn't reached Linus' tree yet but is, I believe, cc'd to stable. As Johannes said in response to the second of my comments, the patch that replaces sg_is_valid_dxfer() with a simpler test is now in Linus' tree - commit f930c7043663188429cd9b254e9d761edfc101ce. Without that change, I think there is still some breakage in sg. Chris --- > > From: Johannes Thumshirn > > commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 upstream. > > SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set > it to NULL for the old sg_io read/write interface, but must have a > length bigger than 0. This fixes a regression introduced by commit > 28676d869bbb ("scsi: sg: check for valid direction before starting the > request") > > Signed-off-by: Johannes Thumshirn > Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the > request") > Reported-by: Chris Clayton > Tested-by: Chris Clayton > Cc: Douglas Gilbert > Reviewed-by: Hannes Reinecke > Tested-by: Chris Clayton > Acked-by: Douglas Gilbert > Signed-off-by: Martin K. Petersen > Signed-off-by: Greg Kroah-Hartman > > --- > drivers/scsi/sg.c |5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_ > if (hp->dxferp || hp->dxfer_len > 0) > return false; > return true; > - case SG_DXFER_TO_DEV: > case SG_DXFER_FROM_DEV: > + if (hp->dxfer_len < 0) > + return false; > + return true; > + case SG_DXFER_TO_DEV: > case SG_DXFER_TO_FROM_DEV: > if (!hp->dxferp || hp->dxfer_len == 0) > return false; > >
Re: [PATCH v2] scsi: sg: fix SG_DXFER_FROM_DEV transfers
On 07/07/17 09:56, Johannes Thumshirn wrote: > SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set > it to NULL for the old sg_io read/write interface, but must have a length > bigger than 0. This fixes a regression introduced by commit 28676d869bbb > ("scsi: sg: check for valid direction before starting the request") > I've tested this new patch and the Nero applications can still find the optical drives on my laptop. Tested-by: Chris Clayton > Signed-off-by: Johannes Thumshirn > Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the > request") > Reported-by: Chris Clayton > Tested-by: Chris Clayton > Cc: Douglas Gilbert > Reviewed-by: Hannes Reinecke > --- > Changes to v1: > * Fix breakage of the sg_io v3 interface, verified using sg_inq > > drivers/scsi/sg.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c > index 21225d62b0c1..1e82d4128a84 100644 > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -758,8 +758,11 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp) > if (hp->dxferp || hp->dxfer_len > 0) > return false; > return true; > - case SG_DXFER_TO_DEV: > case SG_DXFER_FROM_DEV: > + if (hp->dxfer_len < 0) > + return false; > + return true; > + case SG_DXFER_TO_DEV: > case SG_DXFER_TO_FROM_DEV: > if (!hp->dxferp || hp->dxfer_len == 0) > return false; >
4.7.0-rc7+: Oops during boot with USB pen drive inserted
Hi, With Linus' latest and greatest, I get an opps when I boot my laptop with a pen drive inserted in any USB port. The oops message is: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2) The oops seems to be 100% repeatable. If a USB pen drive is not inserted, the laptop boots successfully. I've taken a photograph of the oops and you can view it at http://s714.photobucket.com/user/chris2553/media/IMG_20160722_053841.jpg.html. At the top of the picture, I notice that the partitions on my actual boot disk are being reported as being on /dev/sdb, so it seems likely that, at this point, the pen drive is being seen as /dev/sda, although that has scrolled off the screen. I don't boot via a ramdisk - my kernel has ext4 built in. The grub2 entry is: menuentry "Krisux, Linux 4.7.0-rc7+" { insmod ext2 set root=(hd0,2) linux /boot/vmlinuz-4.7.0-rc7+ ro root=/dev/sda2 resume=/dev/sda6 rootfstype=ext4 net.ifnames=0 } (BTW, Krisux is not a real distro - it's just the name I have given Linux from Scratch system.) The stack that is dumped is: dump_stack panic printk mount_block_root prepare_namespace kernel_init_freeable kernel_init ret_from_fork rest_init I realise I could work around this by specifying the boot partition by, say, its UUID, but I thought you would want me to report this anyway. Of course, I'm happy to provide any other information required and to test any fix, but I will be out and about for the next 14-16 hours, so it will be later tonight or maybe even tomorrow before I can respond. Chris
Re: 4.5 Regression - mouse not working after resume from suspend
I can only assume that although a non-working bluetooth mouse is a symptom of this regression, the silence of the bluetooth folks is because the fault does not lie in the BT subsystem. Consequently, I'm transferring the problem back to LKML in the hope that someone else can solve the problem. On 15/02/16 23:40, Chris Clayton wrote: > Hi, > > Is there anything else I can do to help diagnose this regression. > > To summarise my BT mouse does not work after resuming from suspend to disk or > ram. IT works perfectly in earlier 4.4, > 4.3 and 4.2 kernels. I bisected and found the first bad commit is: > > 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit > commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd > Author: Johan Hedberg > Date: Wed Nov 25 16:15:44 2015 +0200 > > Bluetooth: Perform HCI update for power on synchronously > > Johan requested additional information, which I provided. Checking the > archive at marc.info, it seems the mail didn't > make it to the mailing list. Maybe it exceeded a size limit, I don't know. > Anyway I copied the mail to Johan and Marcel. > > A bit more experimentation revealed that I can reactivate the mouse if I > restart the bluetooth daemon after the machine > resumes. > > Please let me know if I can provide anything else. > > Thanks > > Chris > > On 06/02/16 15:23, Chris Clayton wrote: >> Hi Johan, >> >> The information you requested has been captured from v4.5-rc2-340-g5af9c2e >> and is included below. >> >> On 06/02/16 14:33, Johan Hedberg wrote: >>> Hi Chris, >>> >>> On Sat, Feb 06, 2016, Chris Clayton wrote: >>>> On 06/02/16 11:38, Chris Clayton wrote: >>>>> On 06/02/16 08:37, Chris Clayton wrote: >>>>>> There seems to be a regression in resuming my laptop from a suspend >>>>>> to RAM or disk. The symptom is that my bluetooth >>>>>> mouse doesn't work after the resume. The kernel is built after a >>>>>> pull of Linus' tree this morning (v4.5-rc2-340-g5af9c2e). >>>>>> >>>>>> Attached is the output from dmesg showing the boot, suspend (to >>>>>> RAM) and resume. You'll see that during the resume, >>>>>> error -517 is being reported for some devices. Suspend/resume has worked >>>>>> perfectly with a 4.[234].x kernels. >>>>>> >>>>>> I'll start a bisection, but thought I'd give a heads up in case >>>>>> someone can see the problem before I get done with the >>>>>> bisect. >>>>> >>>>> The bisection ended up at: >>>>> >>>>> 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit >>>>> commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd >>>>> Author: Johan Hedberg >>>>> Date: Wed Nov 25 16:15:44 2015 +0200 >>>>> >>>>> Bluetooth: Perform HCI update for power on synchronously >>>>> >>>>> The request to update HCI during power on is always coming either from >>>>> hdev->req_workqueue or through an ioctl, so it's safe to use >>>>> hci_req_sync for it. This way we also eliminate potential races with >>>>> incoming mgmt commands or other actions while powering on. >>>>> >>>>> Part of this refactoring is the splitting of mgmt_powered() into >>>>> mgmt_power_on() and __mgmt_power_off() functions. The main reason is >>>>> the different requirements as far as hdev locking is concerned, as >>>>> highlighted with the __ prefix of the power off API. >>>>> >>>>> Since the power on in the case of clearing the AUTO_OFF flag cannot be >>>>> done synchronously in the set_powered mgmt handler, the hci_power_on >>>>> work callback is extended to cover this (which also simplifies the >>>>> set_powered helper a lot). >>>>> >>>>> Signed-off-by: Johan Hedberg >>>>> Signed-off-by: Marcel Holtmann >>>>> >>>>> :04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 >>>>> a1eff79cec3ee7208e5aa200ab5069726bbeea8e M include >>>>> :04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 >>>>> 0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M net >>>> >>>> I've just built a kernel at bf943cbf76ecd3b9838a80d5e08777b0f4ccc665 >>>&g
Re: 4.5 Regression - mouse not working after resume from suspend
On 06/02/16 11:38, Chris Clayton wrote: > > > On 06/02/16 08:37, Chris Clayton wrote: >> There seems to be a regression in resuming my laptop from a suspend to RAM >> or disk. The symptom is that my bluetooth >> mouse doesn't work after the resume. The kernel is built after a pull of >> Linus' tree this morning (v4.5-rc2-340-g5af9c2e). >> >> Attached is the output from dmesg showing the boot, suspend (to RAM) and >> resume. You'll see that during the resume, >> error -517 is being reported for some devices. Suspend/resume has worked >> perfectly with a 4.[234].x kernels. >> >> I'll start a bisection, but thought I'd give a heads up in case someone can >> see the problem before I get done with the >> bisect. >> > > The bisection ended up at: > > 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit > commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd > Author: Johan Hedberg > Date: Wed Nov 25 16:15:44 2015 +0200 > > Bluetooth: Perform HCI update for power on synchronously > > The request to update HCI during power on is always coming either from > hdev->req_workqueue or through an ioctl, so it's safe to use > hci_req_sync for it. This way we also eliminate potential races with > incoming mgmt commands or other actions while powering on. > > Part of this refactoring is the splitting of mgmt_powered() into > mgmt_power_on() and __mgmt_power_off() functions. The main reason is > the different requirements as far as hdev locking is concerned, as > highlighted with the __ prefix of the power off API. > > Since the power on in the case of clearing the AUTO_OFF flag cannot be > done synchronously in the set_powered mgmt handler, the hci_power_on > work callback is extended to cover this (which also simplifies the > set_powered helper a lot). > > Signed-off-by: Johan Hedberg > Signed-off-by: Marcel Holtmann > > :04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 > a1eff79cec3ee7208e5aa200ab5069726bbeea8e M include > :04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 > 0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M net > > I've just built a kernel at bf943cbf76ecd3b9838a80d5e08777b0f4ccc665 (the commit prior to the one the bisect landed on) and my BT mouse works fine after a suspend/resume. With a kernel built at 2ff13894cfb877cb3d02d96a8402202f0a6f3efd, the mouse does not work after resume. > The bisect log is: > > git bisect start > # bad: [5af9c2e19da6514a1a50b07d97d93b74a7711873] Merge branch 'akpm' > (patches from Andrew) > git bisect bad 5af9c2e19da6514a1a50b07d97d93b74a7711873 > # good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 > git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc > # bad: [63b6da39bb38e8f1a1ef3180d32a39d6baf9da84] perf: Fix > perf_event_exit_task() race > git bisect bad 63b6da39bb38e8f1a1ef3180d32a39d6baf9da84 > # bad: [aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7 > # good: [60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af] Merge tag > 'upstream-4.5-rc1' of git://git.infradead.org/linux-ubifs > git bisect good 60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af > # bad: [a188222b6ed29404ac2d4232d35d1fe0e77af370] net: Rename > NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK > git bisect bad a188222b6ed29404ac2d4232d35d1fe0e77af370 > # good: [1343c65f70ee1b1f968a08b30e1836a4e37116cd] fm10k: always check > init_hw for errors > git bisect good 1343c65f70ee1b1f968a08b30e1836a4e37116cd > # good: [bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9] Merge branch > 'for-4.5-ancestor-test' of > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup > git bisect good bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9 > # good: [a4fcad656e1100bdda9b0b752b93a1a276810469] fm10k: whitespace cleanups > git bisect good a4fcad656e1100bdda9b0b752b93a1a276810469 > # bad: [7302b9d90117496049dd4bfa28755f7c2ed55b27] ieee802154/adf7242: Driver > for ADF7242 MAC IEEE802154 > git bisect bad 7302b9d90117496049dd4bfa28755f7c2ed55b27 > # bad: [a0c38245153abe1fd844af9b166d1a5d5dafe7b1] Bluetooth: hci_intel: Use > shorter timeout for HCI commands > git bisect bad a0c38245153abe1fd844af9b166d1a5d5dafe7b1 > # good: [bf943cbf76ecd3b9838a80d5e08777b0f4ccc665] Bluetooth: Move fast > connectable code to hci_request.c > git bisect good bf943cbf76ecd3b9838a80d5e08777b0f4ccc665 > # bad: [742c59516822f4a4bc23b0961d88c569a7f1bf71] Bluetooth: Simplify setting > Configuration Field > git bisect bad 742c59516822f4a4bc23b0961d88c569a7f1bf71 >
Re: 4.5 Regression - mouse not working after resume from suspend
On 06/02/16 08:37, Chris Clayton wrote: > There seems to be a regression in resuming my laptop from a suspend to RAM or > disk. The symptom is that my bluetooth > mouse doesn't work after the resume. The kernel is built after a pull of > Linus' tree this morning (v4.5-rc2-340-g5af9c2e). > > Attached is the output from dmesg showing the boot, suspend (to RAM) and > resume. You'll see that during the resume, > error -517 is being reported for some devices. Suspend/resume has worked > perfectly with a 4.[234].x kernels. > > I'll start a bisection, but thought I'd give a heads up in case someone can > see the problem before I get done with the > bisect. > The bisection ended up at: 2ff13894cfb877cb3d02d96a8402202f0a6f3efd is the first bad commit commit 2ff13894cfb877cb3d02d96a8402202f0a6f3efd Author: Johan Hedberg Date: Wed Nov 25 16:15:44 2015 +0200 Bluetooth: Perform HCI update for power on synchronously The request to update HCI during power on is always coming either from hdev->req_workqueue or through an ioctl, so it's safe to use hci_req_sync for it. This way we also eliminate potential races with incoming mgmt commands or other actions while powering on. Part of this refactoring is the splitting of mgmt_powered() into mgmt_power_on() and __mgmt_power_off() functions. The main reason is the different requirements as far as hdev locking is concerned, as highlighted with the __ prefix of the power off API. Since the power on in the case of clearing the AUTO_OFF flag cannot be done synchronously in the set_powered mgmt handler, the hci_power_on work callback is extended to cover this (which also simplifies the set_powered helper a lot). Signed-off-by: Johan Hedberg Signed-off-by: Marcel Holtmann :04 04 a093d0be66f39f99c33a6a4725b2330ca9b41d03 a1eff79cec3ee7208e5aa200ab5069726bbeea8e M include :04 04 d2d122193b33d45fcb9c2bc69f2024487a7528a0 0036e1ec2e125f2432cfd420b5f79ca133ec34f7 M net The bisect log is: git bisect start # bad: [5af9c2e19da6514a1a50b07d97d93b74a7711873] Merge branch 'akpm' (patches from Andrew) git bisect bad 5af9c2e19da6514a1a50b07d97d93b74a7711873 # good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc # bad: [63b6da39bb38e8f1a1ef3180d32a39d6baf9da84] perf: Fix perf_event_exit_task() race git bisect bad 63b6da39bb38e8f1a1ef3180d32a39d6baf9da84 # bad: [aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad aee3bfa3307cd0da2126bdc0ea359dabea5ee8f7 # good: [60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af] Merge tag 'upstream-4.5-rc1' of git://git.infradead.org/linux-ubifs git bisect good 60b7eca1dc2ec066916b3b7ac6ad89bea13cb9af # bad: [a188222b6ed29404ac2d4232d35d1fe0e77af370] net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK git bisect bad a188222b6ed29404ac2d4232d35d1fe0e77af370 # good: [1343c65f70ee1b1f968a08b30e1836a4e37116cd] fm10k: always check init_hw for errors git bisect good 1343c65f70ee1b1f968a08b30e1836a4e37116cd # good: [bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9] Merge branch 'for-4.5-ancestor-test' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup git bisect good bc9b145a092aca91a7f6ef40cdb3628b6ada7ec9 # good: [a4fcad656e1100bdda9b0b752b93a1a276810469] fm10k: whitespace cleanups git bisect good a4fcad656e1100bdda9b0b752b93a1a276810469 # bad: [7302b9d90117496049dd4bfa28755f7c2ed55b27] ieee802154/adf7242: Driver for ADF7242 MAC IEEE802154 git bisect bad 7302b9d90117496049dd4bfa28755f7c2ed55b27 # bad: [a0c38245153abe1fd844af9b166d1a5d5dafe7b1] Bluetooth: hci_intel: Use shorter timeout for HCI commands git bisect bad a0c38245153abe1fd844af9b166d1a5d5dafe7b1 # good: [bf943cbf76ecd3b9838a80d5e08777b0f4ccc665] Bluetooth: Move fast connectable code to hci_request.c git bisect good bf943cbf76ecd3b9838a80d5e08777b0f4ccc665 # bad: [742c59516822f4a4bc23b0961d88c569a7f1bf71] Bluetooth: Simplify setting Configuration Field git bisect bad 742c59516822f4a4bc23b0961d88c569a7f1bf71 # bad: [02c04afea93fbba7925984df455bc63e7d92da97] Bluetooth: Simplify read_adv_features code git bisect bad 02c04afea93fbba7925984df455bc63e7d92da97 # bad: [2ff13894cfb877cb3d02d96a8402202f0a6f3efd] Bluetooth: Perform HCI update for power on synchronously git bisect bad 2ff13894cfb877cb3d02d96a8402202f0a6f3efd # first bad commit: [2ff13894cfb877cb3d02d96a8402202f0a6f3efd] Bluetooth: Perform HCI update for power on synchronously Just shout if you need any additional diagnotics. > Chris >
4.5 Regression - mouse not working after resume from suspend
There seems to be a regression in resuming my laptop from a suspend to RAM or disk. The symptom is that my bluetooth mouse doesn't work after the resume. The kernel is built after a pull of Linus' tree this morning (v4.5-rc2-340-g5af9c2e). Attached is the output from dmesg showing the boot, suspend (to RAM) and resume. You'll see that during the resume, error -517 is being reported for some devices. Suspend/resume has worked perfectly with a 4.[234].x kernels. I'll start a bisection, but thought I'd give a heads up in case someone can see the problem before I get done with the bisect. Chris [0.00] Linux version 4.5.0-rc2+ (chris@laptop) (gcc version 5.3.1 20160202 (GCC) ) #318 SMP PREEMPT Sat Feb 6 06:38:55 GMT 2016 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.5.0-rc2+ ro root=/dev/sda2 resume=/dev/sda6 rootfstype=ext4 net.ifnames=0 [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable [0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS [0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable [0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved [0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable [0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved [0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable [0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS [0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved [0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable [0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Notebook W65_67SZ /W65_67SZ, BIOS 1.03.05 02/26/2014 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask 7C write-back [0.00] 1 base 04 mask 7FE000 write-back [0.00] 2 base 00E000 mask 7FE000 uncachable [0.00] 3 base 00DE00 mask 7FFE00 uncachable [0.00] 4 base 00DD00 mask 7FFF00 uncachable [0.00] 5 base 041FE0 mask 7FFFE0 uncachable [0.00] 6 disabled [0.00] 7 disabled [0.00] 8 disabled [0.00] 9 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xdd00-0x] usable ==> reserved [0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at [880fd820] [0.00] Base memory trampoline at [88097000] 97000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x01a05000, 0x01a05fff] PGTABLE [0.00] BRK [0x01a06000, 0x01a06fff] PGTABLE [0.00] BRK [0x01a07000, 0x01a07fff] PGTABLE [0.00] BRK [0x01a08000, 0x01a08fff] PGTABLE [0.00] BRK [0x01a09000, 0x01a09fff] PGTA
Re: EXT4: new warnings from 4.3.0-rc2
On 09/21/15 15:55, Chris Clayton wrote: > Thanks Ortwin. > > On 09/21/15 14:27, Ortwin Glück wrote: >>> [2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature >>> incompatibilities >>> [2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature >>> incompatibilities >> >> As the kernel doesn't know which FS your root is, it tries the whole list of >> filesystems (init/do_mounts.c >> mount_block_root()). Since the removal of ext3, now the ext4 code is >> responsbile for mounting ext3. Since your FS is >> ext4 and not ext3, the probe for ext3 fails. That's what the message tells >> you. You get these even in previous kernels >> if you say N to ext3 during config. >> > No, I do not get the messages from 4.2.0 even though it is configured the > same as 4.3.0-rc3 as far as EXT{2,3,4} is > concerned: > > # CONFIG_EXT2_FS is not set > # CONFIG_EXT3_FS is not set > CONFIG_EXT4_FS=y > CONFIG_EXT4_USE_FOR_EXT2=y > # CONFIG_EXT4_FS_POSIX_ACL is not set > # CONFIG_EXT4_FS_SECURITY is not set > # CONFIG_EXT4_ENCRYPTION is not set > # CONFIG_EXT4_DEBUG is not set > [chris:~/kernel/linux]$ cd ../linux-4.2.0/ > [chris:~/kernel/linux-4.2.0]$ grep EXT[234] .config > # CONFIG_EXT2_FS is not set > # CONFIG_EXT3_FS is not set > CONFIG_EXT4_FS=y > CONFIG_EXT4_USE_FOR_EXT23=y > # CONFIG_EXT4_FS_POSIX_ACL is not set > # CONFIG_EXT4_FS_SECURITY is not set > # CONFIG_EXT4_ENCRYPTION is not set > # CONFIG_EXT4_DEBUG is not set > > That's why I said they are new messages. > > I've just booted 4.1.7 and I get the messages from that kernel too. I wonder > if there's a recent fix that has made it > into 4.1.7, but not into 4.2.0. I'll apply Greg's 4.2.1-rc1 patch and see > what I get with that. > Applying the 4.2.1-rc1 patch results in a kernel that emits the messages, so I guess my fix-not-yet-in-4.2 theory is right. I'll just ignore the messages. Sorry for the noise. > Chris > > >> If it bugs you, you can add a hint to your kernel command line: >> rootfstype=ext4 >> >> Ortwin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: EXT4: new warnings from 4.3.0-rc2
Thanks Ortwin. On 09/21/15 14:27, Ortwin Glück wrote: >> [2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature >> incompatibilities >> [2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature >> incompatibilities > > As the kernel doesn't know which FS your root is, it tries the whole list of > filesystems (init/do_mounts.c > mount_block_root()). Since the removal of ext3, now the ext4 code is > responsbile for mounting ext3. Since your FS is > ext4 and not ext3, the probe for ext3 fails. That's what the message tells > you. You get these even in previous kernels > if you say N to ext3 during config. > No, I do not get the messages from 4.2.0 even though it is configured the same as 4.3.0-rc3 as far as EXT{2,3,4} is concerned: # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT2=y # CONFIG_EXT4_FS_POSIX_ACL is not set # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_ENCRYPTION is not set # CONFIG_EXT4_DEBUG is not set [chris:~/kernel/linux]$ cd ../linux-4.2.0/ [chris:~/kernel/linux-4.2.0]$ grep EXT[234] .config # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y # CONFIG_EXT4_FS_POSIX_ACL is not set # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_ENCRYPTION is not set # CONFIG_EXT4_DEBUG is not set That's why I said they are new messages. I've just booted 4.1.7 and I get the messages from that kernel too. I wonder if there's a recent fix that has made it into 4.1.7, but not into 4.2.0. I'll apply Greg's 4.2.1-rc1 patch and see what I get with that. Chris > If it bugs you, you can add a hint to your kernel command line: > rootfstype=ext4 > > Ortwin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
EXT4: new warnings from 4.3.0-rc2
Hi, I've just built and booted 4.3.0-rc2 and I'm seeing the following new messages on the console during boot up: [2.481399] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities [2.482426] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities They are immediately followed by: [2.507948] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) [3.549523] EXT4-fs (sda2): re-mounted. Opts: (null) and they are the messages I normally see (from 4.2.0 and earlier). sda2 is my root partition and is mounted OK, so my system is operating as before, but I thought you would want a heads up about these (slightly alarming) new console messages. The output from dmesg is attached, in case it helps. Chris [0.00] Initializing cgroup subsys cpu [0.00] Linux version 4.3.0-rc2 (chris@laptop) (gcc version 5.2.1 20150915 (GCC) ) #251 SMP PREEMPT Mon Sep 21 07:50:42 BST 2015 [0.00] Command line: root=/dev/sda2 ro resume=/dev/sda6 [0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' [0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd7216fff] usable [0.00] BIOS-e820: [mem 0xd7217000-0xd721dfff] ACPI NVS [0.00] BIOS-e820: [mem 0xd721e000-0xd7a0cfff] usable [0.00] BIOS-e820: [mem 0xd7a0d000-0xd7ca1fff] reserved [0.00] BIOS-e820: [mem 0xd7ca2000-0xdb4d] usable [0.00] BIOS-e820: [mem 0xdb4e-0xdb82dfff] reserved [0.00] BIOS-e820: [mem 0xdb82e000-0xdb88afff] usable [0.00] BIOS-e820: [mem 0xdb88b000-0xdb9bcfff] ACPI NVS [0.00] BIOS-e820: [mem 0xdb9bd000-0xdbffefff] reserved [0.00] BIOS-e820: [mem 0xdbfff000-0xdbff] usable [0.00] BIOS-e820: [mem 0xdd00-0xdf1f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00041fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Notebook W65_67SZ /W65_67SZ, BIOS 1.03.05 02/26/2014 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x41fe00 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask 7C write-back [0.00] 1 base 04 mask 7FE000 write-back [0.00] 2 base 00E000 mask 7FE000 uncachable [0.00] 3 base 00DE00 mask 7FFE00 uncachable [0.00] 4 base 00DD00 mask 7FFF00 uncachable [0.00] 5 base 041FE0 mask 7FFFE0 uncachable [0.00] 6 disabled [0.00] 7 disabled [0.00] 8 disabled [0.00] 9 disabled [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: update [mem 0xdd00-0x] usable ==> reserved [0.00] e820: last_pfn = 0xdc000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd820-0x000fd82f] mapped at [880fd820] [0.00] Base memory trampoline at [88097000] 97000 size 24576 [0.00] Using GB pages for direct mapping [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000
Re: [PATCH] iommu: prompt for IOMMU_IO_PGTABLE_LPAE on ARM archs only
On 02/16/15 16:32, Will Deacon wrote: > Hi Chris, > > On Sun, Feb 15, 2015 at 11:17:19AM +0000, Chris Clayton wrote: >> When running "make oldconfig" for an x86_64 kernel, I was prompted for a >> setting for IOMMU_IO_PGTABLE_LPAE. From the prompt and the help text it >> appears that this config item is relevant to ARMv7/v8 only. This patch >> prevents the prompt on non-ARM architectures. Compile tested building a >> cross-compiled x86_64 kernel in an x86 user space. The resultant kernel >> boots fine and I am running it now. >> >> Fixes: e1d3c0fd701df831169b116cd5c5d6203ac07f70 >> Cc: will.dea...@arm.com >> Signed-off-by: Chris Clayton >> >> --- linux/drivers/iommu/Kconfig.orig2015-02-15 09:44:01.235927248 + >> +++ linux/drivers/iommu/Kconfig 2015-02-15 09:44:41.131926434 + >> @@ -22,6 +22,7 @@ config IOMMU_IO_PGTABLE >> >> config IOMMU_IO_PGTABLE_LPAE >> bool "ARMv7/v8 Long Descriptor Format" >> + depends on ARM || ARM64 >> select IOMMU_IO_PGTABLE >> help >> Enable support for the ARM long descriptor pagetable format. > > What's the problem with this? The page-table code is intentionally > decoupled from the CPU architecture and having this boot-tested on x86 > found some real bugs that I'm currently fixing. Sure, you probably don't > need this on your box, but it's not default y and you don't have to > select it. > There's no real problem except that, as I said, the prompt and the help text suggest that the config is relevant to ARM architecture only. Same with the help text. When it popped up on x86_64, it was a surprise. As you say, I can simply answer "N", but the prompt and the help need correcting, because for an ordinary Joe User like me, it's misleading. > Will > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iommu: prompt for IOMMU_IO_PGTABLE_LPAE on ARM archs only
When running "make oldconfig" for an x86_64 kernel, I was prompted for a setting for IOMMU_IO_PGTABLE_LPAE. From the prompt and the help text it appears that this config item is relevant to ARMv7/v8 only. This patch prevents the prompt on non-ARM architectures. Compile tested building a cross-compiled x86_64 kernel in an x86 user space. The resultant kernel boots fine and I am running it now. Fixes: e1d3c0fd701df831169b116cd5c5d6203ac07f70 Cc: will.dea...@arm.com Signed-off-by: Chris Clayton --- linux/drivers/iommu/Kconfig.orig2015-02-15 09:44:01.235927248 + +++ linux/drivers/iommu/Kconfig 2015-02-15 09:44:41.131926434 + @@ -22,6 +22,7 @@ config IOMMU_IO_PGTABLE config IOMMU_IO_PGTABLE_LPAE bool "ARMv7/v8 Long Descriptor Format" + depends on ARM || ARM64 select IOMMU_IO_PGTABLE help Enable support for the ARM long descriptor pagetable format. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG in 3.19.0-rc3+
Thanks Konstantin. [snip] >>> >>> Looks like degree (%edx) is 1 on anon-vma desruction. >>> Probably I've overlooked some weird conrner case in vma splitting/merging. >>> >>> Could you try this patch. It disables vma merging end eliminates half >>> of complicated paths. >>> As I see merging is optional, everything should work fine without it. >>> >>> --- a/mm/mmap.c >>> +++ b/mm/mmap.c >>> @@ -1048,7 +1048,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, >>> * We later require that vma->vm_flags == vm_flags, >>> * so this tests vma->vm_flags & VM_SPECIAL, too. >>> */ >>> - if (vm_flags & VM_SPECIAL) >>> + if (1) >>> return NULL; >>> >>> if (prev) >>> >>> >>> >>> >>> Code from your oops. >>> >>> Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 8b 45 10 48 8d 55 10 48 >>> 83 e8 10 49 39 d6 74 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> >>> 0b 0f 1f 40 00 e8 6b fc ff ff eb 9a 66 0f 1f 84 00 00 00 00 >>> All code >>> >>>0: 00 ad de 48 89 43 add%ch,0x438948de(%rbp) >>>6: 18 e8 sbb%ch,%al >>>8: c5 f9 00 (bad) >>>b: 00 48 8b add%cl,-0x75(%rax) >>>e: 45 10 48 8d adc%r9b,-0x73(%r8) >>> 12: 55 push %rbp >>> 13: 10 48 83 adc%cl,-0x7d(%rax) >>> 16: e8 10 49 39 d6 callq 0xd639492b >>> 1b: 74 54 je 0x71 >>> 1d: 48 8b 7d 08 mov0x8(%rbp),%rdi >>> 21: 48 89 eb mov%rbp,%rbx >>> 24: 8b 57 34 mov0x34(%rdi),%edx >>> 27: 85 d2 test %edx,%edx >>> 29: 74 9e je 0xffc9 >>> 2b:* 0f 0b ud2 <-- trapping instruction >>> 2d: 0f 1f 40 00 nopl 0x0(%rax) >>> 31: e8 6b fc ff ff callq 0xfca1 >>> 36: eb 9a jmp0xffd2 >>> 38: 66 data16 >>> 39: 0f .byte 0xf >>> 3a: 1f (bad) >>> 3b: 84 00 test %al,(%rax) >>> 3d: 00 00 add%al,(%rax) >>> ... >>> >>> Code starting with the faulting instruction >>> === >>>0: 0f 0b ud2 >>>2: 0f 1f 40 00 nopl 0x0(%rax) >>>6: e8 6b fc ff ff callq 0xfc76 >>>b: eb 9a jmp0xffa7 >>>d: 66 data16 >>>e: 0f .byte 0xf >>>f: 1f (bad) >>> 10: 84 00 test %al,(%rax) >>> 12: 00 00 add%al,(%rax) >>> >>> >>> +Added Oded Gabbay into cc, he's reported this >>> problem too. >>> >> Thanks for the fast reply. >> >> I applied the patch and tested it. I wasn't able to reproduce *my* problem, >> so you are definitely in the right direction :) > > Ok. I've found something. Try patch from attachment. > Your patch has fixed the BUG for me. Thank you. Tested-by: Chris Clayton >> >> Oded >> >>>> >>>> In case it helps, I've attached the xz-compressed related config file. >>>> >>>> Chris >>>> >>>>> >>>>> I've attached the full kernel log file for that boot. >>>>> >>>>> Chris >>>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG in 3.19.0-rc3+
On 01/11/15 09:52, Oded Gabbay wrote: > > > On 01/11/2015 11:37 AM, Konstantin Khlebnikov wrote: >> On Sun, Jan 11, 2015 at 11:16 AM, Chris Clayton >> wrote: >>> Hi, >>> >>> I've done the bisect and the outcome is below, but, because I almost always >>> forget to mention it, I'll say here that I >>> am running a 32 bit user space on a 64 bit kernel. >>> >>> On 01/10/15 20:17, Chris Clayton wrote: >>>> Hi, >>>> >>>> I'm getting a bug a BUG report from a kernel built from a pull (earlier >>>> today) of the current development kernel >>>> (running git describe gives v3.19-rc3-169-geb74926). So that I have >>>> useable wireless networking, I have also applied the >>>> latest seven iwlwifi patches from the wireless-drivers git tree. Prior to >>>> today's pull, I was not seeing anything >>>> unusual in dmesg. >>>> >>>> The BUG reported is as follows: >>>> >>>> Jan 10 19:41:32 laptop kernel: [ cut here ] >>>> Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399! >>>> Jan 10 19:41:32 laptop kernel: invalid opcode: [#1] PREEMPT SMP >>>> Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via >>>> iwlmvm coretemp snd_hda_codec_hdmi >>>> snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller >>>> x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211 >>>> snd_hda_codec snd_hwdep >>>> Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted >>>> 3.19.0-rc3+ #42 >>>> Jan 10 19:41:32 laptop kernel: Hardware name: Notebook >>>> W65_67SZ/W65_67SZ >>>>, BIOS 1.03.05 02/26/2014 >>>> Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 >>>> task.ti: 880408dd4000 >>>> Jan 10 19:41:32 laptop kernel: RIP: 0010:[] >>>> [] unlink_anon_vmas+0x17a/0x200 >>>> Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88 EFLAGS: 00010286 >>>> Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 >>>> RCX: >>>> Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 >>>> RDI: 880409f04320 >>>> Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08: >>>> R09: 88040d801c00 >>>> Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 >>>> R12: 880409f04320 >>>> Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 >>>> R15: 88040cb13210 >>>> Jan 10 19:41:33 laptop kernel: FS: () >>>> GS:88041fa4() knlGS: >>>> Jan 10 19:41:33 laptop kernel: CS: 0010 DS: 002b ES: 002b CR0: >>>> 80050033 >>>> Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 >>>> CR4: 001407e0 >>>> Jan 10 19:41:33 laptop kernel: Stack: >>>> Jan 10 19:41:33 laptop kernel: 88040d6cfbd8 88040d6cfba0 >>>> 88040cecd160 88040cb13210 >>>> Jan 10 19:41:33 laptop kernel: 88040cbbb630 f7151000 >>>> 880408dd7e28 >>>> Jan 10 19:41:33 laptop kernel: 810e3633 >>>> >>>> Jan 10 19:41:33 laptop kernel: Call Trace: >>>> Jan 10 19:41:33 laptop kernel: [] ? >>>> free_pgtables+0x83/0xf0 >>>> Jan 10 19:41:34 laptop kernel: [] ? exit_mmap+0xc3/0x150 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> __do_page_fault+0x17d/0x4b0 >>>> Jan 10 19:41:34 laptop kernel: [] ? mmput+0x21/0xc0 >>>> Jan 10 19:41:34 laptop kernel: [] ? do_exit+0x26d/0xa50 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> mntput_no_expire+0x9/0x140 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> task_work_run+0xbc/0xf0 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> do_group_exit+0x34/0xb0 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> SyS_exit_group+0xf/0x10 >>>> Jan 10 19:41:34 laptop kernel: [] ? >>>> sysenter_dispatch+0x7/0x1e >>>> Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 >>>> 48 8b 45 10 48 8d 55
Re: BUG in 3.19.0-rc3+
Hi, I've done the bisect and the outcome is below, but, because I almost always forget to mention it, I'll say here that I am running a 32 bit user space on a 64 bit kernel. On 01/10/15 20:17, Chris Clayton wrote: > Hi, > > I'm getting a bug a BUG report from a kernel built from a pull (earlier > today) of the current development kernel > (running git describe gives v3.19-rc3-169-geb74926). So that I have useable > wireless networking, I have also applied the > latest seven iwlwifi patches from the wireless-drivers git tree. Prior to > today's pull, I was not seeing anything > unusual in dmesg. > > The BUG reported is as follows: > > Jan 10 19:41:32 laptop kernel: [ cut here ] > Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399! > Jan 10 19:41:32 laptop kernel: invalid opcode: [#1] PREEMPT SMP > Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via > iwlmvm coretemp snd_hda_codec_hdmi > snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller > x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211 > snd_hda_codec snd_hwdep > Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted > 3.19.0-rc3+ #42 > Jan 10 19:41:32 laptop kernel: Hardware name: Notebook > W65_67SZ/W65_67SZ >, BIOS 1.03.05 02/26/2014 > Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 > task.ti: 880408dd4000 > Jan 10 19:41:32 laptop kernel: RIP: 0010:[] > [] unlink_anon_vmas+0x17a/0x200 > Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88 EFLAGS: 00010286 > Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 > RCX: > Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 > RDI: 880409f04320 > Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08: > R09: 88040d801c00 > Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 > R12: 880409f04320 > Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 > R15: 88040cb13210 > Jan 10 19:41:33 laptop kernel: FS: () > GS:88041fa4() knlGS: > Jan 10 19:41:33 laptop kernel: CS: 0010 DS: 002b ES: 002b CR0: > 80050033 > Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 > CR4: 001407e0 > Jan 10 19:41:33 laptop kernel: Stack: > Jan 10 19:41:33 laptop kernel: 88040d6cfbd8 88040d6cfba0 > 88040cecd160 88040cb13210 > Jan 10 19:41:33 laptop kernel: 88040cbbb630 f7151000 > 880408dd7e28 > Jan 10 19:41:33 laptop kernel: 810e3633 > > Jan 10 19:41:33 laptop kernel: Call Trace: > Jan 10 19:41:33 laptop kernel: [] ? free_pgtables+0x83/0xf0 > Jan 10 19:41:34 laptop kernel: [] ? exit_mmap+0xc3/0x150 > Jan 10 19:41:34 laptop kernel: [] ? > __do_page_fault+0x17d/0x4b0 > Jan 10 19:41:34 laptop kernel: [] ? mmput+0x21/0xc0 > Jan 10 19:41:34 laptop kernel: [] ? do_exit+0x26d/0xa50 > Jan 10 19:41:34 laptop kernel: [] ? > mntput_no_expire+0x9/0x140 > Jan 10 19:41:34 laptop kernel: [] ? task_work_run+0xbc/0xf0 > Jan 10 19:41:34 laptop kernel: [] ? do_group_exit+0x34/0xb0 > Jan 10 19:41:34 laptop kernel: [] ? SyS_exit_group+0xf/0x10 > Jan 10 19:41:34 laptop kernel: [] ? > sysenter_dispatch+0x7/0x1e > Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 > 8b 45 10 48 8d 55 10 48 83 e8 10 49 39 d6 74 > 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> 0b 0f 1f 40 00 e8 6b fc ff > ff eb 9a 66 0f 1f 84 00 00 00 00 > Jan 10 19:41:34 laptop kernel: RIP [] > unlink_anon_vmas+0x17a/0x200 > Jan 10 19:41:34 laptop kernel: RSP > Jan 10 19:41:34 laptop kernel: ---[ end trace 4aa713b2a9aa664b ]--- > Jan 10 19:41:34 laptop kernel: Fixing recursive fault but reboot is needed! > Jan 10 19:41:34 laptop kernel: nf_conntrack version 0.5.0 (16384 buckets, > 65536 max) [snip] > > I won't get time tonight, but I can bisect it tomorrow, so this is just a > heads up in case the problem (and fix) jumps > out at anyone. Before I bisect I'll build and run a kernel without the > iwlwifi patches. The bisect ended up at: 7a3ef208e662f4b63d43a23f61a64a129c525bbc is the first bad commit commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc Author: Konstantin Khlebnikov Date: Thu Jan 8 14:32:15 2015 -0800 mm: prevent endless growth of anon_vma hierarchy Constantly forking task causes unlimited grow of anon_vma chain. Each next child allocates new level of anon_
BUG in 3.19.0-rc3+
Hi, I'm getting a bug a BUG report from a kernel built from a pull (earlier today) of the current development kernel (running git describe gives v3.19-rc3-169-geb74926). So that I have useable wireless networking, I have also applied the latest seven iwlwifi patches from the wireless-drivers git tree. Prior to today's pull, I was not seeing anything unusual in dmesg. The BUG reported is as follows: Jan 10 19:41:32 laptop kernel: [ cut here ] Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399! Jan 10 19:41:32 laptop kernel: invalid opcode: [#1] PREEMPT SMP Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via iwlmvm coretemp snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211 snd_hda_codec snd_hwdep Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted 3.19.0-rc3+ #42 Jan 10 19:41:32 laptop kernel: Hardware name: Notebook W65_67SZ/W65_67SZ , BIOS 1.03.05 02/26/2014 Jan 10 19:41:32 laptop kernel: task: 8800da98c5c0 ti: 880408dd4000 task.ti: 880408dd4000 Jan 10 19:41:32 laptop kernel: RIP: 0010:[] [] unlink_anon_vmas+0x17a/0x200 Jan 10 19:41:33 laptop kernel: RSP: 0018:880408dd7d88 EFLAGS: 00010286 Jan 10 19:41:33 laptop kernel: RAX: 88040b79e150 RBX: 88040b79e140 RCX: Jan 10 19:41:33 laptop kernel: RDX: 0001 RSI: 880409f04360 RDI: 880409f04320 Jan 10 19:41:33 laptop kernel: RBP: 88040cb13278 R08: R09: 88040d801c00 Jan 10 19:41:33 laptop kernel: R10: 88041fa546e0 R11: 88040b79e160 R12: 880409f04320 Jan 10 19:41:33 laptop kernel: R13: 88040cb13278 R14: 88040cb13288 R15: 88040cb13210 Jan 10 19:41:33 laptop kernel: FS: () GS:88041fa4() knlGS: Jan 10 19:41:33 laptop kernel: CS: 0010 DS: 002b ES: 002b CR0: 80050033 Jan 10 19:41:33 laptop kernel: CR2: f722c8d4 CR3: 0004082a8000 CR4: 001407e0 Jan 10 19:41:33 laptop kernel: Stack: Jan 10 19:41:33 laptop kernel: 88040d6cfbd8 88040d6cfba0 88040cecd160 88040cb13210 Jan 10 19:41:33 laptop kernel: 88040cbbb630 f7151000 880408dd7e28 Jan 10 19:41:33 laptop kernel: 810e3633 Jan 10 19:41:33 laptop kernel: Call Trace: Jan 10 19:41:33 laptop kernel: [] ? free_pgtables+0x83/0xf0 Jan 10 19:41:34 laptop kernel: [] ? exit_mmap+0xc3/0x150 Jan 10 19:41:34 laptop kernel: [] ? __do_page_fault+0x17d/0x4b0 Jan 10 19:41:34 laptop kernel: [] ? mmput+0x21/0xc0 Jan 10 19:41:34 laptop kernel: [] ? do_exit+0x26d/0xa50 Jan 10 19:41:34 laptop kernel: [] ? mntput_no_expire+0x9/0x140 Jan 10 19:41:34 laptop kernel: [] ? task_work_run+0xbc/0xf0 Jan 10 19:41:34 laptop kernel: [] ? do_group_exit+0x34/0xb0 Jan 10 19:41:34 laptop kernel: [] ? SyS_exit_group+0xf/0x10 Jan 10 19:41:34 laptop kernel: [] ? sysenter_dispatch+0x7/0x1e Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 8b 45 10 48 8d 55 10 48 83 e8 10 49 39 d6 74 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> 0b 0f 1f 40 00 e8 6b fc ff ff eb 9a 66 0f 1f 84 00 00 00 00 Jan 10 19:41:34 laptop kernel: RIP [] unlink_anon_vmas+0x17a/0x200 Jan 10 19:41:34 laptop kernel: RSP Jan 10 19:41:34 laptop kernel: ---[ end trace 4aa713b2a9aa664b ]--- Jan 10 19:41:34 laptop kernel: Fixing recursive fault but reboot is needed! Jan 10 19:41:34 laptop kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max) Jan 10 19:41:34 laptop kernel: [ cut here ] Jan 10 19:41:35 laptop kernel: kernel BUG at mm/rmap.c:399! Jan 10 19:41:35 laptop kernel: invalid opcode: [#2] PREEMPT SMP Jan 10 19:41:35 laptop kernel: Modules linked in: iptable_filter xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack rfcomm snd_hda_codec_via iwlmvm coretemp snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211 snd_hda_codec snd_hwdep Jan 10 19:41:35 laptop kernel: CPU: 0 PID: 678 Comm: krootimage Tainted: G D3.19.0-rc3+ #42 Jan 10 19:41:35 laptop kernel: Hardware name: Notebook W65_67SZ/W65_67SZ , BIOS 1.03.05 02/26/2014 Jan 10 19:41:35 laptop kernel: task: 880408de26c0 ti: 880409fcc000 task.ti: 880409fcc000 Jan 10 19:41:35 laptop kernel: RIP: 0010:[] [] unlink_anon_vmas+0x17a/0x200 Jan 10 19:41:35 laptop kernel: RSP: 0018:880409fcfd88 EFLAGS: 00010286 Jan 10 19:41:35 laptop kernel: RAX: 880408370d90 RBX: 880408370d80 RCX: Jan 10 19:41:35 laptop kernel: RDX: 000
Re: [tip:x86/urgent] x86: Use $(OBJDUMP) instead of plain objdump
Hi, Is it planned to get this fix through in time for release of 3.18? It appears to have been in -next for a week. Chris On 11/23/14 20:24, tip-bot for Chris Clayton wrote: > Commit-ID: e2e68ae688b0a3766cd75aedf4ed4e39be402009 > Gitweb: http://git.kernel.org/tip/e2e68ae688b0a3766cd75aedf4ed4e39be402009 > Author: Chris Clayton > AuthorDate: Sat, 22 Nov 2014 09:51:10 + > Committer: Thomas Gleixner > CommitDate: Sun, 23 Nov 2014 21:21:53 +0100 > > x86: Use $(OBJDUMP) instead of plain objdump > > commit e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd' > broke the cross compile of x86. It added a objdump invocation, which > invokes the host native objdump and ignores an active cross tool > chain. > > Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into > account. > > [ tglx: Massage changelog and use $(OBJDUMP) ] > > Fixes: e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd' > Signed-off-by: Chris Clayton > Acked-by: Kees Cook > Acked-by: Borislav Petkov > Cc: Junjie Mao > Cc: Ingo Molnar > Cc: H. Peter Anvin > Cc: sta...@vger.kernel.org > Link: http://lkml.kernel.org/r/54705c8e.1080...@googlemail.com > Signed-off-by: Thomas Gleixner > --- > arch/x86/boot/compressed/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/boot/compressed/Makefile > b/arch/x86/boot/compressed/Makefile > index be1e07d..45abc36 100644 > --- a/arch/x86/boot/compressed/Makefile > +++ b/arch/x86/boot/compressed/Makefile > @@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ) := xz > suffix-$(CONFIG_KERNEL_LZO) := lzo > suffix-$(CONFIG_KERNEL_LZ4) := lz4 > > -RUN_SIZE = $(shell objdump -h vmlinux | \ > +RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \ >perl $(srctree)/arch/x86/tools/calc_run_size.pl) > quiet_cmd_mkpiggy = MKPIGGY $@ >cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false > ) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86: Use $(OBJDUMP) instead of plain objdump
Commit-ID: e2e68ae688b0a3766cd75aedf4ed4e39be402009 Gitweb: http://git.kernel.org/tip/e2e68ae688b0a3766cd75aedf4ed4e39be402009 Author: Chris Clayton AuthorDate: Sat, 22 Nov 2014 09:51:10 + Committer: Thomas Gleixner CommitDate: Sun, 23 Nov 2014 21:21:53 +0100 x86: Use $(OBJDUMP) instead of plain objdump commit e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd' broke the cross compile of x86. It added a objdump invocation, which invokes the host native objdump and ignores an active cross tool chain. Use $(OBJDUMP) instead which takes the CROSS_COMPILE prefix into account. [ tglx: Massage changelog and use $(OBJDUMP) ] Fixes: e6023367d779 'x86, kaslr: Prevent .bss from overlaping initrd' Signed-off-by: Chris Clayton Acked-by: Kees Cook Acked-by: Borislav Petkov Cc: Junjie Mao Cc: Ingo Molnar Cc: H. Peter Anvin Cc: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/54705c8e.1080...@googlemail.com Signed-off-by: Thomas Gleixner --- arch/x86/boot/compressed/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index be1e07d..45abc36 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ):= xz suffix-$(CONFIG_KERNEL_LZO):= lzo suffix-$(CONFIG_KERNEL_LZ4):= lz4 -RUN_SIZE = $(shell objdump -h vmlinux | \ +RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \ perl $(srctree)/arch/x86/tools/calc_run_size.pl) quiet_cmd_mkpiggy = MKPIGGY $@ cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]: cross-compiling x86_64 kernel on i386 user-space fails
Hi, Commit e6023367d779060fddc9a52d1f474085b2b36298 broke building an x86_64 kernel in an i386. The change added a call to objdump but neglected to cater for cross-compiling. The patch below fixes the problem for me. I see the commit is now in 3.14 and 3.17 -stable, so the patch needs to go there too. CC: Junjie Mao CC: Kees Cook CC: Thomas Gleixner CC: Ingo Molnar Signed-off-by: Chris Clayton --- --- linux/arch/x86/boot/compressed/Makefile~2014-11-22 08:56:50.359706324 + +++ linux/arch/x86/boot/compressed/Makefile 2014-11-22 09:04:06.615693435 + @@ -76,7 +76,7 @@ suffix-$(CONFIG_KERNEL_XZ):= xz suffix-$(CONFIG_KERNEL_LZO):= lzo suffix-$(CONFIG_KERNEL_LZ4):= lz4 -RUN_SIZE = $(shell objdump -h vmlinux | \ +RUN_SIZE = $(shell ${CROSS_COMPILE}objdump -h vmlinux | \ perl $(srctree)/arch/x86/tools/calc_run_size.pl) quiet_cmd_mkpiggy = MKPIGGY $@ cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Networking problem with 3.11-rc6+
On 08/20/13 22:54, Francois Romieu wrote: Chris Clayton : [...] [0.207094] acpi PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM [0.207155] acpi PNP0A08:00: Unable to request _OSC control (_OSC support mask: 0x08) [...] [5.311191] r8169 :07:00.0: can't disable ASPM; OS doesn't have ASPM control [...] [ 65.181715] [] rtl8169_interrupt [r8169] [ 65.181717] Disabling IRQ #17 Let me know if I can provide any additional information, although to be honest, the boot completed and the KDE desktop started up OK, so there may not be much else I can provide unless I find that the problem is repeatable. Please don't hesitate if it happens again, even if it can't be reproduced on demand. I've booted the kernel numerous times since the incident reported above, but haven't encountered the problem again. If it does happen again, are there any files from /sys or /proc that might help diagnostics? Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Networking problem with 3.11-rc6+
Hello, I've just booted my laptop and found that networking was broken. Pings of other devices on my home network failed. A reboot has restored networking, but I thought I should report the problem anyway. I'll have no time tomorrow, but on Thursday I'll do a few boots to ascertain how repeatable the problem is. Attached is a complete dmesg, but perhaps the most relevant part is: [ 65.181557] irq 17: nobody cared (try booting with the "irqpoll" option) [ 65.181566] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-rc6+ #124 [ 65.181568] Hardware name: FUJITSU LIFEBOOK AH531/FJNBB0F, BIOS 1.30 05/28/2012 [ 65.181571] 0011 c141bb51 f3154840 c108d581 c14e071c 0011 f3003950 6650 [ 65.181581] f3154840 0011 c108d8f0 1e35395f 0011 c135453b [ 65.181588] 0011 c108bb0b 0e0e192a 75c91d86 a880c0ad de2115ee [ 65.181597] Call Trace: [ 65.181608] [] ? dump_stack+0x48/0x76 [ 65.181614] [] ? __report_bad_irq+0x21/0xc0 [ 65.181619] [] ? note_interrupt+0xf0/0x1a0 [ 65.181624] [] ? cpuidle_enter_state+0x3b/0xc0 [ 65.181632] [] ? handle_irq_event_percpu+0x9b/0x120 [ 65.181641] [] ? __io_apic_modify_irq+0x39/0x40 [ 65.181646] [] ? handle_irq_event+0x29/0x40 [ 65.181650] [] ? unmask_irq+0x20/0x20 [ 65.181654] [] ? handle_fasteoi_irq+0x46/0xd0 [ 65.181656][] ? do_IRQ+0x31/0xa0 [ 65.181667] [] ? common_interrupt+0x2c/0x31 [ 65.181673] [] ? kmsg_dump_rewind_nolock+0x2b/0x40 [ 65.181678] [] ? cpuidle_enter_state+0x3b/0xc0 [ 65.181682] [] ? cpuidle_idle_call+0x7b/0x110 [ 65.181688] [] ? arch_cpu_idle+0x5/0x20 [ 65.181692] [] ? cpu_startup_entry+0xae/0xf0 [ 65.181697] [] ? start_kernel+0x2a8/0x2ad [ 65.181702] [] ? repair_env_string+0x4d/0x4d [ 65.181704] handlers: [ 65.181715] [] rtl8169_interrupt [r8169] [ 65.181717] Disabling IRQ #17 Let me know if I can provide any additional information, although to be honest, the boot completed and the KDE desktop started up OK, so there may not be much else I can provide unless I find that the problem is repeatable. Chris [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.11.0-rc6+ (chris@laptop) (gcc version 4.8.2 20130815 (prerelease) (GCC) ) #124 SMP PREEMPT Tue Aug 20 10:54:09 BST 2013 [0.00] Disabled fast string operations [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdab0efff] usable [0.00] BIOS-e820: [mem 0xdab0f000-0xdad4efff] reserved [0.00] BIOS-e820: [mem 0xdad4f000-0xdad6efff] ACPI NVS [0.00] BIOS-e820: [mem 0xdad6f000-0xdaf1efff] reserved [0.00] BIOS-e820: [mem 0xdaf1f000-0xdaf9efff] ACPI NVS [0.00] BIOS-e820: [mem 0xdaf9f000-0xdaffefff] ACPI data [0.00] BIOS-e820: [mem 0xdafff000-0xdaff] usable [0.00] BIOS-e820: [mem 0xdb00-0xdf9f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed1-0xfed19fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xffd8-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00021fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.6 present. [0.00] DMI: FUJITSU LIFEBOOK AH531/FJNBB0F, BIOS 1.30 05/28/2012 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x21fe00 max_arch_pfn = 0x100 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0FFC0 mask FFFC0 write-protect [0.00] 1 base 0 mask F8000 write-back [0.00] 2 base 08000 mask FC000 write-back [0.00] 3 base 0C000 mask FE000 write-back [0.00] 4 base 0DC00 mask FFC00 uncachable [0.00] 5 base 0DB00 mask FFF00 uncachable [0.00] 6 base 1 mask F write-back [0.00] 7 base 2 mask FE000 write-back [0.00] 8 base 21FE0 mask FFFE0 unc
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hello again, Bjorn. On 04/01/13 18:28, Bjorn Helgaas wrote: Hi Chris, The current Linux acpiphp driver doesn't do anything unless it finds devices with _EJ0 or _RMV methods, and your DSDT has neither. But I think that implementation is incorrect because I'm not convinced that those methods are required in order to do hotplug via ACPI. For example, your DSDT *does* have an _L01 method that does notifications to the root ports. I suspect that hotplug events on your box generate an SCI that invokes that method. Linux basically ignores the resulting notify events, and I suspect that hotplug works on Windows because it is paying attention to them. Can you build a kernel with CONFIG_ACPI_DEBUG=y, do the following, and attach all the output to the bugzilla? 1) Boot with empty ExpressCard slot (without using "pcie_ports=native") 2) # echo 0x00010004 > /sys/module/acpi/parameters/debug_layer 3) # echo 0x0804 > /sys/module/acpi/parameters/debug_level 4) # lspci -vv 5) # setpci -s 1c.3 0x42.w 6) # setpci -s 1c.3 0x5a.w 7) # setpci -s 1c.3 0xd8.l 8) Insert ExpressCard 9) # setpci -s 1c.3 0x5a.w 10) # dmesg Here's what I think we'll see: - Slot Implemented (bit 8 of XCAP at 0x42) set, indicating a slot is implemented below this root port - Hot Plug SCI Enable (bit 30 of MPC at 0xd8) set, indicating that the root port should generate an SCI whenever a hotplug event is detected - Presence Detect State (bit 6 of SLTSTS at 0x5a) change from 0 with the slot empty to 1 with the slot occupied - pciehp doing nothing (since _OSC didn't grant the OS permission to use PCIe native hotplug) - dmesg indication of the SCI, leading to a Bus Check notification to \_SB.PCI0.RP04, which is the 1c.3 root port leading to the ExpressCard slot As far as I can see, your predicted outcomes where correct. I've added the logs to the bugzilla report. Yes, it behaved exactly as I expected. I attached a few more details of the analysis to the bug report (https://bugzilla.kernel.org/show_bug.cgi?id=54981). I think we need to rework acpiphp to fix this. We will fix it, but I don't know who will do it, or when it will happen. My list is growing faster than I can deal with :( I see that there has been quite a bit of work in the acpiphp area recently. Is any of it intended to fix (or ease the subsequent fixing of) this bug report, please? It's not a big deal if it isn't - I do have a system that, via kernel command line arguments, recognises expresscard devices when I plug them into the slot. But when the release comes along that includes the reworking of acpiphp that you mentioned, I will give extra attention to hotplugging when I try out the -rc kernels on my laptop. Thanks Thanks very much for your excellent data collection! It's my pleasure! Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data
On 04/03/13 15:32, Michal Marek wrote: On 1.4.2013 11:28, Chris Clayton wrote: Ping! This is still happening with 3.9-rc5. [chris:~/kernel/linux]$ make bzImage ... Kernel: arch/x86/boot/bzImage is ready (#14) [chris:~/kernel/linux]$ make kernelrelease scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc5 [chris:~/kernel/linux]$ make kernelrelease 3.9.0-rc5 You need to run make -s kernelrelease. Ah, right. I didn't see that announcement. The -s argument was not necessary with earlier releases. Sorry for the noise. Chris Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data
Ping! This is still happening with 3.9-rc5. [chris:~/kernel/linux]$ make bzImage ... Kernel: arch/x86/boot/bzImage is ready (#14) [chris:~/kernel/linux]$ make kernelrelease scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc5 [chris:~/kernel/linux]$ make kernelrelease 3.9.0-rc5 I'm assuming that the output from 'make kernelrelease' should be the string that represents the kernel release and only that string. Chris On 03/09/13 19:38, Chris Clayton wrote: In Linus' current tree, the first time the command "make kernelrelease" is run after building a kernel, the output contains some unwanted text. Subsequent uses of the command produce the expected output. This appears to be a regression - 3.8.2 does not have this problem. This is easily demonstrated from the command line by the following: ... System is 2311 kB CRC a4e38b86 Kernel: arch/x86/boot/bzImage is ready (#186) $ make kernelrelease scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc1+ $ make kernelrelease 3.9.0-rc1+ Happy to test the fix. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Bjorn, Sorry I meant to reply to your mail so that copy recipients will know the outcome of the things you asked me to do. But then I forgot. Doh! You should have received the attachment notification from bugzilla, I think. On 03/15/13 22:48, Bjorn Helgaas wrote: On Tue, Mar 12, 2013 at 4:20 PM, Bjorn Helgaas wrote: On Sat, Mar 9, 2013 at 2:20 AM, Chris Clayton wrote: On 03/08/13 22:57, Bjorn Helgaas wrote: Thanks. I opened this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of your logs. Hi Chris, The current Linux acpiphp driver doesn't do anything unless it finds devices with _EJ0 or _RMV methods, and your DSDT has neither. But I think that implementation is incorrect because I'm not convinced that those methods are required in order to do hotplug via ACPI. For example, your DSDT *does* have an _L01 method that does notifications to the root ports. I suspect that hotplug events on your box generate an SCI that invokes that method. Linux basically ignores the resulting notify events, and I suspect that hotplug works on Windows because it is paying attention to them. Can you build a kernel with CONFIG_ACPI_DEBUG=y, do the following, and attach all the output to the bugzilla? 1) Boot with empty ExpressCard slot (without using "pcie_ports=native") 2) # echo 0x00010004 > /sys/module/acpi/parameters/debug_layer 3) # echo 0x0804 > /sys/module/acpi/parameters/debug_level 4) # lspci -vv 5) # setpci -s 1c.3 0x42.w 6) # setpci -s 1c.3 0x5a.w 7) # setpci -s 1c.3 0xd8.l 8) Insert ExpressCard 9) # setpci -s 1c.3 0x5a.w 10) # dmesg Here's what I think we'll see: - Slot Implemented (bit 8 of XCAP at 0x42) set, indicating a slot is implemented below this root port - Hot Plug SCI Enable (bit 30 of MPC at 0xd8) set, indicating that the root port should generate an SCI whenever a hotplug event is detected - Presence Detect State (bit 6 of SLTSTS at 0x5a) change from 0 with the slot empty to 1 with the slot occupied - pciehp doing nothing (since _OSC didn't grant the OS permission to use PCIe native hotplug) - dmesg indication of the SCI, leading to a Bus Check notification to \_SB.PCI0.RP04, which is the 1c.3 root port leading to the ExpressCard slot As far as I can see, your predicted outcomes where correct. I've added the logs to the bugzilla report. A Bus Check notification means we're supposed to re-enumerate starting at that device. If we *did* re-enumerate, we would find the new ExpressCard. Thanks, Bjorn Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data
In Linus' current tree, the first time the command "make kernelrelease" is run after building a kernel, the output contains some unwanted text. Subsequent uses of the command produce the expected output. This appears to be a regression - 3.8.2 does not have this problem. This is easily demonstrated from the command line by the following: ... System is 2311 kB CRC a4e38b86 Kernel: arch/x86/boot/bzImage is ready (#186) $ make kernelrelease scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc1+ $ make kernelrelease 3.9.0-rc1+ Happy to test the fix. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Bjorn, On 03/08/13 22:57, Bjorn Helgaas wrote: [+cc Rafael, in case the _OSC thing rings a bell with him] On Fri, Mar 8, 2013 at 3:44 AM, Chris Clayton wrote: On 03/08/13 00:39, Bjorn Helgaas wrote: On Thu, Mar 7, 2013 at 1:21 PM, Chris Clayton wrote: On 03/07/13 17:30, Bjorn Helgaas wrote: 3) HVR-1400 not being recognized when inserted. This is a PCI hotplug issue, and I *can* help with this. I don't know what your hardware is, but in general, pciehp should take care of this. If it doesn't, or if you have to use an argument like "pcie_ports=native", we should fix this. So 3) is the thing I might be able to help with. If there's still a problem here (and even having to boot with an argument is a problem), let's start by collecting complete dmesg logs, with and without your "pcie_ports" option. Boot without the card installed, then insert it and remove it. If you can use something like v3.9-rc1 with CONFIG_HOTPLUG_PCI_PCIE=y, that would be ideal. OK, I've gathered these logs using a kernel built from a pull of Linus' tree this afternoon (v3.9-rc1-108-g9f22578). Also, the cx23885 driver is still blacklisted to avoid unnecessary noise and the chance of an oops if the card springs out again when I insert it. The driver does load if it's not blacklisted (and the pcie_ports=native option is present). The two logs are attached. As you will see, nothing at all happens when the pcie_ports=native option is absent. The nf_conntrack message is normally the last one from a normal boot. Perfect, thanks! It looks like something's going wrong when we evaluate _OSC. Can you collect an acpidump from your machine? A bziped file containing the output from acpidump is attached. Thanks. I opened this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of your logs. Excellent, thanks. Here's your _OSC method from the acpidump: Method (_OSC, 4, Serialized) { ... If (LAnd (LEqual (Arg0, GUID), NEXP)) ... # normal case Else { Or (CDW1, 0x04, CDW1) # "unrecognized UUID" error Return (Local0) } It fails with "unrecognized UUID" if either (1) we supply the wrong UUID or (2) "NEXP" is false. I have no idea what NEXP is; your DSDT.dsl never sets it, so maybe it's related to a BIOS setup option or something? I found a BIOS manual [1] but didn't see anything likely. I guess it might be worth you looking, or maybe trying a "reset to defaults" if it's not too destructive for you. You don't As the manual showed, there aren't too many user-changeable settings in the BIOS on this machine, so I tried a "reset to defaults". Unfortunately, it made no difference. have a copy of Windows on that box, do you? I *assume* hotplug would work fine with Windows and maybe we could figure out what it is doing differently. Yes, I have Windows 7 Home Premium (64 bit), although, as I said previously, I rarely boot it. One occasion when I usually do is when I buy new hardware. The hotplug works fine in Windows with the two expresscards that I own. I'm happy to help work out what's different on Windows, but I have no diagnostic tools installed, so I might need some hand-holding. One immediate difference is that my Windows installation is 64bit whereas my linux installation is 32 bit. Thanks for your help so far, Chris Bjorn [1] http://solutions.us.fujitsu.com/www/content/pdf/SupportGuides/AH530_BIOS_Guide_FPC58-2843-01_rA.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel 3.6.7: sysfs: cannot create duplicate filename warnings
Hi, I've been investigating problem with my Hauppauge WinTV HVR-1400 TV card (an expresscard) [1], and I've come across warnings produced when I unload and then reload the cx23885 driver. Attached is the output from dmesg that shows the card being detected (by the PCI express hotplug driver) when it is inserted, the drivers being loaded and unloaded and, finally, the driver being reloaded and producing the warnings. I get the same warning with 3.8.0-rc6+ (freshly pulled this morning). I've searched Google and find thousands of hits going back over two years, but none of the first 20 pages of hits provided a solution. Please let me know if I can provide any additional diagnostics, but cc me on any reply as I'm not subscribed. Chris [1] http://www.spinics.net/lists/linux-media/msg59468.html Plug in the hvr1400 [ 165.300637] pciehp :00:1c.3:pcie04: Card present on Slot(3) [ 165.401471] pci :02:00.0: [14f1:8852] type 00 class 0x04 [ 165.401528] pci :02:00.0: reg 10: [mem 0x-0x001f 64bit] [ 165.401678] pci :02:00.0: supports D1 D2 [ 165.401679] pci :02:00.0: PME# supported from D0 D1 D2 D3hot [ 165.409465] pci :02:00.0: BAR 0: assigned [mem 0xf0e0-0xf0ff 64bit] [ 165.409501] pcieport :00:1c.3: PCI bridge to [bus 02-06] [ 165.409505] pcieport :00:1c.3: bridge window [io 0x3000-0x3fff] [ 165.409510] pcieport :00:1c.3: bridge window [mem 0xf0d0-0xf14f] [ 165.409514] pcieport :00:1c.3: bridge window [mem 0xf040-0xf0bf 64bit pref] [ 165.409530] pci :02:00.0: no hotplug settings from platform Load the drivers [ 211.641874] cx23885 driver version 0.0.3 loaded [ 211.641897] cx23885 :02:00.0: enabling device ( -> 0002) [ 211.642000] CORE cx23885[0]: subsystem: 0070:8010, board: Hauppauge WinTV-HVR1400 [card=9,autodetected] [ 211.781181] cx23885[0] - extra delay being applied for HVR1400 [ 211.809194] tveeprom 7-0050: Hauppauge model 80019, rev B2F1, serial# 3758890 [ 211.809197] tveeprom 7-0050: MAC address is 00:0d:fe:39:5b:2a [ 211.809198] tveeprom 7-0050: tuner model is Xceive XC3028L (idx 151, type 4) [ 211.809200] tveeprom 7-0050: TV standards PAL(B/G) PAL(I) SECAM(L/L') PAL(D/D1/K) ATSC/DVB Digital (eeprom 0xf4) [ 211.809201] tveeprom 7-0050: audio processor is CX23885 (idx 39) [ 211.809202] tveeprom 7-0050: decoder processor is CX23885 (idx 33) [ 211.809203] tveeprom 7-0050: has radio [ 211.809204] cx23885[0]: hauppauge eeprom: model=80019 [ 211.809205] cx23885_dvb_register() allocating 1 frontend(s) [ 211.809207] cx23885[0]: cx23885 based dvb card [ 211.915346] xc2028 8-0064: creating new instance [ 211.915349] xc2028 8-0064: type set to XCeive xc2028/xc3028 tuner [ 211.915353] DVB: registering new adapter (cx23885[0]) [ 211.915357] cx23885 :02:00.0: DVB: registering adapter 0 frontend 0 (DiBcom 7000PC)... [ 211.915576] cx23885_dev_checkrevision() Hardware revision = 0xb0 [ 211.915582] cx23885[0]/0: found at :02:00.0, rev: 2, irq: 19, latency: 0, mmio: 0xf0e0 [ 211.977653] xc2028 8-0064: Loading 81 firmware images from xc3028L-v36.fw, type: xc2028 firmware, ver 3.6 Unload the drivers [ 228.016560] xc2028 8-0064: destroying instance Reload the drivers [ 240.384907] cx23885 driver version 0.0.3 loaded [ 240.385099] CORE cx23885[0]: subsystem: 0070:8010, board: Hauppauge WinTV-HVR1400 [card=9,autodetected] [ 240.524290] cx23885[0] - extra delay being applied for HVR1400 [ 240.552265] tveeprom 7-0050: Hauppauge model 80019, rev B2F1, serial# 3758890 [ 240.552267] tveeprom 7-0050: MAC address is 00:0d:fe:39:5b:2a [ 240.552268] tveeprom 7-0050: tuner model is Xceive XC3028L (idx 151, type 4) [ 240.552269] tveeprom 7-0050: TV standards PAL(B/G) PAL(I) SECAM(L/L') PAL(D/D1/K) ATSC/DVB Digital (eeprom 0xf4) [ 240.552270] tveeprom 7-0050: audio processor is CX23885 (idx 39) [ 240.552271] tveeprom 7-0050: decoder processor is CX23885 (idx 33) [ 240.552272] tveeprom 7-0050: has radio [ 240.552273] cx23885[0]: hauppauge eeprom: model=80019 [ 240.552275] cx23885_dvb_register() allocating 1 frontend(s) [ 240.552277] cx23885[0]: cx23885 based dvb card [ 240.626253] xc2028 8-0064: creating new instance [ 240.626255] xc2028 8-0064: type set to XCeive xc2028/xc3028 tuner [ 240.626258] DVB: registering new adapter (cx23885[0]) [ 240.626263] cx23885 :02:00.0: DVB: registering adapter 0 frontend 0 (DiBcom 7000PC)... [ 240.626316] xc2028 8-0064: Loading 81 firmware images from xc3028L-v36.fw, type: xc2028 firmware, ver 3.6 [ 240.626366] [ cut here ] [ 240.626371] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xae/0xe0() [ 240.626372] Hardware name: LIFEBOOK AH531 [ 240.626373] sysfs: cannot create duplicate filename '/devices/pci:00/:00:1c.3/:02:00.0/dvb/dvb0.frontend0' [ 240.626374] Modules linked in: cx23885(+) tveeprom btcx_risc videobuf_dvb cx2341x videobuf_dma_sg videobuf_core dib7000p d
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Martin, On 01/28/13 21:02, Martin Mokrejs wrote: Hi Chris, Chris Clayton wrote: Hi Martin, On 01/28/13 12:12, Martin Mokrejs wrote: Chris Clayton wrote: [snip] [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 ^^ **typo** I've run the test again with pcie_ports=native and the directories now get populated. Even better though, is that when I plug in the card, hotplug **works** and the card's drivers are loaded. BTW, I have with acpiphp on 3.7.4: ls -la /sys/bus/pci_express/devices total 0 drwxr-xr-x 2 root root 0 Jan 28 13:07 . drwxr-xr-x 4 root root 0 Jan 28 13:07 .. $ ls -la /sys/bus/pci/devices/slots **typo** It should be /sys/bus/pci/slots. ls: cannot access /sys/bus/pci/devices/slots: No such file or directory $ With acpiphp, I get /sys/bus/pci_express/devices populated but /sys/bus/pci/slots is empty. OK, I haven't realized the typo, but I have here with acpiphp: # ls -laR /sys/bus/pci/slots /sys/bus/pci/slots: total 0 drwxr-xr-x 3 root root 0 Jan 27 17:14 . drwxr-xr-x 5 root root 0 Jan 25 15:56 .. drwxr-xr-x 2 root root 0 Jan 27 17:14 1 /sys/bus/pci/slots/1: total 0 drwxr-xr-x 2 root root0 Jan 27 17:14 . drwxr-xr-x 3 root root0 Jan 27 17:14 .. -r--r--r-- 1 root root 4096 Jan 28 21:31 adapter -r--r--r-- 1 root root 4096 Jan 27 17:14 address -rw-r--r-- 1 root root 4096 Jan 28 21:31 attention -r--r--r-- 1 root root 4096 Jan 28 21:31 cur_bus_speed -r--r--r-- 1 root root 4096 Jan 28 21:31 latch -r--r--r-- 1 root root 4096 Jan 28 21:31 max_bus_speed lrwxrwxrwx 1 root root0 Jan 28 21:31 module -> ../../../../module/acpiphp -rw-r--r-- 1 root root 4096 Jan 28 21:31 power # And for me hotplug also works (as far as I can tell). ;-) Excellent! Thank you so much for your help (and patience) Martin and Yijing. Now to solving why running scandvb doesn't find any TV channels. Would be fine if you could re-do the PresDet checks and confirm whether it is also broken for you under pciehp. I've struggled with this a little. For some reason, the expresscard doesn't always stay properly inserted in the slot when I insert it. Now that hotplug is working, the modules are being loaded and when the card pops out again, I get an oops because, of course, the driver is running and the card disappears. Perhaps the driver can be made a bit more robust to sudden disappearance of the card. I'll report the Yes, I had or maybe still have same issues here. I used to get an Oops for sata_sil24 card weird behavior for USB3.0 NEC-based card. It was fine always for a VIA-based firewire card and serial PL2303-based one. I found out it is better if a usb device is connected to the USB card because if that slips out then the libata layer quickly realizes that. If there was no device connected, the usb waits too long before it removes the usb hub from the system. And if you plugin the card meanwhile back into the slot, weird thing happen. My usb3 expresscard device has arrived and I get an oops with that too, if I remove it without unloading the driver first. I guess it shouldn't be a surprise that the driver isn't expecting the device to disappear. As I mentioned, I have some trouble with the WinTV-HVR-1400 card, which sometimes pops out again, if I push it into the slot too hard (but I'm geeting better at that with practice). So what I've done (with the usb3 card too) to avoid the oopsen is blacklist the driver in /etc/modprobe.d/blacklist.conf and then load them when I'm sure the card is properly inserted. Not exactly hotplug, but at least I don't have to reboot because of an oops- and it's not something I'm doing several times an hour. Chris oops later. Anyway, to run these tests I built a kernel without the dvb card's drivers, effectively simulating the situation I had before Yijing got hotplug working for me. The card popping out may also have affected these diffs a bit because, for example, the first one has the CorrErr flag changed, possibly because I had to have two or more goes at getting the card to lock in the slot. Yesterday that diff showed no changes. Anyway, here are the diffs: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt 262c262 < DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- --- DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- 295c295 < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 --- 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt BTW, with the NEC-based card only after every second removal of the card I got into PresDet- state. So, on every other diff attempt you won't see a differenc
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Martin, On 01/28/13 12:12, Martin Mokrejs wrote: Chris Clayton wrote: [snip] [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 ^^ **typo** I've run the test again with pcie_ports=native and the directories now get populated. Even better though, is that when I plug in the card, hotplug **works** and the card's drivers are loaded. BTW, I have with acpiphp on 3.7.4: ls -la /sys/bus/pci_express/devices total 0 drwxr-xr-x 2 root root 0 Jan 28 13:07 . drwxr-xr-x 4 root root 0 Jan 28 13:07 .. $ ls -la /sys/bus/pci/devices/slots **typo** It should be /sys/bus/pci/slots. ls: cannot access /sys/bus/pci/devices/slots: No such file or directory $ With acpiphp, I get /sys/bus/pci_express/devices populated but /sys/bus/pci/slots is empty. And for me hotplug also works (as far as I can tell). ;-) Excellent! Thank you so much for your help (and patience) Martin and Yijing. Now to solving why running scandvb doesn't find any TV channels. Would be fine if you could re-do the PresDet checks and confirm whether it is also broken for you under pciehp. I've struggled with this a little. For some reason, the expresscard doesn't always stay properly inserted in the slot when I insert it. Now that hotplug is working, the modules are being loaded and when the card pops out again, I get an oops because, of course, the driver is running and the card disappears. Perhaps the driver can be made a bit more robust to sudden disappearance of the card. I'll report the oops later. Anyway, to run these tests I built a kernel without the dvb card's drivers, effectively simulating the situation I had before Yijing got hotplug working for me. The card popping out may also have affected these diffs a bit because, for example, the first one has the CorrErr flag changed, possibly because I had to have two or more goes at getting the card to lock in the slot. Yesterday that diff showed no changes. Anyway, here are the diffs: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt 262c262 < DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- --- > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- 295c295 < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 --- > 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt = diff lspci.before_insertion.txt lspci.after_1st_removal.txt 112c112 < 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0 --- > 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0 262,263c262,263 < DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us --- > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us 265c265 < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- 273c273 < Changed: MRL- PresDet- LinkState- --- > Changed: MRL- PresDet- LinkState+ 295,296c295,296 < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 < 50: 03 00 01 10 60 b2 1c 00 28 00 00 00 00 00 00 00 --- > 40: 10 80 42 01 00 80 00 00 00 00 11 00 12 3c 12 04 > 50: 40 00 11 50 60 b2 1c 00 28 00 00 01 00 00 00 00 === diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt diff lspci.before_insertion.txt lspci.after_insertion.txt 112c112 < 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 08 c0 --- > 60: 20 20 ff 07 00 00 00 00 01 00 00 00 00 00 00 c0 263c263 < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us --- > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us 265c265 < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- >
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
[snip] [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 ^^ **typo** I've run the test again with pcie_ports=native and the directories now get populated. Even better though, is that when I plug in the card, hotplug **works** and the card's drivers are loaded. Excellent! Thank you so much for your help (and patience) Martin and Yijing. Now to solving why running scandvb doesn't find any TV channels. Chris [snip] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
[no one screamed, so linux-media ml dropped] Hi Martin, On 01/28/13 10:56, Martin Mokrejs wrote: Chris Clayton wrote: Hi Yijing, On 01/28/13 02:40, Yijing Wang wrote: Hi Chris, Sorry for the delay reply. It seems like my reply last night was missed. From the sysinfo you provide, there are no pcie port devices under /sys/bus/pci_express/devices. Maybe because there are some problems with _OSC in your laptop, so pcie port driver won't create pcie port device for hotplug, aer and so on. Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop. Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules. After above actions, enter /sys/bus/pci_express/devices/ directory and /sys/bus/pci/slots/ Some slots and pcie port devices should be there now. Sorry, I've tried your suggestion, but the two directories are still empty. I verified the test environment as follows: [chris:~]$ uname -a Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux [chris:~]$ grep acpiphp /boot/System.map-3.7.4 [chris:~]$ modinfo acpiphp modinfo: ERROR: Module acpiphp not found. [chris:~]$ modinfo pciehp filename: /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko license:GPL description:PCI Express Hot Plug Controller Driver author: Dan Zink , Greg Kroah-Hartman , Dely Sy depends: intree: Y vermagic: 3.7.4 SMP preempt mod_unload CORE2 parm: pciehp_detect_mode:Slot detection mode: pcie, acpi, auto pcie - Use PCIe based slot detection acpi - Use ACPI for slot detection auto(default) - Auto select mode. Use acpi option if duplicate slot ids are found. Otherwise, use pcie option (charp) parm: pciehp_debug:Debugging mode enabled or not (bool) parm: pciehp_poll_mode:Using polling mechanism for hot-plug events or not (bool) parm: pciehp_poll_time:Polling mechanism frequency, in seconds (int) parm: pciehp_force:Force pciehp, even if OSHP is missing (bool) [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1 [chris:~]$ lsmod Module Size Used by pciehp 19907 0 [...] You will notice that the kernel I have used is 3.7.4. I hope that's a suitable kernel for your tests. I've moved away from the 3.8 development kernel onto one that's stable and on which Martin has identified a solution. I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll upgrade to that, run the tests again and report back. One question - should I include the (acpi) pci_slot driver in the kernel build or does pciehp populate the directories without pci_slot? Hi Chris, I am not a kernel developer but from the other threads at linux-pci I gathered there are in some scenarios problems with improper loading of the hotplug modules. Therefore, the patches floating now around are to disable hotplug module availability. Therefore, I suggested you to try only only static kernel support for hotplug. That way you don't hit the issue. That is for sure not addressed in 3.7.5, seems that it is probably in -next. Martin In a few minutes I'll be sending out another reply to Yijing's suggestions because I noticed a typo in the parameter I added to the kernel command line. I'm now going back through email to remember why we were trying to get those /sys/bus/pci... directories populated. Watch this space! :-) Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Yijing, On 01/28/13 02:40, Yijing Wang wrote: Hi Chris, Sorry for the delay reply. It seems like my reply last night was missed. From the sysinfo you provide, there are no pcie port devices under /sys/bus/pci_express/devices. Maybe because there are some problems with _OSC in your laptop, so pcie port driver won't create pcie port device for hotplug, aer and so on. Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop. Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules. After above actions, enter /sys/bus/pci_express/devices/ directory and /sys/bus/pci/slots/ Some slots and pcie port devices should be there now. Sorry, I've tried your suggestion, but the two directories are still empty. I verified the test environment as follows: [chris:~]$ uname -a Linux laptop 3.7.4 #15 SMP PREEMPT Mon Jan 28 09:43:57 GMT 2013 i686 GNU/Linux [chris:~]$ grep acpiphp /boot/System.map-3.7.4 [chris:~]$ modinfo acpiphp modinfo: ERROR: Module acpiphp not found. [chris:~]$ modinfo pciehp filename: /lib/modules/3.7.4/kernel/drivers/pci/hotplug/pciehp.ko license:GPL description:PCI Express Hot Plug Controller Driver author: Dan Zink , Greg Kroah-Hartman , Dely Sy depends: intree: Y vermagic: 3.7.4 SMP preempt mod_unload CORE2 parm: pciehp_detect_mode:Slot detection mode: pcie, acpi, auto pcie - Use PCIe based slot detection acpi - Use ACPI for slot detection auto(default) - Auto select mode. Use acpi option if duplicate slot ids are found. Otherwise, use pcie option (charp) parm: pciehp_debug:Debugging mode enabled or not (bool) parm: pciehp_poll_mode:Using polling mechanism for hot-plug events or not (bool) parm: pciehp_poll_time:Polling mechanism frequency, in seconds (int) parm: pciehp_force:Force pciehp, even if OSHP is missing (bool) [chris:~]$ cat /proc/cmdline root=/dev/sda5 pciehp_ports=native ro resume=/dev/sda6 [chris:~]$ sudo modprobe pciehp pciehp_force=1 pciehp_debug=1 [chris:~]$ lsmod Module Size Used by pciehp 19907 0 [...] You will notice that the kernel I have used is 3.7.4. I hope that's a suitable kernel for your tests. I've moved away from the 3.8 development kernel onto one that's stable and on which Martin has identified a solution. I see Greg KH released 3.7.5 yesterday and it includes a pciehp change. I'll upgrade to that, run the tests again and report back. One question - should I include the (acpi) pci_slot driver in the kernel build or does pciehp populate the directories without pci_slot? Thanks again. /sys/bus/pci_express/devices: total 0 /sys/bus/pci_express/drivers: total 0 drwxr-xr-x 2 root root 0 Jan 27 13:17 pciehp/ On 2013/1/28 6:53, Chris Clayton wrote: Thanks again, Martin. Firstly, maybe we should remove the linux-media list from the copy list. I imagine this hotplug stuff is just noise to them. [snip] Do you have any other express card around to try if it works at all? Try that always after a cold boot. Not at the moment, but I ordered at USB3 expresscard yesterday, so I will have one soon. Posting a diff result of the below procedure might help: # lspci -vvvxxx > lspci.before_insertion.txt [plug your card into the slot] # lspci -vvvxxx > lspci.after_insertion.txt [ unplug your card] # lspci -vvvxxx > lspci.after_1st_removal.txt [re-plug your card into the slot] # lspci -vvvxxx > lspci.after_1st_re-insertion.txt [ unplug your card] # lspci -vvvxxx > lspci.after_2nd_removal.txt OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still under development, I've switched to 3.7.4, mainly because you are having success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as follows: [chris:~]$ cat /proc/cmdline root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6 [chris:~]$ dmesg | grep ASPM [0.00] PCIe ASPM is disabled [0.348959] pci:00: ACPI _OSC support notification failed, disabling PCIe ASPM [chris:~]$ dmesg | grep acpiphp [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [chris:~]$ dmesg | grep pciehp [chris:~]$ uname -a Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux Then compare them using diff. These should have no difference: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt Correct, there were no differences. These may have only little difference, or none: diff lspci.before_insertion.txt lspci.after_1st_removal.txt 263c263 < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us --- > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us 265c265 &l
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Thanks again, Martin. Firstly, maybe we should remove the linux-media list from the copy list. I imagine this hotplug stuff is just noise to them. [snip] Do you have any other express card around to try if it works at all? Try that always after a cold boot. Not at the moment, but I ordered at USB3 expresscard yesterday, so I will have one soon. Posting a diff result of the below procedure might help: # lspci -vvvxxx > lspci.before_insertion.txt [plug your card into the slot] # lspci -vvvxxx > lspci.after_insertion.txt [ unplug your card] # lspci -vvvxxx > lspci.after_1st_removal.txt [re-plug your card into the slot] # lspci -vvvxxx > lspci.after_1st_re-insertion.txt [ unplug your card] # lspci -vvvxxx > lspci.after_2nd_removal.txt OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still under development, I've switched to 3.7.4, mainly because you are having success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as follows: [chris:~]$ cat /proc/cmdline root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6 [chris:~]$ dmesg | grep ASPM [0.00] PCIe ASPM is disabled [0.348959] pci:00: ACPI _OSC support notification failed, disabling PCIe ASPM [chris:~]$ dmesg | grep acpiphp [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [chris:~]$ dmesg | grep pciehp [chris:~]$ uname -a Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux Then compare them using diff. These should have no difference: diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt Correct, there were no differences. These may have only little difference, or none: diff lspci.before_insertion.txt lspci.after_1st_removal.txt 263c263 < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us --- > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us 265c265 < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- 273c273 < Changed: MRL- PresDet- LinkState- --- > Changed: MRL- PresDet- LinkState+ 295,296c295,296 < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 --- > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 > 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00 diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt No difference. Finally, these should confirm whether the PresDet works for you (for me NOT with pciehp but does work with acpiphp). You should see PresDet- to PresDet+ changes in: Yes, I do see the PresDet- to PresDet+ changes diff lspci.before_insertion.txt lspci.after_insertion.txt 263c263 < LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us --- > LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us 265c265 < LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk- --- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ 267c267 < LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- --- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- 272,273c272,273 < SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- < Changed: MRL- PresDet- LinkState- --- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet- LinkState+ 295,296c295,296 < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04 < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00 --- > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04 > 50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00 diff lspci.after_1st_removal.txt lspci.after_1st_re-insertion.txt 267c267 < LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- --- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- 272c272 < SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- --- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- 296c296 < 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00 --- > 50: 40 00 11 70 60 b2 1c 00 08 00 40 01 00 00 00 00 You should see PresDet+ to PresDet- changes in: Yes, I see those changes too. diff lspci.aft
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
On 01/27/13 14:26, Martin Mokrejs wrote: Chris Clayton wrote: On 01/27/13 12:18, Yijing Wang wrote: 于 2013-01-27 19:19, Chris Clayton 写道: Hi Yijing On 01/27/13 02:45, Yijing Wang wrote: 于 2013-01-27 4:54, Chris Clayton 写道: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether "pcie_aspm=off" in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) Not loaded pciehp driver? Remove the device from this slot without poweroff ? That's correct. When I first encountered the oops, I did not have the pciehp driver loaded and removing the device from the slot whilst the laptop was powered on resulted in the oops. Hmm, that's unsafe and dangerous, because device now may be running. There are two ways to trigger pci hot-add or hot-remove in linux, after loaded pciehp or acpiphp module (the two modules only one can loaded into system at the same time). You can trigger hot-add/hot-remove by sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button on hardware (if your laptop supports that). OK, thanks for the advice. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. Hi Chris, What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 pciehp_poll_time=1 ? Can you resend the dmesg log and "lspci -vvv" info after hotplug device from your Fujitsu laptop with above module parameters? I wasn't sure whether or not the pciehp driver should be loaded on its own or with the acpiphp driver also loaded. So I built them both as modules and planned to try both, pciehp only and acpiphp only. However, I've found that acpiphp will not load (regardless of whether or not pciehp is already loaded). What I get is: [chris:~]$ sudo modprobe acpiphp debug=1 modprobe: ERROR: could not insert 'acpiphp': No such device Are you sure you had pciehp already loaded? Yes, I'm sure it was. Currently, If your hardware support pciehp native hotplug, acpiphp driver will be rejected when loading it in system (you can force loading it by add boot parameter pcie_aspm=off as Martin said). OK, thanks again for the advice. I've disabled the acpiphp driver. Pitty. For me only with acpiphp works detection of express card in the slot. With pciehp the PresDet is not updated properly upon removal/insertion and sometimes, probably as a result of the previous, PresDet on the SltSta: line of lspci is not correct. So I moved away from pciehp. I have a SandyBridge based laptop so I was hoping with your i5-based laptop you have also great chance to get rid of pciehp issues. I've just (very carefully) set this up again (i.e. no pciehp driver (module or builtin), acpiphp driver built in and pcie_aspm=off on the kernel command line (via grub). My card is not detected on insertion. :-( Thanks for your suggestions, Martin. I am grateful. Chris Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
On 01/27/13 12:18, Yijing Wang wrote: 于 2013-01-27 19:19, Chris Clayton 写道: Hi Yijing On 01/27/13 02:45, Yijing Wang wrote: 于 2013-01-27 4:54, Chris Clayton 写道: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether "pcie_aspm=off" in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) Not loaded pciehp driver? Remove the device from this slot without poweroff ? That's correct. When I first encountered the oops, I did not have the pciehp driver loaded and removing the device from the slot whilst the laptop was powered on resulted in the oops. Hmm, that's unsafe and dangerous, because device now may be running. There are two ways to trigger pci hot-add or hot-remove in linux, after loaded pciehp or acpiphp module (the two modules only one can loaded into system at the same time). You can trigger hot-add/hot-remove by sysfs interface under /sys/bus/pci/slots/[slot-name]/power or attention button on hardware (if your laptop supports that). OK, thanks for the advice. The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. Hi Chris, What about use #modprobe pciehp pciehp_debug=1 pciehp_poll_mode=1 pciehp_poll_time=1 ? Can you resend the dmesg log and "lspci -vvv" info after hotplug device from your Fujitsu laptop with above module parameters? I wasn't sure whether or not the pciehp driver should be loaded on its own or with the acpiphp driver also loaded. So I built them both as modules and planned to try both, pciehp only and acpiphp only. However, I've found that acpiphp will not load (regardless of whether or not pciehp is already loaded). What I get is: [chris:~]$ sudo modprobe acpiphp debug=1 modprobe: ERROR: could not insert 'acpiphp': No such device Currently, If your hardware support pciehp native hotplug, acpiphp driver will be rejected when loading it in system (you can force loading it by add boot parameter pcie_aspm=off as Martin said). OK, thanks again for the advice. I've disabled the acpiphp driver. and at the end of the dmesg output I see: [ 68.199789] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 68.199970] acpiphp_glue: Total 0 slots The pciehp driver loads OK. I've attached pciehp-only which shows the dmesg and lscpi output that you asked for. As I said before, the only way that I can get the card detected with rebooting the laptop is to write 1 to /sys/bus/pci/rescan. In the hope that it might help (e.g. it shows details of the expresscard I'm using), I've also attached the output from dmesg and lspci after a rescan. In this case, i guess your slot maybe always power on, once you insert your pcie card, and use rescan intercace, you can find them. I checked the WinTV-HVR-1400 expressed card device's parent port device, as bellow. I found the powerctrl in slot cap is clear. So I doubt the hardware support pci hotplug. Mmm, that's odd because I dual-boot Windows 7 on this laptop and when I plug it in under Windows 7, it appears in Device Manager and works perfectly. Chris, Can you try to add and remove device by /sys/bus/pci/slots/3/power? (use #modprobe pciehp pciehp_debug=1) /sys/bus/pci/slots is empty: [chris:~]$ ls -l /sys/bus/pci/slots/ total 0 Using Google, I find that the acpi slot detection driver should populate that directory. It is configured in and built into the kernel statically, so I don't know what's happening there. By the way, there is also /sys/bus/pci_express directory: [chris:~]$ ls -Rl /sys/bus/pci_express/ /sys/bus/pci_express/: total 0 drwxr-xr-x 2 root root0 Jan 27 12:52 devices/ drwxr-xr-x 3 root root0 Jan 27 13:07 drivers/ -rw-r--r-- 1 root root 4096 Jan 27 13:07 drivers_autoprobe --w--- 1
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Martin, On 01/26/13 21:14, Martin Mokrejs wrote: Hi Chris, Chris Clayton wrote: Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether "pcie_aspm=off" in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. That's what I suspected. Compile in statically acpiphp, no pciehp at all (not even as a module). Then it might work for you -- at least it does for me, provided I use "pcie_aspm=off". Thanks again for the suggestion. Unfortunately, that doesn't fix the problem on my laptop. You may have seen the suggestion I've had from Yijing. I'm just building the kernel to test that out. Chris Martin Thanks again. Chris Martin Chris Clayton wrote: Hi, I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I removed from the expresscard slot in my laptop. I will quite understand if the response to this report is "don't do that!", but in that case, how should one remove one of these cards? I have attached three files: 1. the dmesg output from when I rebooted the machine after the oops. I have turned debugging on in the dib700p and cx23885 modules via modules options in /etc/modprobe.d/hvr1400.conf; 2. the .config file for the kernel that oopsed. 3. the text of the oops message. I've typed this up from a photograph of the screen because the laptop was locked up and there was nothing in the log files. Apologies for any typos, but I have tried to be careful. Assuming the answer isn't don't do that, let me know if I can provide any additional diagnostics, test any patches, etc. Please, however, cc me as I'm not subscribed. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner
Hi Martin, On 01/24/13 19:21, Martin Mokrejs wrote: Hi Chris, try to include in kernel only acpiphp and omit pciehp. Don't use modules but include them statically. And try, in addition, check whether "pcie_aspm=off" in grub.conf helped. Thanks for the tip. I had the pciehp driver installed, but it was a module and not loaded. I didn't have acpiphp enabled at all. Building them both in statically, appears to have papered over the cracks of the oops :-) The best would if you subscribe to linux-pci, and read my recent threads about similar issues I had with express cards with Dell Vostro 3550. Further, there is a lot of changes to PCI hotplug done by Yingahi Liu and Rafael Wysockij, just browse the archives of linux-pci and see the pacthes and the discussion. Those discussions are way above my level of knowledge. I guess all this work will be merged into mainline in due course, so I'll watch for them in 3.9 or later. Unless, of course, there is a tree I could clone and help test the changes with my laptop and expresscard. Hotplug isn't working at all on my Fujitsu laptop, so I can only get the card recognised by rebooting with the card inserted (or by writing 1 to/sys/bus/pci/rescan). There seem to be a few reports on this in the kernel bugzilla, so I'll look through them and see what's being done. Thanks again. Chris Martin Chris Clayton wrote: Hi, I've today taken delivery of a WinTV-HVR-1400 expresscard TV Tuner and got an Oops when I removed from the expresscard slot in my laptop. I will quite understand if the response to this report is "don't do that!", but in that case, how should one remove one of these cards? I have attached three files: 1. the dmesg output from when I rebooted the machine after the oops. I have turned debugging on in the dib700p and cx23885 modules via modules options in /etc/modprobe.d/hvr1400.conf; 2. the .config file for the kernel that oopsed. 3. the text of the oops message. I've typed this up from a photograph of the screen because the laptop was locked up and there was nothing in the log files. Apologies for any typos, but I have tried to be careful. Assuming the answer isn't don't do that, let me know if I can provide any additional diagnostics, test any patches, etc. Please, however, cc me as I'm not subscribed. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
On 12/03/12 23:17, Andrew Morton wrote: On Sun, 02 Dec 2012 19:55:09 + Chris Clayton wrote: On 11/29/12 10:52, Chris Clayton wrote: On 11/28/12 23:52, Andrew Morton wrote: On Wed, 21 Nov 2012 23:09:46 +0800 Jiang Liu wrote: Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap How are people to test this? "does it boot"? I've been running kernels with Gerry's 5 patches applied for 11 days now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I joined the conversation because my laptop would not resume from suspend to disk - it either froze or rebooted. With the patches applied the laptop does successfully resume and has been stable. Since Monday, I have have been running a kernel with the patches (plus, from today, the patch you mailed yesterday) applied to 3.7rc7, without problems. I've been running 3.7-rc7 with the patches listed below for a week now and it has been perfectly stable. In particular, my laptop will now successfully resume from suspend to disk, which always failed without the patches. From Jiang Liu: 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator From Andrew Morton: 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch Tested-by: Chris Clayton Thanks. I have only two of these five patches queued for 3.8: mm-introduce-new-field-managed_pages-to-struct-zone.patch and mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch. I don't recall what happened with the other three. Gerry posted version 1 of the five patches on 18 November. Version 2 of the first and fourth patches were posted on 21 November. BTW, the title of Andrew's patch that I applied is wrong above. It should be mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap-fix. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
On 12/02/12 19:55, Chris Clayton wrote: On 11/29/12 10:52, Chris Clayton wrote: On 11/28/12 23:52, Andrew Morton wrote: On Wed, 21 Nov 2012 23:09:46 +0800 Jiang Liu wrote: Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap How are people to test this? "does it boot"? I've been running kernels with Gerry's 5 patches applied for 11 days now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I joined the conversation because my laptop would not resume from suspend to disk - it either froze or rebooted. With the patches applied the laptop does successfully resume and has been stable. Since Monday, I have have been running a kernel with the patches (plus, from today, the patch you mailed yesterday) applied to 3.7rc7, without problems. I've been running 3.7-rc7 with the patches listed below for a week now and it has been perfectly stable. In particular, my laptop will now successfully resume from suspend to disk, which always failed without the patches. I should have said, of course, that it was -rc6 and earlier that would not boot without Jiang Liu's patches. I applied those patches to rc-6 and my resume after suspend to disk problem was fixed. For a subsequent week I have been running with the patches applied to -rc7, with Andrew's patch also applied for the last 3 days. -rc7 was not subject to the resume problem because the patch which broke it had been reverted. All this has been on a 64bit laptop, but running a 32bit kernel with HIGHMEM. Apologies for yesterday's inaccuracy. I shouldn't send testing reports when I'm in a hurry. From Jiang Liu: 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator From Andrew Morton: 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch Tested-by: Chris Clayton Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
On 11/29/12 10:52, Chris Clayton wrote: On 11/28/12 23:52, Andrew Morton wrote: On Wed, 21 Nov 2012 23:09:46 +0800 Jiang Liu wrote: Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap How are people to test this? "does it boot"? I've been running kernels with Gerry's 5 patches applied for 11 days now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I joined the conversation because my laptop would not resume from suspend to disk - it either froze or rebooted. With the patches applied the laptop does successfully resume and has been stable. Since Monday, I have have been running a kernel with the patches (plus, from today, the patch you mailed yesterday) applied to 3.7rc7, without problems. I've been running 3.7-rc7 with the patches listed below for a week now and it has been perfectly stable. In particular, my laptop will now successfully resume from suspend to disk, which always failed without the patches. From Jiang Liu: 1. [RFT PATCH v2 1/5] mm: introduce new field "managed_pages" to struct zone 2. [RFT PATCH v1 2/5] mm: replace zone->present_pages with zone->managed_pages if appreciated 3. [RFT PATCH v1 3/5] mm: set zone->present_pages to number of existing pages in the zone 4. [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap 5. [RFT PATCH v1 5/5] mm: increase totalram_pages when free pages allocated by bootmem allocator From Andrew Morton: 6. mm-provide-more-accurate-estimation-of-pages-occupied-by-memmap.patch Tested-by: Chris Clayton Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap
On 11/28/12 23:52, Andrew Morton wrote: On Wed, 21 Nov 2012 23:09:46 +0800 Jiang Liu wrote: Subject: Re: [RFT PATCH v2 4/5] mm: provide more accurate estimation of pages occupied by memmap How are people to test this? "does it boot"? I've been running kernels with Gerry's 5 patches applied for 11 days now. This is on a 64bit laptop but with a 32bit kernel + HIGHMEM. I joined the conversation because my laptop would not resume from suspend to disk - it either froze or rebooted. With the patches applied the laptop does successfully resume and has been stable. Since Monday, I have have been running a kernel with the patches (plus, from today, the patch you mailed yesterday) applied to 3.7rc7, without problems. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
On 11/22/12 09:23, Chris Clayton wrote: This patchset has only been tested on x86_64 with nobootmem.c. So need help to test this patchset on machines: 1) use bootmem.c 2) have highmem This patchset applies to "f4a75d2e Linux 3.7-rc6" from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that the kernel allows my system to resume from a suspend to disc. Although my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM): [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config CONFIG_X86_32=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y # CONFIG_X86_32_IRIS is not set # CONFIG_NOHIGHMEM is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_HIGHMEM=y I can also say that a quick browse of the output of dmesg, shows nothing out of the ordinary. I have insufficient knowledge to comment on the patches, but I will run the kernel over the next few days and report back later in the week. Well, I've been running the kernel since Sunday and have had no problems with my normal work mix of browsing, browsing the internet, video editing, listening to music and building software. I'm now running a kernel that build with the new patches 1 and 4 from yesterday (plus the original 1, 2 and 5). All seems OK so far, including a couple of resumes from suspend to disk. -rc6 with Gerry's patches has run fine, including numerous resumes from suspend to disk, which fails (freezing or rebooting) without the patches. I've now applied the patches to -rc7 (they apply with a few offsets, but look OK) and will run that for a day or two. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
This patchset has only been tested on x86_64 with nobootmem.c. So need help to test this patchset on machines: 1) use bootmem.c 2) have highmem This patchset applies to "f4a75d2e Linux 3.7-rc6" from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that the kernel allows my system to resume from a suspend to disc. Although my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM): [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config CONFIG_X86_32=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y # CONFIG_X86_32_IRIS is not set # CONFIG_NOHIGHMEM is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_HIGHMEM=y I can also say that a quick browse of the output of dmesg, shows nothing out of the ordinary. I have insufficient knowledge to comment on the patches, but I will run the kernel over the next few days and report back later in the week. Well, I've been running the kernel since Sunday and have had no problems with my normal work mix of browsing, browsing the internet, video editing, listening to music and building software. I'm now running a kernel that build with the new patches 1 and 4 from yesterday (plus the original 1, 2 and 5). All seems OK so far, including a couple of resumes from suspend to disk. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFT PATCH v1 0/5] fix up inaccurate zone->present_pages
Hi Gerry. On 11/18/12 16:07, Jiang Liu wrote: The commit 7f1290f2f2a4 ("mm: fix-up zone present pages") tries to resolve an issue caused by inaccurate zone->present_pages, but that fix is incomplete and causes regresions with HIGHMEM. And it has been reverted by commit 5576646 revert "mm: fix-up zone present pages" This is a following-up patchset for the issue above. It introduces a new field named "managed_pages" to struct zone, which counts pages managed by the buddy system from the zone. And zone->present_pages is used to count pages existing in the zone, which is spanned_pages - absent_pages. But that way, zone->present_pages will be kept in consistence with pgdat->node_present_pages, which is sum of zone->present_pages. This patchset has only been tested on x86_64 with nobootmem.c. So need help to test this patchset on machines: 1) use bootmem.c 2) have highmem This patchset applies to "f4a75d2e Linux 3.7-rc6" from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git I've applied the five patches to Linus' 3.7.0-rc6 and can confirm that the kernel allows my system to resume from a suspend to disc. Although my laptop is 64 bit, I run a 32 bit kernel with HIGHMEM (I have 8GB RAM): [chris:~/kernel/tmp/linux-3.7-rc6-resume]$ grep -E HIGHMEM\|X86_32 .config CONFIG_X86_32=y CONFIG_X86_32_SMP=y CONFIG_X86_32_LAZY_GS=y # CONFIG_X86_32_IRIS is not set # CONFIG_NOHIGHMEM is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_HIGHMEM=y I can also say that a quick browse of the output of dmesg, shows nothing out of the ordinary. I have insufficient knowledge to comment on the patches, but I will run the kernel over the next few days and report back later in the week. Chris Any comments and helps are welcomed! Jiang Liu (5): mm: introduce new field "managed_pages" to struct zone mm: replace zone->present_pages with zone->managed_pages if appreciated mm: set zone->present_pages to number of existing pages in the zone mm: provide more accurate estimation of pages occupied by memmap mm: increase totalram_pages when free pages allocated by bootmem allocator include/linux/mmzone.h |1 + mm/bootmem.c | 14 mm/memory_hotplug.c|6 mm/mempolicy.c |2 +- mm/nobootmem.c | 15 mm/page_alloc.c| 89 +++- mm/vmscan.c| 16 - mm/vmstat.c|8 +++-- 8 files changed, 108 insertions(+), 43 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/15/12 19:24, Andrew Morton wrote: On Wed, 14 Nov 2012 22:52:03 +0800 Jiang Liu wrote: So how about totally reverting the changeset 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 and I will post another version once I found a cleaner way? We do need to get this regression fixed and I guess that a straightforward revert is an acceptable way of doing that, for now. I queued the below, with a plan to send it to Linus next week. I've applied this patch to v3.7-rc5-28-g79e979e and can confirm that it fixes the problem I had with my laptop failing to resume (by either freezing or rebooting) after a suspend to disk. Tested-by: Chris Clayton From: Andrew Morton Subject: revert "mm: fix-up zone present pages" Revert commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu AuthorDate: Mon Oct 8 16:33:06 2012 -0700 Commit: Linus Torvalds CommitDate: Tue Oct 9 16:22:54 2012 +0900 mm: fix-up zone present pages That patch tried to fix a issue when calculating zone->present_pages, but it caused a regression on 32bit systems with HIGHMEM. With that changeset, reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when the boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Various options for improving the situation are being discussed but for now, let's return to the 3.6 code. Cc: Jianguo Wu Cc: Jiang Liu Cc: Petr Tesarik Cc: "Luck, Tony" Cc: Mel Gorman Cc: Yinghai Lu Cc: Minchan Kim Cc: Johannes Weiner Cc: David Rientjes Signed-off-by: Andrew Morton --- arch/ia64/mm/init.c |1 - include/linux/mm.h |4 mm/bootmem.c| 10 +- mm/memory_hotplug.c |7 --- mm/nobootmem.c |3 --- mm/page_alloc.c | 34 -- 6 files changed, 1 insertion(+), 58 deletions(-) diff -puN arch/ia64/mm/init.c~revert-1 arch/ia64/mm/init.c --- a/arch/ia64/mm/init.c~revert-1 +++ a/arch/ia64/mm/init.c @@ -637,7 +637,6 @@ mem_init (void) high_memory = __va(max_low_pfn * PAGE_SIZE); - reset_zone_present_pages(); for_each_online_pgdat(pgdat) if (pgdat->bdata->node_bootmem_map) totalram_pages += free_all_bootmem_node(pgdat); diff -puN include/linux/mm.h~revert-1 include/linux/mm.h --- a/include/linux/mm.h~revert-1 +++ a/include/linux/mm.h @@ -1684,9 +1684,5 @@ static inline unsigned int debug_guardpa static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ -extern void reset_zone_present_pages(void); -extern void fixup_zone_present_pages(int nid, unsigned long start_pfn, - unsigned long end_pfn); - #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff -puN mm/bootmem.c~revert-1 mm/bootmem.c --- a/mm/bootmem.c~revert-1 +++ a/mm/bootmem.c @@ -198,8 +198,6 @@ static unsigned long __init free_all_boo int order = ilog2(BITS_PER_LONG); __free_pages_bootmem(pfn_to_page(start), order); - fixup_zone_present_pages(page_to_nid(pfn_to_page(start)), - start, start + BITS_PER_LONG); count += BITS_PER_LONG; start += BITS_PER_LONG; } else { @@ -210,9 +208,6 @@ static unsigned long __init free_all_boo if (vec & 1) { page = pfn_to_page(start + off); __free_pages_bootmem(page, 0); - fixup_zone_present_pages( - page_to_nid(page), - start + off, start + off + 1); count++; } vec >>= 1; @@ -226,11 +221,8 @@ static unsigned long __init free_all_boo pages = bdata->node_low_pfn - bdata->node_min_pfn; pages = bootmem_bootmap_pages(pages); count += pages; - while (pages--) { - fixup_zone_present_pages(page_to_nid(page), - page_to_pfn(page), page_to_pfn(page) + 1); + while (pages--) __free_pages_bootmem(page++, 0); - } bdebug("nid=%td released=%lx\n", bdata - bootmem_node_data, count); diff -puN mm/memory_hotplug.c~revert-1 mm/memory_hotplug.c --- a/mm/memory_hotplug.c~revert-1 +++ a/mm/memory_hotplug.c @@ -106,7 +106,6 @@ static void get_page_bootmem(unsigned lo void __ref put_page_bootmem(struct page *page) { unsigned long type; - struct zon
Re: [PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
On 11/06/12 01:31, Jiang Liu wrote: Changeset 7f1290f2f2 tries to fix a issue when calculating zone->present_pages, but it causes a regression to 32bit systems with HIGHMEM. With that changeset, function reset_zone_present_pages() resets all zone->present_pages to zero, and fixup_zone_present_pages() is called to recalculate zone->present_pages when boot allocator frees core memory pages into buddy allocator. Because highmem pages are not freed by bootmem allocator, all highmem zones' present_pages becomes zero. Actually there's no need to recalculate present_pages for highmem zone because bootmem allocator never allocates pages from them. So fix the regression by skipping highmem in function reset_zone_present_pages() and fixup_zone_present_pages(). Signed-off-by: Jiang Liu Signed-off-by: Jianguo Wu Reported-by: Maciej Rutecki Tested-by: Maciej Rutecki Cc: Chris Clayton Cc: Rafael J. Wysocki Cc: Andrew Morton Cc: Mel Gorman Cc: Minchan Kim Cc: KAMEZAWA Hiroyuki Cc: Michal Hocko Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Hi Maciej, Thanks for reporting and bisecting. We have analyzed the regression and worked out a patch for it. Could you please help to verify whether it fix the regression? Thanks! Gerry Thanks Gerry. I've applied this patch to 3.7.0-rc4 and can confirm that it fixes the problem I had with my laptop failing to resume after a suspend to disk. Tested-by: Chris Clayton --- mm/page_alloc.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..2311f15 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6108,7 +6108,8 @@ void reset_zone_present_pages(void) for_each_node_state(nid, N_HIGH_MEMORY) { for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; - z->present_pages = 0; + if (!is_highmem(z)) + z->present_pages = 0; } } } @@ -6123,10 +6124,11 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn, for (i = 0; i < MAX_NR_ZONES; i++) { z = NODE_DATA(nid)->node_zones + i; + if (is_highmem(z)) + continue; + zone_start_pfn = z->zone_start_pfn; zone_end_pfn = zone_start_pfn + z->spanned_pages; - - /* if the two regions intersect */ if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn)) z->present_pages += min(end_pfn, zone_end_pfn) - max(start_pfn, zone_start_pfn); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-3.7.0-rc3+: Warning found in log files
Hi, Whilst browsing the log file on my laptop (looking for diagnostics for a problem I have resuming from a suspend to disc), I spotted the warning below. I see that vlc is mentioned. I did have a problem yesterday when, through vlc's UPNP functionality, I was unable to play a video stored on my NAS. I can't be sure whether that is related to this warning because I parked it and got on with investigating my resume problem. I've had few goes at playing the video this morning and I still can't get it to play via upnp, but the warning hasn't appeared, so this may be a read herring. Nov 4 10:26:14 laptop kernel: [ cut here ] Nov 4 10:26:15 laptop kernel: WARNING: at arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0xa6/0xd0() Nov 4 10:26:15 laptop kernel: Hardware name: LIFEBOOK AH531 Nov 4 10:26:15 laptop kernel: empty IPI mask Nov 4 10:26:15 laptop kernel: Modules linked in: r8169 iptable_filter xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack coretemp hwmon kvm_intel snd_hda_codec_hdmi usb_storage kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec fujitsu_laptop [last unloaded: r8169] Nov 4 10:26:15 laptop kernel: Pid: 25461, comm: vlc Not tainted 3.7.0-rc3+ #39 Nov 4 10:26:15 laptop kernel: Call Trace: Nov 4 10:26:15 laptop kernel: [] ? warn_slowpath_common+0x78/0xb0 Nov 4 10:26:15 laptop kernel: [] ? default_send_IPI_mask_logical+0xa6/0xd0 Nov 4 10:26:15 laptop kernel: [] ? default_send_IPI_mask_logical+0xa6/0xd0 Nov 4 10:26:15 laptop kernel: [] ? warn_slowpath_fmt+0x33/0x40 Nov 4 10:26:15 laptop kernel: [] ? default_send_IPI_mask_logical+0xa6/0xd0 Nov 4 10:26:15 laptop kernel: [] ? native_send_call_func_ipi+0x3f/0x60 Nov 4 10:26:15 laptop kernel: [] ? smp_call_function_many+0x178/0x220 Nov 4 10:26:15 laptop kernel: [] ? do_flush_tlb_all+0x60/0x60 Nov 4 10:26:15 laptop kernel: [] ? native_flush_tlb_others+0x28/0x30 Nov 4 10:26:15 laptop kernel: [] ? flush_tlb_page+0x82/0xd0 Nov 4 10:26:15 laptop kernel: [] ? ptep_set_access_flags+0x61/0x70 Nov 4 10:26:15 laptop kernel: [] ? do_wp_page+0x33a/0x7f0 Nov 4 10:26:15 laptop kernel: [] ? enqueue_task_fair+0x9f/0x1e0 Nov 4 10:26:15 laptop kernel: [] ? handle_pte_fault+0x49d/0x870 Nov 4 10:26:15 laptop kernel: [] ? move_addr_to_user+0x69/0x80 Nov 4 10:26:15 laptop kernel: [] ? sys_getpeername+0x65/0xa0 Nov 4 10:26:15 laptop kernel: [] ? handle_mm_fault+0xda/0x140 Nov 4 10:26:15 laptop kernel: [] ? __do_page_fault+0x141/0x460 Nov 4 10:26:15 laptop kernel: [] ? generic_smp_call_function_interrupt+0x73/0x160 Nov 4 10:26:16 laptop kernel: [] ? irq_exit+0x54/0x90 Nov 4 10:26:16 laptop kernel: [] ? call_function_interrupt+0x2d/0x34 Nov 4 10:26:16 laptop kernel: [] ? vmalloc_sync_all+0x10/0x10 Nov 4 10:26:16 laptop kernel: [] ? error_code+0x5a/0x60 Nov 4 10:26:16 laptop kernel: [] ? vmalloc_sync_all+0x10/0x10 Nov 4 10:26:16 laptop kernel: ---[ end trace ceb5cce1ffac0038 ]--- Let me know if I can provide any additional diagnostics, but please cc me as I'm not subscribed. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/