Re: [PATCH] acpi, numa, ia64: Parse all entries of SRAT memory affinity table
On Thu, Nov 16, 2017 at 12:16 AM, Luck, Tonywrote: >> This check is already added in x86 and extending same to ia64. > > Looks OK. > > Acked-by: Tony Luck thanks Tony. ACPI Maintainers, any comments on this patch? i can send rebasing to 4.15-rc1? thanks Ganapat
Re: [PATCH] acpi, numa, ia64: Parse all entries of SRAT memory affinity table
On Thu, Nov 16, 2017 at 12:16 AM, Luck, Tony wrote: >> This check is already added in x86 and extending same to ia64. > > Looks OK. > > Acked-by: Tony Luck thanks Tony. ACPI Maintainers, any comments on this patch? i can send rebasing to 4.15-rc1? thanks Ganapat
Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
On Mon, 20 Nov 2017 11:30:40 -0500 joe.ko...@concurrent-rt.com wrote: > Hi Steve, > A quick perusal of 4.11.12-rt16 shows that it has an > entirely new version of migrate_disable which to me appears > correct. > > In that new implementation, migrate_enable() recalculates > p->nr_cpus_allowed when it switches the task back to > using p->cpus_mask. This brings the two back into sync > if anything had happened to get them out of sync while > migration was disabled (as would happen on an affinity > change during that disable period). > > 4.9.47-rt37 has the old implementation and it appears to > have same bug as 4.4-rt though I have yet to test 4.9-rt. > > The fix in these older versions could take one of two > forms: either we recalculate p->nr_cpus_allowed when > migrate_enable goes back to using p->cpus_allowed, > as the 4.11-rt version does, or the one place where we > allow p->nr_cpus_allowed to diverge from p->cpus_allowed > be fixed. The patch I submitted earlier takes this second > approach. > Ideally, I would like to stay close to what upstream -rt does. Would you be able to backport the 4.11-rt patch? I'm currently working on releasing 4.9-rt and 4.4-rt with the latest backports. I could easily add this one too. -- Steve
Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
On Mon, 20 Nov 2017 11:30:40 -0500 joe.ko...@concurrent-rt.com wrote: > Hi Steve, > A quick perusal of 4.11.12-rt16 shows that it has an > entirely new version of migrate_disable which to me appears > correct. > > In that new implementation, migrate_enable() recalculates > p->nr_cpus_allowed when it switches the task back to > using p->cpus_mask. This brings the two back into sync > if anything had happened to get them out of sync while > migration was disabled (as would happen on an affinity > change during that disable period). > > 4.9.47-rt37 has the old implementation and it appears to > have same bug as 4.4-rt though I have yet to test 4.9-rt. > > The fix in these older versions could take one of two > forms: either we recalculate p->nr_cpus_allowed when > migrate_enable goes back to using p->cpus_allowed, > as the 4.11-rt version does, or the one place where we > allow p->nr_cpus_allowed to diverge from p->cpus_allowed > be fixed. The patch I submitted earlier takes this second > approach. > Ideally, I would like to stay close to what upstream -rt does. Would you be able to backport the 4.11-rt patch? I'm currently working on releasing 4.9-rt and 4.4-rt with the latest backports. I could easily add this one too. -- Steve
Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
On Mon, 20 Nov 2017, Guenter Roeck wrote: > On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote: > > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > > > bdata->node_min_pfn=6 PFN_PHYS(bdata->node_min_pfn)=c000 > > > start_off=536000 region=c0536000 > > > > If PFN_PHYS(bdata->node_min_pfn)=c000 and > > region=c0536000 that means phys_to_virt() is a no-op. > > > No, it is |= 0x8000 Then the bootmem registration looks very fishy. If you have: > I think the problem is the 0x6 in bdata->node_min_pfn. It is shifted > left by PFN_PHYS, making it 0xc000, which in my understanding is > a virtual address. Exact. #define __pa(x) ((unsigned long)(x) & 0x7fff) #define __va(x) ((void *)((unsigned long)(x) | 0x8000)) With that, the only possible physical address range you may have is 0x4000 - 0x7fff, and it better start at 0x4000. If that's not where your RAM is then something is wrong. This is in fact a very bad idea to define __va() and __pa() using bitwise operations as this hides mistakes like defining physical RAM address at 0xc000. Instead, it should look like: #define __pa(x) ((unsigned long)(x) - 0x8000) #define __va(x) ((void *)((unsigned long)(x) + 0x8000)) This way, bad physical RAM address definitions will be caught immediately. > That doesn't seem to be easy to fix. It seems there is a mixup of physical > and virtual addresses in the architecture. Well... I don't think there is much else to say other than this needs fixing. Nicolas
Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
On Mon, 20 Nov 2017, Guenter Roeck wrote: > On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote: > > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > > > bdata->node_min_pfn=6 PFN_PHYS(bdata->node_min_pfn)=c000 > > > start_off=536000 region=c0536000 > > > > If PFN_PHYS(bdata->node_min_pfn)=c000 and > > region=c0536000 that means phys_to_virt() is a no-op. > > > No, it is |= 0x8000 Then the bootmem registration looks very fishy. If you have: > I think the problem is the 0x6 in bdata->node_min_pfn. It is shifted > left by PFN_PHYS, making it 0xc000, which in my understanding is > a virtual address. Exact. #define __pa(x) ((unsigned long)(x) & 0x7fff) #define __va(x) ((void *)((unsigned long)(x) | 0x8000)) With that, the only possible physical address range you may have is 0x4000 - 0x7fff, and it better start at 0x4000. If that's not where your RAM is then something is wrong. This is in fact a very bad idea to define __va() and __pa() using bitwise operations as this hides mistakes like defining physical RAM address at 0xc000. Instead, it should look like: #define __pa(x) ((unsigned long)(x) - 0x8000) #define __va(x) ((void *)((unsigned long)(x) + 0x8000)) This way, bad physical RAM address definitions will be caught immediately. > That doesn't seem to be easy to fix. It seems there is a mixup of physical > and virtual addresses in the architecture. Well... I don't think there is much else to say other than this needs fixing. Nicolas
Re: [PATCH v3] tracing: Allocate mask_str buffer dynamically
On Mon, 20 Nov 2017 21:33:23 +0800 "Du, Changbin"wrote: > Hi Steven, > Have you picked up this patch or need more polish? Thanks. > Neither. You sent this while I was traveling, and it was missed. I'll look at it tomorrow. Thanks, -- Steve
Re: [PATCH v3] tracing: Allocate mask_str buffer dynamically
On Mon, 20 Nov 2017 21:33:23 +0800 "Du, Changbin" wrote: > Hi Steven, > Have you picked up this patch or need more polish? Thanks. > Neither. You sent this while I was traveling, and it was missed. I'll look at it tomorrow. Thanks, -- Steve
Re: [PATCH 0/13] scsi: arcmsr: add some driver options and support new adapter ARC-1884
On Mon, 2017-11-20 at 22:03 -0500, Martin K. Petersen wrote: > Ching, > > > The following patches apply to Martin's 4.15/scsi-queue. > > Applied to 4.16/scsi-queue. Thank you! > Hi Martin, Thank you for response. These patches can apply to 4.16/scsi-queue is very good. It will be very appreciation if you can spend a little time to review our driver's patch. Providing a better software driver is you and our common target. We will keep going. Thanks, Ching
Re: [PATCH 0/13] scsi: arcmsr: add some driver options and support new adapter ARC-1884
On Mon, 2017-11-20 at 22:03 -0500, Martin K. Petersen wrote: > Ching, > > > The following patches apply to Martin's 4.15/scsi-queue. > > Applied to 4.16/scsi-queue. Thank you! > Hi Martin, Thank you for response. These patches can apply to 4.16/scsi-queue is very good. It will be very appreciation if you can spend a little time to review our driver's patch. Providing a better software driver is you and our common target. We will keep going. Thanks, Ching
[PATCH] arm64: kaslr: Fix kaslr end boundary of virt addr
With kaslr and kasan enable both, I got the follow issue. [ 16.130523s]kasan: reg->base = 1, phys_end =1c000,start = 4000, end = ffc0 [ 16.142517s]___alloc_bootmem_nopanic:257 [ 16.148284s]__alloc_memory_core_early:63, addr = 197fc7fc0 [ 16.155670s]__alloc_memory_core_early:65, virt = d7fc7fc0 [ 16.163635s]__alloc_memory_core_early:67, toshow = ff8ffaff8ff8 [ 16.171783s]__alloc_memory_core_early:69, show_phy = ffe2649f8ff8 [ 16.180145s]Unable to handle kernel paging request at virtual address ff8ffaff8ff8 [ 16.189971s]pgd = ffad9c507000 [ 16.195220s][ff8ffaff8ff8] *pgd=000197fc8003, *pud=000197fc8003 *reg->base = 1, phys_end =1c000,start = 4000, end = ffc0* memstart_addr 0 ARM64_MEMSTART_ALIGN 0x4000 memstart_offset_seed 0xffc7 PHYS_OFFSET = 0 - memstart_addr = 0 - 3E4000 = FFC1C000 reg->base = 0x1 -> 0x4000 phys_end = 0x1c000 -> 0xffc0 This is confused, end less than start. And In memblock it use "start_addr + size" as the end addr. So in function kasan_init, if the start >= end, it will not map the hole block address. But the memory in this block is valid. And it can be allocated as well. So donot use the last memory region. Changing "range = range / ARM64_MEMSTART_ALIGN + 1" to range = range / ARM64_MEMSTART_ALIGN; Signed-off-by: Chen FengSigned-off-by: Chen Xiang --- arch/arm64/mm/init.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 716d122..60112c0 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -267,11 +267,8 @@ void __init arm64_memblock_init(void) * margin, the size of the region that the available physical * memory spans, randomize the linear region as well. */ - if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { - range = range / ARM64_MEMSTART_ALIGN + 1; - memstart_addr -= ARM64_MEMSTART_ALIGN * -((range * memstart_offset_seed) >> 16); - } + if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) + memstart_addr -= (range * memstart_offset_seed) >> 16; } /* -- 1.9.1
[PATCH] arm64: kaslr: Fix kaslr end boundary of virt addr
With kaslr and kasan enable both, I got the follow issue. [ 16.130523s]kasan: reg->base = 1, phys_end =1c000,start = 4000, end = ffc0 [ 16.142517s]___alloc_bootmem_nopanic:257 [ 16.148284s]__alloc_memory_core_early:63, addr = 197fc7fc0 [ 16.155670s]__alloc_memory_core_early:65, virt = d7fc7fc0 [ 16.163635s]__alloc_memory_core_early:67, toshow = ff8ffaff8ff8 [ 16.171783s]__alloc_memory_core_early:69, show_phy = ffe2649f8ff8 [ 16.180145s]Unable to handle kernel paging request at virtual address ff8ffaff8ff8 [ 16.189971s]pgd = ffad9c507000 [ 16.195220s][ff8ffaff8ff8] *pgd=000197fc8003, *pud=000197fc8003 *reg->base = 1, phys_end =1c000,start = 4000, end = ffc0* memstart_addr 0 ARM64_MEMSTART_ALIGN 0x4000 memstart_offset_seed 0xffc7 PHYS_OFFSET = 0 - memstart_addr = 0 - 3E4000 = FFC1C000 reg->base = 0x1 -> 0x4000 phys_end = 0x1c000 -> 0xffc0 This is confused, end less than start. And In memblock it use "start_addr + size" as the end addr. So in function kasan_init, if the start >= end, it will not map the hole block address. But the memory in this block is valid. And it can be allocated as well. So donot use the last memory region. Changing "range = range / ARM64_MEMSTART_ALIGN + 1" to range = range / ARM64_MEMSTART_ALIGN; Signed-off-by: Chen Feng Signed-off-by: Chen Xiang --- arch/arm64/mm/init.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 716d122..60112c0 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -267,11 +267,8 @@ void __init arm64_memblock_init(void) * margin, the size of the region that the available physical * memory spans, randomize the linear region as well. */ - if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { - range = range / ARM64_MEMSTART_ALIGN + 1; - memstart_addr -= ARM64_MEMSTART_ALIGN * -((range * memstart_offset_seed) >> 16); - } + if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) + memstart_addr -= (range * memstart_offset_seed) >> 16; } /* -- 1.9.1
Re: [PATCH] platform/x86: Add Acer Wireless Radio Control driver
On Mon, Nov 20, 2017 at 8:24 PM, Andy Shevchenkowrote: > On Mon, Nov 20, 2017 at 8:31 AM, Chris Chiu wrote: >> On Fri, Nov 17, 2017 at 10:25 PM, Andy Shevchenko >> wrote: >>> On Thu, Nov 16, 2017 at 3:44 PM, Chris Chiu wrote: > + +struct acer_wireless_data { + struct input_dev *idev; >>> + struct acpi_device *adev; >>> >>> Do you need this one? >>> I suppose whenever you need struct device out of it you may find it >>> via input->parent. Right? >>> >> >> But I think it would make life easier to have the following line in >> .notify callback >> struct acer_wireless_data * data = acpi_driver_data(adev); >> >> I can get its parent acer_wireless_data and point to input_dev easier. > > Yes, though if I'm not mistaken, you may get rid completely of your > custom container and use struct input_dev for accessing that via > to_acpi_device(idev->dev.parent). > > It would be less code, less data structures, cleaner view. > Right? > +static int acer_wireless_remove(struct acpi_device *adev) +{ + return 0; +} + >>> >>> No need UI suppose. >>> >> >> You mean remove this acer_wireless_remove which does nothing? Will do. > > Correct. > > -- > With Best Regards, > Andy Shevchenko I think I get your point. Will soon have a v2 patch later.
Re: [PATCH] platform/x86: Add Acer Wireless Radio Control driver
On Mon, Nov 20, 2017 at 8:24 PM, Andy Shevchenko wrote: > On Mon, Nov 20, 2017 at 8:31 AM, Chris Chiu wrote: >> On Fri, Nov 17, 2017 at 10:25 PM, Andy Shevchenko >> wrote: >>> On Thu, Nov 16, 2017 at 3:44 PM, Chris Chiu wrote: > + +struct acer_wireless_data { + struct input_dev *idev; >>> + struct acpi_device *adev; >>> >>> Do you need this one? >>> I suppose whenever you need struct device out of it you may find it >>> via input->parent. Right? >>> >> >> But I think it would make life easier to have the following line in >> .notify callback >> struct acer_wireless_data * data = acpi_driver_data(adev); >> >> I can get its parent acer_wireless_data and point to input_dev easier. > > Yes, though if I'm not mistaken, you may get rid completely of your > custom container and use struct input_dev for accessing that via > to_acpi_device(idev->dev.parent). > > It would be less code, less data structures, cleaner view. > Right? > +static int acer_wireless_remove(struct acpi_device *adev) +{ + return 0; +} + >>> >>> No need UI suppose. >>> >> >> You mean remove this acer_wireless_remove which does nothing? Will do. > > Correct. > > -- > With Best Regards, > Andy Shevchenko I think I get your point. Will soon have a v2 patch later.
Re: [PATCH] MIPS: Fix CPS SMP NS16550 UART defaults
On 11/20/2017 04:02 PM, James Hogan wrote: From: James HoganThe MIPS_CPS_NS16550_BASE and MIPS_CPS_NS16550_SHIFT options have no defaults for non-Malta platforms which select SYS_SUPPORTS_MIPS_CPS (i.e. the pistachio and generic platforms). This is problematic for automated allyesconfig and allmodconfig builds based on these platforms, since make silentoldconfig tries to ask the user for values, and especially since v4.15 where the default platform was switched to generic. Default these options to 0 and arrange for MIPS_CPS_NS16550 to be no when using that default base address, so that the option only has an effect when the default is provided (i.e. Malta) or when a value is provided by the user. Fixes: 609cf6f2291a ("MIPS: CPS: Early debug using an ns16550-compatible UART") Signed-off-by: James Hogan Reviewed-by: Paul Burton Cc: Ralf Baechle Cc: Guenter Roeck Cc: linux-m...@linux-mips.org --- Guenter: I'm guessing this is the problem you're referring to. Yes. Tested-by: Guenter Roeck mips:allmodconfig still fails to build with this patch applied, but elsewhere. In file included from /opt/buildbot/slave/hwmon-testing/build/include/linux/bcma/bcma.h:10:0, from /opt/buildbot/slave/hwmon-testing/build/drivers/bcma/bcma_private.h:9, from /opt/buildbot/slave/hwmon-testing/build/drivers/bcma/main.c:8: /opt/buildbot/slave/hwmon-testing/build/include/linux/bcma/bcma_driver_pci.h:218:24: error: field 'pci_controller' has incomplete type Guenter
Re: [PATCH] MIPS: Fix CPS SMP NS16550 UART defaults
On 11/20/2017 04:02 PM, James Hogan wrote: From: James Hogan The MIPS_CPS_NS16550_BASE and MIPS_CPS_NS16550_SHIFT options have no defaults for non-Malta platforms which select SYS_SUPPORTS_MIPS_CPS (i.e. the pistachio and generic platforms). This is problematic for automated allyesconfig and allmodconfig builds based on these platforms, since make silentoldconfig tries to ask the user for values, and especially since v4.15 where the default platform was switched to generic. Default these options to 0 and arrange for MIPS_CPS_NS16550 to be no when using that default base address, so that the option only has an effect when the default is provided (i.e. Malta) or when a value is provided by the user. Fixes: 609cf6f2291a ("MIPS: CPS: Early debug using an ns16550-compatible UART") Signed-off-by: James Hogan Reviewed-by: Paul Burton Cc: Ralf Baechle Cc: Guenter Roeck Cc: linux-m...@linux-mips.org --- Guenter: I'm guessing this is the problem you're referring to. Yes. Tested-by: Guenter Roeck mips:allmodconfig still fails to build with this patch applied, but elsewhere. In file included from /opt/buildbot/slave/hwmon-testing/build/include/linux/bcma/bcma.h:10:0, from /opt/buildbot/slave/hwmon-testing/build/drivers/bcma/bcma_private.h:9, from /opt/buildbot/slave/hwmon-testing/build/drivers/bcma/main.c:8: /opt/buildbot/slave/hwmon-testing/build/include/linux/bcma/bcma_driver_pci.h:218:24: error: field 'pci_controller' has incomplete type Guenter
Re: [PATCH] cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
On 20-11-17, 13:32, Jesse Chan wrote: > This change resolves a new compile-time warning > when built as a loadable module: > > WARNING: modpost: missing MODULE_LICENSE() in > drivers/cpufreq/mediatek-cpufreq.o > see include/linux/module.h for more information > > This adds the license as "GPL v2", which matches the header of the file. > > MODULE_DESCRIPTION and MODULE_AUTHOR are also added. > > Signed-off-by: Jesse Chan> --- > drivers/cpufreq/mediatek-cpufreq.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cpufreq/mediatek-cpufreq.c > b/drivers/cpufreq/mediatek-cpufreq.c > index 18c4bd9a5c65..e0d5090b303d 100644 > --- a/drivers/cpufreq/mediatek-cpufreq.c > +++ b/drivers/cpufreq/mediatek-cpufreq.c > @@ -620,3 +620,7 @@ static int __init mtk_cpufreq_driver_init(void) > return 0; > } > device_initcall(mtk_cpufreq_driver_init); > + > +MODULE_DESCRIPTION("MediaTek CPUFreq driver"); > +MODULE_AUTHOR("Pi-Cheng Chen "); > +MODULE_LICENSE("GPL v2"); Acked-by: Viresh Kumar -- viresh
Re: [PATCH] cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
On 20-11-17, 13:32, Jesse Chan wrote: > This change resolves a new compile-time warning > when built as a loadable module: > > WARNING: modpost: missing MODULE_LICENSE() in > drivers/cpufreq/mediatek-cpufreq.o > see include/linux/module.h for more information > > This adds the license as "GPL v2", which matches the header of the file. > > MODULE_DESCRIPTION and MODULE_AUTHOR are also added. > > Signed-off-by: Jesse Chan > --- > drivers/cpufreq/mediatek-cpufreq.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cpufreq/mediatek-cpufreq.c > b/drivers/cpufreq/mediatek-cpufreq.c > index 18c4bd9a5c65..e0d5090b303d 100644 > --- a/drivers/cpufreq/mediatek-cpufreq.c > +++ b/drivers/cpufreq/mediatek-cpufreq.c > @@ -620,3 +620,7 @@ static int __init mtk_cpufreq_driver_init(void) > return 0; > } > device_initcall(mtk_cpufreq_driver_init); > + > +MODULE_DESCRIPTION("MediaTek CPUFreq driver"); > +MODULE_AUTHOR("Pi-Cheng Chen "); > +MODULE_LICENSE("GPL v2"); Acked-by: Viresh Kumar -- viresh
Re: [PATCH] bfa: remove unused pointer 'port'
Colin, > The pointer 'port' is being assigned but it is never read, hence it is > redundant and can be removed. Cleans up clang warning: > > drivers/scsi/bfa/bfad_attr.c:505:2: warning: Value stored to 'port' > is never read Applied to 4.16/scsi-queue. Thanks, Colin! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] bfa: remove unused pointer 'port'
Colin, > The pointer 'port' is being assigned but it is never read, hence it is > redundant and can be removed. Cleans up clang warning: > > drivers/scsi/bfa/bfad_attr.c:505:2: warning: Value stored to 'port' > is never read Applied to 4.16/scsi-queue. Thanks, Colin! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: aacraid: remove unused variable managed_request_id
Colin, > Variable managed_request_id is being assigned but it is never read, > hence it is redundant and can be removed. Cleans up clang warning: Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: aacraid: remove unused variable managed_request_id
Colin, > Variable managed_request_id is being assigned but it is never read, > hence it is redundant and can be removed. Cleans up clang warning: Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper
On Mon, 2017-11-20 at 16:33 -0500, Mikulas Patocka wrote: > > Is there some specific scenario where you need to call > blk_schedule_flush_plug from rt_spin_lock_fastlock? Excellent question. What's the difference between not getting IO started because you meet a mutex with an rt_mutex under the hood, and not getting IO started because you meet a spinlock with an rt_mutex under the hood? If just doing the mutex side puts this thing back to sleep, I'm happy. -Mike
Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper
On Mon, 2017-11-20 at 16:33 -0500, Mikulas Patocka wrote: > > Is there some specific scenario where you need to call > blk_schedule_flush_plug from rt_spin_lock_fastlock? Excellent question. What's the difference between not getting IO started because you meet a mutex with an rt_mutex under the hood, and not getting IO started because you meet a spinlock with an rt_mutex under the hood? If just doing the mutex side puts this thing back to sleep, I'm happy. -Mike
Re: [PATCH 13/16] x86/entry/64: Create a percpu SYSCALL entry trampoline
On Mon, Nov 20, 2017 at 6:34 PM, Josh Poimboeufwrote: > On Mon, Nov 20, 2017 at 09:07:44AM -0800, Andy Lutomirski wrote: >> + /* Save RDI, since we need a scratch register. */ >> + pushq %rdi >> + >> + /* >> + * x86 lacks a near absolute jump, and we can't jump to the real >> + * entry text with a relative jump, so we use a double trampoline. >> + */ >> + movq$entry_SYSCALL_64_stage2, %rdi >> + jmp *%rdi >> +END(entry_SYSCALL_64_trampoline) >> + >> + .popsection >> + >> +ENTRY(entry_SYSCALL_64_stage2) >> + /* >> + * Rather than polluting the normal SYSCALL path with stack switching >> + * nonsense, fix up our register state to match its expectations. >> + */ >> + UNWIND_HINT_EMPTY >> + popq %rdi >> + jmp entry_SYSCALL_64_after_hwframe >> +END(entry_SYSCALL_64_stage2) > > Is there a reason why you couldn't just do the following? > > pushq $entry_SYSCALL_64_after_hwframe > ret > > Then you wouldn't need the 2nd trampoline and the %rdi clobber. > Good suggestion. Thanks! > -- > Josh
Re: [PATCH 13/16] x86/entry/64: Create a percpu SYSCALL entry trampoline
On Mon, Nov 20, 2017 at 6:34 PM, Josh Poimboeuf wrote: > On Mon, Nov 20, 2017 at 09:07:44AM -0800, Andy Lutomirski wrote: >> + /* Save RDI, since we need a scratch register. */ >> + pushq %rdi >> + >> + /* >> + * x86 lacks a near absolute jump, and we can't jump to the real >> + * entry text with a relative jump, so we use a double trampoline. >> + */ >> + movq$entry_SYSCALL_64_stage2, %rdi >> + jmp *%rdi >> +END(entry_SYSCALL_64_trampoline) >> + >> + .popsection >> + >> +ENTRY(entry_SYSCALL_64_stage2) >> + /* >> + * Rather than polluting the normal SYSCALL path with stack switching >> + * nonsense, fix up our register state to match its expectations. >> + */ >> + UNWIND_HINT_EMPTY >> + popq %rdi >> + jmp entry_SYSCALL_64_after_hwframe >> +END(entry_SYSCALL_64_stage2) > > Is there a reason why you couldn't just do the following? > > pushq $entry_SYSCALL_64_after_hwframe > ret > > Then you wouldn't need the 2nd trampoline and the %rdi clobber. > Good suggestion. Thanks! > -- > Josh
Re: Improving documentation of parent-ID field in /proc/PID/mountinfo
On Mon, Nov 20, 2017 at 10:07:29AM +0100, Michael Kerrisk (man-pages) wrote: > Hi Miklos, > > Sorry for the slow follow-up. > > On 14 November 2017 at 17:16, Miklos Szerediwrote: > > On Tue, Nov 14, 2017 at 8:08 AM, Michael Kerrisk (man-pages) > > wrote: > >> Hi Miklos, Ram > >> > >> Thanks for your comments. A question below. > >> > >> On 13 November 2017 at 09:11, Miklos Szeredi wrote: > >>> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai wrote: > On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) > wrote: > > Hello Ram, > > > > Long ago (2.6.29) you added the /proc/PID/mountinfo file and > > associated documentation in Documentation/filesystems/proc.txt. Later, > > I pasted much of that documentation into the proc(5) manual page. > > > > That documentation says of the second field in the file: > > > > [[ > > This file contains lines of the form: > > > > 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root > > rw,errors=continue > > (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) > > > > (1) mount ID: unique identifier of the mount (may be reused after > > umount) > > (2) parent ID: ID of parent (or of self for the top of the mount tree) > > ... > > ]] > > > > The last piece of the description of field (2) doesn't seem to be > > correct, or is at least rather unclear. I take this to be saying that > > that for the root mount point, /, field (2) will have the same value > > as field (1). I never actually looked at this detail closely, but > > Alexander pointed out that this is obviously not so, as one can > > immediately verify: > > > > $ grep '/ / ' /proc/$$/mountinfo > > 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 > > rw,seclabel,data=order > > > > I dug around in the kernel source for a bit. I do not have an exact > > handle on the details, but I can see roughly what is going on. > > Internally, there seems to be one ("hidden") mount ID reserved to each > > mount namespace, and that ID is the parent of the root mount point. > > > > Looking through the (4.14) kernel source, mount IDs are allocated by > > mnt_alloc_id() (in fs/namespace.c), which is in turn called by > > alloc_vfsmnt() which is in turn called by clone_mnt(). > > > > A new mount namespace is created by the kernel function copy_mnt_ns() > > (in fs/namespace.c, called by create_new_namespaces() in > > kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in > > fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places. > > The first of these is the call that creates the "hidden" mount ID that > > becomes the parent of the root mount point. (I verified this by > > instrumenting the kernel with a few printk() calls to display the > > IDs.) The second place where copy_tree() calls clone_mnt() is in a > > loop that replicates each of the mount points (including the root > > mount point) in the source mount namespace. > > We used to report that mount, ones upon a time. Something has changed > the behavior since then and its not reported any more, thus making it > hidden. > >>> > >>> The hidden one is the initramfs, I believe. That's the root of the > >>> mount namespace, and the when a namespace is cloned, the tree is > >>> copied from the namespace root. > >>> > >>> It is "hidden" because no process has its root there. Note the > >>> difference between namespace root and process root: the first is the > >>> real root of the mount tree and is unchangeable, the second is > >>> pointing to some place in a mount tree and can be changed (chroot). > >>> > >>> So there's nothing special in this rootfs, it is just hidden because > >>> it's not the root of any task. > >>> > >>> The description of field (2) is correct, it just does not make it > >>> clear what it means by "root". > >> > >> Sorry -- do you mean the old description is correct, or my new > >> description (below)? > > > > Well, both are correct, yours just describes the same thing at the > > higher level. But I think rootfs is an implementation detail, so is > > the fact that it gets a zero mount ID, so I think the original > > description better captures the essence of the interface. Except it > > needs to clarify what "top of the mount tree" means. It doesn't mean > > current process's root, rather it means the root of the mount tree in > > the current mount namespace. > > Thanks for the further info. > > But, the problem is that the existing description is at best misleading: > > (2) parent ID: the ID of the parent mount (or of self for >the top of the mount tree). > > That implies that we'll find one line in the list where field 1 and > field 2
Re: Improving documentation of parent-ID field in /proc/PID/mountinfo
On Mon, Nov 20, 2017 at 10:07:29AM +0100, Michael Kerrisk (man-pages) wrote: > Hi Miklos, > > Sorry for the slow follow-up. > > On 14 November 2017 at 17:16, Miklos Szeredi wrote: > > On Tue, Nov 14, 2017 at 8:08 AM, Michael Kerrisk (man-pages) > > wrote: > >> Hi Miklos, Ram > >> > >> Thanks for your comments. A question below. > >> > >> On 13 November 2017 at 09:11, Miklos Szeredi wrote: > >>> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai wrote: > On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) > wrote: > > Hello Ram, > > > > Long ago (2.6.29) you added the /proc/PID/mountinfo file and > > associated documentation in Documentation/filesystems/proc.txt. Later, > > I pasted much of that documentation into the proc(5) manual page. > > > > That documentation says of the second field in the file: > > > > [[ > > This file contains lines of the form: > > > > 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root > > rw,errors=continue > > (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) > > > > (1) mount ID: unique identifier of the mount (may be reused after > > umount) > > (2) parent ID: ID of parent (or of self for the top of the mount tree) > > ... > > ]] > > > > The last piece of the description of field (2) doesn't seem to be > > correct, or is at least rather unclear. I take this to be saying that > > that for the root mount point, /, field (2) will have the same value > > as field (1). I never actually looked at this detail closely, but > > Alexander pointed out that this is obviously not so, as one can > > immediately verify: > > > > $ grep '/ / ' /proc/$$/mountinfo > > 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 > > rw,seclabel,data=order > > > > I dug around in the kernel source for a bit. I do not have an exact > > handle on the details, but I can see roughly what is going on. > > Internally, there seems to be one ("hidden") mount ID reserved to each > > mount namespace, and that ID is the parent of the root mount point. > > > > Looking through the (4.14) kernel source, mount IDs are allocated by > > mnt_alloc_id() (in fs/namespace.c), which is in turn called by > > alloc_vfsmnt() which is in turn called by clone_mnt(). > > > > A new mount namespace is created by the kernel function copy_mnt_ns() > > (in fs/namespace.c, called by create_new_namespaces() in > > kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in > > fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places. > > The first of these is the call that creates the "hidden" mount ID that > > becomes the parent of the root mount point. (I verified this by > > instrumenting the kernel with a few printk() calls to display the > > IDs.) The second place where copy_tree() calls clone_mnt() is in a > > loop that replicates each of the mount points (including the root > > mount point) in the source mount namespace. > > We used to report that mount, ones upon a time. Something has changed > the behavior since then and its not reported any more, thus making it > hidden. > >>> > >>> The hidden one is the initramfs, I believe. That's the root of the > >>> mount namespace, and the when a namespace is cloned, the tree is > >>> copied from the namespace root. > >>> > >>> It is "hidden" because no process has its root there. Note the > >>> difference between namespace root and process root: the first is the > >>> real root of the mount tree and is unchangeable, the second is > >>> pointing to some place in a mount tree and can be changed (chroot). > >>> > >>> So there's nothing special in this rootfs, it is just hidden because > >>> it's not the root of any task. > >>> > >>> The description of field (2) is correct, it just does not make it > >>> clear what it means by "root". > >> > >> Sorry -- do you mean the old description is correct, or my new > >> description (below)? > > > > Well, both are correct, yours just describes the same thing at the > > higher level. But I think rootfs is an implementation detail, so is > > the fact that it gets a zero mount ID, so I think the original > > description better captures the essence of the interface. Except it > > needs to clarify what "top of the mount tree" means. It doesn't mean > > current process's root, rather it means the root of the mount tree in > > the current mount namespace. > > Thanks for the further info. > > But, the problem is that the existing description is at best misleading: > > (2) parent ID: the ID of the parent mount (or of self for >the top of the mount tree). > > That implies that we'll find one line in the list where field 1 and > field 2 are the same. But we don't, because the mountns rootfs entry > is not shown in
Re: [PATCH] pinctrl: intel: Initialize GPIO properly when used through irqchip
On Mon, Nov 20, 2017 at 11:19 PM, Mika Westerbergwrote: > When a GPIO is requested using gpiod_get_* APIs the intel pinctrl driver > switches the pin to GPIO mode and makes sure interrupts are routed to > the GPIO hardware instead of IOAPIC. However, if the GPIO is used > directly through irqchip, as is the case with many I2C-HID devices where > I2C core automatically configures interrupt for the device, the pin is > not initialized as GPIO. Instead we rely that the BIOS configures the > pin accordingly which seems not to be the case at least in Asus X540NA > SKU3 with Focaltech touchpad. > > When the pin is not properly configured it might result weird behaviour > like interrupts suddenly stop firing completely and the touchpad stops > responding to user input. > > Fix this by properly initializing the pin to GPIO mode also when it is > used directly through irqchip. > > Reported-by: Daniel Drake > Reported-by: Chris Chiu > Signed-off-by: Mika Westerberg > --- > Chris, Daniel, > > Could you check that this still solves the issue and maybe provide your > Tested-by? Thanks! > I've verified on X540NA here and it's working fine w/o any abnormal touchpad stop. Tested-by: Chris Chiu
Re: [PATCH] pinctrl: intel: Initialize GPIO properly when used through irqchip
On Mon, Nov 20, 2017 at 11:19 PM, Mika Westerberg wrote: > When a GPIO is requested using gpiod_get_* APIs the intel pinctrl driver > switches the pin to GPIO mode and makes sure interrupts are routed to > the GPIO hardware instead of IOAPIC. However, if the GPIO is used > directly through irqchip, as is the case with many I2C-HID devices where > I2C core automatically configures interrupt for the device, the pin is > not initialized as GPIO. Instead we rely that the BIOS configures the > pin accordingly which seems not to be the case at least in Asus X540NA > SKU3 with Focaltech touchpad. > > When the pin is not properly configured it might result weird behaviour > like interrupts suddenly stop firing completely and the touchpad stops > responding to user input. > > Fix this by properly initializing the pin to GPIO mode also when it is > used directly through irqchip. > > Reported-by: Daniel Drake > Reported-by: Chris Chiu > Signed-off-by: Mika Westerberg > --- > Chris, Daniel, > > Could you check that this still solves the issue and maybe provide your > Tested-by? Thanks! > I've verified on X540NA here and it's working fine w/o any abnormal touchpad stop. Tested-by: Chris Chiu
Re: [PATCH] scsi: csiostor: remove unneeded DRIVER_LICENSE #define
Greg, > There is no need to #define the license of the driver, just put it in > the MODULE_LICENSE() line directly as a text string. > > This allows tools that check that the module license matches the source > code license to work properly, as there is no need to unwind the > unneeded dereference, especially when the string is defined in a .h file > far away from the .c file it is used in. Applied to 4.16/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: csiostor: remove unneeded DRIVER_LICENSE #define
Greg, > There is no need to #define the license of the driver, just put it in > the MODULE_LICENSE() line directly as a text string. > > This allows tools that check that the module license matches the source > code license to work properly, as there is no need to unwind the > unneeded dereference, especially when the string is defined in a .h file > far away from the .c file it is used in. Applied to 4.16/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 3/3] scsi: 3w-9xxx: rework lock timeouts
Arnd, > The TW_IOCTL_GET_LOCK ioctl uses do_gettimeofday() to check whether a > lock has expired. This can misbehave due to a concurrent > settimeofday() call, as it is based on 'real' time, and it will > overflow in y2038 on 32-bit architectures, producing unexpected > results when used across the overflow time. > > This changes it to using monotonic time, using ktime_get() to simplify > the code. Applied to 4.16/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 3/3] scsi: 3w-9xxx: rework lock timeouts
Arnd, > The TW_IOCTL_GET_LOCK ioctl uses do_gettimeofday() to check whether a > lock has expired. This can misbehave due to a concurrent > settimeofday() call, as it is based on 'real' time, and it will > overflow in y2038 on 32-bit architectures, producing unexpected > results when used across the overflow time. > > This changes it to using monotonic time, using ktime_get() to simplify > the code. Applied to 4.16/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 2/3] scsi: 3ware: use 64-bit times for FW time sync
Arnd, > The calculation of the number of seconds since Sunday 00:00:00 > overflows in 2106, meaning that we instead will return the seconds > since Wednesday 06:28:16 afterwards. > > Using 64-bit time stamps avoids this slight inconsistency, and the > deprecated do_gettimeofday(), replacing it with the simpler > ktime_get_real_seconds(). Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 2/3] scsi: 3ware: use 64-bit times for FW time sync
Arnd, > The calculation of the number of seconds since Sunday 00:00:00 > overflows in 2106, meaning that we instead will return the seconds > since Wednesday 06:28:16 afterwards. > > Using 64-bit time stamps avoids this slight inconsistency, and the > deprecated do_gettimeofday(), replacing it with the simpler > ktime_get_real_seconds(). Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
[PATCH v3] x86/umip: Warn if UMIP-protected instructions are used
Print a rate-limited warning when a user space program attempts to execute any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR and SMSW). This is useful because, when CONFIG_X86_INTEL_UMIP is selected and supported by the hardware, user space programs that try to execute such instructions will receive a SIGSEGV signal that they might not expect. In the specific cases for which emulation is provided (instructions SGDT, SIDT and SMSW in protected and virtual-8086 modes), no signal is generated. However, a warning is helpful to encourage updates in such programs to avoid the use of such instructions. Warnings are printed via a customized printk() function that also provides information about the program that attempted to use the affected instructions. Utility macros are defined to wrap umip_printk() for the error and warning kernel log levels. While here, replace an existing call to the generic rate-limited pr_err() with the new umip_pr_err(). Cc: Andy LutomirskiCc: H. Peter Anvin Cc: Borislav Petkov Cc: Tony Luck Cc: Paolo Bonzini Cc: Ravi V. Shankar Cc: x...@kernel.org Suggested-by: Linus Torvalds Signed-off-by: Ricardo Neri --- Changes since V2: * Dropped patches 1, 2, and 3 as they were applied on the tip repository. * Renamed umip_pr_warn() to umip_printk(). Instead, added utility macros for the error and warning kernel log levels. * Implemented umip_printk() as a variadic function. This also includes using the __printf() macro to type-check arguments of the format string and arguments. * Capitalized all the instructions' mnemonics in strings used for printing warnings. Changes since V1: * Capitalized all the instructions' mnemonics in both code and patch descriptions. * Corrected documentation of umip_pr_warn() to correctly reflect the function name. * Updated description of patch #4 to describe the update to the existing rate-limited pr_err(). --- arch/x86/kernel/umip.c | 62 ++ 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index 1f1f2d5..dabbac3 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -82,6 +82,57 @@ #defineUMIP_INST_SLDT 3 /* 0F 00 /0 */ #defineUMIP_INST_STR 4 /* 0F 00 /1 */ +const char * const umip_insns[5] = { + [UMIP_INST_SGDT] = "SGDT", + [UMIP_INST_SIDT] = "SIDT", + [UMIP_INST_SMSW] = "SMSW", + [UMIP_INST_SLDT] = "SLDT", + [UMIP_INST_STR] = "STR", +}; + +#define umip_pr_err(regs, fmt, ...) \ + umip_printk(regs, KERN_ERR, fmt, ##__VA_ARGS__) +#define umip_pr_warning(regs, fmt, ...) \ + umip_printk(regs, KERN_WARNING, fmt, ##__VA_ARGS__) + +/** + * umip_printk() - Print a rate-limited message + * @regs: Register set with the context in which the warning is printed + * @log_level: Kernel log level to print the message + * @fmt: The text string to print + * + * Print the text contained in @fmt. The print rate is limited to bursts of 5 + * messages every two minutes. The purpose of this customized version of + * printk() is to print messages when user space processes use any of the + * UMIP-protected instructions. Thus, the printed text is prepended with the + * task name and process ID number of the current task as well as the + * instruction and stack pointers in @regs as seen when entering kernel mode. + * + * Returns: + * + * None. + */ +static __printf(3, 4) +void umip_printk(const struct pt_regs *regs, const char *log_level, +const char *fmt, ...) +{ + /* Bursts of 5 messages every two minutes */ + static DEFINE_RATELIMIT_STATE(ratelimit, 2 * 60 * HZ, 5); + struct task_struct *tsk = current; + struct va_format vaf; + va_list args; + + if (!__ratelimit()) + return; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = + printk("%s" pr_fmt("%s[%d] ip:%lx sp:%lx: %pV"), log_level, tsk->comm, + task_pid_nr(tsk), regs->ip, regs->sp, ); + va_end(args); +} + /** * identify_insn() - Identify a UMIP-protected instruction * @insn: Instruction structure with opcode and ModRM byte. @@ -236,10 +287,8 @@ static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs) if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV))) return; - pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n", - tsk->comm, task_pid_nr(tsk), regs->ip, - regs->sp, X86_PF_USER | X86_PF_WRITE, - regs->ip); + umip_pr_err(regs, "segfault in emulation. error%x\n", + X86_PF_USER |
[PATCH v3] x86/umip: Warn if UMIP-protected instructions are used
Print a rate-limited warning when a user space program attempts to execute any of the instructions that UMIP protects (i.e., SGDT, SIDT, SLDT, STR and SMSW). This is useful because, when CONFIG_X86_INTEL_UMIP is selected and supported by the hardware, user space programs that try to execute such instructions will receive a SIGSEGV signal that they might not expect. In the specific cases for which emulation is provided (instructions SGDT, SIDT and SMSW in protected and virtual-8086 modes), no signal is generated. However, a warning is helpful to encourage updates in such programs to avoid the use of such instructions. Warnings are printed via a customized printk() function that also provides information about the program that attempted to use the affected instructions. Utility macros are defined to wrap umip_printk() for the error and warning kernel log levels. While here, replace an existing call to the generic rate-limited pr_err() with the new umip_pr_err(). Cc: Andy Lutomirski Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Tony Luck Cc: Paolo Bonzini Cc: Ravi V. Shankar Cc: x...@kernel.org Suggested-by: Linus Torvalds Signed-off-by: Ricardo Neri --- Changes since V2: * Dropped patches 1, 2, and 3 as they were applied on the tip repository. * Renamed umip_pr_warn() to umip_printk(). Instead, added utility macros for the error and warning kernel log levels. * Implemented umip_printk() as a variadic function. This also includes using the __printf() macro to type-check arguments of the format string and arguments. * Capitalized all the instructions' mnemonics in strings used for printing warnings. Changes since V1: * Capitalized all the instructions' mnemonics in both code and patch descriptions. * Corrected documentation of umip_pr_warn() to correctly reflect the function name. * Updated description of patch #4 to describe the update to the existing rate-limited pr_err(). --- arch/x86/kernel/umip.c | 62 ++ 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index 1f1f2d5..dabbac3 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -82,6 +82,57 @@ #defineUMIP_INST_SLDT 3 /* 0F 00 /0 */ #defineUMIP_INST_STR 4 /* 0F 00 /1 */ +const char * const umip_insns[5] = { + [UMIP_INST_SGDT] = "SGDT", + [UMIP_INST_SIDT] = "SIDT", + [UMIP_INST_SMSW] = "SMSW", + [UMIP_INST_SLDT] = "SLDT", + [UMIP_INST_STR] = "STR", +}; + +#define umip_pr_err(regs, fmt, ...) \ + umip_printk(regs, KERN_ERR, fmt, ##__VA_ARGS__) +#define umip_pr_warning(regs, fmt, ...) \ + umip_printk(regs, KERN_WARNING, fmt, ##__VA_ARGS__) + +/** + * umip_printk() - Print a rate-limited message + * @regs: Register set with the context in which the warning is printed + * @log_level: Kernel log level to print the message + * @fmt: The text string to print + * + * Print the text contained in @fmt. The print rate is limited to bursts of 5 + * messages every two minutes. The purpose of this customized version of + * printk() is to print messages when user space processes use any of the + * UMIP-protected instructions. Thus, the printed text is prepended with the + * task name and process ID number of the current task as well as the + * instruction and stack pointers in @regs as seen when entering kernel mode. + * + * Returns: + * + * None. + */ +static __printf(3, 4) +void umip_printk(const struct pt_regs *regs, const char *log_level, +const char *fmt, ...) +{ + /* Bursts of 5 messages every two minutes */ + static DEFINE_RATELIMIT_STATE(ratelimit, 2 * 60 * HZ, 5); + struct task_struct *tsk = current; + struct va_format vaf; + va_list args; + + if (!__ratelimit()) + return; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = + printk("%s" pr_fmt("%s[%d] ip:%lx sp:%lx: %pV"), log_level, tsk->comm, + task_pid_nr(tsk), regs->ip, regs->sp, ); + va_end(args); +} + /** * identify_insn() - Identify a UMIP-protected instruction * @insn: Instruction structure with opcode and ModRM byte. @@ -236,10 +287,8 @@ static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs) if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV))) return; - pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n", - tsk->comm, task_pid_nr(tsk), regs->ip, - regs->sp, X86_PF_USER | X86_PF_WRITE, - regs->ip); + umip_pr_err(regs, "segfault in emulation. error%x\n", + X86_PF_USER | X86_PF_WRITE); } /** @@ -326,10 +375,15 @@ bool fixup_umip_exception(struct pt_regs *regs) if (umip_inst < 0) return false; + umip_pr_warning(regs, "%s
Re: [PATCH 1/3] scsi: 3ware: fix 32-bit time calculations
Arnd, > twl_aen_queue_event/twa_aen_queue_event, we use do_gettimeofday() > to read the lower 32 bits of the current time in seconds, to pass > them to the TW_IOCTL_GET_NEXT_EVENT ioctl or the 3ware_aen_read > sysfs file. > > This will overflow on all architectures in year 2106, there is > not much we can do about that without breaking the ABI. User > space has 90 years to learn to deal with it, so it's probably ok. > > I'm changing it to use ktime_get_real_seconds() with a comment > to document what happens when. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/3] scsi: 3ware: fix 32-bit time calculations
Arnd, > twl_aen_queue_event/twa_aen_queue_event, we use do_gettimeofday() > to read the lower 32 bits of the current time in seconds, to pass > them to the TW_IOCTL_GET_NEXT_EVENT ioctl or the 3ware_aen_read > sysfs file. > > This will overflow on all architectures in year 2106, there is > not much we can do about that without breaking the ABI. User > space has 90 years to learn to deal with it, so it's probably ok. > > I'm changing it to use ktime_get_real_seconds() with a comment > to document what happens when. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [man-pages] adjtimex.2: document clock_adjtime
On Mon, Nov 20, 2017 at 11:53:02PM +0100, Arnd Bergmann wrote: > .B EINVAL > +The > +.I clk_id > +specified is not supported on this system. We return EINVAL when the clockid is not valid. That can mean two things. Either the SYS-V style hard coded positive clockid is out of range, or the dynamic negative clockid does not refer to a valid instance of a clock object. Dynamic clocks might also return ENODEV in case a hot-plugable device (like USB) has disappeared after its character device was opened. Thanks, Richard
Re: [PATCH 0/7] scsi: bfa: do_gettimeofday removal
Arnd, > The bfa driver is one of the main users of do_gettimeofday(), a > function that I'm trying to remove as part of the y2038 cleanup. > > The timestamps are all uses in slightly different ways, so this has > turned into a rather longish series for doing something that should be > simple. > > The last patch in the series ("scsi: bfa: use 64-bit times in > bfa_aen_entry_s ABI") is one that needs to be reviewed very carefully, > and it can be skipped if the maintainers prefer to leave the 32-bit > ABI unchanged, the rest are hopefully fairly straightforward. Applied to 4.16/scsi-queue, thanks! Will drop #7 if something breaks. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [man-pages] adjtimex.2: document clock_adjtime
On Mon, Nov 20, 2017 at 11:53:02PM +0100, Arnd Bergmann wrote: > .B EINVAL > +The > +.I clk_id > +specified is not supported on this system. We return EINVAL when the clockid is not valid. That can mean two things. Either the SYS-V style hard coded positive clockid is out of range, or the dynamic negative clockid does not refer to a valid instance of a clock object. Dynamic clocks might also return ENODEV in case a hot-plugable device (like USB) has disappeared after its character device was opened. Thanks, Richard
Re: [PATCH 0/7] scsi: bfa: do_gettimeofday removal
Arnd, > The bfa driver is one of the main users of do_gettimeofday(), a > function that I'm trying to remove as part of the y2038 cleanup. > > The timestamps are all uses in slightly different ways, so this has > turned into a rather longish series for doing something that should be > simple. > > The last patch in the series ("scsi: bfa: use 64-bit times in > bfa_aen_entry_s ABI") is one that needs to be reviewed very carefully, > and it can be skipped if the maintainers prefer to leave the 32-bit > ABI unchanged, the rest are hopefully fairly straightforward. Applied to 4.16/scsi-queue, thanks! Will drop #7 if something breaks. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 0/13] scsi: arcmsr: add some driver options and support new adapter ARC-1884
Ching, > The following patches apply to Martin's 4.15/scsi-queue. Applied to 4.16/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 0/13] scsi: arcmsr: add some driver options and support new adapter ARC-1884
Ching, > The following patches apply to Martin's 4.15/scsi-queue. Applied to 4.16/scsi-queue. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 1/6] mmc: renesas_sdhc: remove eprobe jump label
2017-11-21 5:31 GMT+09:00 Wolfram Sang: > >> - ret = PTR_ERR(priv->clk); >> - dev_err(>dev, "cannot get clock: %d\n", ret); >> - goto eprobe; >> + dev_err(>dev, "cannot get clock\n"); >> + return PTR_ERR(priv->clk); > > Why dropping the 'ret' printout? Will it be printed by the core? > No. I just wanted to save "ret = PTR_ERR(priv->clk)" line. I will restore the printout as follows: priv->clk = devm_clk_get(>dev, NULL); if (IS_ERR(priv->clk)) { ret = PTR_ERR(priv->clk); dev_err(>dev, "cannot get clock: %d\n", ret); return ret; } -- Best Regards Masahiro Yamada
Re: [PATCH 1/6] mmc: renesas_sdhc: remove eprobe jump label
2017-11-21 5:31 GMT+09:00 Wolfram Sang : > >> - ret = PTR_ERR(priv->clk); >> - dev_err(>dev, "cannot get clock: %d\n", ret); >> - goto eprobe; >> + dev_err(>dev, "cannot get clock\n"); >> + return PTR_ERR(priv->clk); > > Why dropping the 'ret' printout? Will it be printed by the core? > No. I just wanted to save "ret = PTR_ERR(priv->clk)" line. I will restore the printout as follows: priv->clk = devm_clk_get(>dev, NULL); if (IS_ERR(priv->clk)) { ret = PTR_ERR(priv->clk); dev_err(>dev, "cannot get clock: %d\n", ret); return ret; } -- Best Regards Masahiro Yamada
linux-next: Tree for Nov 21
Hi all, Please do not add any v4.16 material to your linux-next included trees until v4.15-rc1 has been released. Changes since 20171120: I have removed the following trees since they have not been updated in over a year: backlight-fixes berlin binfmt_misc borntraeger cortex-m dax-misc dma-buf drm-exynos freevxfs fscache ftrace-fixes h8300 hexagon hwspinlock ia64 jdelvare-hwmon kgdb luto-misc mpc5xxx omap-pending omap_dss2 pcmcia rpi target-merge unicore32 v9fs The akpm-current tree still had its build failure for which I applied a fix patch. Non-merge commits (relative to Linus' tree): 807 835 files changed, 14447 insertions(+), 7649 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 245 trees (counting Linus' and 40 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (c8a0739b185d Merge tag 'ntb-4.15' of git://github.com/jonmason/ntb) Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi) Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures with sparse check) Merging arc-current/for-curr (f3156851616b ARCv2: boot log: updates for HS48: dual-issue, ECC, Loop Buffer) Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks addr_limit) Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs for v4.14-rc7) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug callback failure during imc initialization) Merging sparc/master (1deab8ce2c91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (32a72bbd5da2 net: vxge: Fix some indentation issues) Merging ipsec/master (94802151894d Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find.") Merging netfilter/master (fbcd253d2448 netfilter: conntrack: lower timeout to RETRANS seconds if window is 0) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (1514f6fc1369 Merge tag 'iwlwifi-for-kalle-2017-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (33ddd81e2bd5 mac80211: properly free requested-but-not-started TX agg sessions) Merging sound-current/for-linus (0ce48e172797 ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon) Merging pci-current/for-linus (1b6115fbe3b3 Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci) Merging driver-core.current/driver-core-linus (cf9b0772f2e4 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (894025f24bd0 Merge tag 'usb-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb) Merging usb.current/usb-linus (894025f24bd0 Merge tag 'usb-4.15-rc1' of git:
linux-next: Tree for Nov 21
Hi all, Please do not add any v4.16 material to your linux-next included trees until v4.15-rc1 has been released. Changes since 20171120: I have removed the following trees since they have not been updated in over a year: backlight-fixes berlin binfmt_misc borntraeger cortex-m dax-misc dma-buf drm-exynos freevxfs fscache ftrace-fixes h8300 hexagon hwspinlock ia64 jdelvare-hwmon kgdb luto-misc mpc5xxx omap-pending omap_dss2 pcmcia rpi target-merge unicore32 v9fs The akpm-current tree still had its build failure for which I applied a fix patch. Non-merge commits (relative to Linus' tree): 807 835 files changed, 14447 insertions(+), 7649 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 245 trees (counting Linus' and 40 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (c8a0739b185d Merge tag 'ntb-4.15' of git://github.com/jonmason/ntb) Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi) Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures with sparse check) Merging arc-current/for-curr (f3156851616b ARCv2: boot log: updates for HS48: dual-issue, ECC, Loop Buffer) Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks addr_limit) Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs for v4.14-rc7) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug callback failure during imc initialization) Merging sparc/master (1deab8ce2c91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (32a72bbd5da2 net: vxge: Fix some indentation issues) Merging ipsec/master (94802151894d Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find.") Merging netfilter/master (fbcd253d2448 netfilter: conntrack: lower timeout to RETRANS seconds if window is 0) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (1514f6fc1369 Merge tag 'iwlwifi-for-kalle-2017-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (33ddd81e2bd5 mac80211: properly free requested-but-not-started TX agg sessions) Merging sound-current/for-linus (0ce48e172797 ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon) Merging pci-current/for-linus (1b6115fbe3b3 Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci) Merging driver-core.current/driver-core-linus (cf9b0772f2e4 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (894025f24bd0 Merge tag 'usb-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb) Merging usb.current/usb-linus (894025f24bd0 Merge tag 'usb-4.15-rc1' of git:
Re: [PATCH 13/16] x86/entry/64: Create a percpu SYSCALL entry trampoline
On Mon, Nov 20, 2017 at 09:07:44AM -0800, Andy Lutomirski wrote: > + /* Save RDI, since we need a scratch register. */ > + pushq %rdi > + > + /* > + * x86 lacks a near absolute jump, and we can't jump to the real > + * entry text with a relative jump, so we use a double trampoline. > + */ > + movq$entry_SYSCALL_64_stage2, %rdi > + jmp *%rdi > +END(entry_SYSCALL_64_trampoline) > + > + .popsection > + > +ENTRY(entry_SYSCALL_64_stage2) > + /* > + * Rather than polluting the normal SYSCALL path with stack switching > + * nonsense, fix up our register state to match its expectations. > + */ > + UNWIND_HINT_EMPTY > + popq %rdi > + jmp entry_SYSCALL_64_after_hwframe > +END(entry_SYSCALL_64_stage2) Is there a reason why you couldn't just do the following? pushq $entry_SYSCALL_64_after_hwframe ret Then you wouldn't need the 2nd trampoline and the %rdi clobber. -- Josh
Re: [PATCH 13/16] x86/entry/64: Create a percpu SYSCALL entry trampoline
On Mon, Nov 20, 2017 at 09:07:44AM -0800, Andy Lutomirski wrote: > + /* Save RDI, since we need a scratch register. */ > + pushq %rdi > + > + /* > + * x86 lacks a near absolute jump, and we can't jump to the real > + * entry text with a relative jump, so we use a double trampoline. > + */ > + movq$entry_SYSCALL_64_stage2, %rdi > + jmp *%rdi > +END(entry_SYSCALL_64_trampoline) > + > + .popsection > + > +ENTRY(entry_SYSCALL_64_stage2) > + /* > + * Rather than polluting the normal SYSCALL path with stack switching > + * nonsense, fix up our register state to match its expectations. > + */ > + UNWIND_HINT_EMPTY > + popq %rdi > + jmp entry_SYSCALL_64_after_hwframe > +END(entry_SYSCALL_64_stage2) Is there a reason why you couldn't just do the following? pushq $entry_SYSCALL_64_after_hwframe ret Then you wouldn't need the 2nd trampoline and the %rdi clobber. -- Josh
Re: [PATCH] scsi: ppa: mark expected switch fall-throughs
Gustavo A., > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: ppa: mark expected switch fall-throughs
Gustavo A., > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. Applied to 4.16/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] fnic: use 64-bit timestamps
> struct timespec is deprecated since it overflows in 2038 on 32-bit > architectures, so we should use timespec64 consistently. > > I'm slightly adapting the format strings here, to make sure we print > the nanoseconds with the correct number of leading zeroes. Satish: Please review/test. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] fnic: use 64-bit timestamps
> struct timespec is deprecated since it overflows in 2038 on 32-bit > architectures, so we should use timespec64 consistently. > > I'm slightly adapting the format strings here, to make sure we print > the nanoseconds with the correct number of leading zeroes. Satish: Please review/test. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 02/16] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
On Mon, Nov 20, 2017 at 05:39:34PM -0800, Andy Lutomirski wrote: > On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeufwrote: > > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote: > >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf > >> wrote: > >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote: > >> >> >> but, more importantly, the OOPS unwinder will just bail without this > >> >> >> patch. With the patch, we get a valid unwind, except that everything > >> >> >> has a ? in front. > >> >> > > >> >> > Hm. I can't even fathom how that's possible. Are you talking about > >> >> > the > >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall? > >> >> > Either way I'm baffled... If the unwinder only encounters the > >> >> > SYSENTER > >> >> > stack at the end, how could that cause everything beforehand to have a > >> >> > question mark? > >> >> > >> >> I mean that, if I put a ud2 or other bug in the code that runs on the > >> >> SYSENTER stack, without this patch, I get a totally blank call trace. > >> > > >> > I would expect a blank call trace either way... > >> > >> Try making sync_regs use a few kB of stack space or, better yet, call > >> a non-inlined function that uses too much stack. > > > > You mean overflow the exception stack? I still don't see how that would > > do it. > > > > If you could show a specific example, with splats from before/after, > > that would be helpful. Because I still have no idea how this patch > > could possibly help. > > I added BUG() to sync_regs(). With the patch, I get: > > [4.211553] PANIC: double fault, error_code: 0x0 > [4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920 > [4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1.fc26 04/01/2014 > [4.213536] task: 88001aa18000 task.stack: 88001aa2 > [4.214059] RIP: 0010:do_error_trap+0x33/0x1c0 > [4.214449] RSP: :ff1b8f78 EFLAGS: 00010096 > [4.214934] RAX: dc00 RBX: ff1b8f90 RCX: > 0006 > [4.215554] RDX: 82048b20 RSI: RDI: > ff1b9110 > [4.216176] RBP: ff1b9088 R08: 0004 R09: > > [4.216793] R10: R11: fbe3723f R12: > 0006 > [4.217419] R13: R14: 0004 R15: > > [4.218046] FS: () GS:88001ae0() > knlGS: > [4.218775] CS: 0010 DS: ES: CR0: 80050033 > [4.219280] CR2: ff1b8f68 CR3: 193da002 CR4: > 003606f0 > [4.219931] Call Trace: > [4.220156] > [4.220383] ? async_page_fault+0x36/0x60 > [4.220768] ? invalid_op+0x22/0x40 > [4.221087] ? async_page_fault+0x36/0x60 > [4.221442] ? sync_regs+0x3c/0x40 > [4.221745] ? sync_regs+0x2e/0x40 > [4.222051] ? error_entry+0x6c/0xd0 > [4.222395] ? async_page_fault+0x36/0x60 > [4.222748] Ah, page fault. I thought you were talking about an NMI. I get it now. Did it overflow the stack? I think that would explain the question marks. -- Josh
Re: [PATCH 02/16] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
On Mon, Nov 20, 2017 at 05:39:34PM -0800, Andy Lutomirski wrote: > On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeuf wrote: > > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote: > >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf > >> wrote: > >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote: > >> >> >> but, more importantly, the OOPS unwinder will just bail without this > >> >> >> patch. With the patch, we get a valid unwind, except that everything > >> >> >> has a ? in front. > >> >> > > >> >> > Hm. I can't even fathom how that's possible. Are you talking about > >> >> > the > >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall? > >> >> > Either way I'm baffled... If the unwinder only encounters the > >> >> > SYSENTER > >> >> > stack at the end, how could that cause everything beforehand to have a > >> >> > question mark? > >> >> > >> >> I mean that, if I put a ud2 or other bug in the code that runs on the > >> >> SYSENTER stack, without this patch, I get a totally blank call trace. > >> > > >> > I would expect a blank call trace either way... > >> > >> Try making sync_regs use a few kB of stack space or, better yet, call > >> a non-inlined function that uses too much stack. > > > > You mean overflow the exception stack? I still don't see how that would > > do it. > > > > If you could show a specific example, with splats from before/after, > > that would be helpful. Because I still have no idea how this patch > > could possibly help. > > I added BUG() to sync_regs(). With the patch, I get: > > [4.211553] PANIC: double fault, error_code: 0x0 > [4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920 > [4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1.fc26 04/01/2014 > [4.213536] task: 88001aa18000 task.stack: 88001aa2 > [4.214059] RIP: 0010:do_error_trap+0x33/0x1c0 > [4.214449] RSP: :ff1b8f78 EFLAGS: 00010096 > [4.214934] RAX: dc00 RBX: ff1b8f90 RCX: > 0006 > [4.215554] RDX: 82048b20 RSI: RDI: > ff1b9110 > [4.216176] RBP: ff1b9088 R08: 0004 R09: > > [4.216793] R10: R11: fbe3723f R12: > 0006 > [4.217419] R13: R14: 0004 R15: > > [4.218046] FS: () GS:88001ae0() > knlGS: > [4.218775] CS: 0010 DS: ES: CR0: 80050033 > [4.219280] CR2: ff1b8f68 CR3: 193da002 CR4: > 003606f0 > [4.219931] Call Trace: > [4.220156] > [4.220383] ? async_page_fault+0x36/0x60 > [4.220768] ? invalid_op+0x22/0x40 > [4.221087] ? async_page_fault+0x36/0x60 > [4.221442] ? sync_regs+0x3c/0x40 > [4.221745] ? sync_regs+0x2e/0x40 > [4.222051] ? error_entry+0x6c/0xd0 > [4.222395] ? async_page_fault+0x36/0x60 > [4.222748] Ah, page fault. I thought you were talking about an NMI. I get it now. Did it overflow the stack? I think that would explain the question marks. -- Josh
Re: [PATCH] scsi: bnx2i: bnx2i_hwi: use swap macro in bnx2i_send_iscsi_nopout
Gustavo A., > Make use of the swap macro and remove unnecessary variable tmp. > This makes the code easier to read and maintain. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: bnx2i: bnx2i_hwi: use swap macro in bnx2i_send_iscsi_nopout
Gustavo A., > Make use of the swap macro and remove unnecessary variable tmp. > This makes the code easier to read and maintain. Applied to 4.16/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4
On Sun, Nov 19, 2017 at 1:36 AM, LEROY Christophewrote: > Meelis Roos a écrit : > >>> > > How early does it hang ? Any oops or trace ? >>> > >>> > Very early - instead oif kernel emssages, I see some repeated gibberish >>> > of some characteers, and the background turns white. >>> > I am booting from yaboot, background is normally black. >>> >>> Ok, could you try by replacing #ifdef CONFIG_STRICT_KERNEL_RWX by #if 0 >>> in arch/powerpc/lib/code-patching.c >> >> >> With this change and CONFIG_STRICT_KERNEL_RWX=y, it still boots. >> >> BTW, I get these warnings (sorry for the word wrap from screen paste) - >> may they be related or rather not? >> >> WRAParch/powerpc/boot/zImage.pmac >> WRAParch/powerpc/boot/zImage.coff >> WRAParch/powerpc/boot/zImage.miboot >> INFO: Uncompressed kernel (size 0x5d4c3c) overlaps the address of the >> wrapper(0x40) >> INFO: Fixing the link_address of wrapper to (0x60) >> > > Then i believe there is something wrong with commit > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20171115=37bc3e5fd764fb258ff4fcbb90b6d1b67fb466c1 > > Balbir, do you have any idea ? > Hmm.. interesting, so nobats works, but code-patching has issues? Any chance you can boot with xmon=on on the command line when you drop into the xmon> prompt, type dl to get the kernel log. Balbir Singh.
Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4
On Sun, Nov 19, 2017 at 1:36 AM, LEROY Christophe wrote: > Meelis Roos a écrit : > >>> > > How early does it hang ? Any oops or trace ? >>> > >>> > Very early - instead oif kernel emssages, I see some repeated gibberish >>> > of some characteers, and the background turns white. >>> > I am booting from yaboot, background is normally black. >>> >>> Ok, could you try by replacing #ifdef CONFIG_STRICT_KERNEL_RWX by #if 0 >>> in arch/powerpc/lib/code-patching.c >> >> >> With this change and CONFIG_STRICT_KERNEL_RWX=y, it still boots. >> >> BTW, I get these warnings (sorry for the word wrap from screen paste) - >> may they be related or rather not? >> >> WRAParch/powerpc/boot/zImage.pmac >> WRAParch/powerpc/boot/zImage.coff >> WRAParch/powerpc/boot/zImage.miboot >> INFO: Uncompressed kernel (size 0x5d4c3c) overlaps the address of the >> wrapper(0x40) >> INFO: Fixing the link_address of wrapper to (0x60) >> > > Then i believe there is something wrong with commit > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20171115=37bc3e5fd764fb258ff4fcbb90b6d1b67fb466c1 > > Balbir, do you have any idea ? > Hmm.. interesting, so nobats works, but code-patching has issues? Any chance you can boot with xmon=on on the command line when you drop into the xmon> prompt, type dl to get the kernel log. Balbir Singh.
[PATCH] mm: migrate: fix an incorrect call of prep_transhuge_page()
From: Zi YanIn [1], Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. [1] https://lkml.org/lkml/2017/11/20/411 Cc: sta...@vger.kernel.org # v4.14 Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Reported-by: Andrea Reale Signed-off-by: Zi Yan Cc: Naoya Horiguchi Cc: "Jérôme Glisse" --- include/linux/migrate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 895ec0c4942e..a2246cf670ba 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page, new_page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask); - if (new_page && PageTransHuge(page)) + if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); return new_page; -- 2.14.2
[PATCH] mm: migrate: fix an incorrect call of prep_transhuge_page()
From: Zi Yan In [1], Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. [1] https://lkml.org/lkml/2017/11/20/411 Cc: sta...@vger.kernel.org # v4.14 Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Reported-by: Andrea Reale Signed-off-by: Zi Yan Cc: Naoya Horiguchi Cc: "Jérôme Glisse" --- include/linux/migrate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 895ec0c4942e..a2246cf670ba 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page, new_page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask); - if (new_page && PageTransHuge(page)) + if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); return new_page; -- 2.14.2
[PATCH] x86/smpboot: set topology CPU mask before use.
From: Zhang Ningwe detect topology CPU mask in tsc is used before it is set, it leads to longer bootup time. let's check the code. smpboot.c:smp_callin() ---> calibarate.c:calibrate_delay() ---> tsc.c: calibrate_delay_is_known() ---> topology_core_cpumask(): read topology CPU mask ---> set_cpu_sibling_map(raw_smp_processor_id()) ---> cpumask_set_cpu(cpu, topology_core_cpumask(cpu)); from the calling chain, we know topology CPU mask is used before it actually set. So move set_cpu_sibling_map before calibrate_delay. Change-Id: I4eb8facb8751fe7aa2c6d2eac32437266d92ec00 Signed-off-by: Zhang Ning --- arch/x86/kernel/smpboot.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 36171bcd91f8..acee1ca3ef43 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -181,6 +181,12 @@ static void smp_callin(void) smp_store_cpu_info(cpuid); /* +* This must be done before setting cpu_online_mask +* or calling notify_cpu_starting. +*/ + set_cpu_sibling_map(raw_smp_processor_id()); + + /* * Get our bogomips. * Update loops_per_jiffy in cpu_data. Previous call to * smp_store_cpu_info() stored a value that is close but not as @@ -190,11 +196,7 @@ static void smp_callin(void) cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; pr_debug("Stack at about %p\n", ); - /* -* This must be done before setting cpu_online_mask -* or calling notify_cpu_starting. -*/ - set_cpu_sibling_map(raw_smp_processor_id()); + wmb(); notify_cpu_starting(cpuid); -- 2.11.0
[PATCH] x86/smpboot: set topology CPU mask before use.
From: Zhang Ning we detect topology CPU mask in tsc is used before it is set, it leads to longer bootup time. let's check the code. smpboot.c:smp_callin() ---> calibarate.c:calibrate_delay() ---> tsc.c: calibrate_delay_is_known() ---> topology_core_cpumask(): read topology CPU mask ---> set_cpu_sibling_map(raw_smp_processor_id()) ---> cpumask_set_cpu(cpu, topology_core_cpumask(cpu)); from the calling chain, we know topology CPU mask is used before it actually set. So move set_cpu_sibling_map before calibrate_delay. Change-Id: I4eb8facb8751fe7aa2c6d2eac32437266d92ec00 Signed-off-by: Zhang Ning --- arch/x86/kernel/smpboot.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 36171bcd91f8..acee1ca3ef43 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -181,6 +181,12 @@ static void smp_callin(void) smp_store_cpu_info(cpuid); /* +* This must be done before setting cpu_online_mask +* or calling notify_cpu_starting. +*/ + set_cpu_sibling_map(raw_smp_processor_id()); + + /* * Get our bogomips. * Update loops_per_jiffy in cpu_data. Previous call to * smp_store_cpu_info() stored a value that is close but not as @@ -190,11 +196,7 @@ static void smp_callin(void) cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; pr_debug("Stack at about %p\n", ); - /* -* This must be done before setting cpu_online_mask -* or calling notify_cpu_starting. -*/ - set_cpu_sibling_map(raw_smp_processor_id()); + wmb(); notify_cpu_starting(cpuid); -- 2.11.0
Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored
On Wed, May 10 2017, Ian Kent wrote: > The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT > which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering > of an automount by the call. But this flag is unconditionally cleared > for all stat family system calls except statx(). > > stat family system calls have always triggered mount requests for the > negative dentry case in follow_automount() which is intended but prevents > the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled. > > In order to handle the AT_NO_AUTOMOUNT for both system calls the > negative dentry case in follow_automount() needs to be changed to > return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other > required flags are clear). > > AFAICT this change doesn't have any noticable side effects and may, > in some use cases (although I didn't see it in testing) prevent > unnecessary callbacks to the automount daemon. Actually, this patch does have a noticeable side effect. Previously, if /home were an indirect mount point so that /home/neilb would mount my home directory, "ls -l /home/neilb" would always work. Now it doesn't. This happens because "ls" calls 'lstat' on the path and when that fails with "ENOENT", it doesn't bother trying to open. An alternate approach to the problem that this patch addresses is to *not* change follow_automount() but instead change the automount daemon and possibly autofs. When a notification of access for an indirect mount point is received, it would firstly just create the directory - not triggering a mount. If another notification is then received (after the directory has been created), it then triggers the mount. I suspect this might need changes to autofs to avoid races when two threads call lstat() on the same path at the same time. We would need to ensure that automount didn't see this as two requests though maybe it already has enough locking. Does that seem reasonable? Should we revert this patch (as a regression) and do something in automount instead? Thanks, NeilBrown > > It's also possible that a stat family call has been made with a > path that is in the process of being mounted by some other process. > But stat family calls should return the automount state of the path > as it is "now" so it shouldn't wait for mount completion. > > This is the same semantic as the positive dentry case already > handled. > > Signed-off-by: Ian Kent> Cc: David Howells > Cc: Colin Walters > Cc: Ondrej Holy > Cc: sta...@vger.kernel.org > --- > fs/namei.c | 15 --- > include/linux/fs.h |3 +-- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 7286f87..cd74838 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct > nameidata *nd, >* of the daemon to instantiate them before they can be used. >*/ > if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | > -LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && > - path->dentry->d_inode) > - return -EISDIR; > +LOOKUP_OPEN | LOOKUP_CREATE | > +LOOKUP_AUTOMOUNT))) { > + /* Positive dentry that isn't meant to trigger an > + * automount, EISDIR will allow it to be used, > + * otherwise there's no mount here "now" so return > + * ENOENT. > + */ > + if (path->dentry->d_inode) > + return -EISDIR; > + else > + return -ENOENT; > + } > > if (path->dentry->d_sb->s_user_ns != _user_ns) > return -EACCES; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 26488b4..be09684 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2935,8 +2935,7 @@ static inline int vfs_lstat(const char __user *name, > struct kstat *stat) > static inline int vfs_fstatat(int dfd, const char __user *filename, > struct kstat *stat, int flags) > { > - return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, > - stat, STATX_BASIC_STATS); > + return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); > } > static inline int vfs_fstat(int fd, struct kstat *stat) > { signature.asc Description: PGP signature
Re: [PATCH 3/3] autofs - fix AT_NO_AUTOMOUNT not being honored
On Wed, May 10 2017, Ian Kent wrote: > The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT > which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering > of an automount by the call. But this flag is unconditionally cleared > for all stat family system calls except statx(). > > stat family system calls have always triggered mount requests for the > negative dentry case in follow_automount() which is intended but prevents > the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled. > > In order to handle the AT_NO_AUTOMOUNT for both system calls the > negative dentry case in follow_automount() needs to be changed to > return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other > required flags are clear). > > AFAICT this change doesn't have any noticable side effects and may, > in some use cases (although I didn't see it in testing) prevent > unnecessary callbacks to the automount daemon. Actually, this patch does have a noticeable side effect. Previously, if /home were an indirect mount point so that /home/neilb would mount my home directory, "ls -l /home/neilb" would always work. Now it doesn't. This happens because "ls" calls 'lstat' on the path and when that fails with "ENOENT", it doesn't bother trying to open. An alternate approach to the problem that this patch addresses is to *not* change follow_automount() but instead change the automount daemon and possibly autofs. When a notification of access for an indirect mount point is received, it would firstly just create the directory - not triggering a mount. If another notification is then received (after the directory has been created), it then triggers the mount. I suspect this might need changes to autofs to avoid races when two threads call lstat() on the same path at the same time. We would need to ensure that automount didn't see this as two requests though maybe it already has enough locking. Does that seem reasonable? Should we revert this patch (as a regression) and do something in automount instead? Thanks, NeilBrown > > It's also possible that a stat family call has been made with a > path that is in the process of being mounted by some other process. > But stat family calls should return the automount state of the path > as it is "now" so it shouldn't wait for mount completion. > > This is the same semantic as the positive dentry case already > handled. > > Signed-off-by: Ian Kent > Cc: David Howells > Cc: Colin Walters > Cc: Ondrej Holy > Cc: sta...@vger.kernel.org > --- > fs/namei.c | 15 --- > include/linux/fs.h |3 +-- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 7286f87..cd74838 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct > nameidata *nd, >* of the daemon to instantiate them before they can be used. >*/ > if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | > -LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && > - path->dentry->d_inode) > - return -EISDIR; > +LOOKUP_OPEN | LOOKUP_CREATE | > +LOOKUP_AUTOMOUNT))) { > + /* Positive dentry that isn't meant to trigger an > + * automount, EISDIR will allow it to be used, > + * otherwise there's no mount here "now" so return > + * ENOENT. > + */ > + if (path->dentry->d_inode) > + return -EISDIR; > + else > + return -ENOENT; > + } > > if (path->dentry->d_sb->s_user_ns != _user_ns) > return -EACCES; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 26488b4..be09684 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2935,8 +2935,7 @@ static inline int vfs_lstat(const char __user *name, > struct kstat *stat) > static inline int vfs_fstatat(int dfd, const char __user *filename, > struct kstat *stat, int flags) > { > - return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, > - stat, STATX_BASIC_STATS); > + return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); > } > static inline int vfs_fstat(int fd, struct kstat *stat) > { signature.asc Description: PGP signature
Re: [alsa-devel] [PATCH 2/2] ASoC: fsl_ssi: serialize AC'97 register access operations
On Mon, Nov 20, 2017 at 11:16:07PM +0100, Maciej S. Szmigiero wrote: > AC'97 register access operations (both read and write) on SSI use a one, > shared set of SSI registers for AC'97 register address and data. > This means that only one such access is possible at a time and so all these > operations need to be serialized. > > Since an AC'97 register access operation in this driver takes 100us+ let's > use a mutex for this. > > Use this opportunity to also change a default value returned from AC'97 > register read function from -1 to 0, since that's what AC'97 specs require > to be returned when unknown / undefined registers are read. > > Signed-off-by: Maciej S. Szmigiero> static unsigned short fsl_ssi_ac97_read(struct snd_ac97 *ac97, > @@ -1287,16 +1295,18 @@ static unsigned short fsl_ssi_ac97_read(struct > snd_ac97 *ac97, > { > struct regmap *regs = fsl_ac97_data->regs; > > - unsigned short val = -1; > + unsigned short val = 0; > u32 reg_val; > unsigned int lreg; > int ret; > > + mutex_lock(_ac97_data->ac97_reg_lock); > + > ret = clk_prepare_enable(fsl_ac97_data->clk); > if (ret) { > pr_err("ac97 read clk_prepare_enable failed: %d\n", > ret); > - return -1; > + goto ret_unlock; It will return val (== 0) in this case. Will this be correctly handled by callers? I find sound/ac97/bus.c checks if ret < 0 for ops->read(). So it might be better to add "val = ret;" before goto? Or use val instead of ret directly? > } > > lreg = (reg & 0x7f) << 12; > @@ -1311,6 +1321,8 @@ static unsigned short fsl_ssi_ac97_read(struct snd_ac97 > *ac97, > > clk_disable_unprepare(fsl_ac97_data->clk); > > +ret_unlock: > + mutex_unlock(_ac97_data->ac97_reg_lock); > return val; > }
Re: [alsa-devel] [PATCH 2/2] ASoC: fsl_ssi: serialize AC'97 register access operations
On Mon, Nov 20, 2017 at 11:16:07PM +0100, Maciej S. Szmigiero wrote: > AC'97 register access operations (both read and write) on SSI use a one, > shared set of SSI registers for AC'97 register address and data. > This means that only one such access is possible at a time and so all these > operations need to be serialized. > > Since an AC'97 register access operation in this driver takes 100us+ let's > use a mutex for this. > > Use this opportunity to also change a default value returned from AC'97 > register read function from -1 to 0, since that's what AC'97 specs require > to be returned when unknown / undefined registers are read. > > Signed-off-by: Maciej S. Szmigiero > static unsigned short fsl_ssi_ac97_read(struct snd_ac97 *ac97, > @@ -1287,16 +1295,18 @@ static unsigned short fsl_ssi_ac97_read(struct > snd_ac97 *ac97, > { > struct regmap *regs = fsl_ac97_data->regs; > > - unsigned short val = -1; > + unsigned short val = 0; > u32 reg_val; > unsigned int lreg; > int ret; > > + mutex_lock(_ac97_data->ac97_reg_lock); > + > ret = clk_prepare_enable(fsl_ac97_data->clk); > if (ret) { > pr_err("ac97 read clk_prepare_enable failed: %d\n", > ret); > - return -1; > + goto ret_unlock; It will return val (== 0) in this case. Will this be correctly handled by callers? I find sound/ac97/bus.c checks if ret < 0 for ops->read(). So it might be better to add "val = ret;" before goto? Or use val instead of ret directly? > } > > lreg = (reg & 0x7f) << 12; > @@ -1311,6 +1321,8 @@ static unsigned short fsl_ssi_ac97_read(struct snd_ac97 > *ac97, > > clk_disable_unprepare(fsl_ac97_data->clk); > > +ret_unlock: > + mutex_unlock(_ac97_data->ac97_reg_lock); > return val; > }
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkinwrote: > On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin wrote: >> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary >> wrote: >>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary wrote: > @@ -483,8 +483,8 @@ > nouveau :02:00.0: disp:0860: -> 0500 > nouveau :02:00.0: disp:0864: > nouveau :02:00.0: disp:0868: -> 04000500 > -nouveau :02:00.0: disp:086c: -> 00100500 > -nouveau :02:00.0: disp:0870: e900 -> 1e00 > +nouveau :02:00.0: disp:086c: -> 00100a00 > +nouveau :02:00.0: disp:0870: e900 -> e800 > nouveau :02:00.0: disp:0874: -> > nouveau :02:00.0: disp:0878: > nouveau :02:00.0: disp:0880: 0500 > > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in > 64MB case. Why? > > I get blank screen even with 64MB with video=1280x1024-8 kernel > parameter. Console works with video=1280x1024-16 even with 32MB stolen > memory. > > Conclusions: 8-bit support is broken and bpp reduction is weird. OK, well that makes a *ton* of sense (8bpp being broken). I think the idea of bpp reduction is that when you're on your shiny new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating all that to a pinned fbcon - almost half of that would go to a single 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to have at least a few fb-sized buffers for backbuffer rendering, etc. The specific limits could probably use tweaking - I think they only consider VRAM size, not the fb size. I guess 8bpp worked prior to the change you bisected though, so we should figure out what we did wrong in the new code. >>> >>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. >> >> By the way, instead of booting $kernel, you can use modetest from >> libdrm/tests. Not sure if it supports C8 though =/ > > It didn't. But it does now - I mailed a patch to dri-devel, also (with > slight fix) available at > > https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch > > This works on GK208 but not on G92 (whose display unit is much closer > to your MCP79's). You can run as > > ./modetest -s DVI-I-1:1920x1200@C8 > > This should display a SMPTE pattern, and exit when you hit enter. When > it does so, it doesn't restore fbcon, but you can swtich to another > vty to get console back. > > I get a white picture on G92. Now just have to figure out how to fix > it. Someone should also test on a G80 if possible, since that takes a > different path as well. Someone tested out a GF100 and it had the same issue. I've since determined that the color is that of the first entry in the LUT. With the above program, it's (192, 192, 192) which looks white. -ilia
Re: Blank console but X11 works on MCP79 - old regression since 3.8
On Sat, Nov 18, 2017 at 12:23 AM, Ilia Mirkin wrote: > On Fri, Nov 17, 2017 at 2:37 PM, Ilia Mirkin wrote: >> On Fri, Nov 17, 2017 at 2:25 PM, Ondrej Zary >> wrote: >>> On Friday 17 November 2017 18:41:17 Ilia Mirkin wrote: On Fri, Nov 17, 2017 at 12:33 PM, Ondrej Zary wrote: > @@ -483,8 +483,8 @@ > nouveau :02:00.0: disp:0860: -> 0500 > nouveau :02:00.0: disp:0864: > nouveau :02:00.0: disp:0868: -> 04000500 > -nouveau :02:00.0: disp:086c: -> 00100500 > -nouveau :02:00.0: disp:0870: e900 -> 1e00 > +nouveau :02:00.0: disp:086c: -> 00100a00 > +nouveau :02:00.0: disp:0870: e900 -> e800 > nouveau :02:00.0: disp:0874: -> > nouveau :02:00.0: disp:0878: > nouveau :02:00.0: disp:0880: 0500 > > Looks like it's using 8bpp (0x1e00) in 32MB case but 16bpp (0xe800) in > 64MB case. Why? > > I get blank screen even with 64MB with video=1280x1024-8 kernel > parameter. Console works with video=1280x1024-16 even with 32MB stolen > memory. > > Conclusions: 8-bit support is broken and bpp reduction is weird. OK, well that makes a *ton* of sense (8bpp being broken). I think the idea of bpp reduction is that when you're on your shiny new Riva TNT with 16MB of VRAM, you don't want to go crazy allocating all that to a pinned fbcon - almost half of that would go to a single 32bpp 1600x1200 buffer, more for 1920x1200. You want to be able to have at least a few fb-sized buffers for backbuffer rendering, etc. The specific limits could probably use tweaking - I think they only consider VRAM size, not the fb size. I guess 8bpp worked prior to the change you bisected though, so we should figure out what we did wrong in the new code. >>> >>> Yes, booted 3.7 (last working kernel) and it's running in 8bpp. >> >> By the way, instead of booting $kernel, you can use modetest from >> libdrm/tests. Not sure if it supports C8 though =/ > > It didn't. But it does now - I mailed a patch to dri-devel, also (with > slight fix) available at > > https://people.freedesktop.org/~imirkin/patches/0001-modetest-add-C8-support-to-generate-SMPTE-pattern.patch > > This works on GK208 but not on G92 (whose display unit is much closer > to your MCP79's). You can run as > > ./modetest -s DVI-I-1:1920x1200@C8 > > This should display a SMPTE pattern, and exit when you hit enter. When > it does so, it doesn't restore fbcon, but you can swtich to another > vty to get console back. > > I get a white picture on G92. Now just have to figure out how to fix > it. Someone should also test on a G80 if possible, since that takes a > different path as well. Someone tested out a GF100 and it had the same issue. I've since determined that the color is that of the first entry in the LUT. With the above program, it's (192, 192, 192) which looks white. -ilia
Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote: > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > On Mon, Nov 20, 2017 at 03:21:32PM -0500, Nicolas Pitre wrote: > > > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > > > > > On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote: > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote: > > > > > > > > > > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote: > > > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote: > > > > > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote: > > > > > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote: > > > > > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre > > > > > > > > > > wrote: > > > > > > > > > > > @@ -2295,6 +2295,7 @@ void __init > > > > > > > > > > > setup_per_cpu_areas(void) > > > > > > > > > > > if (pcpu_setup_first_chunk(ai, fc) < 0) > > > > > > > > > > > panic("Failed to initialize percpu > > > > > > > > > > > areas."); > > > > > > > > > > > + pcpu_free_alloc_info(ai); > > > > > > > > > > > > > > > > > > > > This is the culprit. Everything works fine if I remove this > > > > > > > > > > line. > > > > > > > > > > > > > > > > > > Without this line, the memory at the ai pointer is leaked. > > > > > > > > > Maybe this is > > > > > > > > > modifying the memory allocation pattern and that triggers a > > > > > > > > > bug later on > > > > > > > > > in your case. > > > > > > > > > > > > > > > > > > At that point the console driver is not yet initialized and > > > > > > > > > any error > > > > > > > > > message won't be printed. You should enable the early console > > > > > > > > > mechanism > > > > > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) > > > > > > > > > and see what > > > > > > > > > that might tell you. > > > > > > > > > > > > > > > > > > > > > > > > > The problem is that BUG() on crisv32 does not yield useful > > > > > > > > output. > > > > > > > > Anyway, here is the culprit. > > > > > > > > > > > > > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c > > > > > > > > index 6aef64254203..2bcc8901450c 100644 > > > > > > > > --- a/mm/bootmem.c > > > > > > > > +++ b/mm/bootmem.c > > > > > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned > > > > > > > > long start, > > > > > > > > unsigned long end, > > > > > > > > return 0; > > > > > > > > pos = bdata->node_low_pfn; > > > > > > > > } > > > > > > > > - BUG(); > > > > > > > > + WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not > > > > > > > > found\n", > > > > > > > > start, > > > > > > > > end); > > > > > > > > + return -ENOMEM; > > > > > > > > } > > > > > > > > > > > > > > > > /** > > > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > > > > > > > index 79e3549cab0f..c75622d844f1 100644 > > > > > > > > --- a/mm/percpu.c > > > > > > > > +++ b/mm/percpu.c > > > > > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init > > > > > > > > pcpu_alloc_alloc_info(int nr_groups, > > > > > > > >*/ > > > > > > > > void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai) > > > > > > > > { > > > > > > > > + printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, > > > > > > > > __pa(ai)); > > > > > > > > memblock_free_early(__pa(ai), ai->__ai_size); > > > > > > > > > > > > > > > > results in: > > > > > > > > > > > > > > > > pcpu_free_alloc_info(c0534000 (0x40534000)) > > > > > > > > [ cut here ] > > > > > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 > > > > > > > > mark_bootmem+0x9a/0xaa > > > > > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found > > > > > > > > > > > > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end > > > > > > > up > > > > > > > with 0x2029a in mark_bootmem(), let alone not exit on the first > > > > > > > "if (max > > > > > > > == end) return 0;" within the loop is rather weird. > > > > > > > > > > > > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000, > > > > > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b > > > > > > > > > > > > bootmem range is 0x6..0x61000. It doesn't get to "if (max == > > > > > > end)" > > > > > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x6)". > > > > > > > > > > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case. > > > > > However the bootmem allocator deals with physical addresses not > > > > > virtual > > > > > ones. So it shouldn't give you a 0x6..0x61000 range. > > > > > > > > > > Would be interesting to see what result you get on line 860 of > > > > > mm/bootmem.c. > > > > > > > > > Nothing; __alloc_bootmem_low_node() is not called. > > > > > > > > Call chain is: > > > > pcpu_alloc_alloc_info > > > > memblock_virt_alloc_nopanic > > > > __alloc_bootmem_nopanic > > > > ___alloc_bootmem_nopanic > > > > > > But from there it should continue with: > > > > > >
Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote: > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > On Mon, Nov 20, 2017 at 03:21:32PM -0500, Nicolas Pitre wrote: > > > On Mon, 20 Nov 2017, Guenter Roeck wrote: > > > > > > > On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote: > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote: > > > > > > > > > > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote: > > > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote: > > > > > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote: > > > > > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote: > > > > > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre > > > > > > > > > > wrote: > > > > > > > > > > > @@ -2295,6 +2295,7 @@ void __init > > > > > > > > > > > setup_per_cpu_areas(void) > > > > > > > > > > > if (pcpu_setup_first_chunk(ai, fc) < 0) > > > > > > > > > > > panic("Failed to initialize percpu > > > > > > > > > > > areas."); > > > > > > > > > > > + pcpu_free_alloc_info(ai); > > > > > > > > > > > > > > > > > > > > This is the culprit. Everything works fine if I remove this > > > > > > > > > > line. > > > > > > > > > > > > > > > > > > Without this line, the memory at the ai pointer is leaked. > > > > > > > > > Maybe this is > > > > > > > > > modifying the memory allocation pattern and that triggers a > > > > > > > > > bug later on > > > > > > > > > in your case. > > > > > > > > > > > > > > > > > > At that point the console driver is not yet initialized and > > > > > > > > > any error > > > > > > > > > message won't be printed. You should enable the early console > > > > > > > > > mechanism > > > > > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) > > > > > > > > > and see what > > > > > > > > > that might tell you. > > > > > > > > > > > > > > > > > > > > > > > > > The problem is that BUG() on crisv32 does not yield useful > > > > > > > > output. > > > > > > > > Anyway, here is the culprit. > > > > > > > > > > > > > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c > > > > > > > > index 6aef64254203..2bcc8901450c 100644 > > > > > > > > --- a/mm/bootmem.c > > > > > > > > +++ b/mm/bootmem.c > > > > > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned > > > > > > > > long start, > > > > > > > > unsigned long end, > > > > > > > > return 0; > > > > > > > > pos = bdata->node_low_pfn; > > > > > > > > } > > > > > > > > - BUG(); > > > > > > > > + WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not > > > > > > > > found\n", > > > > > > > > start, > > > > > > > > end); > > > > > > > > + return -ENOMEM; > > > > > > > > } > > > > > > > > > > > > > > > > /** > > > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > > > > > > > index 79e3549cab0f..c75622d844f1 100644 > > > > > > > > --- a/mm/percpu.c > > > > > > > > +++ b/mm/percpu.c > > > > > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init > > > > > > > > pcpu_alloc_alloc_info(int nr_groups, > > > > > > > >*/ > > > > > > > > void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai) > > > > > > > > { > > > > > > > > + printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, > > > > > > > > __pa(ai)); > > > > > > > > memblock_free_early(__pa(ai), ai->__ai_size); > > > > > > > > > > > > > > > > results in: > > > > > > > > > > > > > > > > pcpu_free_alloc_info(c0534000 (0x40534000)) > > > > > > > > [ cut here ] > > > > > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 > > > > > > > > mark_bootmem+0x9a/0xaa > > > > > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found > > > > > > > > > > > > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end > > > > > > > up > > > > > > > with 0x2029a in mark_bootmem(), let alone not exit on the first > > > > > > > "if (max > > > > > > > == end) return 0;" within the loop is rather weird. > > > > > > > > > > > > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000, > > > > > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b > > > > > > > > > > > > bootmem range is 0x6..0x61000. It doesn't get to "if (max == > > > > > > end)" > > > > > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x6)". > > > > > > > > > > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case. > > > > > However the bootmem allocator deals with physical addresses not > > > > > virtual > > > > > ones. So it shouldn't give you a 0x6..0x61000 range. > > > > > > > > > > Would be interesting to see what result you get on line 860 of > > > > > mm/bootmem.c. > > > > > > > > > Nothing; __alloc_bootmem_low_node() is not called. > > > > > > > > Call chain is: > > > > pcpu_alloc_alloc_info > > > > memblock_virt_alloc_nopanic > > > > __alloc_bootmem_nopanic > > > > ___alloc_bootmem_nopanic > > > > > > But from there it should continue with: > > > > > >
Re: [PATCH 02/16] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeufwrote: > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote: >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf wrote: >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote: >> >> >> but, more importantly, the OOPS unwinder will just bail without this >> >> >> patch. With the patch, we get a valid unwind, except that everything >> >> >> has a ? in front. >> >> > >> >> > Hm. I can't even fathom how that's possible. Are you talking about the >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall? >> >> > Either way I'm baffled... If the unwinder only encounters the SYSENTER >> >> > stack at the end, how could that cause everything beforehand to have a >> >> > question mark? >> >> >> >> I mean that, if I put a ud2 or other bug in the code that runs on the >> >> SYSENTER stack, without this patch, I get a totally blank call trace. >> > >> > I would expect a blank call trace either way... >> >> Try making sync_regs use a few kB of stack space or, better yet, call >> a non-inlined function that uses too much stack. > > You mean overflow the exception stack? I still don't see how that would > do it. > > If you could show a specific example, with splats from before/after, > that would be helpful. Because I still have no idea how this patch > could possibly help. I added BUG() to sync_regs(). With the patch, I get: [4.211553] PANIC: double fault, error_code: 0x0 [4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920 [4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [4.213536] task: 88001aa18000 task.stack: 88001aa2 [4.214059] RIP: 0010:do_error_trap+0x33/0x1c0 [4.214449] RSP: :ff1b8f78 EFLAGS: 00010096 [4.214934] RAX: dc00 RBX: ff1b8f90 RCX: 0006 [4.215554] RDX: 82048b20 RSI: RDI: ff1b9110 [4.216176] RBP: ff1b9088 R08: 0004 R09: [4.216793] R10: R11: fbe3723f R12: 0006 [4.217419] R13: R14: 0004 R15: [4.218046] FS: () GS:88001ae0() knlGS: [4.218775] CS: 0010 DS: ES: CR0: 80050033 [4.219280] CR2: ff1b8f68 CR3: 193da002 CR4: 003606f0 [4.219931] Call Trace: [4.220156] [4.220383] ? async_page_fault+0x36/0x60 [4.220768] ? invalid_op+0x22/0x40 [4.221087] ? async_page_fault+0x36/0x60 [4.221442] ? sync_regs+0x3c/0x40 [4.221745] ? sync_regs+0x2e/0x40 [4.222051] ? error_entry+0x6c/0xd0 [4.222395] ? async_page_fault+0x36/0x60 [4.222748] [4.223014] Code: 00 00 00 fc ff df 41 55 41 54 49 89 f7 55 53 48 89 fd 48 81 c7 88 00 00 00 49 89 cc 45 89 c6 48 81 ec d8 00 00 00 48 8d 5c 24 18 <48> 89 14 24 48 c7 44 24 18 b3 8a b5 41 48 c7 44 24 20 b8 5e 4d Without the patch, I get: [3.962218] PANIC: double fault, error_code: 0x0 [3.962728] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #921 [3.963225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [3.963998] task: 88001aa18000 task.stack: 88001aa2 [3.964526] RIP: 0010:do_error_trap+0x33/0x1c0 [3.964947] RSP: :ff1b8f78 EFLAGS: 00010096 [3.965439] RAX: dc00 RBX: ff1b8f90 RCX: 0006 [3.966067] RDX: 82048b20 RSI: RDI: ff1b9110 [3.966696] RBP: ff1b9088 R08: 0004 R09: [3.967316] R10: R11: fbe3723f R12: 0006 [3.967939] R13: R14: 0004 R15: [3.968565] FS: () GS:88001ae0() knlGS: [3.969272] CS: 0010 DS: ES: CR0: 80050033 [3.969776] CR2: ff1b8f68 CR3: 19308002 CR4: 003606f0 [3.970406] Call Trace: [3.970632] Code: 00 00 00 fc ff df 41 55 41 54 49 89 f7 55 53 48 89 fd 48 81 c7 88 00 00 00 49 89 cc 45 89 c6 48 81 ec d8 00 00 00 48 8d 5c 24 18 <48> 89 14 24 48 c7 44 24 18 b3 8a b5 41 48 c7 44 24 20 b8 5e 4d I prefer the former :) > > -- > Josh
Re: [PATCH 02/16] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeuf wrote: > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote: >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf wrote: >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote: >> >> >> but, more importantly, the OOPS unwinder will just bail without this >> >> >> patch. With the patch, we get a valid unwind, except that everything >> >> >> has a ? in front. >> >> > >> >> > Hm. I can't even fathom how that's possible. Are you talking about the >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall? >> >> > Either way I'm baffled... If the unwinder only encounters the SYSENTER >> >> > stack at the end, how could that cause everything beforehand to have a >> >> > question mark? >> >> >> >> I mean that, if I put a ud2 or other bug in the code that runs on the >> >> SYSENTER stack, without this patch, I get a totally blank call trace. >> > >> > I would expect a blank call trace either way... >> >> Try making sync_regs use a few kB of stack space or, better yet, call >> a non-inlined function that uses too much stack. > > You mean overflow the exception stack? I still don't see how that would > do it. > > If you could show a specific example, with splats from before/after, > that would be helpful. Because I still have no idea how this patch > could possibly help. I added BUG() to sync_regs(). With the patch, I get: [4.211553] PANIC: double fault, error_code: 0x0 [4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920 [4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [4.213536] task: 88001aa18000 task.stack: 88001aa2 [4.214059] RIP: 0010:do_error_trap+0x33/0x1c0 [4.214449] RSP: :ff1b8f78 EFLAGS: 00010096 [4.214934] RAX: dc00 RBX: ff1b8f90 RCX: 0006 [4.215554] RDX: 82048b20 RSI: RDI: ff1b9110 [4.216176] RBP: ff1b9088 R08: 0004 R09: [4.216793] R10: R11: fbe3723f R12: 0006 [4.217419] R13: R14: 0004 R15: [4.218046] FS: () GS:88001ae0() knlGS: [4.218775] CS: 0010 DS: ES: CR0: 80050033 [4.219280] CR2: ff1b8f68 CR3: 193da002 CR4: 003606f0 [4.219931] Call Trace: [4.220156] [4.220383] ? async_page_fault+0x36/0x60 [4.220768] ? invalid_op+0x22/0x40 [4.221087] ? async_page_fault+0x36/0x60 [4.221442] ? sync_regs+0x3c/0x40 [4.221745] ? sync_regs+0x2e/0x40 [4.222051] ? error_entry+0x6c/0xd0 [4.222395] ? async_page_fault+0x36/0x60 [4.222748] [4.223014] Code: 00 00 00 fc ff df 41 55 41 54 49 89 f7 55 53 48 89 fd 48 81 c7 88 00 00 00 49 89 cc 45 89 c6 48 81 ec d8 00 00 00 48 8d 5c 24 18 <48> 89 14 24 48 c7 44 24 18 b3 8a b5 41 48 c7 44 24 20 b8 5e 4d Without the patch, I get: [3.962218] PANIC: double fault, error_code: 0x0 [3.962728] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #921 [3.963225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [3.963998] task: 88001aa18000 task.stack: 88001aa2 [3.964526] RIP: 0010:do_error_trap+0x33/0x1c0 [3.964947] RSP: :ff1b8f78 EFLAGS: 00010096 [3.965439] RAX: dc00 RBX: ff1b8f90 RCX: 0006 [3.966067] RDX: 82048b20 RSI: RDI: ff1b9110 [3.966696] RBP: ff1b9088 R08: 0004 R09: [3.967316] R10: R11: fbe3723f R12: 0006 [3.967939] R13: R14: 0004 R15: [3.968565] FS: () GS:88001ae0() knlGS: [3.969272] CS: 0010 DS: ES: CR0: 80050033 [3.969776] CR2: ff1b8f68 CR3: 19308002 CR4: 003606f0 [3.970406] Call Trace: [3.970632] Code: 00 00 00 fc ff df 41 55 41 54 49 89 f7 55 53 48 89 fd 48 81 c7 88 00 00 00 49 89 cc 45 89 c6 48 81 ec d8 00 00 00 48 8d 5c 24 18 <48> 89 14 24 48 c7 44 24 18 b3 8a b5 41 48 c7 44 24 20 b8 5e 4d I prefer the former :) > > -- > Josh
Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
On Mon, Nov 20 2017 at 7:34pm -0500, NeilBrownwrote: > On Mon, Nov 20 2017, Mike Snitzer wrote: > > > On Sun, Jun 18, 2017 at 5:36 PM, NeilBrown wrote: > >> On Sun, Jun 18 2017, Jens Axboe wrote: > >> > >>> On Sun, Jun 18 2017, NeilBrown wrote: > This is a resend of my series of patches working > towards removing the bioset work queues. > > This set is based on for-4.13/block. > > It incorporates the revised versions of all the patches that were > resent following feedback on the last set. > > It also includes a minor grammatic improvement to a comment, and > simple changes to compensate for a couple of changes to the block tree > since the last posting. > > I hope to eventually get rid of the new BIOSET_NEED_RESCUER flag, > but that needs work in dm and probably bcache first. > >>> > >>> Thanks Neil, applied. > >> > >> Thanks a lot Jens. > > > > I missed this line of work until now. Not quite sure why I wasn't > > cc'd or my review/ack required for the DM changes in this patchset. > > Hi Mike, > I'm sorry you weren't included on those. My thinking at the time was > probably that they were purely cosmetic changes which made no > functional difference to dm. That is no excuse though and I do > apologize. > > > > > But I've now queued this patch for once Linus gets back (reverts DM > > changes from commit 47e0fb461f): > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next=c9fdc42ba23eabd1ba7aef199fb9bb4b4fe5c545 > > This patch does two things. > 1/ It removes the BIOSET_NEED_RESCUER flag from biosets created by dm. > This a functional changed over the code from before my patches. > Previously, all biosets were given a rescuer thread. > After my patch set, biosets only got a rescuer thread if > BIOSET_NEED_RESCUER was passed, and it was passed for all biosets. > I then removed it from places were I was certain it wasn't needed. > I didn't remove it from dm because I wasn't certain. Your > patch does remove the flags, which I think is incorrect - see below. > > 2/ It changes flush_current_bio_list() so that bios allocated from a >bioset that does not have a rescue_workqueue are now added to >the ->rescue_list for their bio_set, and ->rescue_work is queued >on the NULL ->rescue_workqueue, resulting in a NULL dereference. >I suspect you don't want this. > > The patch description claims that the patch fixes something, but it > isn't clear to me what it is meant to be fixing. > > It makes reference to dbba42d8 which is described as removing an unused > bioset process, though what it actually does is remove an used bioset > (and obvious the process disappears with it). My patch doesn't change > that behavior. Well I looked at this because Zdenek reported that with more recent kernels he is seeing the "bioset" per DM device again (whereas it was thought to be removed with mikulas' commit dbba42d8 -- but that commit removed "bioset" only in terms of q->bio_split. Looks like the q->bio_split bioset is created with BIOSET_NEED_RESCUER but DM calls bioset_free() for q->bio_split. Strikes me as odd that "bioset" was removed DM devices until it recently reappeared. Especially if what you say is accurate (that BIOSET_NEED_RESCUER was implied with the old code.. I believe you!) I tried to quickly answer how "bioset" is now re-appearing for DM devices but I obviously need to put more time to it. (And my goodness does this bioset rescue workqueue need a better name than "bioset"! Should include the bdevname too) > > As is, it is my understanding that DM has no need for a bio_set's > > rescue_workqueue. So its removal would appear to only be gated by > > bcache? > > > > But I could be mistaken, moving forward please let me know what you > > feel needs doing in DM to make it a better citizen. > > I think you are mistaken. > Please see >https://www.redhat.com/archives/dm-devel/2017-August/msg00310.html > and >https://www.redhat.com/archives/dm-devel/2017-August/msg00315.html > for which the thread continues: >https://www.redhat.com/archives/dm-devel/2017-September/msg1.html > > These were sent to you, though most of the conversation happened with > Mikulas. > > I think that the patches in those threads explain why dm currently needs > rescuer threads, and shows how dm can be changed to no longer need the > rescuer. I would appreciate your thoughts on these patches. I can > resend them if that would help. No need to resend. I'll work through the old threads. > That would then just leave bcache I find it a bit of a challenge to > reason about the code in bcache, but if we can remove > BIOSET_NEED_RESCUER from dm, that will be an extra incentive for me to learn > :-) I'm all for properly removing BIOSET_NEED_RESCUER from DM. I'll look closer at all of this in the morning (for now I'm backing the
Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.
On Mon, Nov 20 2017 at 7:34pm -0500, NeilBrown wrote: > On Mon, Nov 20 2017, Mike Snitzer wrote: > > > On Sun, Jun 18, 2017 at 5:36 PM, NeilBrown wrote: > >> On Sun, Jun 18 2017, Jens Axboe wrote: > >> > >>> On Sun, Jun 18 2017, NeilBrown wrote: > This is a resend of my series of patches working > towards removing the bioset work queues. > > This set is based on for-4.13/block. > > It incorporates the revised versions of all the patches that were > resent following feedback on the last set. > > It also includes a minor grammatic improvement to a comment, and > simple changes to compensate for a couple of changes to the block tree > since the last posting. > > I hope to eventually get rid of the new BIOSET_NEED_RESCUER flag, > but that needs work in dm and probably bcache first. > >>> > >>> Thanks Neil, applied. > >> > >> Thanks a lot Jens. > > > > I missed this line of work until now. Not quite sure why I wasn't > > cc'd or my review/ack required for the DM changes in this patchset. > > Hi Mike, > I'm sorry you weren't included on those. My thinking at the time was > probably that they were purely cosmetic changes which made no > functional difference to dm. That is no excuse though and I do > apologize. > > > > > But I've now queued this patch for once Linus gets back (reverts DM > > changes from commit 47e0fb461f): > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next=c9fdc42ba23eabd1ba7aef199fb9bb4b4fe5c545 > > This patch does two things. > 1/ It removes the BIOSET_NEED_RESCUER flag from biosets created by dm. > This a functional changed over the code from before my patches. > Previously, all biosets were given a rescuer thread. > After my patch set, biosets only got a rescuer thread if > BIOSET_NEED_RESCUER was passed, and it was passed for all biosets. > I then removed it from places were I was certain it wasn't needed. > I didn't remove it from dm because I wasn't certain. Your > patch does remove the flags, which I think is incorrect - see below. > > 2/ It changes flush_current_bio_list() so that bios allocated from a >bioset that does not have a rescue_workqueue are now added to >the ->rescue_list for their bio_set, and ->rescue_work is queued >on the NULL ->rescue_workqueue, resulting in a NULL dereference. >I suspect you don't want this. > > The patch description claims that the patch fixes something, but it > isn't clear to me what it is meant to be fixing. > > It makes reference to dbba42d8 which is described as removing an unused > bioset process, though what it actually does is remove an used bioset > (and obvious the process disappears with it). My patch doesn't change > that behavior. Well I looked at this because Zdenek reported that with more recent kernels he is seeing the "bioset" per DM device again (whereas it was thought to be removed with mikulas' commit dbba42d8 -- but that commit removed "bioset" only in terms of q->bio_split. Looks like the q->bio_split bioset is created with BIOSET_NEED_RESCUER but DM calls bioset_free() for q->bio_split. Strikes me as odd that "bioset" was removed DM devices until it recently reappeared. Especially if what you say is accurate (that BIOSET_NEED_RESCUER was implied with the old code.. I believe you!) I tried to quickly answer how "bioset" is now re-appearing for DM devices but I obviously need to put more time to it. (And my goodness does this bioset rescue workqueue need a better name than "bioset"! Should include the bdevname too) > > As is, it is my understanding that DM has no need for a bio_set's > > rescue_workqueue. So its removal would appear to only be gated by > > bcache? > > > > But I could be mistaken, moving forward please let me know what you > > feel needs doing in DM to make it a better citizen. > > I think you are mistaken. > Please see >https://www.redhat.com/archives/dm-devel/2017-August/msg00310.html > and >https://www.redhat.com/archives/dm-devel/2017-August/msg00315.html > for which the thread continues: >https://www.redhat.com/archives/dm-devel/2017-September/msg1.html > > These were sent to you, though most of the conversation happened with > Mikulas. > > I think that the patches in those threads explain why dm currently needs > rescuer threads, and shows how dm can be changed to no longer need the > rescuer. I would appreciate your thoughts on these patches. I can > resend them if that would help. No need to resend. I'll work through the old threads. > That would then just leave bcache I find it a bit of a challenge to > reason about the code in bcache, but if we can remove > BIOSET_NEED_RESCUER from dm, that will be an extra incentive for me to learn > :-) I'm all for properly removing BIOSET_NEED_RESCUER from DM. I'll look closer at all of this in the morning (for now I'm backing the patch I referenced out from
Re: [PATCH 07/16] x86/asm: Move SYSENTER_stack to the beginning of struct tss_struct
On Mon, Nov 20, 2017 at 3:37 PM, Thomas Gleixnerwrote: > On Mon, 20 Nov 2017, Andy Lutomirski wrote: >> struct tss_struct { >> /* >> - * The hardware state: >> + * Space for the temporary SYSENTER stack. Used for the entry >> + * trampoline as well. Size it such that tss_struct ends up >> + * as a multiple of PAGE_SIZE. This calculation assumes that >> + * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one >> + * long. > > I don't see how sizeof(tss_struct) is a multiple of PAGE_SIZE > > canary =8 > stack = 512 > hw_tss = 104 > io bitmap = 8200 > - > 8824 > > The alignment is what blows it up to 3 * PAGE_SIZE Whoops! That *was* correct in the RFC version version, but I changed it and failed to fix the comment.
Re: [PATCH 07/16] x86/asm: Move SYSENTER_stack to the beginning of struct tss_struct
On Mon, Nov 20, 2017 at 3:37 PM, Thomas Gleixner wrote: > On Mon, 20 Nov 2017, Andy Lutomirski wrote: >> struct tss_struct { >> /* >> - * The hardware state: >> + * Space for the temporary SYSENTER stack. Used for the entry >> + * trampoline as well. Size it such that tss_struct ends up >> + * as a multiple of PAGE_SIZE. This calculation assumes that >> + * io_bitmap is a multiple of PAGE_SIZE (8192 bytes) plus one >> + * long. > > I don't see how sizeof(tss_struct) is a multiple of PAGE_SIZE > > canary =8 > stack = 512 > hw_tss = 104 > io bitmap = 8200 > - > 8824 > > The alignment is what blows it up to 3 * PAGE_SIZE Whoops! That *was* correct in the RFC version version, but I changed it and failed to fix the comment.
Re: [PATCH v4] iio : Add cm3218 smbus ara and acpi support
On 20/11/2017 18:57, Mika Westerberg wrote: +Jarkko On Sun, Nov 19, 2017 at 04:35:51PM +, Jonathan Cameron wrote: On Thu, 2 Nov 2017 16:04:07 +0100 Wolfram Sangwrote: On Thu, Nov 02, 2017 at 02:35:50PM +, Jonathan Cameron wrote: On Fri, 27 Oct 2017 18:27:02 +0200 Marc CAPDEVILLE wrote: On asus T100, Capella cm3218 chip is implemented as ambiant light sensor. This chip expose an smbus ARA protocol device on standard address 0x0c. The chip is not functional before all alerts are acknowledged. On asus T100, this device is enumerated on ACPI bus and the description give tow I2C connection. The first is the connection to the ARA device and the second gives the real address of the device. So, on device probe,If the i2c address is the ARA address and the device is enumerated via acpi, we lookup for the real address in the ACPI resource list and change it in the client structure. if an interrupt resource is given, and only for cm3218 chip, we declare an smbus_alert device. Signed-off-by: Marc CAPDEVILLE Wolfram - this needs input from you on how to neatly handle an ACPI registered ARA. ACPI is really not my field. Try asking the I2C ACPI maintainers or Benjamin Tissoires who did work on SMBus interrupts recently. Hi Mika, Benjamin, So we've lost most of the context in this thread, but the basic question is how to handle smbus ARA support with ACPI. https://patchwork.kernel.org/patch/10030309/ Has the proposal made in this driver which is really not terribly nice (as it registers the ARA device by messing with the address and registering a second device). As I understood it the ARA device registration should be handled by the i2c master, but there are very few examples. Phil pointed out that equivalent OF support recently got taken from him.. https://www.spinics.net/lists/devicetree/msg191947.html https://www.spinics.net/lists/linux-i2c/msg31173.html Any thoughts on the right way to do this? There does not seem to be any way in ACPI to tell which "connection" is used to describe ARA so that part currently is something each driver needs to handle as they know the device the best. I don't think we have any means to handle it in generic way in I2C core except to provide some helpers that work on top of i2c_setup_smbus_alert() but understand ACPI resources. Say provide function like this: int acpi_i2c_setup_smbus_alert(struct i2c_adapter *adapter, int index); Which then extracts automatically I2cSerialBus connection from "index" and calls i2c_setup_smbus_alert() accordingly. In the long run we could introduce _DSD property that can be used to name the connection in the same way DT does; Name (_CRS, ResourceTemplate () { I2cSerialBus () { ... } // ARA I2cSerialBus () { ... } // normal device address }) Name (_DSD, Package () { ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), Package () { Package () {"smbus_alert", 0} // Where 0 means the first I2cSerialBus ... } }) I wonder if it's worth involving the smbus_alert driver in this case. The cm3218 driver doesn't appear to be using the alert callback in strcut i2c_driver. So the smbus_alert driver is not going to notifiy the cm3218 driver. Are there more than one alert/ara capable devices on the bus? Perhaps a workaround in this case is if that acpi entry is defined the cm3218 driver handles that ara request directly to clear the interrupt. -- Regards Phil Reid
Re: [PATCH v4] iio : Add cm3218 smbus ara and acpi support
On 20/11/2017 18:57, Mika Westerberg wrote: +Jarkko On Sun, Nov 19, 2017 at 04:35:51PM +, Jonathan Cameron wrote: On Thu, 2 Nov 2017 16:04:07 +0100 Wolfram Sang wrote: On Thu, Nov 02, 2017 at 02:35:50PM +, Jonathan Cameron wrote: On Fri, 27 Oct 2017 18:27:02 +0200 Marc CAPDEVILLE wrote: On asus T100, Capella cm3218 chip is implemented as ambiant light sensor. This chip expose an smbus ARA protocol device on standard address 0x0c. The chip is not functional before all alerts are acknowledged. On asus T100, this device is enumerated on ACPI bus and the description give tow I2C connection. The first is the connection to the ARA device and the second gives the real address of the device. So, on device probe,If the i2c address is the ARA address and the device is enumerated via acpi, we lookup for the real address in the ACPI resource list and change it in the client structure. if an interrupt resource is given, and only for cm3218 chip, we declare an smbus_alert device. Signed-off-by: Marc CAPDEVILLE Wolfram - this needs input from you on how to neatly handle an ACPI registered ARA. ACPI is really not my field. Try asking the I2C ACPI maintainers or Benjamin Tissoires who did work on SMBus interrupts recently. Hi Mika, Benjamin, So we've lost most of the context in this thread, but the basic question is how to handle smbus ARA support with ACPI. https://patchwork.kernel.org/patch/10030309/ Has the proposal made in this driver which is really not terribly nice (as it registers the ARA device by messing with the address and registering a second device). As I understood it the ARA device registration should be handled by the i2c master, but there are very few examples. Phil pointed out that equivalent OF support recently got taken from him.. https://www.spinics.net/lists/devicetree/msg191947.html https://www.spinics.net/lists/linux-i2c/msg31173.html Any thoughts on the right way to do this? There does not seem to be any way in ACPI to tell which "connection" is used to describe ARA so that part currently is something each driver needs to handle as they know the device the best. I don't think we have any means to handle it in generic way in I2C core except to provide some helpers that work on top of i2c_setup_smbus_alert() but understand ACPI resources. Say provide function like this: int acpi_i2c_setup_smbus_alert(struct i2c_adapter *adapter, int index); Which then extracts automatically I2cSerialBus connection from "index" and calls i2c_setup_smbus_alert() accordingly. In the long run we could introduce _DSD property that can be used to name the connection in the same way DT does; Name (_CRS, ResourceTemplate () { I2cSerialBus () { ... } // ARA I2cSerialBus () { ... } // normal device address }) Name (_DSD, Package () { ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), Package () { Package () {"smbus_alert", 0} // Where 0 means the first I2cSerialBus ... } }) I wonder if it's worth involving the smbus_alert driver in this case. The cm3218 driver doesn't appear to be using the alert callback in strcut i2c_driver. So the smbus_alert driver is not going to notifiy the cm3218 driver. Are there more than one alert/ara capable devices on the bus? Perhaps a workaround in this case is if that acpi entry is defined the cm3218 driver handles that ara request directly to clear the interrupt. -- Regards Phil Reid
Re: [PATCH 04/16] x86/fixmap: Generalize the GDT fixmap mechanism
On Mon, Nov 20, 2017 at 2:01 PM, Thomas Gleixnerwrote: > On Mon, 20 Nov 2017, Andy Lutomirski wrote: > > Just a few nits. > >> /* Provide the fixmap address of the remapped GDT */ >> static inline struct desc_struct *get_cpu_gdt_ro(int cpu) >> { >> - unsigned int idx = get_cpu_gdt_ro_index(cpu); >> - return (struct desc_struct *)__fix_to_virt(idx); >> + return (struct desc_struct *)_cpu_entry_area(cpu)->gdt; >> } >> >> /* Provide the current read-only GDT */ >> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >> index b0c505fe9a95..5ff8e251772a 100644 >> --- a/arch/x86/include/asm/fixmap.h >> +++ b/arch/x86/include/asm/fixmap.h >> @@ -44,6 +44,17 @@ extern unsigned long __FIXADDR_TOP; >>PAGE_SIZE) >> #endif >> >> +/* >> + * cpu_entry_area is a percpu region in the fixmap that contains things >> + * need by the CPU and early entry/exit code. Real types aren't used here > > s/need/needed/ Yeah, fixed > >> + * to avoid circular header dependencies. > > :( Hmm. I could probably fix this, but it involves (at least) moving a struct definition and adding several new includes, and I'm not sure it'll actually converge to something working. > >> + */ >> +struct cpu_entry_area >> +{ >> + char gdt[PAGE_SIZE]; >> +}; >> + >> +#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) > >> +static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int >> page) >> +{ >> + BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); >> + >> + return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page; >> +} >> + >> +#define __get_cpu_entry_area_offset_index(cpu, offset) ({\ >> + BUILD_BUG_ON(offset % PAGE_SIZE != 0); \ >> + __get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE); \ >> + }) >> + >> +#define get_cpu_entry_area_index(cpu, field) \ >> + __get_cpu_entry_area_offset_index((cpu), offsetof(struct >> cpu_entry_area, field)) > > Any reason why those need to be macros? The former is a macro because I doubt that BUILD_BUG_ON is valid in that context in a function. The latter is a macro because 'field' is a name, not a value. > >> +static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) >> +{ >> + return (struct cpu_entry_area *) >> + __fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); >> +} >> + > > Thanks, > > tglx >
Re: [PATCH 04/16] x86/fixmap: Generalize the GDT fixmap mechanism
On Mon, Nov 20, 2017 at 2:01 PM, Thomas Gleixner wrote: > On Mon, 20 Nov 2017, Andy Lutomirski wrote: > > Just a few nits. > >> /* Provide the fixmap address of the remapped GDT */ >> static inline struct desc_struct *get_cpu_gdt_ro(int cpu) >> { >> - unsigned int idx = get_cpu_gdt_ro_index(cpu); >> - return (struct desc_struct *)__fix_to_virt(idx); >> + return (struct desc_struct *)_cpu_entry_area(cpu)->gdt; >> } >> >> /* Provide the current read-only GDT */ >> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h >> index b0c505fe9a95..5ff8e251772a 100644 >> --- a/arch/x86/include/asm/fixmap.h >> +++ b/arch/x86/include/asm/fixmap.h >> @@ -44,6 +44,17 @@ extern unsigned long __FIXADDR_TOP; >>PAGE_SIZE) >> #endif >> >> +/* >> + * cpu_entry_area is a percpu region in the fixmap that contains things >> + * need by the CPU and early entry/exit code. Real types aren't used here > > s/need/needed/ Yeah, fixed > >> + * to avoid circular header dependencies. > > :( Hmm. I could probably fix this, but it involves (at least) moving a struct definition and adding several new includes, and I'm not sure it'll actually converge to something working. > >> + */ >> +struct cpu_entry_area >> +{ >> + char gdt[PAGE_SIZE]; >> +}; >> + >> +#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) > >> +static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int >> page) >> +{ >> + BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); >> + >> + return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page; >> +} >> + >> +#define __get_cpu_entry_area_offset_index(cpu, offset) ({\ >> + BUILD_BUG_ON(offset % PAGE_SIZE != 0); \ >> + __get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE); \ >> + }) >> + >> +#define get_cpu_entry_area_index(cpu, field) \ >> + __get_cpu_entry_area_offset_index((cpu), offsetof(struct >> cpu_entry_area, field)) > > Any reason why those need to be macros? The former is a macro because I doubt that BUILD_BUG_ON is valid in that context in a function. The latter is a macro because 'field' is a name, not a value. > >> +static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) >> +{ >> + return (struct cpu_entry_area *) >> + __fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); >> +} >> + > > Thanks, > > tglx >
Re: [PATCH] ACPI / battery: add quirk for Asus GL502VSK and UX305LA
On Friday, September 22, 2017 10:27:44 AM CET Kai-Heng Feng wrote: > On Asus GL502VSK and UX305LA, ACPI incorrectly reports discharging when > battery is full and AC is plugged. > > However rate_now is correct under this circumstance, hence we can use > "rate_now == 0" as a predicate to report battery full status correctly. > > BugLink: https://bugs.launchpad.net/bugs/1482390 > Signed-off-by: Kai-Heng Feng> --- > drivers/acpi/battery.c | 29 - > 1 file changed, 28 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c > index 13e7b56e33ae..f9f008cf3da7 100644 > --- a/drivers/acpi/battery.c > +++ b/drivers/acpi/battery.c > @@ -70,6 +70,7 @@ static async_cookie_t async_cookie; > static bool battery_driver_registered; > static int battery_bix_broken_package; > static int battery_notification_delay_ms; > +static int battery_full_discharging; > static unsigned int cache_time = 1000; > module_param(cache_time, uint, 0644); > MODULE_PARM_DESC(cache_time, "cache time in milliseconds"); > @@ -214,7 +215,10 @@ static int acpi_battery_get_property(struct power_supply > *psy, > return -ENODEV; > switch (psp) { > case POWER_SUPPLY_PROP_STATUS: > - if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) > + if (battery_full_discharging && battery->rate_now == 0 && > + battery->state & ACPI_BATTERY_STATE_DISCHARGING) > + val->intval = POWER_SUPPLY_STATUS_FULL; > + else if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) You are adding a special sub-case to the (battery->state & ACPI_BATTERY_STATE_DISCHARGING) case, so it would be cleaner to do that in the following way: if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) { if (battery_full_discharging && battery->rate_now == 0) val->intval = POWER_SUPPLY_STATUS_FULL; else val->intval = POWER_SUPPLY_STATUS_DISCHARGING; } else if (battery->state & ACPI_BATTERY_STATE_CHARGING) { val->intval = POWER_SUPPLY_STATUS_CHARGING; } > val->intval = POWER_SUPPLY_STATUS_DISCHARGING; > else if (battery->state & ACPI_BATTERY_STATE_CHARGING) > val->intval = POWER_SUPPLY_STATUS_CHARGING; > @@ -1166,6 +1170,13 @@ battery_notification_delay_quirk(const struct > dmi_system_id *d) > return 0; > } > > +static int __init > +battery_full_discharging_quirk(const struct dmi_system_id *d) > +{ > + battery_full_discharging = 1; > + return 0; > +} > + > static const struct dmi_system_id bat_dmi_table[] __initconst = { > { > .callback = battery_bix_broken_package_quirk, > @@ -1183,6 +1194,22 @@ static const struct dmi_system_id bat_dmi_table[] > __initconst = { > DMI_MATCH(DMI_PRODUCT_NAME, "Aspire V5-573G"), > }, > }, > + { > + .callback = battery_full_discharging_quirk, > + .ident = "ASUS GL502VSK", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_PRODUCT_NAME, "GL502VSK"), > + }, > + }, > + { > + .callback = battery_full_discharging_quirk, > + .ident = "ASUS UX305LA", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_PRODUCT_NAME, "UX305LA"), > + }, > + }, > {}, > }; > > Thanks, Rafael
Re: [PATCH] ACPI / battery: add quirk for Asus GL502VSK and UX305LA
On Friday, September 22, 2017 10:27:44 AM CET Kai-Heng Feng wrote: > On Asus GL502VSK and UX305LA, ACPI incorrectly reports discharging when > battery is full and AC is plugged. > > However rate_now is correct under this circumstance, hence we can use > "rate_now == 0" as a predicate to report battery full status correctly. > > BugLink: https://bugs.launchpad.net/bugs/1482390 > Signed-off-by: Kai-Heng Feng > --- > drivers/acpi/battery.c | 29 - > 1 file changed, 28 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c > index 13e7b56e33ae..f9f008cf3da7 100644 > --- a/drivers/acpi/battery.c > +++ b/drivers/acpi/battery.c > @@ -70,6 +70,7 @@ static async_cookie_t async_cookie; > static bool battery_driver_registered; > static int battery_bix_broken_package; > static int battery_notification_delay_ms; > +static int battery_full_discharging; > static unsigned int cache_time = 1000; > module_param(cache_time, uint, 0644); > MODULE_PARM_DESC(cache_time, "cache time in milliseconds"); > @@ -214,7 +215,10 @@ static int acpi_battery_get_property(struct power_supply > *psy, > return -ENODEV; > switch (psp) { > case POWER_SUPPLY_PROP_STATUS: > - if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) > + if (battery_full_discharging && battery->rate_now == 0 && > + battery->state & ACPI_BATTERY_STATE_DISCHARGING) > + val->intval = POWER_SUPPLY_STATUS_FULL; > + else if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) You are adding a special sub-case to the (battery->state & ACPI_BATTERY_STATE_DISCHARGING) case, so it would be cleaner to do that in the following way: if (battery->state & ACPI_BATTERY_STATE_DISCHARGING) { if (battery_full_discharging && battery->rate_now == 0) val->intval = POWER_SUPPLY_STATUS_FULL; else val->intval = POWER_SUPPLY_STATUS_DISCHARGING; } else if (battery->state & ACPI_BATTERY_STATE_CHARGING) { val->intval = POWER_SUPPLY_STATUS_CHARGING; } > val->intval = POWER_SUPPLY_STATUS_DISCHARGING; > else if (battery->state & ACPI_BATTERY_STATE_CHARGING) > val->intval = POWER_SUPPLY_STATUS_CHARGING; > @@ -1166,6 +1170,13 @@ battery_notification_delay_quirk(const struct > dmi_system_id *d) > return 0; > } > > +static int __init > +battery_full_discharging_quirk(const struct dmi_system_id *d) > +{ > + battery_full_discharging = 1; > + return 0; > +} > + > static const struct dmi_system_id bat_dmi_table[] __initconst = { > { > .callback = battery_bix_broken_package_quirk, > @@ -1183,6 +1194,22 @@ static const struct dmi_system_id bat_dmi_table[] > __initconst = { > DMI_MATCH(DMI_PRODUCT_NAME, "Aspire V5-573G"), > }, > }, > + { > + .callback = battery_full_discharging_quirk, > + .ident = "ASUS GL502VSK", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_PRODUCT_NAME, "GL502VSK"), > + }, > + }, > + { > + .callback = battery_full_discharging_quirk, > + .ident = "ASUS UX305LA", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), > + DMI_MATCH(DMI_PRODUCT_NAME, "UX305LA"), > + }, > + }, > {}, > }; > > Thanks, Rafael
Re: [alsa-devel] [PATCH] ASoC: fsl_ssi: call _fsl_ssi_set_dai_fmt() just once in AC'97 mode
On Tue, Nov 21, 2017 at 01:32:09AM +0100, Maciej S. Szmigiero wrote: > On 21.11.2017 01:00, Nicolin Chen wrote: > > On Mon, Nov 20, 2017 at 11:13:45PM +0100, Maciej S. Szmigiero wrote: > (..) > >> We need to make sure, however, that only proper channel slots are enabled > >> at playback start time since some AC'97 CODECs (like VT1613) were observed > >> requesting via SLOTREQ (and so enabling at SSI) spurious ones just after > >> an AC'97 link is started but before the CODEC is configured by its driver. > > > > I don't really understand this part. Why do we need to *make sure* > > and set SACCDIS and SACCEN again since they're initialized already> > > Could you please elaborate a bit more? > > SACCDIS and SACCEN aren't just normal registers. > They are a way to disable or enable bits in SACCST register - writing > a '1' bit at some position in SACCDIS unsets the relevant bit in SACCST, > writing a '1' bit at some position in SACCEN sets the relevant bit in > SACCST. > > But a bit in SACCST can also be set (and so an AC'97 channel slot > enabled) if a CODEC sets the relevant bit in its SLOTREQ request (it is > enough if this bit was set in SLOTREQ just once, bits in SACCST are > 'sticky'). This makes sense now. It could be mentioned a bit in the commit log as well. > If an extra slot gets enabled that's a disaster for playback because some > of normal left or right channel samples are instead redirected to this > extra slot. > > Unfortunately, the VT1613 codec on UDOO board (which is the only user of > fsl_ssi driver in the AC'97 mode) is a bit broken and likes to (sometimes) > set extra bits in its SLOTREQ request - I've confirmed this on two boards > bought a few months apart directly from UDOO shop. > > A workaround implemented in fsl-asoc-card of setting an appropriate > CODEC register so that slots 3/4 (the normal stereo playback slots) are > used for S/PDIF seems to mostly fix this issue but since this codec is so > untrustworthy then: If this move is also a workaround, probably it'd be better to have an fsl_ssi_ac97_xxx_war() function with some comments/descriptions, and include it in the config(). Since it doesn't hurt to set these registers again, I am personally fine with that. But it needs to be clear -- otherwise, people may try to remove it later with a change like: Removing redundant SACCEN/SACCDIS settings since they're done during probe().
Re: [alsa-devel] [PATCH] ASoC: fsl_ssi: call _fsl_ssi_set_dai_fmt() just once in AC'97 mode
On Tue, Nov 21, 2017 at 01:32:09AM +0100, Maciej S. Szmigiero wrote: > On 21.11.2017 01:00, Nicolin Chen wrote: > > On Mon, Nov 20, 2017 at 11:13:45PM +0100, Maciej S. Szmigiero wrote: > (..) > >> We need to make sure, however, that only proper channel slots are enabled > >> at playback start time since some AC'97 CODECs (like VT1613) were observed > >> requesting via SLOTREQ (and so enabling at SSI) spurious ones just after > >> an AC'97 link is started but before the CODEC is configured by its driver. > > > > I don't really understand this part. Why do we need to *make sure* > > and set SACCDIS and SACCEN again since they're initialized already> > > Could you please elaborate a bit more? > > SACCDIS and SACCEN aren't just normal registers. > They are a way to disable or enable bits in SACCST register - writing > a '1' bit at some position in SACCDIS unsets the relevant bit in SACCST, > writing a '1' bit at some position in SACCEN sets the relevant bit in > SACCST. > > But a bit in SACCST can also be set (and so an AC'97 channel slot > enabled) if a CODEC sets the relevant bit in its SLOTREQ request (it is > enough if this bit was set in SLOTREQ just once, bits in SACCST are > 'sticky'). This makes sense now. It could be mentioned a bit in the commit log as well. > If an extra slot gets enabled that's a disaster for playback because some > of normal left or right channel samples are instead redirected to this > extra slot. > > Unfortunately, the VT1613 codec on UDOO board (which is the only user of > fsl_ssi driver in the AC'97 mode) is a bit broken and likes to (sometimes) > set extra bits in its SLOTREQ request - I've confirmed this on two boards > bought a few months apart directly from UDOO shop. > > A workaround implemented in fsl-asoc-card of setting an appropriate > CODEC register so that slots 3/4 (the normal stereo playback slots) are > used for S/PDIF seems to mostly fix this issue but since this codec is so > untrustworthy then: If this move is also a workaround, probably it'd be better to have an fsl_ssi_ac97_xxx_war() function with some comments/descriptions, and include it in the config(). Since it doesn't hurt to set these registers again, I am personally fine with that. But it needs to be clear -- otherwise, people may try to remove it later with a change like: Removing redundant SACCEN/SACCDIS settings since they're done during probe().
Re: [PATCH v2] mm: show total hugetlb memory consumption in /proc/meminfo
On 11/20/2017 04:51 PM, Andrew Morton wrote: > On Wed, 15 Nov 2017 23:14:09 + Roman Gushchinwrote: > >> Currently we display some hugepage statistics (total, free, etc) >> in /proc/meminfo, but only for default hugepage size (e.g. 2Mb). >> >> If hugepages of different sizes are used (like 2Mb and 1Gb on x86-64), >> /proc/meminfo output can be confusing, as non-default sized hugepages >> are not reflected at all, and there are no signs that they are >> existing and consuming system memory. >> >> To solve this problem, let's display the total amount of memory, >> consumed by hugetlb pages of all sized (both free and used). >> Let's call it "Hugetlb", and display size in kB to match generic >> /proc/meminfo style. >> >> For example, (1024 2Mb pages and 2 1Gb pages are pre-allocated): >> $ cat /proc/meminfo >> MemTotal:8168984 kB >> MemFree: 3789276 kB >> <...> >> CmaFree: 0 kB >> HugePages_Total:1024 >> HugePages_Free: 1024 >> HugePages_Rsvd:0 >> HugePages_Surp:0 >> Hugepagesize: 2048 kB >> Hugetlb: 4194304 kB >> DirectMap4k: 32632 kB >> DirectMap2M: 4161536 kB >> DirectMap1G: 6291456 kB >> >> Also, this patch updates corresponding docs to reflect >> Hugetlb entry meaning and difference between Hugetlb and >> HugePages_Total * Hugepagesize. >> >> ... >> >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2973,20 +2973,32 @@ int hugetlb_overcommit_handler(struct ctl_table >> *table, int write, >> >> void hugetlb_report_meminfo(struct seq_file *m) >> { >> -struct hstate *h = _hstate; >> +struct hstate *h; >> +unsigned long total = 0; >> + >> if (!hugepages_supported()) >> return; >> -seq_printf(m, >> -"HugePages_Total: %5lu\n" >> -"HugePages_Free:%5lu\n" >> -"HugePages_Rsvd:%5lu\n" >> -"HugePages_Surp:%5lu\n" >> -"Hugepagesize: %8lu kB\n", >> -h->nr_huge_pages, >> -h->free_huge_pages, >> -h->resv_huge_pages, >> -h->surplus_huge_pages, >> -1UL << (huge_page_order(h) + PAGE_SHIFT - 10)); >> + >> +for_each_hstate(h) { >> +unsigned long count = h->nr_huge_pages; >> + >> +total += (PAGE_SIZE << huge_page_order(h)) * count; >> + >> +if (h == _hstate) > > I'm not understanding this test. Are we assuming that default_hstate > always refers to the highest-index hstate? If so why, and is that > valid? Actually default_hstate is defined as: #define default_hstate (hstates[default_hstate_idx]) default_hstate_idx is set during hugetlb_init based upon default_hstate_size which defaults to HPAGE_SIZE. However, it can be overridden by the kernel command line argument "default_hugepagesz=". By definition and history /proc/meminfo lists information on the default huge page size. This code is looping through all hstates to get the total memory consumed by hugetlb pages for the new "Hugetlb" field. When it gets to the default huge page size, it prints the historic fields. Hope that helps, -- Mike Kravetz > >> +seq_printf(m, >> + "HugePages_Total: %5lu\n" >> + "HugePages_Free:%5lu\n" >> + "HugePages_Rsvd:%5lu\n" >> + "HugePages_Surp:%5lu\n" >> + "Hugepagesize: %8lu kB\n", >> + count, >> + h->free_huge_pages, >> + h->resv_huge_pages, >> + h->surplus_huge_pages, >> + (PAGE_SIZE << huge_page_order(h)) / 1024); >> +} >> + >> +seq_printf(m, "Hugetlb:%8lu kB\n", total / 1024); >> } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org >
Re: [PATCH v2] mm: show total hugetlb memory consumption in /proc/meminfo
On 11/20/2017 04:51 PM, Andrew Morton wrote: > On Wed, 15 Nov 2017 23:14:09 + Roman Gushchin wrote: > >> Currently we display some hugepage statistics (total, free, etc) >> in /proc/meminfo, but only for default hugepage size (e.g. 2Mb). >> >> If hugepages of different sizes are used (like 2Mb and 1Gb on x86-64), >> /proc/meminfo output can be confusing, as non-default sized hugepages >> are not reflected at all, and there are no signs that they are >> existing and consuming system memory. >> >> To solve this problem, let's display the total amount of memory, >> consumed by hugetlb pages of all sized (both free and used). >> Let's call it "Hugetlb", and display size in kB to match generic >> /proc/meminfo style. >> >> For example, (1024 2Mb pages and 2 1Gb pages are pre-allocated): >> $ cat /proc/meminfo >> MemTotal:8168984 kB >> MemFree: 3789276 kB >> <...> >> CmaFree: 0 kB >> HugePages_Total:1024 >> HugePages_Free: 1024 >> HugePages_Rsvd:0 >> HugePages_Surp:0 >> Hugepagesize: 2048 kB >> Hugetlb: 4194304 kB >> DirectMap4k: 32632 kB >> DirectMap2M: 4161536 kB >> DirectMap1G: 6291456 kB >> >> Also, this patch updates corresponding docs to reflect >> Hugetlb entry meaning and difference between Hugetlb and >> HugePages_Total * Hugepagesize. >> >> ... >> >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2973,20 +2973,32 @@ int hugetlb_overcommit_handler(struct ctl_table >> *table, int write, >> >> void hugetlb_report_meminfo(struct seq_file *m) >> { >> -struct hstate *h = _hstate; >> +struct hstate *h; >> +unsigned long total = 0; >> + >> if (!hugepages_supported()) >> return; >> -seq_printf(m, >> -"HugePages_Total: %5lu\n" >> -"HugePages_Free:%5lu\n" >> -"HugePages_Rsvd:%5lu\n" >> -"HugePages_Surp:%5lu\n" >> -"Hugepagesize: %8lu kB\n", >> -h->nr_huge_pages, >> -h->free_huge_pages, >> -h->resv_huge_pages, >> -h->surplus_huge_pages, >> -1UL << (huge_page_order(h) + PAGE_SHIFT - 10)); >> + >> +for_each_hstate(h) { >> +unsigned long count = h->nr_huge_pages; >> + >> +total += (PAGE_SIZE << huge_page_order(h)) * count; >> + >> +if (h == _hstate) > > I'm not understanding this test. Are we assuming that default_hstate > always refers to the highest-index hstate? If so why, and is that > valid? Actually default_hstate is defined as: #define default_hstate (hstates[default_hstate_idx]) default_hstate_idx is set during hugetlb_init based upon default_hstate_size which defaults to HPAGE_SIZE. However, it can be overridden by the kernel command line argument "default_hugepagesz=". By definition and history /proc/meminfo lists information on the default huge page size. This code is looping through all hstates to get the total memory consumed by hugetlb pages for the new "Hugetlb" field. When it gets to the default huge page size, it prints the historic fields. Hope that helps, -- Mike Kravetz > >> +seq_printf(m, >> + "HugePages_Total: %5lu\n" >> + "HugePages_Free:%5lu\n" >> + "HugePages_Rsvd:%5lu\n" >> + "HugePages_Surp:%5lu\n" >> + "Hugepagesize: %8lu kB\n", >> + count, >> + h->free_huge_pages, >> + h->resv_huge_pages, >> + h->surplus_huge_pages, >> + (PAGE_SIZE << huge_page_order(h)) / 1024); >> +} >> + >> +seq_printf(m, "Hugetlb:%8lu kB\n", total / 1024); >> } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org >
Re: [PATCH] devres: use MACRO instead of function for devm_ioremap_nocache
On 2017/11/20 17:32, Yisheng Xie wrote: > Default ioremap is ioremap_nocache, so devm_ioremap has the same function > with devm_ioremap_nocache, which may just be killed. However, there > are many places which use devm_ioremap_nocache instead of devm_ioremap. > > This patch is to use MACRO for devm_ioremap_nocache, which will reduce > the size of devres.o from 206824 Bytes to 203912 Bytes. Oh, maybe it is better to let devm_ioremap use MACRO, and user may more clear to know that devm_ioremap is default is devm_ioremap*_nocache*. Any suggestions? Thanks Yisheng Xie > > Signed-off-by: Yisheng Xie> --- > include/linux/io.h | 3 +-- > lib/devres.c | 29 - > 2 files changed, 1 insertion(+), 31 deletions(-) > > diff --git a/include/linux/io.h b/include/linux/io.h > index 32e30e8..1516ccd 100644 > --- a/include/linux/io.h > +++ b/include/linux/io.h > @@ -75,14 +75,13 @@ static inline void devm_ioport_unmap(struct device *dev, > void __iomem *addr) > > void __iomem *devm_ioremap(struct device *dev, resource_size_t offset, > resource_size_t size); > -void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t > offset, > -resource_size_t size); > void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset, > resource_size_t size); > void devm_iounmap(struct device *dev, void __iomem *addr); > int check_signature(const volatile void __iomem *io_addr, > const unsigned char *signature, int length); > void devm_ioremap_release(struct device *dev, void *res); > +#define devm_ioremap_nocache devm_ioremap > > void *devm_memremap(struct device *dev, resource_size_t offset, > size_t size, unsigned long flags); > diff --git a/lib/devres.c b/lib/devres.c > index 5f2aedd..f818fcf 100644 > --- a/lib/devres.c > +++ b/lib/devres.c > @@ -44,35 +44,6 @@ void __iomem *devm_ioremap(struct device *dev, > resource_size_t offset, > EXPORT_SYMBOL(devm_ioremap); > > /** > - * devm_ioremap_nocache - Managed ioremap_nocache() > - * @dev: Generic device to remap IO address for > - * @offset: Resource address to map > - * @size: Size of map > - * > - * Managed ioremap_nocache(). Map is automatically unmapped on driver > - * detach. > - */ > -void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t > offset, > -resource_size_t size) > -{ > - void __iomem **ptr, *addr; > - > - ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL); > - if (!ptr) > - return NULL; > - > - addr = ioremap_nocache(offset, size); > - if (addr) { > - *ptr = addr; > - devres_add(dev, ptr); > - } else > - devres_free(ptr); > - > - return addr; > -} > -EXPORT_SYMBOL(devm_ioremap_nocache); > - > -/** > * devm_ioremap_wc - Managed ioremap_wc() > * @dev: Generic device to remap IO address for > * @offset: Resource address to map >
Re: [PATCH] devres: use MACRO instead of function for devm_ioremap_nocache
On 2017/11/20 17:32, Yisheng Xie wrote: > Default ioremap is ioremap_nocache, so devm_ioremap has the same function > with devm_ioremap_nocache, which may just be killed. However, there > are many places which use devm_ioremap_nocache instead of devm_ioremap. > > This patch is to use MACRO for devm_ioremap_nocache, which will reduce > the size of devres.o from 206824 Bytes to 203912 Bytes. Oh, maybe it is better to let devm_ioremap use MACRO, and user may more clear to know that devm_ioremap is default is devm_ioremap*_nocache*. Any suggestions? Thanks Yisheng Xie > > Signed-off-by: Yisheng Xie > --- > include/linux/io.h | 3 +-- > lib/devres.c | 29 - > 2 files changed, 1 insertion(+), 31 deletions(-) > > diff --git a/include/linux/io.h b/include/linux/io.h > index 32e30e8..1516ccd 100644 > --- a/include/linux/io.h > +++ b/include/linux/io.h > @@ -75,14 +75,13 @@ static inline void devm_ioport_unmap(struct device *dev, > void __iomem *addr) > > void __iomem *devm_ioremap(struct device *dev, resource_size_t offset, > resource_size_t size); > -void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t > offset, > -resource_size_t size); > void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset, > resource_size_t size); > void devm_iounmap(struct device *dev, void __iomem *addr); > int check_signature(const volatile void __iomem *io_addr, > const unsigned char *signature, int length); > void devm_ioremap_release(struct device *dev, void *res); > +#define devm_ioremap_nocache devm_ioremap > > void *devm_memremap(struct device *dev, resource_size_t offset, > size_t size, unsigned long flags); > diff --git a/lib/devres.c b/lib/devres.c > index 5f2aedd..f818fcf 100644 > --- a/lib/devres.c > +++ b/lib/devres.c > @@ -44,35 +44,6 @@ void __iomem *devm_ioremap(struct device *dev, > resource_size_t offset, > EXPORT_SYMBOL(devm_ioremap); > > /** > - * devm_ioremap_nocache - Managed ioremap_nocache() > - * @dev: Generic device to remap IO address for > - * @offset: Resource address to map > - * @size: Size of map > - * > - * Managed ioremap_nocache(). Map is automatically unmapped on driver > - * detach. > - */ > -void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t > offset, > -resource_size_t size) > -{ > - void __iomem **ptr, *addr; > - > - ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL); > - if (!ptr) > - return NULL; > - > - addr = ioremap_nocache(offset, size); > - if (addr) { > - *ptr = addr; > - devres_add(dev, ptr); > - } else > - devres_free(ptr); > - > - return addr; > -} > -EXPORT_SYMBOL(devm_ioremap_nocache); > - > -/** > * devm_ioremap_wc - Managed ioremap_wc() > * @dev: Generic device to remap IO address for > * @offset: Resource address to map >
Re: [patches] Re: [PATCH] dt-bindings: Add a RISC-V SBI firmware node
On Mon, Nov 20, 2017 at 01:28:01PM -0800, Palmer Dabbelt wrote: [...] > > > +++ b/Documentation/devicetree/bindings/firmware/riscv.sbi.txt > > > > Nit: Other bindings use either a comma (as in the compatible string, > > "riscv,sbi.txt") or a dash (vendor-product.txt, "riscv-sbi.txt") in the > > file name. > > That was just a typo, I'll fix it. Ok > > > @@ -0,0 +1,20 @@ > > > +RISC-V Supervisor Binary Interface (SBI) > > > + > > > +The RISC-V privileged ISA specification mandates the presence of a > > > supervisor > > > +binary interface that performs some operations which might otherwise > > > require > > > +particularly complicated instructions. This interface includes > > > +inter-processor interrupts, TLB flushes, i-cache and TLB shootdowns, a > > > +console, and power management. > > > + > > > +Required properties: > > > +- compatible: must contain one of the following > > > + * "riscv,sbi" for the SBI defined by the privileged specification of the > > > + system. > > > > "of the system" seems to imply that different RISC-V systems (different > > RISC-V implementations) can have different privileged specifications. > > Actually, that was intentional -- I wrote it this way because different > RISC-V systems do have different privileged specifications. The RISC-V > specifications aren't frozen in time, they're just guaranteed to be > compatible in the future. For example, the user ISA document has been > updated multiple times (the C spec, eliminating some unspecified behavior) > and will continue to be updated (V and other extensions, the memory model). > The privileged spec will be updated in a compatible way just like the user > spec will be -- I know there's at least hypervisor support in the works, and > I saw some things to remove undefined behavior go past as well. > > In a similar fashion, the ABI and SBI will continue to evolve. For example, > we'll probably add new system calls to extend the user ABI and new hyper > calls to extend the SBI. My problem with the wording was that the OS somehow has to know which version and variant of the SBI it is talking to -- either through in-band communication (an SBI call to request SBI information, etc.), or through devicetree or similar mechanisms. > > > I think it's better to refer to concrete documents, that don't depend on > > the rest of the system, instead. Either: Thanks, Jonathan Neuschäfer signature.asc Description: PGP signature
Re: [patches] Re: [PATCH] dt-bindings: Add a RISC-V SBI firmware node
On Mon, Nov 20, 2017 at 01:28:01PM -0800, Palmer Dabbelt wrote: [...] > > > +++ b/Documentation/devicetree/bindings/firmware/riscv.sbi.txt > > > > Nit: Other bindings use either a comma (as in the compatible string, > > "riscv,sbi.txt") or a dash (vendor-product.txt, "riscv-sbi.txt") in the > > file name. > > That was just a typo, I'll fix it. Ok > > > @@ -0,0 +1,20 @@ > > > +RISC-V Supervisor Binary Interface (SBI) > > > + > > > +The RISC-V privileged ISA specification mandates the presence of a > > > supervisor > > > +binary interface that performs some operations which might otherwise > > > require > > > +particularly complicated instructions. This interface includes > > > +inter-processor interrupts, TLB flushes, i-cache and TLB shootdowns, a > > > +console, and power management. > > > + > > > +Required properties: > > > +- compatible: must contain one of the following > > > + * "riscv,sbi" for the SBI defined by the privileged specification of the > > > + system. > > > > "of the system" seems to imply that different RISC-V systems (different > > RISC-V implementations) can have different privileged specifications. > > Actually, that was intentional -- I wrote it this way because different > RISC-V systems do have different privileged specifications. The RISC-V > specifications aren't frozen in time, they're just guaranteed to be > compatible in the future. For example, the user ISA document has been > updated multiple times (the C spec, eliminating some unspecified behavior) > and will continue to be updated (V and other extensions, the memory model). > The privileged spec will be updated in a compatible way just like the user > spec will be -- I know there's at least hypervisor support in the works, and > I saw some things to remove undefined behavior go past as well. > > In a similar fashion, the ABI and SBI will continue to evolve. For example, > we'll probably add new system calls to extend the user ABI and new hyper > calls to extend the SBI. My problem with the wording was that the OS somehow has to know which version and variant of the SBI it is talking to -- either through in-band communication (an SBI call to request SBI information, etc.), or through devicetree or similar mechanisms. > > > I think it's better to refer to concrete documents, that don't depend on > > the rest of the system, instead. Either: Thanks, Jonathan Neuschäfer signature.asc Description: PGP signature
Re: [PATCH 2/3] x86/apic: Update the 'apic=' description of setting APIC driver
Hi Randy, At 11/21/2017 01:21 AM, Randy Dunlap wrote: On 11/20/2017 05:27 AM, Dou Liyang wrote: There are two consumers of apic=: the APIC debug level and the low level generic architecture code, but Linux just documented the first one. Append the second description. Signed-off-by: Dou Liyang--- Documentation/admin-guide/kernel-parameters.txt | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 62436bd..fdd2382 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -328,11 +328,14 @@ not play well with APC CPU idle - disable it if you have APC and your system crashes randomly. - apic= [APIC,X86-32] Advanced Programmable Interrupt Controller + apic= [APIC,X86] Advanced Programmable Interrupt Controller Change the output verbosity whilst booting Format: { quiet (default) | verbose | debug } Change the amount of debugging information output when initialising the APIC and IO-APIC components. + If in X86-32, one more function is registering APIC + driver. + Examples: apic=bigsmp, bigsmp is an APIC driver name. How about this? For X86-32, this can also be used to specify an APIC driver name. Format: apic=drivername Example: apic=bigsmp Pretty good! Will use it in the next version. Thanks. dou. apic_extnmi=[APIC,X86] External NMI delivery setting Format: { bsp (default) | all | none }
Re: [PATCH 2/3] x86/apic: Update the 'apic=' description of setting APIC driver
Hi Randy, At 11/21/2017 01:21 AM, Randy Dunlap wrote: On 11/20/2017 05:27 AM, Dou Liyang wrote: There are two consumers of apic=: the APIC debug level and the low level generic architecture code, but Linux just documented the first one. Append the second description. Signed-off-by: Dou Liyang --- Documentation/admin-guide/kernel-parameters.txt | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 62436bd..fdd2382 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -328,11 +328,14 @@ not play well with APC CPU idle - disable it if you have APC and your system crashes randomly. - apic= [APIC,X86-32] Advanced Programmable Interrupt Controller + apic= [APIC,X86] Advanced Programmable Interrupt Controller Change the output verbosity whilst booting Format: { quiet (default) | verbose | debug } Change the amount of debugging information output when initialising the APIC and IO-APIC components. + If in X86-32, one more function is registering APIC + driver. + Examples: apic=bigsmp, bigsmp is an APIC driver name. How about this? For X86-32, this can also be used to specify an APIC driver name. Format: apic=drivername Example: apic=bigsmp Pretty good! Will use it in the next version. Thanks. dou. apic_extnmi=[APIC,X86] External NMI delivery setting Format: { bsp (default) | all | none }
Re: [PATCH] mm, meminit: Serially initialise deferred memory if trace_buf_size is specified
On Fri, 17 Nov 2017 13:19:56 -0500 Pavel Tatashinwrote: > On Thu, Nov 16, 2017 at 5:06 AM, Mel Gorman > wrote: > > 4. Put a check into the page allocator slowpath that triggers serialised > >init if the system is booting and an allocation is about to fail. It > >would be such a cold path that it would never be noticable although it > >would leave dead code in the kernel image once boot had completed > > Hi Mel, > > The forth approach is the best as it is seamless for admins and > engineers, it will also work on any system configuration with any > parameters without any special involvement. Apart from what-mel-said, I'd be concerned that this failsafe would almost never get tested. We should find some way to ensure that this code gets exercised in some people's kernels on a permanent basis and I'm not sure how to do that. One option might be to ask Fengguang to add the occasional test_pavels_stuff=1 to the kernel boot commandline. That's better than nothing but 0-day only runs on a small number of machine types.
Re: [PATCH] mm, meminit: Serially initialise deferred memory if trace_buf_size is specified
On Fri, 17 Nov 2017 13:19:56 -0500 Pavel Tatashin wrote: > On Thu, Nov 16, 2017 at 5:06 AM, Mel Gorman > wrote: > > 4. Put a check into the page allocator slowpath that triggers serialised > >init if the system is booting and an allocation is about to fail. It > >would be such a cold path that it would never be noticable although it > >would leave dead code in the kernel image once boot had completed > > Hi Mel, > > The forth approach is the best as it is seamless for admins and > engineers, it will also work on any system configuration with any > parameters without any special involvement. Apart from what-mel-said, I'd be concerned that this failsafe would almost never get tested. We should find some way to ensure that this code gets exercised in some people's kernels on a permanent basis and I'm not sure how to do that. One option might be to ask Fengguang to add the occasional test_pavels_stuff=1 to the kernel boot commandline. That's better than nothing but 0-day only runs on a small number of machine types.
Re: [v2,12/18] kbuild: add support for clang LTO
On Mon, 20 Nov 2017 12:21:52 -0800 Sami Tolvanenwrote: > On Sat, Nov 18, 2017 at 01:21:39PM +1000, Nicholas Piggin wrote: > > Do you have any kind of numbers for this, out of curiosity? Binary > > size, performance, build time? > > I don't have performance numbers to share. Are there any specific > benchmarks you'd be interested in seeing? Build time typically > increases with LTO and in my experience, binary size tends to increase > by ~10-15% as well. By deduction, then you must see some performance improvement? :) I just wonder are you doing this because there is some worthwhile performance gain? Or to enable more testing and development of LTO? Any clues for why a user would want to enable it. > > > Why is this needed? It would have been nice to get rid of the > > !THIN_ARCHIVES option if you can make the patches work with the thin > > archives paths. > > I believe LLVMgold doesn't know how to deal with an archive of LLVM IR > files, but I can certainly use thin archives as an index and extract > the path names for linking. I'll look into it. Thanks, if you could. Possibly file a request with LLVMgold too, it seems to be that toolchain support for archives is quite strong, so it will be good to keep pushing for that. Thanks, Nick
Re: [v2,12/18] kbuild: add support for clang LTO
On Mon, 20 Nov 2017 12:21:52 -0800 Sami Tolvanen wrote: > On Sat, Nov 18, 2017 at 01:21:39PM +1000, Nicholas Piggin wrote: > > Do you have any kind of numbers for this, out of curiosity? Binary > > size, performance, build time? > > I don't have performance numbers to share. Are there any specific > benchmarks you'd be interested in seeing? Build time typically > increases with LTO and in my experience, binary size tends to increase > by ~10-15% as well. By deduction, then you must see some performance improvement? :) I just wonder are you doing this because there is some worthwhile performance gain? Or to enable more testing and development of LTO? Any clues for why a user would want to enable it. > > > Why is this needed? It would have been nice to get rid of the > > !THIN_ARCHIVES option if you can make the patches work with the thin > > archives paths. > > I believe LLVMgold doesn't know how to deal with an archive of LLVM IR > files, but I can certainly use thin archives as an index and extract > the path names for linking. I'll look into it. Thanks, if you could. Possibly file a request with LLVMgold too, it seems to be that toolchain support for archives is quite strong, so it will be good to keep pushing for that. Thanks, Nick