date:20180605

Re: [RFC Patch 1/3] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support

2018-06-05 Thread Tianyu Lan

Hi Michael:
Thanks for your review.

On 6/6/2018 12:59 AM, Michael Kelley (EOSG) wrote:
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org 
>>  On Behalf
>> Of Tianyu Lan
>> Sent: Monday, June 4, 2018 2:08 AM
>> Cc: Tianyu Lan ; KY Srinivasan 
>> ; Haiyang
>> Zhang ; Stephen Hemminger ;
>> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
>> pbonz...@redhat.com; rkrc...@redhat.com; de...@linuxdriverproject.org; linux-
>> ker...@vger.kernel.org; k...@vger.kernel.org; vkuzn...@redhat.com
>> Subject: [RFC Patch 1/3] X86/Hyper-V: Add flush 
>> HvFlushGuestPhysicalAddressSpace hypercall
>> support
>>
>> Hyper-V provides a pv hypercall HvFlushGuestPhysicalAddressSpace to flush
>> nested VM address space mapping in l1 hypervisor and it's to reduce overhead
>> of flushing ept tlb among vcpus. This patch is to implement it.
>>
>> Signed-off-by: Lan Tianyu 
>> ---
>> diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
>> new file mode 100644
>> index ..17f7c288eccc
>> --- /dev/null
>> +++ b/arch/x86/hyperv/nested.c
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +int hyperv_flush_guest_mapping(u64 as)
>> +{
>> +struct hv_guest_mapping_flush **flush_pcpu;
>> +struct hv_guest_mapping_flush *flush;
>> +u64 status = U64_MAX;
> 
> Initializing status to U64_MAX doesn't seem necessary.
> 
>> +unsigned long flags;
>> +int ret = -EFAULT;
>> +
>> +if (!hv_hypercall_pg)
>> +goto fault;
>> +
>> +local_irq_save(flags);
>> +
>> +flush_pcpu = (struct hv_guest_mapping_flush **)
>> +this_cpu_ptr(hyperv_pcpu_input_arg);
>> +
>> +flush = *flush_pcpu;
>> +
>> +if (unlikely(!flush)) {
>> +local_irq_restore(flags);
>> +goto fault;
>> +}
>> +
>> +flush->address_space = as;
>> +flush->flags = 0;
>> +
>> +status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
>> + flush, NULL);
>> +local_irq_restore(flags);
>> +
>> +if (!(status & HV_HYPERCALL_RESULT_MASK))
>> +ret = 0;
>> +
>> +fault:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
>> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
>> b/arch/x86/include/asm/hyperv-tlfs.h
>> index b8c89265baf0..53bbeb08faea 100644
>> --- a/arch/x86/include/asm/hyperv-tlfs.h
>> +++ b/arch/x86/include/asm/hyperv-tlfs.h
>> @@ -309,6 +309,7 @@ struct ms_hyperv_tsc_page {
>>   #define HV_X64_MSR_REENLIGHTENMENT_CONTROL 0x4106
>>
>>   /* Nested features (CPUID 0x400A) EAX */
>> +#define HV_X64_NESTED_GUSET_MAPPING_FLUSH   BIT(18)
> 
> The #define name is misspelled.  "_GUSET_" should be "_GUEST_".
> And the matching usage in patch 3/3 will need to be updated as well.
> 
> Michael
> 

Nice catch! Will update.

Re: [RFC Patch 1/3] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support

2018-06-05 Thread Tianyu Lan

Hi Michael:
Thanks for your review.

On 6/6/2018 12:59 AM, Michael Kelley (EOSG) wrote:
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org 
>>  On Behalf
>> Of Tianyu Lan
>> Sent: Monday, June 4, 2018 2:08 AM
>> Cc: Tianyu Lan ; KY Srinivasan 
>> ; Haiyang
>> Zhang ; Stephen Hemminger ;
>> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
>> pbonz...@redhat.com; rkrc...@redhat.com; de...@linuxdriverproject.org; linux-
>> ker...@vger.kernel.org; k...@vger.kernel.org; vkuzn...@redhat.com
>> Subject: [RFC Patch 1/3] X86/Hyper-V: Add flush 
>> HvFlushGuestPhysicalAddressSpace hypercall
>> support
>>
>> Hyper-V provides a pv hypercall HvFlushGuestPhysicalAddressSpace to flush
>> nested VM address space mapping in l1 hypervisor and it's to reduce overhead
>> of flushing ept tlb among vcpus. This patch is to implement it.
>>
>> Signed-off-by: Lan Tianyu 
>> ---
>> diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
>> new file mode 100644
>> index ..17f7c288eccc
>> --- /dev/null
>> +++ b/arch/x86/hyperv/nested.c
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +int hyperv_flush_guest_mapping(u64 as)
>> +{
>> +struct hv_guest_mapping_flush **flush_pcpu;
>> +struct hv_guest_mapping_flush *flush;
>> +u64 status = U64_MAX;
> 
> Initializing status to U64_MAX doesn't seem necessary.
> 
>> +unsigned long flags;
>> +int ret = -EFAULT;
>> +
>> +if (!hv_hypercall_pg)
>> +goto fault;
>> +
>> +local_irq_save(flags);
>> +
>> +flush_pcpu = (struct hv_guest_mapping_flush **)
>> +this_cpu_ptr(hyperv_pcpu_input_arg);
>> +
>> +flush = *flush_pcpu;
>> +
>> +if (unlikely(!flush)) {
>> +local_irq_restore(flags);
>> +goto fault;
>> +}
>> +
>> +flush->address_space = as;
>> +flush->flags = 0;
>> +
>> +status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
>> + flush, NULL);
>> +local_irq_restore(flags);
>> +
>> +if (!(status & HV_HYPERCALL_RESULT_MASK))
>> +ret = 0;
>> +
>> +fault:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
>> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
>> b/arch/x86/include/asm/hyperv-tlfs.h
>> index b8c89265baf0..53bbeb08faea 100644
>> --- a/arch/x86/include/asm/hyperv-tlfs.h
>> +++ b/arch/x86/include/asm/hyperv-tlfs.h
>> @@ -309,6 +309,7 @@ struct ms_hyperv_tsc_page {
>>   #define HV_X64_MSR_REENLIGHTENMENT_CONTROL 0x4106
>>
>>   /* Nested features (CPUID 0x400A) EAX */
>> +#define HV_X64_NESTED_GUSET_MAPPING_FLUSH   BIT(18)
> 
> The #define name is misspelled.  "_GUSET_" should be "_GUEST_".
> And the matching usage in patch 3/3 will need to be updated as well.
> 
> Michael
> 

Nice catch! Will update.

Re: [PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524

2018-06-05 Thread H. Nikolaus Schaller

Hi,

> Am 05.06.2018 um 22:39 schrieb Pavel Machek :
> 
> On Tue 2018-06-05 18:37:21, Andy Shevchenko wrote:
>> On Wed, May 23, 2018 at 5:06 PM, Pavel Machek  wrote:
>>> On Thu 2018-05-17 06:59:49, H. Nikolaus Schaller wrote:
 The register constants are so far defined in a way that they fit
 for the pcal9555a when shifted by the number of banks, i.e. are
 multiplied by 2 in the accessor function.

 Now, the pcal6524 has 3 banks which means the relative offset
 is multiplied by 4 for the standard registers.

 Simply applying the bit shift to the extended registers gives
 a wrong result, since the base offset is already included in
 the offset.

 Therefore, we have to add code to the 24 bit accessor functions
 that adjusts the register number for these exended registers.

 The formula finally used was developed and proposed by
 Andy Shevchenko .
>> 
  int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
 + int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
 + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
>> 
>>> Is this reasonable to do on each register access? Compiler will not be
>>> able to optimize out fls and shifts, right?
>> 
>> On modern CPUs fls() is one assembly command. OTOH, any proposal to do
>> this better?
>> 
>> What I can see is that bank_shift is invariant to the function, and
>> maybe cached.
> 
> Yes, I thought that caching bank_shift might be good idea. I thought
> it was constant for given chip...

Yes, it is an f(chip), but the question that comes to my mind is if
optimization is worth any effort. This is an accessor method over i2c
which tends to be slow (100 / 400kHz SCL) compared to the CPU. So saving
1 or 2 CPU cycles here doesn't seem to be a significant improvement.
Maybe it is more valuable to improve the code path through the i2c core?

BR,
Nikolaus

Re: [PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524

2018-06-05 Thread H. Nikolaus Schaller

Hi,

> Am 05.06.2018 um 22:39 schrieb Pavel Machek :
> 
> On Tue 2018-06-05 18:37:21, Andy Shevchenko wrote:
>> On Wed, May 23, 2018 at 5:06 PM, Pavel Machek  wrote:
>>> On Thu 2018-05-17 06:59:49, H. Nikolaus Schaller wrote:
 The register constants are so far defined in a way that they fit
 for the pcal9555a when shifted by the number of banks, i.e. are
 multiplied by 2 in the accessor function.

 Now, the pcal6524 has 3 banks which means the relative offset
 is multiplied by 4 for the standard registers.

 Simply applying the bit shift to the extended registers gives
 a wrong result, since the base offset is already included in
 the offset.

 Therefore, we have to add code to the 24 bit accessor functions
 that adjusts the register number for these exended registers.

 The formula finally used was developed and proposed by
 Andy Shevchenko .
>> 
  int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
 + int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
 + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
>> 
>>> Is this reasonable to do on each register access? Compiler will not be
>>> able to optimize out fls and shifts, right?
>> 
>> On modern CPUs fls() is one assembly command. OTOH, any proposal to do
>> this better?
>> 
>> What I can see is that bank_shift is invariant to the function, and
>> maybe cached.
> 
> Yes, I thought that caching bank_shift might be good idea. I thought
> it was constant for given chip...

Yes, it is an f(chip), but the question that comes to my mind is if
optimization is worth any effort. This is an accessor method over i2c
which tends to be slow (100 / 400kHz SCL) compared to the CPU. So saving
1 or 2 CPU cycles here doesn't seem to be a significant improvement.
Maybe it is more valuable to improve the code path through the i2c core?

BR,
Nikolaus

Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM

2018-06-05 Thread Naoya Horiguchi

On Tue, Jun 05, 2018 at 07:35:01AM +, Horiguchi Naoya(堀口 直也) wrote:
> On Mon, Jun 04, 2018 at 06:18:36PM -0700, Matthew Wilcox wrote:
> > On Tue, Jun 05, 2018 at 12:54:03AM +, Naoya Horiguchi wrote:
> > > Reproduction precedure is like this:
> > >  - enable RAM based PMEM (with a kernel boot parameter like memmap=1G!4G)
> > >  - read /proc/kpageflags (or call tools/vm/page-types with no arguments)
> > >  (- my kernel config is attached)
> > > 
> > > I spent a few days on this, but didn't reach any solutions.
> > > So let me report this with some details below ...
> > > 
> > > In the critial page request, stable_page_flags() is called with an 
> > > argument
> > > page whose ->compound_head was somehow filled with '0x'.
> > > And compound_head() returns (struct page *)(head - 1), which explains the
> > > address 0xfffe in the above message.
> > 
> > Hm.  compound_head shares with:
> > 
> > struct list_head lru;
> > struct list_head slab_list; /* uses lru 
> > */
> > struct {/* Partial pages */
> > struct page *next;
> > unsigned long _compound_pad_1;  /* compound_head */
> > unsigned long _pt_pad_1;/* compound_head */
> > struct dev_pagemap *pgmap;
> > struct rcu_head rcu_head;
> > 
> > None of them should be -1.
> > 
> > > It seems that this kernel panic happens when reading kpageflags of pfn 
> > > range
> > > [0xbffd7, 0xc), which coresponds to a 'reserved' range.
> > > 
> > > [0.00] user-defined physical RAM map:
> > > [0.00] user: [mem 0x-0x0009fbff] usable
> > > [0.00] user: [mem 0x0009fc00-0x0009] reserved
> > > [0.00] user: [mem 0x000f-0x000f] reserved
> > > [0.00] user: [mem 0x0010-0xbffd6fff] usable
> > > [0.00] user: [mem 0xbffd7000-0xbfff] reserved
> > > [0.00] user: [mem 0xfeffc000-0xfeff] reserved
> > > [0.00] user: [mem 0xfffc-0x] reserved
> > > [0.00] user: [mem 0x0001-0x00013fff] 
> > > persistent (type 12)
> > > 
> > > So I guess 'memmap=' parameter might badly affect the memory 
> > > initialization process.
> > > 
> > > This problem doesn't reproduce on v4.17, so some pre-released patch 
> > > introduces it.
> > > I hope this info helps you find the solution/workaround.
> > 
> > Can you try bisecting this?  It could be one of my patches to reorder struct
> > page, or it could be one of Pavel's deferred page initialisation patches.
> > Or something else ;-)
> 
> Thank you for the comment. I'm trying bisecting now, let you know the result 
> later.
> 
> And I found that my statement "not reproduce on v4.17" was wrong (I used
> different kvm guests, which made some different test condition and misguided 
> me),
> this seems an older (at least < 4.15) bug.

(Cc: Pavel)

Bisection showed that the following commit introduced this issue:

  commit f7f99100d8d95dbcf09e0216a143211e79418b9f
  Author: Pavel Tatashin 
  Date:   Wed Nov 15 17:36:44 2017 -0800
  
  mm: stop zeroing memory during allocation in vmemmap

This patch postpones struct page zeroing to later stage of memory 
initialization.
My kernel config disabled CONFIG_DEFERRED_STRUCT_PAGE_INIT so two callsites of
__init_single_page() were never reached. So in such case, struct pages populated
by vmemmap_pte_populate() could be left uninitialized?
And I'm not sure yet how this issue becomes visible with memmap= setting.

Thanks,
Naoya Horiguchi

Re: kernel panic in reading /proc/kpageflags when enabling RAM-simulated PMEM

2018-06-05 Thread Naoya Horiguchi

On Tue, Jun 05, 2018 at 07:35:01AM +, Horiguchi Naoya(堀口 直也) wrote:
> On Mon, Jun 04, 2018 at 06:18:36PM -0700, Matthew Wilcox wrote:
> > On Tue, Jun 05, 2018 at 12:54:03AM +, Naoya Horiguchi wrote:
> > > Reproduction precedure is like this:
> > >  - enable RAM based PMEM (with a kernel boot parameter like memmap=1G!4G)
> > >  - read /proc/kpageflags (or call tools/vm/page-types with no arguments)
> > >  (- my kernel config is attached)
> > > 
> > > I spent a few days on this, but didn't reach any solutions.
> > > So let me report this with some details below ...
> > > 
> > > In the critial page request, stable_page_flags() is called with an 
> > > argument
> > > page whose ->compound_head was somehow filled with '0x'.
> > > And compound_head() returns (struct page *)(head - 1), which explains the
> > > address 0xfffe in the above message.
> > 
> > Hm.  compound_head shares with:
> > 
> > struct list_head lru;
> > struct list_head slab_list; /* uses lru 
> > */
> > struct {/* Partial pages */
> > struct page *next;
> > unsigned long _compound_pad_1;  /* compound_head */
> > unsigned long _pt_pad_1;/* compound_head */
> > struct dev_pagemap *pgmap;
> > struct rcu_head rcu_head;
> > 
> > None of them should be -1.
> > 
> > > It seems that this kernel panic happens when reading kpageflags of pfn 
> > > range
> > > [0xbffd7, 0xc), which coresponds to a 'reserved' range.
> > > 
> > > [0.00] user-defined physical RAM map:
> > > [0.00] user: [mem 0x-0x0009fbff] usable
> > > [0.00] user: [mem 0x0009fc00-0x0009] reserved
> > > [0.00] user: [mem 0x000f-0x000f] reserved
> > > [0.00] user: [mem 0x0010-0xbffd6fff] usable
> > > [0.00] user: [mem 0xbffd7000-0xbfff] reserved
> > > [0.00] user: [mem 0xfeffc000-0xfeff] reserved
> > > [0.00] user: [mem 0xfffc-0x] reserved
> > > [0.00] user: [mem 0x0001-0x00013fff] 
> > > persistent (type 12)
> > > 
> > > So I guess 'memmap=' parameter might badly affect the memory 
> > > initialization process.
> > > 
> > > This problem doesn't reproduce on v4.17, so some pre-released patch 
> > > introduces it.
> > > I hope this info helps you find the solution/workaround.
> > 
> > Can you try bisecting this?  It could be one of my patches to reorder struct
> > page, or it could be one of Pavel's deferred page initialisation patches.
> > Or something else ;-)
> 
> Thank you for the comment. I'm trying bisecting now, let you know the result 
> later.
> 
> And I found that my statement "not reproduce on v4.17" was wrong (I used
> different kvm guests, which made some different test condition and misguided 
> me),
> this seems an older (at least < 4.15) bug.

(Cc: Pavel)

Bisection showed that the following commit introduced this issue:

  commit f7f99100d8d95dbcf09e0216a143211e79418b9f
  Author: Pavel Tatashin 
  Date:   Wed Nov 15 17:36:44 2017 -0800
  
  mm: stop zeroing memory during allocation in vmemmap

This patch postpones struct page zeroing to later stage of memory 
initialization.
My kernel config disabled CONFIG_DEFERRED_STRUCT_PAGE_INIT so two callsites of
__init_single_page() were never reached. So in such case, struct pages populated
by vmemmap_pte_populate() could be left uninitialized?
And I'm not sure yet how this issue becomes visible with memmap= setting.

Thanks,
Naoya Horiguchi

Re: [PATCH v5 2/4] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Matti Vaittinen

On Tue, Jun 05, 2018 at 09:47:57AM -0600, Rob Herring wrote:
> On Mon, Jun 04, 2018 at 04:18:30PM +0300, Matti Vaittinen wrote:
> > Document devicetree bindings for ROHM BD71837 PMIC MFD.
> > 
> > Signed-off-by: Matti Vaittinen 
> > ---
> >  .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  | 76 
> > ++
> >  1 file changed, 76 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
> 
> I've replied on the prior version discussion. Please don't send new 
> versions if the last one is still under discussion.

Allright. I thought this was a good way to suggest something which I
thought might address your concerns. But you are correct, it is better
to continue discussion in one emai thread. Sorry for adding this
version. I'll write further replies in v4 thread untill we get some
conclusion.

Br,
Matti Vaittinen

Re: [PATCH v5 2/4] mfd: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Matti Vaittinen

On Tue, Jun 05, 2018 at 09:47:57AM -0600, Rob Herring wrote:
> On Mon, Jun 04, 2018 at 04:18:30PM +0300, Matti Vaittinen wrote:
> > Document devicetree bindings for ROHM BD71837 PMIC MFD.
> > 
> > Signed-off-by: Matti Vaittinen 
> > ---
> >  .../devicetree/bindings/mfd/rohm,bd71837-pmic.txt  | 76 
> > ++
> >  1 file changed, 76 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
> 
> I've replied on the prior version discussion. Please don't send new 
> versions if the last one is still under discussion.

Allright. I thought this was a good way to suggest something which I
thought might address your concerns. But you are correct, it is better
to continue discussion in one emai thread. Sorry for adding this
version. I'll write further replies in v4 thread untill we get some
conclusion.

Br,
Matti Vaittinen

[PATCH] x86/iommu: Fix a typo in a macro parameter

2018-06-05 Thread Masatake YAMATO

Signed-off-by: Masatake YAMATO 
---
 arch/x86/include/asm/iommu_table.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/iommu_table.h 
b/arch/x86/include/asm/iommu_table.h
index 1fb3fd1a83c2..2a0d5f7d1ed1 100644
--- a/arch/x86/include/asm/iommu_table.h
+++ b/arch/x86/include/asm/iommu_table.h
@@ -66,7 +66,7 @@ struct iommu_table_entry {
 #define IOMMU_INIT_POST(_detect)   \
__IOMMU_INIT(_detect, pci_swiotlb_detect_4gb,  NULL, NULL, 0)
 
-#define IOMMU_INIT_POST_FINISH(detect) \
+#define IOMMU_INIT_POST_FINISH(_detect)
\
__IOMMU_INIT(_detect, pci_swiotlb_detect_4gb,  NULL, NULL, 1)
 
 /*
-- 
2.17.0

[PATCH] x86/iommu: Fix a typo in a macro parameter

2018-06-05 Thread Masatake YAMATO

Signed-off-by: Masatake YAMATO 
---
 arch/x86/include/asm/iommu_table.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/iommu_table.h 
b/arch/x86/include/asm/iommu_table.h
index 1fb3fd1a83c2..2a0d5f7d1ed1 100644
--- a/arch/x86/include/asm/iommu_table.h
+++ b/arch/x86/include/asm/iommu_table.h
@@ -66,7 +66,7 @@ struct iommu_table_entry {
 #define IOMMU_INIT_POST(_detect)   \
__IOMMU_INIT(_detect, pci_swiotlb_detect_4gb,  NULL, NULL, 0)
 
-#define IOMMU_INIT_POST_FINISH(detect) \
+#define IOMMU_INIT_POST_FINISH(_detect)
\
__IOMMU_INIT(_detect, pci_swiotlb_detect_4gb,  NULL, NULL, 1)
 
 /*
-- 
2.17.0

Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

2018-06-05 Thread Sergey Senozhatsky

On (06/05/18 14:47), Petr Mladek wrote:
[..]
> Grr, the ABBA deadlock is still there. NMIs are not sent to the other
> CPUs atomically. Even if we detect that logbuf_lock is available
> in printk_nmi_enter() on some CPUs, it might still get locked on
> another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework
locking in nmi_cpu_backtrace()?

> By other words, any check in printk_safe_enter() is racy and not
> sufficient

I suppose you meant printk_nmi_enter().

>   => I suggest to revert the commit 719f6a7040f1bdaf96fcc70
>  "printk: Use the main logbuf in NMI when logbuf_lock is available"
>  for-4.18 and stable until we get a better solution.

Just random thoughts.

May be we need to revert it, but let's not "panic". I think [but don't
insist on it] that the patch in question is *probably* "good enough". It
addresses a bug report after all.
How often do we have arch_trigger_cpumask_backtrace() on all CPUs these
days? I tend to think that it used to be much more popular in the past,
because we had a loops_per_jiffy based spin_lock lockup detection which
would trigger NMI backtracase, but this functionality has gone, see
bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details. I'm not saying
that the race condition that you found is unrealistic, I'm just saying
that _it seems_ that nmi_panic()->printk() on a single CPU is more common
now, so having that nmi_printk()->printk_deferred() might be quite
valuable at the end of the day.

May be I'm wrong!

> The only safe solution seems to be a trylock() in NMI in
> vprintk_emit() and fallback to vprintk_safe() when the lock
> is not taken. I mean something like:
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 247808333ba4..4a5a0bf221b3 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
>   printk_delay();
>  
>   /* This stops the holder of console_sem just where we want him */
> - logbuf_lock_irqsave(flags);
> + printk_safe_enter_irqsave(flags);
> + if (in_nmi() && !raw_spin_trylock(_lock)) {
> + vprintk_nmi(fmt, args);
> + printk_safe_exit_irqrestore(flags);
> + return;
> + } else
> + raw_spin_lock(_lock);
>   /*
>* The printf needs to come first; we need the syslog
>* prefix which might be passed-in as a parameter.

I need some time to think about it.

> Sigh, this looks like a material for-4.19.

Agreed.

> We might need to revisit if printk_context still makes sense, ...

What do you mean by this?

> PS: I realized this when writing the pull request for-4.18.
> I removed this patch from the pull request.

Yep. Good job!

-ss

Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

2018-06-05 Thread Sergey Senozhatsky

On (06/05/18 14:47), Petr Mladek wrote:
[..]
> Grr, the ABBA deadlock is still there. NMIs are not sent to the other
> CPUs atomically. Even if we detect that logbuf_lock is available
> in printk_nmi_enter() on some CPUs, it might still get locked on
> another CPU before the other CPU gets NMI.

Can we do something about "B"? :) I mean - any chance we can rework
locking in nmi_cpu_backtrace()?

> By other words, any check in printk_safe_enter() is racy and not
> sufficient

I suppose you meant printk_nmi_enter().

>   => I suggest to revert the commit 719f6a7040f1bdaf96fcc70
>  "printk: Use the main logbuf in NMI when logbuf_lock is available"
>  for-4.18 and stable until we get a better solution.

Just random thoughts.

May be we need to revert it, but let's not "panic". I think [but don't
insist on it] that the patch in question is *probably* "good enough". It
addresses a bug report after all.
How often do we have arch_trigger_cpumask_backtrace() on all CPUs these
days? I tend to think that it used to be much more popular in the past,
because we had a loops_per_jiffy based spin_lock lockup detection which
would trigger NMI backtracase, but this functionality has gone, see
bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details. I'm not saying
that the race condition that you found is unrealistic, I'm just saying
that _it seems_ that nmi_panic()->printk() on a single CPU is more common
now, so having that nmi_printk()->printk_deferred() might be quite
valuable at the end of the day.

May be I'm wrong!

> The only safe solution seems to be a trylock() in NMI in
> vprintk_emit() and fallback to vprintk_safe() when the lock
> is not taken. I mean something like:
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 247808333ba4..4a5a0bf221b3 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
>   printk_delay();
>  
>   /* This stops the holder of console_sem just where we want him */
> - logbuf_lock_irqsave(flags);
> + printk_safe_enter_irqsave(flags);
> + if (in_nmi() && !raw_spin_trylock(_lock)) {
> + vprintk_nmi(fmt, args);
> + printk_safe_exit_irqrestore(flags);
> + return;
> + } else
> + raw_spin_lock(_lock);
>   /*
>* The printf needs to come first; we need the syslog
>* prefix which might be passed-in as a parameter.

I need some time to think about it.

> Sigh, this looks like a material for-4.19.

Agreed.

> We might need to revisit if printk_context still makes sense, ...

What do you mean by this?

> PS: I realized this when writing the pull request for-4.18.
> I removed this patch from the pull request.

Yep. Good job!

-ss

Re: [PATCH v5 3/4] clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Matti Vaittinen

On Tue, Jun 05, 2018 at 09:49:32AM -0600, Rob Herring wrote:
> On Mon, Jun 04, 2018 at 04:18:53PM +0300, Matti Vaittinen wrote:
> > Document devicetree bindings for ROHM BD71837 PMIC clock output.
> > 
> > Signed-off-by: Matti Vaittinen 
> > ---
> >  .../bindings/clock/rohm,bd71837-clock.txt  | 38 
> > ++
> >  1 file changed, 38 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt 
> > b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > new file mode 100644
> > index ..771acfe34114
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > @@ -0,0 +1,38 @@
> > +ROHM BD71837 Power Management Integrated Circuit clock bindings
> 
> This needs to be added to the MFD doc. One node should be covered by at 
> most 1 document.

I was thinking of that too. But then I asked why? I also thought that if
one knows there is clock block in the chip - where does he look for
binding document? From clock folder. Then I saw how bindings for
MAX77686 chip were written and thought that this is beneficial for all.
MFD document directs to clock and regulator docs and on othe other hand,
clock document clearly states that properties it describes must be
present "in main device node of the MFD chip".

Don't you think on searching for clock bindings should find something
from clock folder? I can follow your instruction here but I think
the user might be happy if he found something under bindings/clock for
clock related properties.

Br,
Matti Vaittinen

Re: [PATCH v5 3/4] clk: bd71837: Devicetree bindings for ROHM BD71837 PMIC

2018-06-05 Thread Matti Vaittinen

On Tue, Jun 05, 2018 at 09:49:32AM -0600, Rob Herring wrote:
> On Mon, Jun 04, 2018 at 04:18:53PM +0300, Matti Vaittinen wrote:
> > Document devicetree bindings for ROHM BD71837 PMIC clock output.
> > 
> > Signed-off-by: Matti Vaittinen 
> > ---
> >  .../bindings/clock/rohm,bd71837-clock.txt  | 38 
> > ++
> >  1 file changed, 38 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt 
> > b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > new file mode 100644
> > index ..771acfe34114
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/clock/rohm,bd71837-clock.txt
> > @@ -0,0 +1,38 @@
> > +ROHM BD71837 Power Management Integrated Circuit clock bindings
> 
> This needs to be added to the MFD doc. One node should be covered by at 
> most 1 document.

I was thinking of that too. But then I asked why? I also thought that if
one knows there is clock block in the chip - where does he look for
binding document? From clock folder. Then I saw how bindings for
MAX77686 chip were written and thought that this is beneficial for all.
MFD document directs to clock and regulator docs and on othe other hand,
clock document clearly states that properties it describes must be
present "in main device node of the MFD chip".

Don't you think on searching for clock bindings should find something
from clock folder? I can follow your instruction here but I think
the user might be happy if he found something under bindings/clock for
clock related properties.

Br,
Matti Vaittinen

Re: [PATCH v3 21/21] sparc64: use match_string() helper

2018-06-05 Thread Andy Shevchenko

On Wed, Jun 6, 2018 at 5:19 AM, Yisheng Xie  wrote:
> match_string() returns the index of an array for a matching string,
> which can be used instead of open coded variant.
>

Thanks for an update.
My comments below.

I think you need to mentioned the string literal change in the commit message.

> Cc: "David S. Miller" 
> Cc: Anthony Yznaga 
> Cc: Pavel Tatashin 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Yisheng Xie 
> ---
> v3:
>  - add string literal instead of NULL for array hwcaps to make it
>can use match_string() too.  - per Andy
> v2
>  - new add for use match_string() helper patchset.
>
>  arch/sparc/kernel/setup_64.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
> index 7944b3c..4f0ec0c 100644
> --- a/arch/sparc/kernel/setup_64.c
> +++ b/arch/sparc/kernel/setup_64.c
> @@ -401,7 +401,7 @@ void __init start_early_boot(void)
>  */
> "mul32", "div32", "fsmuld", "v8plus", "popc", "vis", "vis2",
> "ASIBlkInit", "fmaf", "vis3", "hpc", "random", "trans", "fjfmau",
> -   "ima", "cspare", "pause", "cbcond", NULL /*reserved for crypto */,
> +   "ima", "cspare", "pause", "cbcond", "resv" /*reserved for crypto */,
> "adp",

Why not to spell "crypto" explicitly and remove comment?

>  };
>
> @@ -418,7 +418,7 @@ void cpucap_info(struct seq_file *m)
> seq_puts(m, "cpucaps\t\t: ");
> for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> unsigned long bit = 1UL << i;
> -   if (hwcaps[i] && (caps & bit)) {
> +   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit)) {

I would rather swap the order of subsonditions to check if caps has a
bit first, and then exclude CRYPTO.

> seq_printf(m, "%s%s",
>printed ? "," : "", hwcaps[i]);
> printed++;
> @@ -472,7 +472,7 @@ static void __init report_hwcaps(unsigned long caps)
>
> for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> unsigned long bit = 1UL << i;
> -   if (hwcaps[i] && (caps & bit))
> +   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit))
> report_one_hwcap(, hwcaps[i]);

Ditto.

> }
> if (caps & HWCAP_SPARC_CRYPTO)
> @@ -504,18 +504,13 @@ static unsigned long __init mdesc_cpu_hwcap_list(void)
> while (len) {
> int i, plen;
>
> -   for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> -   unsigned long bit = 1UL << i;
> +   i = match_string(hwcaps, ARRAY_SIZE(hwcaps), prop);
> +   if (i >= 0)
> +   caps |= (1UL << i);

Parens are redundant (and actually didn't present in the original code above).

>
> -   if (hwcaps[i] && !strcmp(prop, hwcaps[i])) {
> -   caps |= bit;
> -   break;
> -   }
> -   }
> -   for (i = 0; i < ARRAY_SIZE(crypto_hwcaps); i++) {
> -   if (!strcmp(prop, crypto_hwcaps[i]))
> -   caps |= HWCAP_SPARC_CRYPTO;
> -   }
> +   i = match_string(crypto_hwcaps, ARRAY_SIZE(crypto_hwcaps), 
> prop);
> +   if (i >= 0)
> +   caps |= HWCAP_SPARC_CRYPTO;
>
> plen = strlen(prop) + 1;
> prop += plen;
> --
> 1.7.12.4
>
>
>



-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 21/21] sparc64: use match_string() helper

2018-06-05 Thread Andy Shevchenko

On Wed, Jun 6, 2018 at 5:19 AM, Yisheng Xie  wrote:
> match_string() returns the index of an array for a matching string,
> which can be used instead of open coded variant.
>

Thanks for an update.
My comments below.

I think you need to mentioned the string literal change in the commit message.

> Cc: "David S. Miller" 
> Cc: Anthony Yznaga 
> Cc: Pavel Tatashin 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Yisheng Xie 
> ---
> v3:
>  - add string literal instead of NULL for array hwcaps to make it
>can use match_string() too.  - per Andy
> v2
>  - new add for use match_string() helper patchset.
>
>  arch/sparc/kernel/setup_64.c | 23 +--
>  1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
> index 7944b3c..4f0ec0c 100644
> --- a/arch/sparc/kernel/setup_64.c
> +++ b/arch/sparc/kernel/setup_64.c
> @@ -401,7 +401,7 @@ void __init start_early_boot(void)
>  */
> "mul32", "div32", "fsmuld", "v8plus", "popc", "vis", "vis2",
> "ASIBlkInit", "fmaf", "vis3", "hpc", "random", "trans", "fjfmau",
> -   "ima", "cspare", "pause", "cbcond", NULL /*reserved for crypto */,
> +   "ima", "cspare", "pause", "cbcond", "resv" /*reserved for crypto */,
> "adp",

Why not to spell "crypto" explicitly and remove comment?

>  };
>
> @@ -418,7 +418,7 @@ void cpucap_info(struct seq_file *m)
> seq_puts(m, "cpucaps\t\t: ");
> for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> unsigned long bit = 1UL << i;
> -   if (hwcaps[i] && (caps & bit)) {
> +   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit)) {

I would rather swap the order of subsonditions to check if caps has a
bit first, and then exclude CRYPTO.

> seq_printf(m, "%s%s",
>printed ? "," : "", hwcaps[i]);
> printed++;
> @@ -472,7 +472,7 @@ static void __init report_hwcaps(unsigned long caps)
>
> for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> unsigned long bit = 1UL << i;
> -   if (hwcaps[i] && (caps & bit))
> +   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit))
> report_one_hwcap(, hwcaps[i]);

Ditto.

> }
> if (caps & HWCAP_SPARC_CRYPTO)
> @@ -504,18 +504,13 @@ static unsigned long __init mdesc_cpu_hwcap_list(void)
> while (len) {
> int i, plen;
>
> -   for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
> -   unsigned long bit = 1UL << i;
> +   i = match_string(hwcaps, ARRAY_SIZE(hwcaps), prop);
> +   if (i >= 0)
> +   caps |= (1UL << i);

Parens are redundant (and actually didn't present in the original code above).

>
> -   if (hwcaps[i] && !strcmp(prop, hwcaps[i])) {
> -   caps |= bit;
> -   break;
> -   }
> -   }
> -   for (i = 0; i < ARRAY_SIZE(crypto_hwcaps); i++) {
> -   if (!strcmp(prop, crypto_hwcaps[i]))
> -   caps |= HWCAP_SPARC_CRYPTO;
> -   }
> +   i = match_string(crypto_hwcaps, ARRAY_SIZE(crypto_hwcaps), 
> prop);
> +   if (i >= 0)
> +   caps |= HWCAP_SPARC_CRYPTO;
>
> plen = strlen(prop) + 1;
> prop += plen;
> --
> 1.7.12.4
>
>
>



-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] cpufreq: kryo: allow building as a loadable module

2018-06-05 Thread Ilia Lin

Viresh got ahead of my answer a bit. :)
Sure I'll post the module exit later.

On June 6, 2018 7:09:29 AM GMT+03:00, Viresh Kumar  
wrote:
>On 05-06-18, 13:44, Arnd Bergmann wrote:
>> Building the kryo cpufreq driver while QCOM_SMEM is a loadable module
>> results in a link error:
>> 
>> drivers/cpufreq/qcom-cpufreq-kryo.o: In function
>`qcom_cpufreq_kryo_probe':
>> qcom-cpufreq-kryo.c:(.text+0xbc): undefined reference to
>`qcom_smem_get'
>> 
>> The problem is that Kconfig ignores interprets the dependency as met
>> when the dependent symbol is a 'bool' one. By making it 'tristate',
>> it will be forced to be a module here, which builds successfully.
>> 
>> Fixes: 46e2856b8e18 ("cpufreq: Add Kryo CPU scaling driver")
>> Signed-off-by: Arnd Bergmann 
>> ---
>>  drivers/cpufreq/Kconfig.arm | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/cpufreq/Kconfig.arm
>b/drivers/cpufreq/Kconfig.arm
>> index c7ce928fbf1f..52f5f1a2040c 100644
>> --- a/drivers/cpufreq/Kconfig.arm
>> +++ b/drivers/cpufreq/Kconfig.arm
>> @@ -125,7 +125,7 @@ config ARM_OMAP2PLUS_CPUFREQ
>>  default ARCH_OMAP2PLUS
>>  
>>  config ARM_QCOM_CPUFREQ_KRYO
>> -bool "Qualcomm Kryo based CPUFreq"
>> +tristate "Qualcomm Kryo based CPUFreq"
>>  depends on ARM64
>>  depends on QCOM_QFPROM
>>  depends on QCOM_SMEM
>
>Okay, so we really need this to be a module. But the driver can't
>really work as
>a module right now if we do this: insmod, rmmod, insmod. Because it
>doesn't free
>resources at rmmmod and will fail on second insmod.
>
>Because what you are fixing is a critical build error, we better get it
>merged
>right now.
>
>Acked-by: Viresh Kumar 
>
>But Ilia needs to cook another patch to add the module removal code for
>the
>driver and mark your patch's commit id in the fixes tag.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] cpufreq: kryo: allow building as a loadable module

2018-06-05 Thread Ilia Lin

Viresh got ahead of my answer a bit. :)
Sure I'll post the module exit later.

On June 6, 2018 7:09:29 AM GMT+03:00, Viresh Kumar  
wrote:
>On 05-06-18, 13:44, Arnd Bergmann wrote:
>> Building the kryo cpufreq driver while QCOM_SMEM is a loadable module
>> results in a link error:
>> 
>> drivers/cpufreq/qcom-cpufreq-kryo.o: In function
>`qcom_cpufreq_kryo_probe':
>> qcom-cpufreq-kryo.c:(.text+0xbc): undefined reference to
>`qcom_smem_get'
>> 
>> The problem is that Kconfig ignores interprets the dependency as met
>> when the dependent symbol is a 'bool' one. By making it 'tristate',
>> it will be forced to be a module here, which builds successfully.
>> 
>> Fixes: 46e2856b8e18 ("cpufreq: Add Kryo CPU scaling driver")
>> Signed-off-by: Arnd Bergmann 
>> ---
>>  drivers/cpufreq/Kconfig.arm | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/cpufreq/Kconfig.arm
>b/drivers/cpufreq/Kconfig.arm
>> index c7ce928fbf1f..52f5f1a2040c 100644
>> --- a/drivers/cpufreq/Kconfig.arm
>> +++ b/drivers/cpufreq/Kconfig.arm
>> @@ -125,7 +125,7 @@ config ARM_OMAP2PLUS_CPUFREQ
>>  default ARCH_OMAP2PLUS
>>  
>>  config ARM_QCOM_CPUFREQ_KRYO
>> -bool "Qualcomm Kryo based CPUFreq"
>> +tristate "Qualcomm Kryo based CPUFreq"
>>  depends on ARM64
>>  depends on QCOM_QFPROM
>>  depends on QCOM_SMEM
>
>Okay, so we really need this to be a module. But the driver can't
>really work as
>a module right now if we do this: insmod, rmmod, insmod. Because it
>doesn't free
>resources at rmmmod and will fail on second insmod.
>
>Because what you are fixing is a critical build error, we better get it
>merged
>right now.
>
>Acked-by: Viresh Kumar 
>
>But Ilia needs to cook another patch to add the module removal code for
>the
>driver and mark your patch's commit id in the fixes tag.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC PATCH -tip v5 06/27] arm64: kprobes: Remove jprobe implementation

2018-06-05 Thread Masami Hiramatsu

On Tue, 5 Jun 2018 16:34:21 +0100
Will Deacon  wrote:

> On Tue, Jun 05, 2018 at 12:51:24AM +0900, Masami Hiramatsu wrote:
> > Remove arch dependent setjump/longjump functions
> > and unused fields in kprobe_ctlblk for jprobes
> > from arch/arm64.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: linux-arm-ker...@lists.infradead.org
> > ---
> >  arch/arm64/include/asm/kprobes.h   |1 -
> >  arch/arm64/kernel/probes/kprobes.c |   68 
> > 
> >  2 files changed, 69 deletions(-)
> 
> Acked-by: Will Deacon 

Thank you Will!


-- 
Masami Hiramatsu

Re: [RFC PATCH -tip v5 06/27] arm64: kprobes: Remove jprobe implementation

2018-06-05 Thread Masami Hiramatsu

On Tue, 5 Jun 2018 16:34:21 +0100
Will Deacon  wrote:

> On Tue, Jun 05, 2018 at 12:51:24AM +0900, Masami Hiramatsu wrote:
> > Remove arch dependent setjump/longjump functions
> > and unused fields in kprobe_ctlblk for jprobes
> > from arch/arm64.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: linux-arm-ker...@lists.infradead.org
> > ---
> >  arch/arm64/include/asm/kprobes.h   |1 -
> >  arch/arm64/kernel/probes/kprobes.c |   68 
> > 
> >  2 files changed, 69 deletions(-)
> 
> Acked-by: Will Deacon 

Thank you Will!


-- 
Masami Hiramatsu

Re: [PATCH 1/2] dt-bindings: cpufreq: Introduce QCOM CPUFREQ FW bindings

2018-06-05 Thread Viresh Kumar

On 04-06-18, 16:16, Taniya Das wrote:
> Add QCOM cpufreq firmware device bindings for Qualcomm Technology Inc's
> SoCs. This is required for managing the cpu frequency transitions which are
> controlled by firmware.
> 
> Signed-off-by: Taniya Das 
> ---
>  .../bindings/cpufreq/cpufreq-qcom-fw.txt   | 173 
> +
>  1 file changed, 173 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> 
> diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt 
> b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> new file mode 100644
> index 000..e3087ec
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> @@ -0,0 +1,173 @@
> +Qualcomm Technologies, Inc. CPUFREQ Bindings
> +
> +CPUFREQ FW is a hardware engine used by some Qualcomm Technologies, Inc. 
> (QTI)
> +SoCs to manage frequency in hardware. It is capable of controlling frequency
> +for multiple clusters.
> +
> +Properties:
> +- compatible
> + Usage:  required
> + Value type: 
> + Definition: must be "qcom,cpufreq-fw".
> +
> +* Property qcom,freq-domain
> +Devices supporting freq-domain must set their "qcom,freq-domain" property 
> with
> +phandle to a freq_domain_table in their DT node.
> +
> +* Frequency Domain Table Node
> +
> +This describes the frequency domain belonging to a device.
> +This node can have following properties:
> +
> +- reg
> + Usage:  required
> + Value type: 
> + Definition: Addresses and sizes for the memory of the perf
> + , lut and enable bases.
> + perf - indicates the base address for the desired
> + performance state to be set.
> + lut - indicates the look up table base address for the
> + cpufreq driver to read frequencies.
> + enable - indicates the enable register for firmware.
> +- reg-names
> + Usage:  required
> + Value type: 
> + Definition: Address names. Must be "perf", "lut", "enable".
> + Must be specified in the same order as the reg property.
> +
> +Example:
> +
> +Example 1: Dual-cluster, Quad-core per cluster. CPUs within a cluster switch
> +DCVS state together.
> +
> +/ {
> + cpus {
> + #address-cells = <2>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x0>;
> + enable-method = "psci";
> + next-level-cache = <_0>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_0: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + L3_0: l3-cache {
> +   compatible = "cache";
> + };
> + };
> + };
> +
> + CPU1: cpu@100 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x100>;
> + enable-method = "psci";
> + next-level-cache = <_100>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_100: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU2: cpu@200 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x200>;
> + enable-method = "psci";
> + next-level-cache = <_200>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_200: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU3: cpu@300 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x300>;
> + enable-method = "psci";
> + next-level-cache = <_300>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_300: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU4: cpu@400 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x400>;
> + enable-method = "psci";
> +

Re: [PATCH 1/2] dt-bindings: cpufreq: Introduce QCOM CPUFREQ FW bindings

2018-06-05 Thread Viresh Kumar

On 04-06-18, 16:16, Taniya Das wrote:
> Add QCOM cpufreq firmware device bindings for Qualcomm Technology Inc's
> SoCs. This is required for managing the cpu frequency transitions which are
> controlled by firmware.
> 
> Signed-off-by: Taniya Das 
> ---
>  .../bindings/cpufreq/cpufreq-qcom-fw.txt   | 173 
> +
>  1 file changed, 173 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> 
> diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt 
> b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> new file mode 100644
> index 000..e3087ec
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-fw.txt
> @@ -0,0 +1,173 @@
> +Qualcomm Technologies, Inc. CPUFREQ Bindings
> +
> +CPUFREQ FW is a hardware engine used by some Qualcomm Technologies, Inc. 
> (QTI)
> +SoCs to manage frequency in hardware. It is capable of controlling frequency
> +for multiple clusters.
> +
> +Properties:
> +- compatible
> + Usage:  required
> + Value type: 
> + Definition: must be "qcom,cpufreq-fw".
> +
> +* Property qcom,freq-domain
> +Devices supporting freq-domain must set their "qcom,freq-domain" property 
> with
> +phandle to a freq_domain_table in their DT node.
> +
> +* Frequency Domain Table Node
> +
> +This describes the frequency domain belonging to a device.
> +This node can have following properties:
> +
> +- reg
> + Usage:  required
> + Value type: 
> + Definition: Addresses and sizes for the memory of the perf
> + , lut and enable bases.
> + perf - indicates the base address for the desired
> + performance state to be set.
> + lut - indicates the look up table base address for the
> + cpufreq driver to read frequencies.
> + enable - indicates the enable register for firmware.
> +- reg-names
> + Usage:  required
> + Value type: 
> + Definition: Address names. Must be "perf", "lut", "enable".
> + Must be specified in the same order as the reg property.
> +
> +Example:
> +
> +Example 1: Dual-cluster, Quad-core per cluster. CPUs within a cluster switch
> +DCVS state together.
> +
> +/ {
> + cpus {
> + #address-cells = <2>;
> + #size-cells = <0>;
> +
> + CPU0: cpu@0 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x0>;
> + enable-method = "psci";
> + next-level-cache = <_0>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_0: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + L3_0: l3-cache {
> +   compatible = "cache";
> + };
> + };
> + };
> +
> + CPU1: cpu@100 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x100>;
> + enable-method = "psci";
> + next-level-cache = <_100>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_100: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU2: cpu@200 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x200>;
> + enable-method = "psci";
> + next-level-cache = <_200>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_200: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU3: cpu@300 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x300>;
> + enable-method = "psci";
> + next-level-cache = <_300>;
> + qcom,freq-domain = <_domain_table0>;
> + L2_300: l2-cache {
> + compatible = "cache";
> + next-level-cache = <_0>;
> + };
> + };
> +
> + CPU4: cpu@400 {
> + device_type = "cpu";
> + compatible = "qcom,kryo385";
> + reg = <0x0 0x400>;
> + enable-method = "psci";
> +

[PATCH v3 1/3] arm64/mm: pass swapper_pg_dir as an argument to __enable_mmu()

2018-06-05 Thread Jun Yao

Introduce __pa_swapper_pg_dir to save physical address of
swapper_pg_dir. And pass it as an argument to __enable_mmu().

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/mmu_context.h |  4 +---
 arch/arm64/include/asm/pgtable.h |  1 +
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/head.S |  6 --
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/sleep.S|  1 +
 arch/arm64/mm/kasan_init.c   |  4 ++--
 arch/arm64/mm/mmu.c  | 20 ++--
 8 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 39ec0b8a689e..3eddb871f251 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -141,14 +141,12 @@ static inline void cpu_install_idmap(void)
  * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
  * avoiding the possibility of conflicting TLB entries being allocated.
  */
-static inline void cpu_replace_ttbr1(pgd_t *pgdp)
+static inline void cpu_replace_ttbr1(phys_addr_t pgd_phys)
 {
typedef void (ttbr_replace_func)(phys_addr_t);
extern ttbr_replace_func idmap_cpu_replace_ttbr1;
ttbr_replace_func *replace_phys;
 
-   phys_addr_t pgd_phys = virt_to_phys(pgdp);
-
replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
 
cpu_install_idmap();
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7c4c8f318ba9..519ab5581b08 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -722,6 +722,7 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
+extern volatile phys_addr_t __pa_swapper_pg_dir;
 
 /*
  * Encode and decode a swap entry:
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d2856b129097..e3d76a9dd67a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -917,7 +917,7 @@ kpti_install_ng_mappings(const struct 
arm64_cpu_capabilities *__unused)
remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
 
cpu_install_idmap();
-   remap_fn(cpu, num_online_cpus(), __pa_symbol(swapper_pg_dir));
+   remap_fn(cpu, num_online_cpus(), __pa_swapper_pg_dir);
cpu_uninstall_idmap();
 
if (!cpu)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b0853069702f..2e871b1cb75f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -706,6 +706,7 @@ secondary_startup:
 * Common entry point for secondary CPUs.
 */
bl  __cpu_setup // initialise processor
+   ldr_l   x26, __pa_swapper_pg_dir
bl  __enable_mmu
ldr x8, =__secondary_switched
br  x8
@@ -748,6 +749,7 @@ ENDPROC(__secondary_switched)
  * Enable the MMU.
  *
  *  x0  = SCTLR_EL1 value for turning on the MMU.
+ *  x26 = TTBR1 value for turning on the MMU.
  *
  * Returns to the caller via x30/lr. This requires the caller to be covered
  * by the .idmap.text section.
@@ -762,9 +764,8 @@ ENTRY(__enable_mmu)
b.ne__no_granule_support
update_early_cpu_boot_status 0, x1, x2
adrpx1, idmap_pg_dir
-   adrpx2, swapper_pg_dir
phys_to_ttbr x3, x1
-   phys_to_ttbr x4, x2
+   phys_to_ttbr x4, x26
msr ttbr0_el1, x3   // load TTBR0
msr ttbr1_el1, x4   // load TTBR1
isb
@@ -823,6 +824,7 @@ __primary_switch:
mrs x20, sctlr_el1  // preserve old SCTLR_EL1 value
 #endif
 
+   adrpx26, swapper_pg_dir
bl  __enable_mmu
 #ifdef CONFIG_RELOCATABLE
bl  __relocate_kernel
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 6b2686d54411..0a0a0ca19f9b 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -125,7 +125,7 @@ int arch_hibernation_header_save(void *addr, unsigned int 
max_size)
return -EOVERFLOW;
 
arch_hdr_invariants(>invariants);
-   hdr->ttbr1_el1  = __pa_symbol(swapper_pg_dir);
+   hdr->ttbr1_el1  = __pa_swapper_pg_dir;
hdr->reenter_kernel = _cpu_resume;
 
/* We can't use __hyp_get_vectors() because kvm may still be loaded */
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index bebec8ef9372..03854c329449 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -101,6 +101,7 @@ ENTRY(cpu_resume)
bl  el2_setup   // if in EL2 drop to EL1 cleanly
bl  __cpu_setup
/* enable the MMU early - so we can access sleep_save_stash by va */
+   ldr_l   x26, __pa_swapper_pg_dir
bl  __enable_mmu
ldr x8,

[PATCH v3 1/3] arm64/mm: pass swapper_pg_dir as an argument to __enable_mmu()

2018-06-05 Thread Jun Yao

Introduce __pa_swapper_pg_dir to save physical address of
swapper_pg_dir. And pass it as an argument to __enable_mmu().

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/mmu_context.h |  4 +---
 arch/arm64/include/asm/pgtable.h |  1 +
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/head.S |  6 --
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/sleep.S|  1 +
 arch/arm64/mm/kasan_init.c   |  4 ++--
 arch/arm64/mm/mmu.c  | 20 ++--
 8 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 39ec0b8a689e..3eddb871f251 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -141,14 +141,12 @@ static inline void cpu_install_idmap(void)
  * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
  * avoiding the possibility of conflicting TLB entries being allocated.
  */
-static inline void cpu_replace_ttbr1(pgd_t *pgdp)
+static inline void cpu_replace_ttbr1(phys_addr_t pgd_phys)
 {
typedef void (ttbr_replace_func)(phys_addr_t);
extern ttbr_replace_func idmap_cpu_replace_ttbr1;
ttbr_replace_func *replace_phys;
 
-   phys_addr_t pgd_phys = virt_to_phys(pgdp);
-
replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
 
cpu_install_idmap();
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7c4c8f318ba9..519ab5581b08 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -722,6 +722,7 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
+extern volatile phys_addr_t __pa_swapper_pg_dir;
 
 /*
  * Encode and decode a swap entry:
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d2856b129097..e3d76a9dd67a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -917,7 +917,7 @@ kpti_install_ng_mappings(const struct 
arm64_cpu_capabilities *__unused)
remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
 
cpu_install_idmap();
-   remap_fn(cpu, num_online_cpus(), __pa_symbol(swapper_pg_dir));
+   remap_fn(cpu, num_online_cpus(), __pa_swapper_pg_dir);
cpu_uninstall_idmap();
 
if (!cpu)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b0853069702f..2e871b1cb75f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -706,6 +706,7 @@ secondary_startup:
 * Common entry point for secondary CPUs.
 */
bl  __cpu_setup // initialise processor
+   ldr_l   x26, __pa_swapper_pg_dir
bl  __enable_mmu
ldr x8, =__secondary_switched
br  x8
@@ -748,6 +749,7 @@ ENDPROC(__secondary_switched)
  * Enable the MMU.
  *
  *  x0  = SCTLR_EL1 value for turning on the MMU.
+ *  x26 = TTBR1 value for turning on the MMU.
  *
  * Returns to the caller via x30/lr. This requires the caller to be covered
  * by the .idmap.text section.
@@ -762,9 +764,8 @@ ENTRY(__enable_mmu)
b.ne__no_granule_support
update_early_cpu_boot_status 0, x1, x2
adrpx1, idmap_pg_dir
-   adrpx2, swapper_pg_dir
phys_to_ttbr x3, x1
-   phys_to_ttbr x4, x2
+   phys_to_ttbr x4, x26
msr ttbr0_el1, x3   // load TTBR0
msr ttbr1_el1, x4   // load TTBR1
isb
@@ -823,6 +824,7 @@ __primary_switch:
mrs x20, sctlr_el1  // preserve old SCTLR_EL1 value
 #endif
 
+   adrpx26, swapper_pg_dir
bl  __enable_mmu
 #ifdef CONFIG_RELOCATABLE
bl  __relocate_kernel
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 6b2686d54411..0a0a0ca19f9b 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -125,7 +125,7 @@ int arch_hibernation_header_save(void *addr, unsigned int 
max_size)
return -EOVERFLOW;
 
arch_hdr_invariants(>invariants);
-   hdr->ttbr1_el1  = __pa_symbol(swapper_pg_dir);
+   hdr->ttbr1_el1  = __pa_swapper_pg_dir;
hdr->reenter_kernel = _cpu_resume;
 
/* We can't use __hyp_get_vectors() because kvm may still be loaded */
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index bebec8ef9372..03854c329449 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -101,6 +101,7 @@ ENTRY(cpu_resume)
bl  el2_setup   // if in EL2 drop to EL1 cleanly
bl  __cpu_setup
/* enable the MMU early - so we can access sleep_save_stash by va */
+   ldr_l   x26, __pa_swapper_pg_dir
bl  __enable_mmu
ldr x8,

[PATCH v3 3/3] arm64/mm: migrate swapper_pg_dir and tramp_pg_dir

2018-06-05 Thread Jun Yao

Migrate swapper_pg_dir and tramp_pg_dir. And their virtual addresses
do not correlate with kernel's address.

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/pgtable.h |  1 +
 arch/arm64/mm/mmu.c  | 79 +---
 2 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2bda899dcf22..b032d6c2e390 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -722,6 +722,7 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
+extern pgd_t reserved_ttbr0[PTRS_PER_PGD];
 extern volatile phys_addr_t __pa_swapper_pg_dir;
 extern pgd_t *new_swapper_pg_dir;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 94056e064c6f..ba0b55158971 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -63,6 +63,9 @@ volatile phys_addr_t __section(".mmuoff.data.read")
 __pa_swapper_pg_dir;
 
 pgd_t *new_swapper_pg_dir = swapper_pg_dir;
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+pgd_t *new_tramp_pg_dir;
+#endif
 
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
@@ -86,19 +89,14 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned 
long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(void)
+static void __init clear_page_phys(phys_addr_t phys)
 {
-   phys_addr_t phys;
-   void *ptr;
-
-   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
-
/*
 * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
 * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
 * any level of table.
 */
-   ptr = pte_set_fixmap(phys);
+   void *ptr = pte_set_fixmap(phys);
 
memset(ptr, 0, PAGE_SIZE);
 
@@ -107,6 +105,14 @@ static phys_addr_t __init early_pgtable_alloc(void)
 * table walker
 */
pte_clear_fixmap();
+}
+
+static phys_addr_t __init early_pgtable_alloc(void)
+{
+   phys_addr_t phys;
+
+   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+   clear_page_phys(phys);
 
return phys;
 }
@@ -560,6 +566,10 @@ static int __init map_entry_trampoline(void)
__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
 prot, pgd_pgtable_alloc, 0);
 
+   memcpy(new_tramp_pg_dir, tramp_pg_dir, PGD_SIZE);
+   memblock_free(__pa_symbol(tramp_pg_dir),
+   __pa_symbol(swapper_pg_dir) - __pa_symbol(tramp_pg_dir));
+
/* Map both the text and data into the kernel page table */
__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
@@ -637,10 +647,29 @@ static void __init map_kernel(pgd_t *pgdp)
  */
 void __init paging_init(void)
 {
-   phys_addr_t pgd_phys = early_pgtable_alloc();
-   pgd_t *pgdp = pgd_set_fixmap(pgd_phys);
+   phys_addr_t pgd_phys;
+   pgd_t *pgdp;
+   phys_addr_t mem_size;
 
-   __pa_swapper_pg_dir = __pa_symbol(swapper_pg_dir);
+   mem_size = __pa_symbol(swapper_pg_dir) + PAGE_SIZE
+   - (__pa_symbol(idmap_pg_dir) + IDMAP_DIR_SIZE);
+
+   if (mem_size == PAGE_SIZE) {
+   pgd_phys = early_pgtable_alloc();
+   __pa_swapper_pg_dir = pgd_phys;
+   } else {
+   phys_addr_t p;
+
+   pgd_phys = memblock_alloc(mem_size, PAGE_SIZE);
+
+   for (p = pgd_phys; p < pgd_phys + mem_size; p += PAGE_SIZE)
+   clear_page_phys(p);
+
+   #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+   new_tramp_pg_dir = __va(pgd_phys);
+   #endif
+   __pa_swapper_pg_dir = pgd_phys + mem_size - PAGE_SIZE;
+   }
 
/*
 * We need to clean '__pa_swapper_pg_dir' to the PoC, so that
@@ -649,31 +678,25 @@ void __init paging_init(void)
__flush_dcache_area((void *)&__pa_swapper_pg_dir,
sizeof(__pa_swapper_pg_dir));
 
+   new_swapper_pg_dir = __va(__pa_swapper_pg_dir);
+
+   pgdp = pgd_set_fixmap(__pa_swapper_pg_dir);
+
map_kernel(pgdp);
map_mem(pgdp);
 
-   /*
-* We want to reuse the original swapper_pg_dir so we don't have to
-* communicate the new address to non-coherent secondaries in
-* secondary_entry, and so cpu_switch_mm can generate the address with
-* adrp+add rather than a load from some global variable.
-*
-* To do this we need to go via a temporary pgd.
-*/
-   cpu_replace_ttbr1(pgd_phys);
-   memcpy(swapper_pg_dir, pgdp, PGD_SIZE);
cpu_replace_ttbr1(__pa_swapper_pg_dir);
+   init_mm.pgd = new_swapper_pg_dir;
 
pgd_clear_fixmap();
-   memblock_free(pgd_phys, PAGE_SIZE);
 
-   /*
-* We only reuse the PGD from the swapper_pg_dir,

[PATCH v3 3/3] arm64/mm: migrate swapper_pg_dir and tramp_pg_dir

2018-06-05 Thread Jun Yao

Migrate swapper_pg_dir and tramp_pg_dir. And their virtual addresses
do not correlate with kernel's address.

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/pgtable.h |  1 +
 arch/arm64/mm/mmu.c  | 79 +---
 2 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2bda899dcf22..b032d6c2e390 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -722,6 +722,7 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
+extern pgd_t reserved_ttbr0[PTRS_PER_PGD];
 extern volatile phys_addr_t __pa_swapper_pg_dir;
 extern pgd_t *new_swapper_pg_dir;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 94056e064c6f..ba0b55158971 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -63,6 +63,9 @@ volatile phys_addr_t __section(".mmuoff.data.read")
 __pa_swapper_pg_dir;
 
 pgd_t *new_swapper_pg_dir = swapper_pg_dir;
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+pgd_t *new_tramp_pg_dir;
+#endif
 
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
@@ -86,19 +89,14 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned 
long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(void)
+static void __init clear_page_phys(phys_addr_t phys)
 {
-   phys_addr_t phys;
-   void *ptr;
-
-   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
-
/*
 * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
 * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
 * any level of table.
 */
-   ptr = pte_set_fixmap(phys);
+   void *ptr = pte_set_fixmap(phys);
 
memset(ptr, 0, PAGE_SIZE);
 
@@ -107,6 +105,14 @@ static phys_addr_t __init early_pgtable_alloc(void)
 * table walker
 */
pte_clear_fixmap();
+}
+
+static phys_addr_t __init early_pgtable_alloc(void)
+{
+   phys_addr_t phys;
+
+   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+   clear_page_phys(phys);
 
return phys;
 }
@@ -560,6 +566,10 @@ static int __init map_entry_trampoline(void)
__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
 prot, pgd_pgtable_alloc, 0);
 
+   memcpy(new_tramp_pg_dir, tramp_pg_dir, PGD_SIZE);
+   memblock_free(__pa_symbol(tramp_pg_dir),
+   __pa_symbol(swapper_pg_dir) - __pa_symbol(tramp_pg_dir));
+
/* Map both the text and data into the kernel page table */
__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
@@ -637,10 +647,29 @@ static void __init map_kernel(pgd_t *pgdp)
  */
 void __init paging_init(void)
 {
-   phys_addr_t pgd_phys = early_pgtable_alloc();
-   pgd_t *pgdp = pgd_set_fixmap(pgd_phys);
+   phys_addr_t pgd_phys;
+   pgd_t *pgdp;
+   phys_addr_t mem_size;
 
-   __pa_swapper_pg_dir = __pa_symbol(swapper_pg_dir);
+   mem_size = __pa_symbol(swapper_pg_dir) + PAGE_SIZE
+   - (__pa_symbol(idmap_pg_dir) + IDMAP_DIR_SIZE);
+
+   if (mem_size == PAGE_SIZE) {
+   pgd_phys = early_pgtable_alloc();
+   __pa_swapper_pg_dir = pgd_phys;
+   } else {
+   phys_addr_t p;
+
+   pgd_phys = memblock_alloc(mem_size, PAGE_SIZE);
+
+   for (p = pgd_phys; p < pgd_phys + mem_size; p += PAGE_SIZE)
+   clear_page_phys(p);
+
+   #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+   new_tramp_pg_dir = __va(pgd_phys);
+   #endif
+   __pa_swapper_pg_dir = pgd_phys + mem_size - PAGE_SIZE;
+   }
 
/*
 * We need to clean '__pa_swapper_pg_dir' to the PoC, so that
@@ -649,31 +678,25 @@ void __init paging_init(void)
__flush_dcache_area((void *)&__pa_swapper_pg_dir,
sizeof(__pa_swapper_pg_dir));
 
+   new_swapper_pg_dir = __va(__pa_swapper_pg_dir);
+
+   pgdp = pgd_set_fixmap(__pa_swapper_pg_dir);
+
map_kernel(pgdp);
map_mem(pgdp);
 
-   /*
-* We want to reuse the original swapper_pg_dir so we don't have to
-* communicate the new address to non-coherent secondaries in
-* secondary_entry, and so cpu_switch_mm can generate the address with
-* adrp+add rather than a load from some global variable.
-*
-* To do this we need to go via a temporary pgd.
-*/
-   cpu_replace_ttbr1(pgd_phys);
-   memcpy(swapper_pg_dir, pgdp, PGD_SIZE);
cpu_replace_ttbr1(__pa_swapper_pg_dir);
+   init_mm.pgd = new_swapper_pg_dir;
 
pgd_clear_fixmap();
-   memblock_free(pgd_phys, PAGE_SIZE);
 
-   /*
-* We only reuse the PGD from the swapper_pg_dir,

[PATCH v3 0/3] arm64/mm: migrate swapper_pg_dir

2018-06-05 Thread Jun Yao

Version 3 changes:
* Fix memory leak problem with CONFIG_ARM64_SW_TTBR0_PAN
* add comment explaining why the flush is needed and also why
  __pa_swapper_pg_dir gets placed in the .mmuoff.data.read
  section.

[v2] https://www.spinics.net/lists/arm-kernel/msg657549.html
[v1] https://www.spinics.net/lists/kernel/msg2819351.html

Jun Yao (3):
  arm64/mm: pass swapper_pg_dir as an argument to __enable_mmu()
  arm64/mm: introduce variable to save new swapper_pg_dir address
  arm64/mm: migrate swapper_pg_dir and tramp_pg_dir

 arch/arm64/include/asm/mmu_context.h |  6 +-
 arch/arm64/include/asm/pgtable.h |  3 +
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/head.S |  6 +-
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/sleep.S|  1 +
 arch/arm64/mm/kasan_init.c   |  6 +-
 arch/arm64/mm/mmu.c  | 97 
 8 files changed, 84 insertions(+), 39 deletions(-)

-- 
2.17.0

[PATCH v3 2/3] arm64/mm: introduce variable to save new swapper_pg_dir address

2018-06-05 Thread Jun Yao

Prepare for migrating swapper_pg_dir, introduce new_swapper_pg_dir
to save virtual address of swapper_pg_dir.

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/mmu_context.h | 2 +-
 arch/arm64/include/asm/pgtable.h | 1 +
 arch/arm64/mm/kasan_init.c   | 2 +-
 arch/arm64/mm/mmu.c  | 2 ++
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 3eddb871f251..481c2d16adeb 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -57,7 +57,7 @@ static inline void cpu_set_reserved_ttbr0(void)
 
 static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-   BUG_ON(pgd == swapper_pg_dir);
+   BUG_ON(pgd == new_swapper_pg_dir);
cpu_set_reserved_ttbr0();
cpu_do_switch_mm(virt_to_phys(pgd),mm);
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 519ab5581b08..2bda899dcf22 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -723,6 +723,7 @@ extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
 extern volatile phys_addr_t __pa_swapper_pg_dir;
+extern pgd_t *new_swapper_pg_dir;
 
 /*
  * Encode and decode a swap entry:
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index dd4f28c19165..08bcaae4725e 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -197,7 +197,7 @@ void __init kasan_init(void)
 * tmp_pg_dir used to keep early shadow mapped until full shadow
 * setup will be finished.
 */
-   memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
+   memcpy(tmp_pg_dir, new_swapper_pg_dir, sizeof(tmp_pg_dir));
dsb(ishst);
cpu_replace_ttbr1(__pa_symbol(tmp_pg_dir));
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d518e27792ef..94056e064c6f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -62,6 +62,8 @@ EXPORT_SYMBOL(kimage_voffset);
 volatile phys_addr_t __section(".mmuoff.data.read")
 __pa_swapper_pg_dir;
 
+pgd_t *new_swapper_pg_dir = swapper_pg_dir;
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.17.0

[PATCH v3 2/3] arm64/mm: introduce variable to save new swapper_pg_dir address

2018-06-05 Thread Jun Yao

Prepare for migrating swapper_pg_dir, introduce new_swapper_pg_dir
to save virtual address of swapper_pg_dir.

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/mmu_context.h | 2 +-
 arch/arm64/include/asm/pgtable.h | 1 +
 arch/arm64/mm/kasan_init.c   | 2 +-
 arch/arm64/mm/mmu.c  | 2 ++
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 3eddb871f251..481c2d16adeb 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -57,7 +57,7 @@ static inline void cpu_set_reserved_ttbr0(void)
 
 static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-   BUG_ON(pgd == swapper_pg_dir);
+   BUG_ON(pgd == new_swapper_pg_dir);
cpu_set_reserved_ttbr0();
cpu_do_switch_mm(virt_to_phys(pgd),mm);
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 519ab5581b08..2bda899dcf22 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -723,6 +723,7 @@ extern pgd_t swapper_pg_end[];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
 extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
 extern volatile phys_addr_t __pa_swapper_pg_dir;
+extern pgd_t *new_swapper_pg_dir;
 
 /*
  * Encode and decode a swap entry:
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index dd4f28c19165..08bcaae4725e 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -197,7 +197,7 @@ void __init kasan_init(void)
 * tmp_pg_dir used to keep early shadow mapped until full shadow
 * setup will be finished.
 */
-   memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
+   memcpy(tmp_pg_dir, new_swapper_pg_dir, sizeof(tmp_pg_dir));
dsb(ishst);
cpu_replace_ttbr1(__pa_symbol(tmp_pg_dir));
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d518e27792ef..94056e064c6f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -62,6 +62,8 @@ EXPORT_SYMBOL(kimage_voffset);
 volatile phys_addr_t __section(".mmuoff.data.read")
 __pa_swapper_pg_dir;
 
+pgd_t *new_swapper_pg_dir = swapper_pg_dir;
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.17.0

[PATCH v3 0/3] arm64/mm: migrate swapper_pg_dir

2018-06-05 Thread Jun Yao

Version 3 changes:
* Fix memory leak problem with CONFIG_ARM64_SW_TTBR0_PAN
* add comment explaining why the flush is needed and also why
  __pa_swapper_pg_dir gets placed in the .mmuoff.data.read
  section.

[v2] https://www.spinics.net/lists/arm-kernel/msg657549.html
[v1] https://www.spinics.net/lists/kernel/msg2819351.html

Jun Yao (3):
  arm64/mm: pass swapper_pg_dir as an argument to __enable_mmu()
  arm64/mm: introduce variable to save new swapper_pg_dir address
  arm64/mm: migrate swapper_pg_dir and tramp_pg_dir

 arch/arm64/include/asm/mmu_context.h |  6 +-
 arch/arm64/include/asm/pgtable.h |  3 +
 arch/arm64/kernel/cpufeature.c   |  2 +-
 arch/arm64/kernel/head.S |  6 +-
 arch/arm64/kernel/hibernate.c|  2 +-
 arch/arm64/kernel/sleep.S|  1 +
 arch/arm64/mm/kasan_init.c   |  6 +-
 arch/arm64/mm/mmu.c  | 97 
 8 files changed, 84 insertions(+), 39 deletions(-)

-- 
2.17.0

Re: [PATCH V5] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Viresh Kumar

On 05-06-18, 16:54, Daniel Lezcano wrote:
> On 05/06/2018 12:39, Viresh Kumar wrote:
> I don't think you are doing a mistake. Even if this can happen
> theoretically, I don't think practically that is the case.
> 
> The play_idle() has 1ms minimum sleep time.
> 
> The scenario you are describing means:
> 
> 1. the loop in idle_injection_wakeup() takes more than 1ms to achieve

There are many ways in which idle_injection_wakeup() can get called.

- from hrtimer handler, this happens in softirq context, right? So interrupts
  can still block the handler to run ?

- from idle_injection_start(), process context. RT or DL or IRQ activity can
  block the CPU for long durations sometimes.

> 2. at the same time, the user of the idle injection unregisters while
> the idle injection is acting precisely at CPU0 and exits before another
> task was wakeup by the loop in 1. more than 1ms after.
> 
> >From my POV, this scenario can't happen.

Maybe something else needs to be buggy as well to make this crap happen.

> Anyway, we must write rock solid code

That's my point.

> so may be we can use a refcount to
> protect against that, so instead of freeing in unregister, we refput the
> ii_dev pointer.

I think the solution can be a simple change in implementation of
idle_injection_wakeup(), something like this..

+static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
+{
+   struct idle_injection_thread *iit;
+   int cpu;
+
+   for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask)
+   atomic_inc(_dev->count);
+
+   mb(); //I am not sure but I think we need some kind of barrier here ?
+
+   for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
+   iit = per_cpu_ptr(_injection_thread, cpu);
+   iit->should_run = 1;
+   wake_up_process(iit->tsk);
+   }
+}

-- 
viresh

Re: [PATCH V5] powercap/drivers/idle_injection: Add an idle injection framework

2018-06-05 Thread Viresh Kumar

On 05-06-18, 16:54, Daniel Lezcano wrote:
> On 05/06/2018 12:39, Viresh Kumar wrote:
> I don't think you are doing a mistake. Even if this can happen
> theoretically, I don't think practically that is the case.
> 
> The play_idle() has 1ms minimum sleep time.
> 
> The scenario you are describing means:
> 
> 1. the loop in idle_injection_wakeup() takes more than 1ms to achieve

There are many ways in which idle_injection_wakeup() can get called.

- from hrtimer handler, this happens in softirq context, right? So interrupts
  can still block the handler to run ?

- from idle_injection_start(), process context. RT or DL or IRQ activity can
  block the CPU for long durations sometimes.

> 2. at the same time, the user of the idle injection unregisters while
> the idle injection is acting precisely at CPU0 and exits before another
> task was wakeup by the loop in 1. more than 1ms after.
> 
> >From my POV, this scenario can't happen.

Maybe something else needs to be buggy as well to make this crap happen.

> Anyway, we must write rock solid code

That's my point.

> so may be we can use a refcount to
> protect against that, so instead of freeing in unregister, we refput the
> ii_dev pointer.

I think the solution can be a simple change in implementation of
idle_injection_wakeup(), something like this..

+static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
+{
+   struct idle_injection_thread *iit;
+   int cpu;
+
+   for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask)
+   atomic_inc(_dev->count);
+
+   mb(); //I am not sure but I think we need some kind of barrier here ?
+
+   for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
+   iit = per_cpu_ptr(_injection_thread, cpu);
+   iit->should_run = 1;
+   wake_up_process(iit->tsk);
+   }
+}

-- 
viresh

Re: [PATCH] cpufreq: kryo: allow building as a loadable module

2018-06-05 Thread Viresh Kumar

On 05-06-18, 13:44, Arnd Bergmann wrote:
> Building the kryo cpufreq driver while QCOM_SMEM is a loadable module
> results in a link error:
> 
> drivers/cpufreq/qcom-cpufreq-kryo.o: In function `qcom_cpufreq_kryo_probe':
> qcom-cpufreq-kryo.c:(.text+0xbc): undefined reference to `qcom_smem_get'
> 
> The problem is that Kconfig ignores interprets the dependency as met
> when the dependent symbol is a 'bool' one. By making it 'tristate',
> it will be forced to be a module here, which builds successfully.
> 
> Fixes: 46e2856b8e18 ("cpufreq: Add Kryo CPU scaling driver")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/cpufreq/Kconfig.arm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index c7ce928fbf1f..52f5f1a2040c 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -125,7 +125,7 @@ config ARM_OMAP2PLUS_CPUFREQ
>   default ARCH_OMAP2PLUS
>  
>  config ARM_QCOM_CPUFREQ_KRYO
> - bool "Qualcomm Kryo based CPUFreq"
> + tristate "Qualcomm Kryo based CPUFreq"
>   depends on ARM64
>   depends on QCOM_QFPROM
>   depends on QCOM_SMEM

Okay, so we really need this to be a module. But the driver can't really work as
a module right now if we do this: insmod, rmmod, insmod. Because it doesn't free
resources at rmmmod and will fail on second insmod.

Because what you are fixing is a critical build error, we better get it merged
right now.

Acked-by: Viresh Kumar 

But Ilia needs to cook another patch to add the module removal code for the
driver and mark your patch's commit id in the fixes tag.

-- 
viresh

Re: [PATCH] cpufreq: kryo: allow building as a loadable module

2018-06-05 Thread Viresh Kumar

On 05-06-18, 13:44, Arnd Bergmann wrote:
> Building the kryo cpufreq driver while QCOM_SMEM is a loadable module
> results in a link error:
> 
> drivers/cpufreq/qcom-cpufreq-kryo.o: In function `qcom_cpufreq_kryo_probe':
> qcom-cpufreq-kryo.c:(.text+0xbc): undefined reference to `qcom_smem_get'
> 
> The problem is that Kconfig ignores interprets the dependency as met
> when the dependent symbol is a 'bool' one. By making it 'tristate',
> it will be forced to be a module here, which builds successfully.
> 
> Fixes: 46e2856b8e18 ("cpufreq: Add Kryo CPU scaling driver")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/cpufreq/Kconfig.arm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index c7ce928fbf1f..52f5f1a2040c 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -125,7 +125,7 @@ config ARM_OMAP2PLUS_CPUFREQ
>   default ARCH_OMAP2PLUS
>  
>  config ARM_QCOM_CPUFREQ_KRYO
> - bool "Qualcomm Kryo based CPUFreq"
> + tristate "Qualcomm Kryo based CPUFreq"
>   depends on ARM64
>   depends on QCOM_QFPROM
>   depends on QCOM_SMEM

Okay, so we really need this to be a module. But the driver can't really work as
a module right now if we do this: insmod, rmmod, insmod. Because it doesn't free
resources at rmmmod and will fail on second insmod.

Because what you are fixing is a critical build error, we better get it merged
right now.

Acked-by: Viresh Kumar 

But Ilia needs to cook another patch to add the module removal code for the
driver and mark your patch's commit id in the fixes tag.

-- 
viresh

Re: [patch 3/8] x86/apic: Provide apic_ack_irq()

2018-06-05 Thread Dou Liyang


Hi Thomas,

At 06/05/2018 07:41 PM, Thomas Gleixner wrote:

On Tue, 5 Jun 2018, Dou Liyang wrote:

+{
+   if (unlikely(irqd_is_setaffinity_pending(irqd)))


Affinity pending is also judged in


+   irq_move_irq(irqd);


If we can remove the if(...) statement here


That requires to fix all call sites in ia64 and that's why I didn't.  But


I didn't express clearly, I meant remove the if(...) statement from
apic_ack_irq(), it doesn't require to fix the call sites in ia64.

+void apic_ack_irq(struct irq_data *irqd)
+{
+   irq_move_irq(irqd);
+   ack_APIC_irq();
+}

BTW, If apic_ack_irq() can accept _any_ irq_data when hierarchical
irqdomains are enabled[1]? If it is true, If there is a situation in
the original code that we should avoid:

  If the top-level irq_data has the IRQD_SETAFFINITY_PENDING state, but
  non-top-level irq_data state not, when using non-top-level irq_data in
  apic_ack_irq(), we may skip the irq_move_irq() which we should call.

[1] commit 77ed42f18edd("genirq: Prevent crash in irq_move_irq()")


we can make irq_move_irq() an inline function and have the check in the
inline.



I don't know why do we need to make irq_move_irq() an inline function.

And, yes, irq_move_irq() has already had the check

...
if (likely(!irqd_is_setaffinity_pending(idata)))
return;
...

Thanks,
dou

Re: [patch 3/8] x86/apic: Provide apic_ack_irq()

2018-06-05 Thread Dou Liyang


Hi Thomas,

At 06/05/2018 07:41 PM, Thomas Gleixner wrote:

On Tue, 5 Jun 2018, Dou Liyang wrote:

+{
+   if (unlikely(irqd_is_setaffinity_pending(irqd)))


Affinity pending is also judged in


+   irq_move_irq(irqd);


If we can remove the if(...) statement here


That requires to fix all call sites in ia64 and that's why I didn't.  But


I didn't express clearly, I meant remove the if(...) statement from
apic_ack_irq(), it doesn't require to fix the call sites in ia64.

+void apic_ack_irq(struct irq_data *irqd)
+{
+   irq_move_irq(irqd);
+   ack_APIC_irq();
+}

BTW, If apic_ack_irq() can accept _any_ irq_data when hierarchical
irqdomains are enabled[1]? If it is true, If there is a situation in
the original code that we should avoid:

  If the top-level irq_data has the IRQD_SETAFFINITY_PENDING state, but
  non-top-level irq_data state not, when using non-top-level irq_data in
  apic_ack_irq(), we may skip the irq_move_irq() which we should call.

[1] commit 77ed42f18edd("genirq: Prevent crash in irq_move_irq()")


we can make irq_move_irq() an inline function and have the check in the
inline.



I don't know why do we need to make irq_move_irq() an inline function.

And, yes, irq_move_irq() has already had the check

...
if (likely(!irqd_is_setaffinity_pending(idata)))
return;
...

Thanks,
dou

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Masahiro Yamada

Hi Linus,

2018-06-06 11:19 GMT+09:00 Linus Torvalds :
> On Tue, Jun 5, 2018 at 6:54 PM Linus Torvalds
>  wrote:
>>
>> But once you *have* that particular Kconfig, I do think that "make
>> oldconfig" should just work. And it apparently used to.
>>
>> So I think this is a behavioral regression.
>
> That doesn't necessarily mean that he fix should be to revert.


If this is a regression, I am OK with the revert,
and it is the only quick solution.



> Maybe the fix is to simply change how we generate the ARCH variable.
>
> Right now, in the Makefile, it is
>
> ARCH ?= $(SUBARCH)
>
> so basically "if the user didn't specify ARCH, we pick it from SUBARCH".
>
> But that doesn't make much sense for "make oldconfig" does it?
>
> So maybe we could make the rule be that if the user didn't specify
> ARCH explicitly, we take it from SUBARCH, _except_ if we are doing
> "make oldconfig", in which case we take it from the .config file.
>
> That makes a certain amount of sense, wouldn't you agree? Doing
> "oldconfig" and silently changing ARCH under the user seems pretty
> user-hostile.
>
> In fact, I think it would _always_ make sense to take ARCH from the
> config file, _unless_ we're actively generating a new config file
> entirely (ie "make *config", not counting "oldconfig").
>
> Hmm?
>
> Linus


This is a big hammer.

It is difficult to make a quick answer.


In fact, I saw a patch series a few years ago.

https://lkml.org/lkml/2014/9/1/70

It was not accepted.
(I was not a maintainer at that time)

I do not remember the details,
but I thought it was a double-edged sword.




-- 
Best Regards
Masahiro Yamada

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Masahiro Yamada

Hi Linus,

2018-06-06 11:19 GMT+09:00 Linus Torvalds :
> On Tue, Jun 5, 2018 at 6:54 PM Linus Torvalds
>  wrote:
>>
>> But once you *have* that particular Kconfig, I do think that "make
>> oldconfig" should just work. And it apparently used to.
>>
>> So I think this is a behavioral regression.
>
> That doesn't necessarily mean that he fix should be to revert.


If this is a regression, I am OK with the revert,
and it is the only quick solution.



> Maybe the fix is to simply change how we generate the ARCH variable.
>
> Right now, in the Makefile, it is
>
> ARCH ?= $(SUBARCH)
>
> so basically "if the user didn't specify ARCH, we pick it from SUBARCH".
>
> But that doesn't make much sense for "make oldconfig" does it?
>
> So maybe we could make the rule be that if the user didn't specify
> ARCH explicitly, we take it from SUBARCH, _except_ if we are doing
> "make oldconfig", in which case we take it from the .config file.
>
> That makes a certain amount of sense, wouldn't you agree? Doing
> "oldconfig" and silently changing ARCH under the user seems pretty
> user-hostile.
>
> In fact, I think it would _always_ make sense to take ARCH from the
> config file, _unless_ we're actively generating a new config file
> entirely (ie "make *config", not counting "oldconfig").
>
> Hmm?
>
> Linus


This is a big hammer.

It is difficult to make a quick answer.


In fact, I saw a patch series a few years ago.

https://lkml.org/lkml/2014/9/1/70

It was not accepted.
(I was not a maintainer at that time)

I do not remember the details,
but I thought it was a double-edged sword.




-- 
Best Regards
Masahiro Yamada

[no subject]

2018-06-05 Thread Sgt Sherri Gallagher

Please reply me back i have something to tell u I am Sgt.Sherri Gallagher.

[no subject]

2018-06-05 Thread Sgt Sherri Gallagher

Please reply me back i have something to tell u I am Sgt.Sherri Gallagher.

Re: [PATCH V2] xfs: fix string handling in get/set functions

2018-06-05 Thread Darrick J. Wong

On Tue, Jun 05, 2018 at 02:49:20PM -0500, Eric Sandeen wrote:
> From: Arnd Bergmann 
> 
> [sandeen: fix subject, avoid copy-out of uninit data in getlabel]
> 
> gcc-8 reports two warnings for the newly added getlabel/setlabel code:
> 
> fs/xfs/xfs_ioctl.c: In function 'xfs_ioc_getlabel':
> fs/xfs/xfs_ioctl.c:1822:38: error: argument to 'sizeof' in 'strncpy' call is 
> the same expression as the source; did you mean to use the size of the 
> destination? [-Werror=sizeof-pointer-memaccess]
>   strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname));
>   ^
> In function 'strncpy',
> inlined from 'xfs_ioc_setlabel' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1863:2,
> inlined from 'xfs_file_ioctl' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1918:10:
> include/linux/string.h:254:9: error: '__builtin_strncpy' output may be 
> truncated copying 12 bytes from a string of length 12 
> [-Werror=stringop-truncation]
>   return __builtin_strncpy(p, q, size);
> 
> In both cases, part of the problem is that one of the strncpy()
> arguments is a fixed-length character array with zero-padding rather
> than a zero-terminated string. In the first one case, we also get an
> odd warning about sizeof-pointer-memaccess, which doesn't seem right
> (the sizeof is for an array that happens to be the same as the second
> strncpy argument).
> 
> To work around the bogus warning, I use a plain 'XFSLABEL_MAX' for
> the strncpy() length when copying the label in getlabel. For setlabel(),
> using memcpy() with the correct length that is already known avoids
> the second warning and is slightly simpler.
> 
> In a related issue, it appears that we accidentally skip the trailing
> \0 when copying a 12-character label back to user space in getlabel().
> Using the correct sizeof() argument here copies the extra character.
> 
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85602
> Fixes: f7664b31975b ("xfs: implement online get/set fs label")
> Cc: Eric Sandeen 
> Cc: Martin Sebor 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Eric Sandeen 

Working around strncpy warnings with memcpy?  I guess...

Reviewed-by: Darrick J. Wong 

--D

> ---
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 82f7c83c1dad..596e176c19a6 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1828,13 +1828,13 @@ xfs_ioc_getlabel(
>   /* Paranoia */
>   BUILD_BUG_ON(sizeof(sbp->sb_fname) > FSLABEL_MAX);
>  
> + /* 1 larger than sb_fname, so this ensures a trailing NUL char */
> + memset(label, 0, sizeof(label));
>   spin_lock(>m_sb_lock);
> - strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname));
> + strncpy(label, sbp->sb_fname, XFSLABEL_MAX);
>   spin_unlock(>m_sb_lock);
>  
> - /* xfs on-disk label is 12 chars, be sure we send a null to user */
> - label[XFSLABEL_MAX] = '\0';
> - if (copy_to_user(user_label, label, sizeof(sbp->sb_fname)))
> + if (copy_to_user(user_label, label, sizeof(label)))
>   return -EFAULT;
>   return 0;
>  }
> @@ -1870,7 +1870,7 @@ xfs_ioc_setlabel(
>  
>   spin_lock(>m_sb_lock);
>   memset(sbp->sb_fname, 0, sizeof(sbp->sb_fname));
> - strncpy(sbp->sb_fname, label, sizeof(sbp->sb_fname));
> + memcpy(sbp->sb_fname, label, len);
>   spin_unlock(>m_sb_lock);
>  
>   /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] xfs: fix string handling in get/set functions

2018-06-05 Thread Darrick J. Wong

On Tue, Jun 05, 2018 at 02:49:20PM -0500, Eric Sandeen wrote:
> From: Arnd Bergmann 
> 
> [sandeen: fix subject, avoid copy-out of uninit data in getlabel]
> 
> gcc-8 reports two warnings for the newly added getlabel/setlabel code:
> 
> fs/xfs/xfs_ioctl.c: In function 'xfs_ioc_getlabel':
> fs/xfs/xfs_ioctl.c:1822:38: error: argument to 'sizeof' in 'strncpy' call is 
> the same expression as the source; did you mean to use the size of the 
> destination? [-Werror=sizeof-pointer-memaccess]
>   strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname));
>   ^
> In function 'strncpy',
> inlined from 'xfs_ioc_setlabel' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1863:2,
> inlined from 'xfs_file_ioctl' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1918:10:
> include/linux/string.h:254:9: error: '__builtin_strncpy' output may be 
> truncated copying 12 bytes from a string of length 12 
> [-Werror=stringop-truncation]
>   return __builtin_strncpy(p, q, size);
> 
> In both cases, part of the problem is that one of the strncpy()
> arguments is a fixed-length character array with zero-padding rather
> than a zero-terminated string. In the first one case, we also get an
> odd warning about sizeof-pointer-memaccess, which doesn't seem right
> (the sizeof is for an array that happens to be the same as the second
> strncpy argument).
> 
> To work around the bogus warning, I use a plain 'XFSLABEL_MAX' for
> the strncpy() length when copying the label in getlabel. For setlabel(),
> using memcpy() with the correct length that is already known avoids
> the second warning and is slightly simpler.
> 
> In a related issue, it appears that we accidentally skip the trailing
> \0 when copying a 12-character label back to user space in getlabel().
> Using the correct sizeof() argument here copies the extra character.
> 
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85602
> Fixes: f7664b31975b ("xfs: implement online get/set fs label")
> Cc: Eric Sandeen 
> Cc: Martin Sebor 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Eric Sandeen 

Working around strncpy warnings with memcpy?  I guess...

Reviewed-by: Darrick J. Wong 

--D

> ---
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 82f7c83c1dad..596e176c19a6 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1828,13 +1828,13 @@ xfs_ioc_getlabel(
>   /* Paranoia */
>   BUILD_BUG_ON(sizeof(sbp->sb_fname) > FSLABEL_MAX);
>  
> + /* 1 larger than sb_fname, so this ensures a trailing NUL char */
> + memset(label, 0, sizeof(label));
>   spin_lock(>m_sb_lock);
> - strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname));
> + strncpy(label, sbp->sb_fname, XFSLABEL_MAX);
>   spin_unlock(>m_sb_lock);
>  
> - /* xfs on-disk label is 12 chars, be sure we send a null to user */
> - label[XFSLABEL_MAX] = '\0';
> - if (copy_to_user(user_label, label, sizeof(sbp->sb_fname)))
> + if (copy_to_user(user_label, label, sizeof(label)))
>   return -EFAULT;
>   return 0;
>  }
> @@ -1870,7 +1870,7 @@ xfs_ioc_setlabel(
>  
>   spin_lock(>m_sb_lock);
>   memset(sbp->sb_fname, 0, sizeof(sbp->sb_fname));
> - strncpy(sbp->sb_fname, label, sizeof(sbp->sb_fname));
> + memcpy(sbp->sb_fname, label, len);
>   spin_unlock(>m_sb_lock);
>  
>   /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-05 Thread Chris Chiu

On Tue, Jun 5, 2018 at 7:06 PM, Hans de Goede  wrote:
> Hi,
>
>
> On 05-06-18 12:46, Benjamin Berg wrote:
>>
>> Hey,
>>
>> On Tue, 2018-06-05 at 12:31 +0200, Hans de Goede wrote:
>>>
>>> On 05-06-18 12:14, Bastien Nocera wrote:

 On Tue, 2018-06-05 at 12:05 +0200, Hans de Goede wrote:
>
> On 05-06-18 11:58, Bastien Nocera wrote:
>>
>> [SNIP]
>
>
> Ok, so what are you suggestion, do you really want to hardcode
> the cycle behavior in the kernel as these 2 patches are doing,
> without any option to intervene from userspace?
>
> As mentioned before in the thread there are several example
> of the kernel deciding to handle key-presses itself, putting
> policy in the kernel and they have all ended poorly (think
> e.g. rfkill, acpi-video dealing with LC brightnesskey presses
> itself).
>
> I guess one thing we could do here is code out both solutions,
> have a module option which controls if we:
>
> 1) Handle this in the kernel as these patches do
> 2) Or send a new KEY_KBDILLUMCYCLE event
>
> Combined with a Kconfig option to select which is the default
> behavior. Then Endless can select 1 for now and then in
> Fedora (which defaults to Wayland now) we could default to
> 2. once all the code for handling 2 is in place.
>
> This is ugly (on the kernel side) but it might be the best
> compromise we can do.


 I don't really mind which option is used, I'm listing the problems with
 the different options. If you don't care about Xorg, then definitely go
 for adding a new key. Otherwise, processing it in the kernel is the
 least ugly, especially given that the key goes through the same driver
 that controls the brightness anyway. There's no crazy cross driver
 interaction as there was in the other cases you listed.
>>>
>>>
>>> Unfortunately not caring about Xorg is not really an option.
>>>
>>> Ok, new idea, how about we make g-s-d behavior upon detecting a
>>> KEY_KBDILLUMTOGGLE event configurable, if we're on a Mac do a
>>> toggle, otherwise do a cycle.
>>>
>>> Or we could do this through hwdb, then we could add a hwdb entry
>>> for this laptop setting the udev property to do a cycle instead of
>>> a toggle on receiving the keypress.
>>
>>
>> If we are adding hwdb entries anyway to control the userspace
>> interpretation of the TOGGLE key, then we could also add the new CYCLE
>> key and explicitly re-map it to TOGGLE. That requires slightly more
>> logic in hwdb, but it does mean that we could theoretically just drop
>> the workaround if we ever stop caring about Xorg.
>
>
> Hmm, interesting proposal, I say go for it :)
>
> Regards,
>
> Hans
>
>
>

So maybe the next stop is that I can follow Darren's suggestion to eliminate
the is_kbd_led_event() and send a v2 for review?

Re: [PATCH 1/2] platform/x86: asus-wmi: Call new led hw_changed API on kbd brightness change

2018-06-05 Thread Chris Chiu

On Tue, Jun 5, 2018 at 7:06 PM, Hans de Goede  wrote:
> Hi,
>
>
> On 05-06-18 12:46, Benjamin Berg wrote:
>>
>> Hey,
>>
>> On Tue, 2018-06-05 at 12:31 +0200, Hans de Goede wrote:
>>>
>>> On 05-06-18 12:14, Bastien Nocera wrote:

 On Tue, 2018-06-05 at 12:05 +0200, Hans de Goede wrote:
>
> On 05-06-18 11:58, Bastien Nocera wrote:
>>
>> [SNIP]
>
>
> Ok, so what are you suggestion, do you really want to hardcode
> the cycle behavior in the kernel as these 2 patches are doing,
> without any option to intervene from userspace?
>
> As mentioned before in the thread there are several example
> of the kernel deciding to handle key-presses itself, putting
> policy in the kernel and they have all ended poorly (think
> e.g. rfkill, acpi-video dealing with LC brightnesskey presses
> itself).
>
> I guess one thing we could do here is code out both solutions,
> have a module option which controls if we:
>
> 1) Handle this in the kernel as these patches do
> 2) Or send a new KEY_KBDILLUMCYCLE event
>
> Combined with a Kconfig option to select which is the default
> behavior. Then Endless can select 1 for now and then in
> Fedora (which defaults to Wayland now) we could default to
> 2. once all the code for handling 2 is in place.
>
> This is ugly (on the kernel side) but it might be the best
> compromise we can do.


 I don't really mind which option is used, I'm listing the problems with
 the different options. If you don't care about Xorg, then definitely go
 for adding a new key. Otherwise, processing it in the kernel is the
 least ugly, especially given that the key goes through the same driver
 that controls the brightness anyway. There's no crazy cross driver
 interaction as there was in the other cases you listed.
>>>
>>>
>>> Unfortunately not caring about Xorg is not really an option.
>>>
>>> Ok, new idea, how about we make g-s-d behavior upon detecting a
>>> KEY_KBDILLUMTOGGLE event configurable, if we're on a Mac do a
>>> toggle, otherwise do a cycle.
>>>
>>> Or we could do this through hwdb, then we could add a hwdb entry
>>> for this laptop setting the udev property to do a cycle instead of
>>> a toggle on receiving the keypress.
>>
>>
>> If we are adding hwdb entries anyway to control the userspace
>> interpretation of the TOGGLE key, then we could also add the new CYCLE
>> key and explicitly re-map it to TOGGLE. That requires slightly more
>> logic in hwdb, but it does mean that we could theoretically just drop
>> the workaround if we ever stop caring about Xorg.
>
>
> Hmm, interesting proposal, I say go for it :)
>
> Regards,
>
> Hans
>
>
>

So maybe the next stop is that I can follow Darren's suggestion to eliminate
the is_kbd_led_event() and send a v2 for review?

[PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout

2018-06-05 Thread Yang Yingliang

When the kernel booted with maxcpus=x, 'x' is smaller
than actual cpu numbers, the TAs of offline cpus won't
be set to its->collection.

If LPI is bind to offline cpu, sync cmd will use zero TA,
it leads to ITS queue timeout.  Fix this by choosing a
online cpu, if there is no online cpu in cpu_mask.

Signed-off-by: Yang Yingliang 
---
 drivers/irqchip/irq-gic-v3-its.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5416f2b..d8b9539 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain 
*domain,
cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 
/* Bind the LPI to the first possible CPU */
-   cpu = cpumask_first(cpu_mask);
+   cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(cpu_online_mask);
its_dev->event_map.col_map[event] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
 
@@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
 {
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-   int cpu = cpumask_first(mask_val);
+   int cpu = cpumask_first_and(mask_val, cpu_online_mask);
+
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(cpu_online_mask);
 
/*
 * Changing affinity is mega expensive, so let's be as lazy as
-- 
1.8.3

[PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout

2018-06-05 Thread Yang Yingliang

When the kernel booted with maxcpus=x, 'x' is smaller
than actual cpu numbers, the TAs of offline cpus won't
be set to its->collection.

If LPI is bind to offline cpu, sync cmd will use zero TA,
it leads to ITS queue timeout.  Fix this by choosing a
online cpu, if there is no online cpu in cpu_mask.

Signed-off-by: Yang Yingliang 
---
 drivers/irqchip/irq-gic-v3-its.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5416f2b..d8b9539 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain 
*domain,
cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 
/* Bind the LPI to the first possible CPU */
-   cpu = cpumask_first(cpu_mask);
+   cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(cpu_online_mask);
its_dev->event_map.col_map[event] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
 
@@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
 {
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-   int cpu = cpumask_first(mask_val);
+   int cpu = cpumask_first_and(mask_val, cpu_online_mask);
+
+   if (cpu >= nr_cpu_ids)
+   cpu = cpumask_first(cpu_online_mask);
 
/*
 * Changing affinity is mega expensive, so let's be as lazy as
-- 
1.8.3

ATENCIÓN

2018-06-05 Thread Sistemas administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono: 
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2018

¡gracias
Sistemas administrador

ATENCIÓN

2018-06-05 Thread Sistemas administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono: 
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2018

¡gracias
Sistemas administrador

[PATCH v3 03/21] Staging: gdm724x: use match_string() helper

2018-06-05 Thread Yisheng Xie

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Greg Kroah-Hartman 
Cc: Quytelda Kahja 
Cc: de...@driverdev.osuosl.org
Signed-off-by: Yisheng Xie 
---
v3:
 - no need to check input tty's index - per Greg
v2:
 - const DRIVER_STRING instead  - per Andy

 drivers/staging/gdm724x/gdm_tty.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/gdm724x/gdm_tty.c 
b/drivers/staging/gdm724x/gdm_tty.c
index 3cdebb8..29ac6b5 100644
--- a/drivers/staging/gdm724x/gdm_tty.c
+++ b/drivers/staging/gdm724x/gdm_tty.c
@@ -43,7 +43,7 @@
 static struct gdm *gdm_table[TTY_MAX_COUNT][GDM_TTY_MINOR];
 static DEFINE_MUTEX(gdm_table_lock);

-static char *DRIVER_STRING[TTY_MAX_COUNT] = {"GCTATC", "GCTDM"};
+static const char *DRIVER_STRING[TTY_MAX_COUNT] = {"GCTATC", "GCTDM"};
 static char *DEVICE_STRING[TTY_MAX_COUNT] = {"GCT-ATC", "GCT-DM"};

 static void gdm_port_destruct(struct tty_port *port)
@@ -65,22 +65,14 @@ static int gdm_tty_install(struct tty_driver *driver, 
struct tty_struct *tty)
 {
struct gdm *gdm = NULL;
int ret;
-   int i;
-   int j;
-
-   j = GDM_TTY_MINOR;
-   for (i = 0; i < TTY_MAX_COUNT; i++) {
-   if (!strcmp(tty->driver->driver_name, DRIVER_STRING[i])) {
-   j = tty->index;
-   break;
-   }
-   }

-   if (j == GDM_TTY_MINOR)
+   ret = match_string(DRIVER_STRING, TTY_MAX_COUNT,
+  tty->driver->driver_name);
+   if (ret < 0)
return -ENODEV;

mutex_lock(_table_lock);
-   gdm = gdm_table[i][j];
+   gdm = gdm_table[ret][tty->index];
if (!gdm) {
mutex_unlock(_table_lock);
return -ENODEV;
-- 
1.7.12.4

[PATCH v3 03/21] Staging: gdm724x: use match_string() helper

2018-06-05 Thread Yisheng Xie

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Greg Kroah-Hartman 
Cc: Quytelda Kahja 
Cc: de...@driverdev.osuosl.org
Signed-off-by: Yisheng Xie 
---
v3:
 - no need to check input tty's index - per Greg
v2:
 - const DRIVER_STRING instead  - per Andy

 drivers/staging/gdm724x/gdm_tty.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/gdm724x/gdm_tty.c 
b/drivers/staging/gdm724x/gdm_tty.c
index 3cdebb8..29ac6b5 100644
--- a/drivers/staging/gdm724x/gdm_tty.c
+++ b/drivers/staging/gdm724x/gdm_tty.c
@@ -43,7 +43,7 @@
 static struct gdm *gdm_table[TTY_MAX_COUNT][GDM_TTY_MINOR];
 static DEFINE_MUTEX(gdm_table_lock);

-static char *DRIVER_STRING[TTY_MAX_COUNT] = {"GCTATC", "GCTDM"};
+static const char *DRIVER_STRING[TTY_MAX_COUNT] = {"GCTATC", "GCTDM"};
 static char *DEVICE_STRING[TTY_MAX_COUNT] = {"GCT-ATC", "GCT-DM"};

 static void gdm_port_destruct(struct tty_port *port)
@@ -65,22 +65,14 @@ static int gdm_tty_install(struct tty_driver *driver, 
struct tty_struct *tty)
 {
struct gdm *gdm = NULL;
int ret;
-   int i;
-   int j;
-
-   j = GDM_TTY_MINOR;
-   for (i = 0; i < TTY_MAX_COUNT; i++) {
-   if (!strcmp(tty->driver->driver_name, DRIVER_STRING[i])) {
-   j = tty->index;
-   break;
-   }
-   }

-   if (j == GDM_TTY_MINOR)
+   ret = match_string(DRIVER_STRING, TTY_MAX_COUNT,
+  tty->driver->driver_name);
+   if (ret < 0)
return -ENODEV;

mutex_lock(_table_lock);
-   gdm = gdm_table[i][j];
+   gdm = gdm_table[ret][tty->index];
if (!gdm) {
mutex_unlock(_table_lock);
return -ENODEV;
-- 
1.7.12.4

[PATCH v3 21/21] sparc64: use match_string() helper

2018-06-05 Thread Yisheng Xie

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: "David S. Miller" 
Cc: Anthony Yznaga 
Cc: Pavel Tatashin 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Yisheng Xie 
---
v3:
 - add string literal instead of NULL for array hwcaps to make it
   can use match_string() too.  - per Andy
v2
 - new add for use match_string() helper patchset.

 arch/sparc/kernel/setup_64.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 7944b3c..4f0ec0c 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -401,7 +401,7 @@ void __init start_early_boot(void)
 */
"mul32", "div32", "fsmuld", "v8plus", "popc", "vis", "vis2",
"ASIBlkInit", "fmaf", "vis3", "hpc", "random", "trans", "fjfmau",
-   "ima", "cspare", "pause", "cbcond", NULL /*reserved for crypto */,
+   "ima", "cspare", "pause", "cbcond", "resv" /*reserved for crypto */,
"adp",
 };

@@ -418,7 +418,7 @@ void cpucap_info(struct seq_file *m)
seq_puts(m, "cpucaps\t\t: ");
for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
unsigned long bit = 1UL << i;
-   if (hwcaps[i] && (caps & bit)) {
+   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit)) {
seq_printf(m, "%s%s",
   printed ? "," : "", hwcaps[i]);
printed++;
@@ -472,7 +472,7 @@ static void __init report_hwcaps(unsigned long caps)

for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
unsigned long bit = 1UL << i;
-   if (hwcaps[i] && (caps & bit))
+   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit))
report_one_hwcap(, hwcaps[i]);
}
if (caps & HWCAP_SPARC_CRYPTO)
@@ -504,18 +504,13 @@ static unsigned long __init mdesc_cpu_hwcap_list(void)
while (len) {
int i, plen;

-   for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
-   unsigned long bit = 1UL << i;
+   i = match_string(hwcaps, ARRAY_SIZE(hwcaps), prop);
+   if (i >= 0)
+   caps |= (1UL << i);

-   if (hwcaps[i] && !strcmp(prop, hwcaps[i])) {
-   caps |= bit;
-   break;
-   }
-   }
-   for (i = 0; i < ARRAY_SIZE(crypto_hwcaps); i++) {
-   if (!strcmp(prop, crypto_hwcaps[i]))
-   caps |= HWCAP_SPARC_CRYPTO;
-   }
+   i = match_string(crypto_hwcaps, ARRAY_SIZE(crypto_hwcaps), 
prop);
+   if (i >= 0)
+   caps |= HWCAP_SPARC_CRYPTO;

plen = strlen(prop) + 1;
prop += plen;
-- 
1.7.12.4

[PATCH v3 21/21] sparc64: use match_string() helper

2018-06-05 Thread Yisheng Xie

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: "David S. Miller" 
Cc: Anthony Yznaga 
Cc: Pavel Tatashin 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Yisheng Xie 
---
v3:
 - add string literal instead of NULL for array hwcaps to make it
   can use match_string() too.  - per Andy
v2
 - new add for use match_string() helper patchset.

 arch/sparc/kernel/setup_64.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 7944b3c..4f0ec0c 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -401,7 +401,7 @@ void __init start_early_boot(void)
 */
"mul32", "div32", "fsmuld", "v8plus", "popc", "vis", "vis2",
"ASIBlkInit", "fmaf", "vis3", "hpc", "random", "trans", "fjfmau",
-   "ima", "cspare", "pause", "cbcond", NULL /*reserved for crypto */,
+   "ima", "cspare", "pause", "cbcond", "resv" /*reserved for crypto */,
"adp",
 };

@@ -418,7 +418,7 @@ void cpucap_info(struct seq_file *m)
seq_puts(m, "cpucaps\t\t: ");
for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
unsigned long bit = 1UL << i;
-   if (hwcaps[i] && (caps & bit)) {
+   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit)) {
seq_printf(m, "%s%s",
   printed ? "," : "", hwcaps[i]);
printed++;
@@ -472,7 +472,7 @@ static void __init report_hwcaps(unsigned long caps)

for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
unsigned long bit = 1UL << i;
-   if (hwcaps[i] && (caps & bit))
+   if (bit != HWCAP_SPARC_CRYPTO && (caps & bit))
report_one_hwcap(, hwcaps[i]);
}
if (caps & HWCAP_SPARC_CRYPTO)
@@ -504,18 +504,13 @@ static unsigned long __init mdesc_cpu_hwcap_list(void)
while (len) {
int i, plen;

-   for (i = 0; i < ARRAY_SIZE(hwcaps); i++) {
-   unsigned long bit = 1UL << i;
+   i = match_string(hwcaps, ARRAY_SIZE(hwcaps), prop);
+   if (i >= 0)
+   caps |= (1UL << i);

-   if (hwcaps[i] && !strcmp(prop, hwcaps[i])) {
-   caps |= bit;
-   break;
-   }
-   }
-   for (i = 0; i < ARRAY_SIZE(crypto_hwcaps); i++) {
-   if (!strcmp(prop, crypto_hwcaps[i]))
-   caps |= HWCAP_SPARC_CRYPTO;
-   }
+   i = match_string(crypto_hwcaps, ARRAY_SIZE(crypto_hwcaps), 
prop);
+   if (i >= 0)
+   caps |= HWCAP_SPARC_CRYPTO;

plen = strlen(prop) + 1;
prop += plen;
-- 
1.7.12.4

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 6:54 PM Linus Torvalds
 wrote:
>
> But once you *have* that particular Kconfig, I do think that "make
> oldconfig" should just work. And it apparently used to.
>
> So I think this is a behavioral regression.

That doesn't necessarily mean that he fix should be to revert.

Maybe the fix is to simply change how we generate the ARCH variable.

Right now, in the Makefile, it is

ARCH ?= $(SUBARCH)

so basically "if the user didn't specify ARCH, we pick it from SUBARCH".

But that doesn't make much sense for "make oldconfig" does it?

So maybe we could make the rule be that if the user didn't specify
ARCH explicitly, we take it from SUBARCH, _except_ if we are doing
"make oldconfig", in which case we take it from the .config file.

That makes a certain amount of sense, wouldn't you agree? Doing
"oldconfig" and silently changing ARCH under the user seems pretty
user-hostile.

In fact, I think it would _always_ make sense to take ARCH from the
config file, _unless_ we're actively generating a new config file
entirely (ie "make *config", not counting "oldconfig").

Hmm?

Linus

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 6:54 PM Linus Torvalds
 wrote:
>
> But once you *have* that particular Kconfig, I do think that "make
> oldconfig" should just work. And it apparently used to.
>
> So I think this is a behavioral regression.

That doesn't necessarily mean that he fix should be to revert.

Maybe the fix is to simply change how we generate the ARCH variable.

Right now, in the Makefile, it is

ARCH ?= $(SUBARCH)

so basically "if the user didn't specify ARCH, we pick it from SUBARCH".

But that doesn't make much sense for "make oldconfig" does it?

So maybe we could make the rule be that if the user didn't specify
ARCH explicitly, we take it from SUBARCH, _except_ if we are doing
"make oldconfig", in which case we take it from the .config file.

That makes a certain amount of sense, wouldn't you agree? Doing
"oldconfig" and silently changing ARCH under the user seems pretty
user-hostile.

In fact, I think it would _always_ make sense to take ARCH from the
config file, _unless_ we're actively generating a new config file
entirely (ie "make *config", not counting "oldconfig").

Hmm?

Linus

Re: [PATCH v3 1/2] PCI: Avoid panic when PCI IO resource's size is not page aligned

2018-06-05 Thread Yisheng Xie

Hi Bjorn,

On 2018/6/6 7:53, Bjorn Helgaas wrote:
> On Tue, May 29, 2018 at 08:18:18PM +0800, Yisheng Xie wrote:
>> Zhou reported a bug on Hisilicon arm64 D06 platform with 64KB page size:
>>
>>  [2.470908] kernel BUG at lib/ioremap.c:72!
>>  [2.475079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>  [2.480551] Modules linked in:
>>  [2.483594] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>> 4.16.0-rc7-00062-g0b41260-dirty #23
>>  [2.491756] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI Nemo 
>> 2.0 RC0 - B120 03/23/2018
>>  [2.500614] pstate: 80c9 (Nzcv daif +PAN +UAO)
>>  [2.505395] pc : ioremap_page_range+0x268/0x36c
>>  [2.509912] lr : pci_remap_iospace+0xe4/0x100
>>  [...]
>>  [2.603733] Call trace:
>>  [2.606168]  ioremap_page_range+0x268/0x36c
>>  [2.610337]  pci_remap_iospace+0xe4/0x100
>>  [2.614334]  acpi_pci_probe_root_resources+0x1d4/0x214
>>  [2.619460]  pci_acpi_root_prepare_resources+0x18/0xa8
>>  [2.624585]  acpi_pci_root_create+0x98/0x214
>>  [2.628843]  pci_acpi_scan_root+0x124/0x20c
>>  [2.633013]  acpi_pci_root_add+0x224/0x494
>>  [2.637096]  acpi_bus_attach+0xf8/0x200
>>  [2.640918]  acpi_bus_attach+0x98/0x200
>>  [2.644740]  acpi_bus_attach+0x98/0x200
>>  [2.648562]  acpi_bus_scan+0x48/0x9c
>>  [2.652125]  acpi_scan_init+0x104/0x268
>>  [2.655948]  acpi_init+0x308/0x374
>>  [2.659337]  do_one_initcall+0x48/0x14c
>>  [2.663160]  kernel_init_freeable+0x19c/0x250
>>  [2.667504]  kernel_init+0x10/0x100
>>  [2.670979]  ret_from_fork+0x10/0x18
>>
>> The cause is the size of PCI IO resource is 32KB, which is 4K aligned but
>> not 64KB aligned, however, ioremap_page_range() request the range as page
>> aligned or it will trigger a BUG_ON() on ioremap_pte_range() it calls, as
>> ioremap_pte_range increase the addr by PAGE_SIZE, which makes addr != end
>> until trigger BUG_ON, if its incoming end is not page aligned. More detail
>> trace is as following:
>>
>>  ioremap_page_range
>>  -> ioremap_p4d_range
>> -> ioremap_p4d_range
>>-> ioremap_pud_range
>>   -> ioremap_pmd_range
>>  -> ioremap_pte_range
>>
>> This patch avoid panic by return -EINVAL if vaddr or resource size is not
>> page aligned.
>>
>> Reported-by: Zhou Wang 
>> Tested-by: Xiaojun Tan 
>> Signed-off-by: Yisheng Xie 
>> ---
>> v3:
>>  - pci_remap_iospace() sanitize its arguments instead - per Rafael
>>
>> v2:
>>  - Let the caller of ioremap_page_range() align the request by PAGE_SIZE - 
>> per Toshi
>>
>>  drivers/pci/pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index dbfe7c4..0eb0381 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3544,6 +3544,9 @@ int pci_remap_iospace(const struct resource *res, 
>> phys_addr_t phys_addr)
>>  if (res->end > IO_SPACE_LIMIT)
>>  return -EINVAL;
>>  
>> +if (!PAGE_ALIGNED(vaddr) || !PAGE_ALIGNED(resource_size(res)))
>> +return -EINVAL;
> 
> Most other callers of ioremap_page_range() are in the ioremap() path,
> and they align phys_addr themselves.  In some cases that results in a
> mapping that covers more than necessary.  For instance, see the
> function comment at the x86 version of __ioremap_caller().
> 
> Is there any reason we couldn't similarly align vaddr and phys_addr
> here?
> 
> The acpi_pci_probe_root_resources() path you mention above basically
> ignores the errors you're returning.  Your patches will avoid the
> panic, which is an improvement, but I/O port space will not work, and
> I don't see anything that gives the user a hint about why not.
> 
> If we could align vaddr and phys_addr (and possibly map more than
> necessary), I/O port space would still work.

Right, I will send another version, soon.

Thanks
Yisheng
> 
>>  return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
>>pgprot_device(PAGE_KERNEL));
>>  #else
>> -- 
>> 1.7.12.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>

Re: [PATCH v3 1/2] PCI: Avoid panic when PCI IO resource's size is not page aligned

2018-06-05 Thread Yisheng Xie

Hi Bjorn,

On 2018/6/6 7:53, Bjorn Helgaas wrote:
> On Tue, May 29, 2018 at 08:18:18PM +0800, Yisheng Xie wrote:
>> Zhou reported a bug on Hisilicon arm64 D06 platform with 64KB page size:
>>
>>  [2.470908] kernel BUG at lib/ioremap.c:72!
>>  [2.475079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>  [2.480551] Modules linked in:
>>  [2.483594] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>> 4.16.0-rc7-00062-g0b41260-dirty #23
>>  [2.491756] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI Nemo 
>> 2.0 RC0 - B120 03/23/2018
>>  [2.500614] pstate: 80c9 (Nzcv daif +PAN +UAO)
>>  [2.505395] pc : ioremap_page_range+0x268/0x36c
>>  [2.509912] lr : pci_remap_iospace+0xe4/0x100
>>  [...]
>>  [2.603733] Call trace:
>>  [2.606168]  ioremap_page_range+0x268/0x36c
>>  [2.610337]  pci_remap_iospace+0xe4/0x100
>>  [2.614334]  acpi_pci_probe_root_resources+0x1d4/0x214
>>  [2.619460]  pci_acpi_root_prepare_resources+0x18/0xa8
>>  [2.624585]  acpi_pci_root_create+0x98/0x214
>>  [2.628843]  pci_acpi_scan_root+0x124/0x20c
>>  [2.633013]  acpi_pci_root_add+0x224/0x494
>>  [2.637096]  acpi_bus_attach+0xf8/0x200
>>  [2.640918]  acpi_bus_attach+0x98/0x200
>>  [2.644740]  acpi_bus_attach+0x98/0x200
>>  [2.648562]  acpi_bus_scan+0x48/0x9c
>>  [2.652125]  acpi_scan_init+0x104/0x268
>>  [2.655948]  acpi_init+0x308/0x374
>>  [2.659337]  do_one_initcall+0x48/0x14c
>>  [2.663160]  kernel_init_freeable+0x19c/0x250
>>  [2.667504]  kernel_init+0x10/0x100
>>  [2.670979]  ret_from_fork+0x10/0x18
>>
>> The cause is the size of PCI IO resource is 32KB, which is 4K aligned but
>> not 64KB aligned, however, ioremap_page_range() request the range as page
>> aligned or it will trigger a BUG_ON() on ioremap_pte_range() it calls, as
>> ioremap_pte_range increase the addr by PAGE_SIZE, which makes addr != end
>> until trigger BUG_ON, if its incoming end is not page aligned. More detail
>> trace is as following:
>>
>>  ioremap_page_range
>>  -> ioremap_p4d_range
>> -> ioremap_p4d_range
>>-> ioremap_pud_range
>>   -> ioremap_pmd_range
>>  -> ioremap_pte_range
>>
>> This patch avoid panic by return -EINVAL if vaddr or resource size is not
>> page aligned.
>>
>> Reported-by: Zhou Wang 
>> Tested-by: Xiaojun Tan 
>> Signed-off-by: Yisheng Xie 
>> ---
>> v3:
>>  - pci_remap_iospace() sanitize its arguments instead - per Rafael
>>
>> v2:
>>  - Let the caller of ioremap_page_range() align the request by PAGE_SIZE - 
>> per Toshi
>>
>>  drivers/pci/pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index dbfe7c4..0eb0381 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3544,6 +3544,9 @@ int pci_remap_iospace(const struct resource *res, 
>> phys_addr_t phys_addr)
>>  if (res->end > IO_SPACE_LIMIT)
>>  return -EINVAL;
>>  
>> +if (!PAGE_ALIGNED(vaddr) || !PAGE_ALIGNED(resource_size(res)))
>> +return -EINVAL;
> 
> Most other callers of ioremap_page_range() are in the ioremap() path,
> and they align phys_addr themselves.  In some cases that results in a
> mapping that covers more than necessary.  For instance, see the
> function comment at the x86 version of __ioremap_caller().
> 
> Is there any reason we couldn't similarly align vaddr and phys_addr
> here?
> 
> The acpi_pci_probe_root_resources() path you mention above basically
> ignores the errors you're returning.  Your patches will avoid the
> panic, which is an improvement, but I/O port space will not work, and
> I don't see anything that gives the user a hint about why not.
> 
> If we could align vaddr and phys_addr (and possibly map more than
> necessary), I/O port space would still work.

Right, I will send another version, soon.

Thanks
Yisheng
> 
>>  return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
>>pgprot_device(PAGE_KERNEL));
>>  #else
>> -- 
>> 1.7.12.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>

Re: [PATCH v3 1/2] power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats

2018-06-05 Thread Phil Reid


G'day Brian,

One comment below.

On 2/06/2018 09:28, Brian Norris wrote:

This driver was originally submitted for the TI BQ20Z75 battery IC
(commit a7640bfa10c5 ("power_supply: Add driver for TI BQ20Z75 gas gauge
IC")) and later renamed to express generic SBS support. While it's
mostly true that this driver implemented a standard SBS command set, it
takes liberties with the REG_MANUFACTURER_DATA register. This register
is specified in the SBS spec, but it doesn't make any mention of what
its actual contents are.

We've sort of noticed this optionality previously, with commit
17c6d3979e5b ("sbs-battery: make writes to ManufacturerAccess
optional"), where we found that some batteries NAK writes to this
register.

What this really means is that so far, we've just been lucky that most
batteries have either been compatible with the TI chip, or else at least
haven't reported highly-unexpected values.

For instance, one battery I have here seems to report either 0x or
0x0100 to the MANUFACTURER_ACCESS_STATUS command -- while this seems to
match either Wake Up (bits[11:8] = b) or Normal Discharge
(bits[11:8] = 0001b) status for the TI part [1], they don't seem to
actually correspond to real states (for instance, I never see 0101b =
Charge, even when charging).

On other batteries, I'm getting apparently random data in return, which
means that occasionally, we interpret this as "battery not present" or
"battery is not healthy".

All in all, it seems to be a really bad idea to make assumptions about
REG_MANUFACTURER_DATA, unless we already know what battery we're using.
Therefore, this patch reimplements the "present" and "health" checks to
the following on most SBS batteries:

1. HEALTH: report "unknown" -- I couldn't find a standard SBS command
that gives us much useful here
2. PRESENT: just send a REG_STATUS command; if it succeeds, then the
battery is present

Also, we stop sending MANUFACTURER_ACCESS_SLEEP to non-TI parts. I have
no proof that this is useful and supported.

If someone explicitly provided a 'ti,bq20z75' compatible property, then
we retain the existing TI command behaviors.

[1] http://www.ti.com/lit/er/sluu265a/sluu265a.pdf

Signed-off-by: Brian Norris 
Reviewed-by: Guenter Roeck 
Acked-by: Rhyland Klein 
---
v2:
  * don't stub out POWER_SUPPLY_PROP_PRESENT from sbs_data[]
  * use if/else instead of switch/case

v3:
  * pull 'return 0' out of if/else, to satisfy braindead tooling
---
  drivers/power/supply/sbs-battery.c | 54 +-
  1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/power/supply/sbs-battery.c 
b/drivers/power/supply/sbs-battery.c
index 83d7b4115857..a9691ea42f44 100644
--- a/drivers/power/supply/sbs-battery.c
+++ b/drivers/power/supply/sbs-battery.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -156,6 +157,9 @@ static enum power_supply_property sbs_properties[] = {
POWER_SUPPLY_PROP_MODEL_NAME
  };
  
+/* Supports special manufacturer commands from TI BQ20Z75 IC. */

+#define SBS_FLAGS_TI_BQ20Z75   BIT(0)
+
  struct sbs_info {
struct i2c_client   *client;
struct power_supply *power_supply;
@@ -168,6 +172,7 @@ struct sbs_info {
u32 poll_retry_count;
struct delayed_work work;
struct mutexmode_lock;
+   u32 flags;
  };
  
  static char model_name[I2C_SMBUS_BLOCK_MAX + 1];

@@ -315,6 +320,27 @@ static int sbs_status_correct(struct i2c_client *client, 
int *intval)
  static int sbs_get_battery_presence_and_health(
struct i2c_client *client, enum power_supply_property psp,
union power_supply_propval *val)
+{
+   int ret;
+
+   if (psp == POWER_SUPPLY_PROP_PRESENT) {
+   /* Dummy command; if it succeeds, battery is present. */
+   ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr);
+   if (ret < 0)
+   val->intval = 0; /* battery disconnected */
+   else
+   val->intval = 1; /* battery present */
+   } else { /* POWER_SUPPLY_PROP_HEALTH */
+   /* SBS spec doesn't have a general health command. */
+   val->intval = POWER_SUPPLY_HEALTH_UNKNOWN;
+   }
+
+   return 0;
+}
+
+static int sbs_get_ti_battery_presence_and_health(
+   struct i2c_client *client, enum power_supply_property psp,
+   union power_supply_propval *val)
  {
s32 ret;
  
@@ -600,7 +626,12 @@ static int sbs_get_property(struct power_supply *psy,

switch (psp) {
case POWER_SUPPLY_PROP_PRESENT:
case POWER_SUPPLY_PROP_HEALTH:
-   ret = sbs_get_battery_presence_and_health(client, psp, val);
+   if (client->flags & SBS_FLAGS_TI_BQ20Z75)
+   ret = sbs_get_ti_battery_presence_and_health(client,
+

Re: [PATCH v3 1/2] power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats

2018-06-05 Thread Phil Reid


G'day Brian,

One comment below.

On 2/06/2018 09:28, Brian Norris wrote:

This driver was originally submitted for the TI BQ20Z75 battery IC
(commit a7640bfa10c5 ("power_supply: Add driver for TI BQ20Z75 gas gauge
IC")) and later renamed to express generic SBS support. While it's
mostly true that this driver implemented a standard SBS command set, it
takes liberties with the REG_MANUFACTURER_DATA register. This register
is specified in the SBS spec, but it doesn't make any mention of what
its actual contents are.

We've sort of noticed this optionality previously, with commit
17c6d3979e5b ("sbs-battery: make writes to ManufacturerAccess
optional"), where we found that some batteries NAK writes to this
register.

What this really means is that so far, we've just been lucky that most
batteries have either been compatible with the TI chip, or else at least
haven't reported highly-unexpected values.

For instance, one battery I have here seems to report either 0x or
0x0100 to the MANUFACTURER_ACCESS_STATUS command -- while this seems to
match either Wake Up (bits[11:8] = b) or Normal Discharge
(bits[11:8] = 0001b) status for the TI part [1], they don't seem to
actually correspond to real states (for instance, I never see 0101b =
Charge, even when charging).

On other batteries, I'm getting apparently random data in return, which
means that occasionally, we interpret this as "battery not present" or
"battery is not healthy".

All in all, it seems to be a really bad idea to make assumptions about
REG_MANUFACTURER_DATA, unless we already know what battery we're using.
Therefore, this patch reimplements the "present" and "health" checks to
the following on most SBS batteries:

1. HEALTH: report "unknown" -- I couldn't find a standard SBS command
that gives us much useful here
2. PRESENT: just send a REG_STATUS command; if it succeeds, then the
battery is present

Also, we stop sending MANUFACTURER_ACCESS_SLEEP to non-TI parts. I have
no proof that this is useful and supported.

If someone explicitly provided a 'ti,bq20z75' compatible property, then
we retain the existing TI command behaviors.

[1] http://www.ti.com/lit/er/sluu265a/sluu265a.pdf

Signed-off-by: Brian Norris 
Reviewed-by: Guenter Roeck 
Acked-by: Rhyland Klein 
---
v2:
  * don't stub out POWER_SUPPLY_PROP_PRESENT from sbs_data[]
  * use if/else instead of switch/case

v3:
  * pull 'return 0' out of if/else, to satisfy braindead tooling
---
  drivers/power/supply/sbs-battery.c | 54 +-
  1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/power/supply/sbs-battery.c 
b/drivers/power/supply/sbs-battery.c
index 83d7b4115857..a9691ea42f44 100644
--- a/drivers/power/supply/sbs-battery.c
+++ b/drivers/power/supply/sbs-battery.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -156,6 +157,9 @@ static enum power_supply_property sbs_properties[] = {
POWER_SUPPLY_PROP_MODEL_NAME
  };
  
+/* Supports special manufacturer commands from TI BQ20Z75 IC. */

+#define SBS_FLAGS_TI_BQ20Z75   BIT(0)
+
  struct sbs_info {
struct i2c_client   *client;
struct power_supply *power_supply;
@@ -168,6 +172,7 @@ struct sbs_info {
u32 poll_retry_count;
struct delayed_work work;
struct mutexmode_lock;
+   u32 flags;
  };
  
  static char model_name[I2C_SMBUS_BLOCK_MAX + 1];

@@ -315,6 +320,27 @@ static int sbs_status_correct(struct i2c_client *client, 
int *intval)
  static int sbs_get_battery_presence_and_health(
struct i2c_client *client, enum power_supply_property psp,
union power_supply_propval *val)
+{
+   int ret;
+
+   if (psp == POWER_SUPPLY_PROP_PRESENT) {
+   /* Dummy command; if it succeeds, battery is present. */
+   ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr);
+   if (ret < 0)
+   val->intval = 0; /* battery disconnected */
+   else
+   val->intval = 1; /* battery present */
+   } else { /* POWER_SUPPLY_PROP_HEALTH */
+   /* SBS spec doesn't have a general health command. */
+   val->intval = POWER_SUPPLY_HEALTH_UNKNOWN;
+   }
+
+   return 0;
+}
+
+static int sbs_get_ti_battery_presence_and_health(
+   struct i2c_client *client, enum power_supply_property psp,
+   union power_supply_propval *val)
  {
s32 ret;
  
@@ -600,7 +626,12 @@ static int sbs_get_property(struct power_supply *psy,

switch (psp) {
case POWER_SUPPLY_PROP_PRESENT:
case POWER_SUPPLY_PROP_HEALTH:
-   ret = sbs_get_battery_presence_and_health(client, psp, val);
+   if (client->flags & SBS_FLAGS_TI_BQ20Z75)
+   ret = sbs_get_ti_battery_presence_and_health(client,
+

ATENCIÓN

2018-06-05 Thread Sistemas administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono: 
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2018

¡gracias
Sistemas administrador

ATENCIÓN

2018-06-05 Thread Sistemas administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono: 
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2018

¡gracias
Sistemas administrador

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 6:38 PM Masahiro Yamada
 wrote:
>
> Was the v4.16 behavior intentional,
> or just something people found to work?

I don't think all the _details_ may be intentional, but the fact that

   make oldconfig

"just works" is definitely a good thing.

> "make ARCH=i386 allnoconfig" hides the prompt
> of 64BIT.
>
> Then, "make oldconfig" makes the prompt newly visible,
> so it is asking for user input.

The "hides the prompt for 64BIT" is just a technical detail.

It could equally well be a pre-determined Kconfig fragment that is
used to generate a particular Kconfig.

But once you *have* that particular Kconfig, I do think that "make
oldconfig" should just work. And it apparently used to.

So I think this is a behavioral regression.

   Linus

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 6:38 PM Masahiro Yamada
 wrote:
>
> Was the v4.16 behavior intentional,
> or just something people found to work?

I don't think all the _details_ may be intentional, but the fact that

   make oldconfig

"just works" is definitely a good thing.

> "make ARCH=i386 allnoconfig" hides the prompt
> of 64BIT.
>
> Then, "make oldconfig" makes the prompt newly visible,
> so it is asking for user input.

The "hides the prompt for 64BIT" is just a technical detail.

It could equally well be a pre-determined Kconfig fragment that is
used to generate a particular Kconfig.

But once you *have* that particular Kconfig, I do think that "make
oldconfig" should just work. And it apparently used to.

So I think this is a behavioral regression.

   Linus

Re: [lkp-robot] [xfs] b027d4c97b: fio.latency_2ms% +7.1% regression

2018-06-05 Thread Ye Xiaolong

On 06/06, Dave Chinner wrote:
>On Tue, Jun 05, 2018 at 03:16:57PM +0800, kernel test robot wrote:
>> 
>> Greeting,
>> 
>> FYI, we noticed a +7.1%% regression of fio.latency_2ms% due to commit:
>> 
>> 
>> commit: b027d4c97b9675c2ad75dec94be4e46dceb3ec74 ("xfs: don't retry 
>> xfs_buf_find on XBF_TRYLOCK failure")
>> https://git.kernel.org/cgit/fs/xfs/xfs-linux.git xfs-4.18-merge
>
>> 8925a3dc4771004b b027d4c97b9675c2ad75dec94b 
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>  46.56 ±  3%  +7.1   53.61fio.latency_2ms%
>>   8.19+0.28.40fio.latency_100ms%
>>   0.74 ±  3%  -0.10.68 ±  6%  fio.latency_250ms%
>>  25.20 ±  6%  -7.3   17.86 ±  6%  fio.latency_4ms%
>>   0.46 ±  9%  +0.20.69 ± 13%  fio.latency_750us%
>
>This is not a regression. The number of IOs in the 4ms IO latency
>bin has reduced by 7%, and inumber in the 2ms IO latency bin has
>increased by 7%. IOWs, there's a measurable improvement in IO
>latency as a result of those patches, not a regression.

Thanks for clarification.

Thanks,
Xiaolong

>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>da...@fromorbit.com

Re: [lkp-robot] [xfs] b027d4c97b: fio.latency_2ms% +7.1% regression

2018-06-05 Thread Ye Xiaolong

On 06/06, Dave Chinner wrote:
>On Tue, Jun 05, 2018 at 03:16:57PM +0800, kernel test robot wrote:
>> 
>> Greeting,
>> 
>> FYI, we noticed a +7.1%% regression of fio.latency_2ms% due to commit:
>> 
>> 
>> commit: b027d4c97b9675c2ad75dec94be4e46dceb3ec74 ("xfs: don't retry 
>> xfs_buf_find on XBF_TRYLOCK failure")
>> https://git.kernel.org/cgit/fs/xfs/xfs-linux.git xfs-4.18-merge
>
>> 8925a3dc4771004b b027d4c97b9675c2ad75dec94b 
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>  46.56 ±  3%  +7.1   53.61fio.latency_2ms%
>>   8.19+0.28.40fio.latency_100ms%
>>   0.74 ±  3%  -0.10.68 ±  6%  fio.latency_250ms%
>>  25.20 ±  6%  -7.3   17.86 ±  6%  fio.latency_4ms%
>>   0.46 ±  9%  +0.20.69 ± 13%  fio.latency_750us%
>
>This is not a regression. The number of IOs in the 4ms IO latency
>bin has reduced by 7%, and inumber in the 2ms IO latency bin has
>increased by 7%. IOWs, there's a measurable improvement in IO
>latency as a result of those patches, not a regression.

Thanks for clarification.

Thanks,
Xiaolong

>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>da...@fromorbit.com

Re: [PATCH] irqchip/gic-v3-its: fix ITS queue timeout

2018-06-05 Thread Yang Yingliang


Hi, Julien

On 2018/6/5 18:16, Julien Thierry wrote:

Hi Yang,

On 05/06/18 07:30, Yang Yingliang wrote:

When the kernel booted with maxcpus=x, 'x' is smaller
than actual cpu numbers, the TAs of offline cpus won't
be set to its->collection.

If LPI is bind to offline cpu, sync cmd will use zero TA,
it leads to ITS queue timeout.  Fix this by choosing a
online cpu, if there is no online cpu in cpu_mask.

Signed-off-by: Yang Yingliang 
---
  drivers/irqchip/irq-gic-v3-its.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index 5416f2b..edd92a9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct 
irq_domain *domain,

  cpu_mask = cpumask_of_node(its_dev->its->numa_node);
/* Bind the LPI to the first possible CPU */
-cpu = cpumask_first(cpu_mask);
+cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+if (!cpu_online(cpu))


Testing for cpu being online here feels a bit redundant.

Since cpu is online if the cpumask_first_and returns a valid cpu, I 
think you could replace this test with:


if (cpu >= nr_cpu_ids)
Yes, I used wrong check here, according to comment of cpumask_first_and, 
this func will returns >= nr_cpu_ids if no cpus set in both.


I'll send v2 later.



+cpu = cpumask_first(cpu_online_mask);
  its_dev->event_map.col_map[event] = cpu;
  irq_data_update_effective_affinity(d, cpumask_of(cpu));
  @@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct 
irq_data *d,

  bool force)
  {
  struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-int cpu = cpumask_first(mask_val);
+int cpu = cpumask_first_and(mask_val, cpu_online_mask);
+
+if (!cpu_online(cpu))


Same thing here.


+cpu = cpumask_first(cpu_online_mask);
/*
   * Changing affinity is mega expensive, so let's be as lazy as



Cheers,



Thanks,
Yang

Re: [PATCH] irqchip/gic-v3-its: fix ITS queue timeout

2018-06-05 Thread Yang Yingliang


Hi, Julien

On 2018/6/5 18:16, Julien Thierry wrote:

Hi Yang,

On 05/06/18 07:30, Yang Yingliang wrote:

When the kernel booted with maxcpus=x, 'x' is smaller
than actual cpu numbers, the TAs of offline cpus won't
be set to its->collection.

If LPI is bind to offline cpu, sync cmd will use zero TA,
it leads to ITS queue timeout.  Fix this by choosing a
online cpu, if there is no online cpu in cpu_mask.

Signed-off-by: Yang Yingliang 
---
  drivers/irqchip/irq-gic-v3-its.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index 5416f2b..edd92a9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct 
irq_domain *domain,

  cpu_mask = cpumask_of_node(its_dev->its->numa_node);
/* Bind the LPI to the first possible CPU */
-cpu = cpumask_first(cpu_mask);
+cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+if (!cpu_online(cpu))


Testing for cpu being online here feels a bit redundant.

Since cpu is online if the cpumask_first_and returns a valid cpu, I 
think you could replace this test with:


if (cpu >= nr_cpu_ids)
Yes, I used wrong check here, according to comment of cpumask_first_and, 
this func will returns >= nr_cpu_ids if no cpus set in both.


I'll send v2 later.



+cpu = cpumask_first(cpu_online_mask);
  its_dev->event_map.col_map[event] = cpu;
  irq_data_update_effective_affinity(d, cpumask_of(cpu));
  @@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct 
irq_data *d,

  bool force)
  {
  struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-int cpu = cpumask_first(mask_val);
+int cpu = cpumask_first_and(mask_val, cpu_online_mask);
+
+if (!cpu_online(cpu))


Same thing here.


+cpu = cpumask_first(cpu_online_mask);
/*
   * Changing affinity is mega expensive, so let's be as lazy as



Cheers,



Thanks,
Yang

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Masahiro Yamada

Hi Linus, Thomas,

2018-06-06 4:13 GMT+09:00 Linus Torvalds :
> On Tue, Jun 5, 2018 at 11:50 AM Thomas Backlund  wrote:
>> >
>> > but why do you care?
>>
>> Because without it running the build in the 32bit chroot will get the
>> initial reported issue:
>
> Ahh. I can re-create that now.
>
> Yes, doing
>
>   make ARCH=i386 allnoconfig
>
> followed by
>
>   make oldconfig
>
> is broken. And doing a trivial "git bisect run" to pinpoint where
> CONFIG_64BIT goes away gives us
>
> f467c5640c29ad258c3cd8186a776c82fc3b8057 is the first bad commit
>
> which does that
>
>   "kconfig: only write '# CONFIG_FOO is not set' for visible symbols"
>
> and it turns out that CONFIG_64BIT is not a visible symbol on x86-32,
> because that question is disabled when ARCH != "x86".
>
> bool "64-bit kernel" if ARCH = "x86"
>
> And the problem with that, is that *next* time around this config file
> is used, because we don't have that
>
>   # CONFIG_64BIT is not set
>
> line, we don't turn it into
>
>   CONFIG_64BIT=n
>
> and then the "depends on" in X86_64
>
>   config X86_64
>   def_bool y
>   depends on 64BIT
>
> no longer hides it.
>
> Hmm. Ulf, Masahiro, comments?
>
> Should we just revert that commit?

Hmm.

Was the v4.16 behavior intentional,
or just something people found to work?

IMHO, the current behavior in v4.17 is expected
from the Kconfig point of view.

"make ARCH=i386 allnoconfig" hides the prompt
of 64BIT.

Then, "make oldconfig" makes the prompt newly visible,
so it is asking for user input.

However, the previous behavior is desired,
I think we can change the course.

If a symbol is visible (i.e. there is no unmet direct dependency),
'# CONFIG_... is not set' should be written out to the .config
even if its prompt is invisible.

For example,

config FOO
   bool

should write out "# CONFIG_FOO is not set"

I think this will fix the reported problem,
and Kconfig can still keep the grammatical consistency.

Ulf, what do you think?

> Thomas, can you verify that a
>
> git revert f467c5640c29ad258c3cd8186a776c82fc3b8057
>
> fixes the problem for you?
>
>   Linus

-- 
Best Regards
Masahiro Yamada

Re: building in 32bit chroot on x86_64 host broken

2018-06-05 Thread Masahiro Yamada

Hi Linus, Thomas,

2018-06-06 4:13 GMT+09:00 Linus Torvalds :
> On Tue, Jun 5, 2018 at 11:50 AM Thomas Backlund  wrote:
>> >
>> > but why do you care?
>>
>> Because without it running the build in the 32bit chroot will get the
>> initial reported issue:
>
> Ahh. I can re-create that now.
>
> Yes, doing
>
>   make ARCH=i386 allnoconfig
>
> followed by
>
>   make oldconfig
>
> is broken. And doing a trivial "git bisect run" to pinpoint where
> CONFIG_64BIT goes away gives us
>
> f467c5640c29ad258c3cd8186a776c82fc3b8057 is the first bad commit
>
> which does that
>
>   "kconfig: only write '# CONFIG_FOO is not set' for visible symbols"
>
> and it turns out that CONFIG_64BIT is not a visible symbol on x86-32,
> because that question is disabled when ARCH != "x86".
>
> bool "64-bit kernel" if ARCH = "x86"
>
> And the problem with that, is that *next* time around this config file
> is used, because we don't have that
>
>   # CONFIG_64BIT is not set
>
> line, we don't turn it into
>
>   CONFIG_64BIT=n
>
> and then the "depends on" in X86_64
>
>   config X86_64
>   def_bool y
>   depends on 64BIT
>
> no longer hides it.
>
> Hmm. Ulf, Masahiro, comments?
>
> Should we just revert that commit?

Hmm.

Was the v4.16 behavior intentional,
or just something people found to work?

IMHO, the current behavior in v4.17 is expected
from the Kconfig point of view.

"make ARCH=i386 allnoconfig" hides the prompt
of 64BIT.

Then, "make oldconfig" makes the prompt newly visible,
so it is asking for user input.

However, the previous behavior is desired,
I think we can change the course.

If a symbol is visible (i.e. there is no unmet direct dependency),
'# CONFIG_... is not set' should be written out to the .config
even if its prompt is invisible.

For example,

config FOO
   bool

should write out "# CONFIG_FOO is not set"

I think this will fix the reported problem,
and Kconfig can still keep the grammatical consistency.

Ulf, what do you think?

> Thomas, can you verify that a
>
> git revert f467c5640c29ad258c3cd8186a776c82fc3b8057
>
> fixes the problem for you?
>
>   Linus

-- 
Best Regards
Masahiro Yamada

Re: [PATCH 0/5] mm: rework hmm to use devm_memremap_pages

2018-06-05 Thread Dan Williams

On Tue, Jun 5, 2018 at 5:08 PM, Jerome Glisse  wrote:
> On Tue, Jun 05, 2018 at 04:06:12PM -0700, Dan Williams wrote:
[..]
>> I want the EXPORT_SYMBOL_GPL on devm_memremap_pages() primarily for
>> development purposes. Any new users of devm_memremap_pages() should be
>> aware that they are subscribing to the whims of the core-VM, i.e. the
>> ongoing evolution of 'struct page', and encourage those drivers to be
>> upstream to improve the implementation, and consolidate use cases. I'm
>> not qualified to comment on your "nor will it change anyone's legal
>> position.", but I'm saying it's in the Linux kernel's best interest
>> that new users of this interface assume they need to be GPL.
>
> Note that HMM isolate the device driver from struct page as long as
> the driver only use HMM helpers to get to the information it needs.
> I intend to be pedantic about that with any driver using HMM. I want
> HMM to be an impedance layer that provide stable and simple API to
> device driver while preserving freedom of change to mm.
>

I would not classify redefining put_page() and recompiling the
entirety of the kernel to turn on HMM as "isolating the driver from
'struct page'". HMM is instead isolating these out of drivers from
ever needing to go upstream.

Unless the nouveau patches are using the entirety of what is already
upstream for HMM, we should look to pare HMM back.

There is plenty of precedent of building a large capability
out-of-tree and piecemeal merging it later, so I do not buy the
"chicken-egg" argument. The change in the export is to make sure we
don't repeat this backward "merge first, ask questions later" mistake
in the future as devm_memremap_pages() is continuing to find new users
like peer-to-peer DMA support and Linux is better off if that
development is upstream. From a purely technical standpoint
devm_memremap_pages() is EXPORT_SYMBOL_GPL because it hacks around
several implementation details in the core kernel to achieve its goal,
and it leaks new assumptions all over the kernel. It is strictly not a
self contained interface.

Re: [PATCH 0/5] mm: rework hmm to use devm_memremap_pages

2018-06-05 Thread Dan Williams

On Tue, Jun 5, 2018 at 5:08 PM, Jerome Glisse  wrote:
> On Tue, Jun 05, 2018 at 04:06:12PM -0700, Dan Williams wrote:
[..]
>> I want the EXPORT_SYMBOL_GPL on devm_memremap_pages() primarily for
>> development purposes. Any new users of devm_memremap_pages() should be
>> aware that they are subscribing to the whims of the core-VM, i.e. the
>> ongoing evolution of 'struct page', and encourage those drivers to be
>> upstream to improve the implementation, and consolidate use cases. I'm
>> not qualified to comment on your "nor will it change anyone's legal
>> position.", but I'm saying it's in the Linux kernel's best interest
>> that new users of this interface assume they need to be GPL.
>
> Note that HMM isolate the device driver from struct page as long as
> the driver only use HMM helpers to get to the information it needs.
> I intend to be pedantic about that with any driver using HMM. I want
> HMM to be an impedance layer that provide stable and simple API to
> device driver while preserving freedom of change to mm.
>

I would not classify redefining put_page() and recompiling the
entirety of the kernel to turn on HMM as "isolating the driver from
'struct page'". HMM is instead isolating these out of drivers from
ever needing to go upstream.

Unless the nouveau patches are using the entirety of what is already
upstream for HMM, we should look to pare HMM back.

There is plenty of precedent of building a large capability
out-of-tree and piecemeal merging it later, so I do not buy the
"chicken-egg" argument. The change in the export is to make sure we
don't repeat this backward "merge first, ask questions later" mistake
in the future as devm_memremap_pages() is continuing to find new users
like peer-to-peer DMA support and Linux is better off if that
development is upstream. From a purely technical standpoint
devm_memremap_pages() is EXPORT_SYMBOL_GPL because it hacks around
several implementation details in the core kernel to achieve its goal,
and it leaks new assumptions all over the kernel. It is strictly not a
self contained interface.

Re: [PATCH] slab: Clean up the code comment in slab kmem_cache struct

2018-06-05 Thread Baoquan He

On 06/05/18 at 05:04pm, Christopher Lameter wrote:
> On Sun, 3 Jun 2018, Baoquan He wrote:
> 
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index d9228e4d0320..3485c58cfd1c 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -67,9 +67,10 @@ struct kmem_cache {
> >
> > /*
> >  * If debugging is enabled, then the allocator can add additional
> > -* fields and/or padding to every object. size contains the total
> > -* object size including these internal fields, the following two
> > -* variables contain the offset to the user object and its size.
> > +* fields and/or padding to every object. 'size' contains the total
> > +* object size including these internal fields, while 'obj_offset'
> > +* and 'object_size' contain the offset to the user object and its
> > +* size.
> >  */
> > int obj_offset;
> >  #endif /* CONFIG_DEBUG_SLAB */
> >
> 
> Wish we had some more consistent naming. object_size and obj_offset??? And
> the fields better be as close together as possible.

I am back porting Thomas's sl[a|u]b freelist randomization feature to
our distros, need go through slab code for better understanding. From
git log history, they were 'obj_offset' and 'obj_size'. Later on
'obj_size' was renamed to 'object_size' in commit 3b0efdfa1e("mm, sl[aou]b:
Extract common fields from struct kmem_cache") which is from your patch.
With my understanding, I guess you changed that on purpose because
object_size is size of each object, obj_offset is for the whole cache,
representing the offset the real object starts to be stored. And putting
them separately is for better desribing them in code comment and
distinction, e.g 'object_size' is in "4) cache creation/removal",
while 'obj_offset' is put alone to indicate it's for the whole.

Re: [PATCH] slab: Clean up the code comment in slab kmem_cache struct

2018-06-05 Thread Baoquan He

On 06/05/18 at 05:04pm, Christopher Lameter wrote:
> On Sun, 3 Jun 2018, Baoquan He wrote:
> 
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index d9228e4d0320..3485c58cfd1c 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -67,9 +67,10 @@ struct kmem_cache {
> >
> > /*
> >  * If debugging is enabled, then the allocator can add additional
> > -* fields and/or padding to every object. size contains the total
> > -* object size including these internal fields, the following two
> > -* variables contain the offset to the user object and its size.
> > +* fields and/or padding to every object. 'size' contains the total
> > +* object size including these internal fields, while 'obj_offset'
> > +* and 'object_size' contain the offset to the user object and its
> > +* size.
> >  */
> > int obj_offset;
> >  #endif /* CONFIG_DEBUG_SLAB */
> >
> 
> Wish we had some more consistent naming. object_size and obj_offset??? And
> the fields better be as close together as possible.

I am back porting Thomas's sl[a|u]b freelist randomization feature to
our distros, need go through slab code for better understanding. From
git log history, they were 'obj_offset' and 'obj_size'. Later on
'obj_size' was renamed to 'object_size' in commit 3b0efdfa1e("mm, sl[aou]b:
Extract common fields from struct kmem_cache") which is from your patch.
With my understanding, I guess you changed that on purpose because
object_size is size of each object, obj_offset is for the whole cache,
representing the offset the real object starts to be stored. And putting
them separately is for better desribing them in code comment and
distinction, e.g 'object_size' is in "4) cache creation/removal",
while 'obj_offset' is put alone to indicate it's for the whole.

[PATCH] module: Implement sig_unenforce parameter

2018-06-05 Thread Brett T. Warden

When CONFIG_MODULE_SIG_FORCE is enabled, also provide a boot-time-only
parameter, module.sig_unenforce, to disable signature enforcement. This
allows distributions to ship with signature verification enforcement
enabled by default, but for users to elect to disable it without
recompiling, to support building and loading out-of-tree modules.

Signed-off-by: Brett T. Warden 
---
 .../admin-guide/kernel-parameters.txt |  4 +++
 kernel/module.c   | 25 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1beb30d8d7fc..87909e021558 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2380,6 +2380,10 @@
Note that if CONFIG_MODULE_SIG_FORCE is set, that
is always true, so this option does nothing.
 
+   module.sig_unenforce
+   [KNL] This parameter allows modules without signatures
+   to be loaded, overriding CONFIG_MODULE_SIG_FORCE.
+
module_blacklist=  [KNL] Do not load a comma-separated list of
modules.  Useful for debugging problem modules.
 
diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..53cd6cd52dc6 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "module-internal.h"
 
@@ -274,9 +275,13 @@ static void module_assert_mutex_or_preempt(void)
 }
 
 static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
-#ifndef CONFIG_MODULE_SIG_FORCE
+#ifdef CONFIG_MODULE_SIG_FORCE
+/* Allow disabling module signature requirement by adding boot param */
+static bool sig_unenforce;
+module_param(sig_unenforce, bool_enable_only, 0444);
+#else /* !CONFIG_MODULE_SIG_FORCE */
 module_param(sig_enforce, bool_enable_only, 0644);
-#endif /* !CONFIG_MODULE_SIG_FORCE */
+#endif /* CONFIG_MODULE_SIG_FORCE */
 
 /*
  * Export sig_enforce kernel cmdline parameter to allow other subsystems rely
@@ -415,6 +420,8 @@ extern const s32 __start___kcrctab_unused[];
 extern const s32 __start___kcrctab_unused_gpl[];
 #endif
 
+extern struct boot_params boot_params;
+
 #ifndef CONFIG_MODVERSIONS
 #define symversion(base, idx) NULL
 #else
@@ -4243,6 +4250,20 @@ static const struct file_operations 
proc_modules_operations = {
 static int __init proc_modules_init(void)
 {
proc_create("modules", 0, NULL, _modules_operations);
+
+#ifdef CONFIG_MODULE_SIG_FORCE
+   switch (boot_params.secure_boot) {
+   case efi_secureboot_mode_unset:
+   case efi_secureboot_mode_unknown:
+   case efi_secureboot_mode_disabled:
+   /*
+* sig_unenforce is only applied if SecureBoot is not
+* enabled.
+*/
+   sig_enforce = !sig_unenforce;
+   }
+#endif
+
return 0;
 }
 module_init(proc_modules_init);
-- 
2.17.1

[PATCH] module: Implement sig_unenforce parameter

2018-06-05 Thread Brett T. Warden

When CONFIG_MODULE_SIG_FORCE is enabled, also provide a boot-time-only
parameter, module.sig_unenforce, to disable signature enforcement. This
allows distributions to ship with signature verification enforcement
enabled by default, but for users to elect to disable it without
recompiling, to support building and loading out-of-tree modules.

Signed-off-by: Brett T. Warden 
---
 .../admin-guide/kernel-parameters.txt |  4 +++
 kernel/module.c   | 25 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1beb30d8d7fc..87909e021558 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2380,6 +2380,10 @@
Note that if CONFIG_MODULE_SIG_FORCE is set, that
is always true, so this option does nothing.
 
+   module.sig_unenforce
+   [KNL] This parameter allows modules without signatures
+   to be loaded, overriding CONFIG_MODULE_SIG_FORCE.
+
module_blacklist=  [KNL] Do not load a comma-separated list of
modules.  Useful for debugging problem modules.
 
diff --git a/kernel/module.c b/kernel/module.c
index c9bea7f2b43e..53cd6cd52dc6 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "module-internal.h"
 
@@ -274,9 +275,13 @@ static void module_assert_mutex_or_preempt(void)
 }
 
 static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
-#ifndef CONFIG_MODULE_SIG_FORCE
+#ifdef CONFIG_MODULE_SIG_FORCE
+/* Allow disabling module signature requirement by adding boot param */
+static bool sig_unenforce;
+module_param(sig_unenforce, bool_enable_only, 0444);
+#else /* !CONFIG_MODULE_SIG_FORCE */
 module_param(sig_enforce, bool_enable_only, 0644);
-#endif /* !CONFIG_MODULE_SIG_FORCE */
+#endif /* CONFIG_MODULE_SIG_FORCE */
 
 /*
  * Export sig_enforce kernel cmdline parameter to allow other subsystems rely
@@ -415,6 +420,8 @@ extern const s32 __start___kcrctab_unused[];
 extern const s32 __start___kcrctab_unused_gpl[];
 #endif
 
+extern struct boot_params boot_params;
+
 #ifndef CONFIG_MODVERSIONS
 #define symversion(base, idx) NULL
 #else
@@ -4243,6 +4250,20 @@ static const struct file_operations 
proc_modules_operations = {
 static int __init proc_modules_init(void)
 {
proc_create("modules", 0, NULL, _modules_operations);
+
+#ifdef CONFIG_MODULE_SIG_FORCE
+   switch (boot_params.secure_boot) {
+   case efi_secureboot_mode_unset:
+   case efi_secureboot_mode_unknown:
+   case efi_secureboot_mode_disabled:
+   /*
+* sig_unenforce is only applied if SecureBoot is not
+* enabled.
+*/
+   sig_enforce = !sig_unenforce;
+   }
+#endif
+
return 0;
 }
 module_init(proc_modules_init);
-- 
2.17.1

Re: [PATCH v3] kbuild: check for pkg-config on make {menu,n,g.x}config

2018-06-05 Thread Randy Dunlap

On 06/04/2018 11:53 PM, Masahiro Yamada wrote:
> Hi Randy,
> 
> 
> 2018-06-04 11:59 GMT+09:00 Randy Dunlap :
>> From: Randy Dunlap 
>>
>> Each of 'make {menu,n,g,x}config' uses (needs) pkg-config to make sure
>> that other required files are present and to determine build flags
>> settings, but none of these check that pkg-config itself is present.
>> Add a check for all 4 of these targets and update
>> Documentation/process/changes.rst to mention 'pkg-config'.
>>
>> Fixes kernel bugzilla #77511:
>> https://bugzilla.kernel.org/show_bug.cgi?id=77511
>>
>> Signed-off-by: Randy Dunlap 
>> Cc: Bjørn Forsman 
>> ---
>> Strictly speaking, pkg-config is not required if someone is only
>> using 'make {menu,n}config' since there are fallbacks for those
>> two targets.
>>


OK, I made your suggested changes and will test it later and then resend it.

thanks,
-- 
~Randy

Re: [PATCH v3] kbuild: check for pkg-config on make {menu,n,g.x}config

2018-06-05 Thread Randy Dunlap

On 06/04/2018 11:53 PM, Masahiro Yamada wrote:
> Hi Randy,
> 
> 
> 2018-06-04 11:59 GMT+09:00 Randy Dunlap :
>> From: Randy Dunlap 
>>
>> Each of 'make {menu,n,g,x}config' uses (needs) pkg-config to make sure
>> that other required files are present and to determine build flags
>> settings, but none of these check that pkg-config itself is present.
>> Add a check for all 4 of these targets and update
>> Documentation/process/changes.rst to mention 'pkg-config'.
>>
>> Fixes kernel bugzilla #77511:
>> https://bugzilla.kernel.org/show_bug.cgi?id=77511
>>
>> Signed-off-by: Randy Dunlap 
>> Cc: Bjørn Forsman 
>> ---
>> Strictly speaking, pkg-config is not required if someone is only
>> using 'make {menu,n}config' since there are fallbacks for those
>> two targets.
>>


OK, I made your suggested changes and will test it later and then resend it.

thanks,
-- 
~Randy

Re: [PATCH 4.4 00/37] 4.4.136-stable review

2018-06-05 Thread Nathan Chancellor

On Tue, Jun 05, 2018 at 07:01:05PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.136 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Jun  7 17:00:49 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.136-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my Pixel 2 XL and
OnePlus 5.

No issues in general usage or dmesg (although I somehow didn't catch the
VTI issue in 4.4.134 so don't know how valuable this is anymore...)

Thanks!
Nathan

[GIT PULL] SELinux patches for v4.18

2018-06-05 Thread Paul Moore

Hi Linus,

SELinux is back with a quiet pull request for v4.18.  Three patches,
all small: two cleanups of the SELinux audit records, and one to
migrate to a newly defined type (vm_fault_t).

Everything passes our test suite, and as of about five minutes ago it
merged cleanly with your tree.

Please pull, thanks.
-Paul
--
The following changes since commit 60cc43fc888428bb2f18f08997432d426a243338:

 Linux 4.17-rc1 (2018-04-15 18:24:20 -0700)

are available in the Git repository at:

 git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
   tags/selinux-pr-20180605

for you to fetch changes up to d141136f523a3a6372d22981bdff7a8906f36fea:

 audit: normalize MAC_POLICY_LOAD record (2018-04-17 17:54:11 -0400)


selinux/stable-4.18 PR 20180605


Richard Guy Briggs (2):
 audit: normalize MAC_STATUS record
 audit: normalize MAC_POLICY_LOAD record

Souptick Joarder (1):
 security: selinux: Change return type to vm_fault_t

security/selinux/selinuxfs.c | 18 --
1 file changed, 12 insertions(+), 6 deletions(-)

-- 
paul moore
www.paul-moore.com

Re: [PATCH 4.4 00/37] 4.4.136-stable review

2018-06-05 Thread Nathan Chancellor

On Tue, Jun 05, 2018 at 07:01:05PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.136 release.
> There are 37 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Jun  7 17:00:49 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.136-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my Pixel 2 XL and
OnePlus 5.

No issues in general usage or dmesg (although I somehow didn't catch the
VTI issue in 4.4.134 so don't know how valuable this is anymore...)

Thanks!
Nathan

[GIT PULL] SELinux patches for v4.18

2018-06-05 Thread Paul Moore

Hi Linus,

SELinux is back with a quiet pull request for v4.18.  Three patches,
all small: two cleanups of the SELinux audit records, and one to
migrate to a newly defined type (vm_fault_t).

Everything passes our test suite, and as of about five minutes ago it
merged cleanly with your tree.

Please pull, thanks.
-Paul
--
The following changes since commit 60cc43fc888428bb2f18f08997432d426a243338:

 Linux 4.17-rc1 (2018-04-15 18:24:20 -0700)

are available in the Git repository at:

 git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
   tags/selinux-pr-20180605

for you to fetch changes up to d141136f523a3a6372d22981bdff7a8906f36fea:

 audit: normalize MAC_POLICY_LOAD record (2018-04-17 17:54:11 -0400)


selinux/stable-4.18 PR 20180605


Richard Guy Briggs (2):
 audit: normalize MAC_STATUS record
 audit: normalize MAC_POLICY_LOAD record

Souptick Joarder (1):
 security: selinux: Change return type to vm_fault_t

security/selinux/selinuxfs.c | 18 --
1 file changed, 12 insertions(+), 6 deletions(-)

-- 
paul moore
www.paul-moore.com

Re: linux-next: manual merge of the userns tree with the arm tree

2018-06-05 Thread Stephen Rothwell

Hi all,

On Wed, 30 May 2018 18:30:58 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the userns tree got a conflict in:
> 
>   arch/arm/mm/fault.c
> 
> between commit:
> 
>   93a24d7e23e7 ("ARM: spectre-v2: harden user aborts in kernel space")
> 
> from the arm tree and commit:
> 
>   3eb0f5193b49 ("signal: Ensure every siginfo we send has all bits 
> initialized")
> 
> from the userns tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/arm/mm/fault.c
> index 3b1ba003c4f9,32034543f49c..
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@@ -163,9 -163,8 +163,11 @@@ __do_user_fault(struct task_struct *tsk
>   {
>   struct siginfo si;
>   
>  +if (addr > TASK_SIZE)
>  +harden_branch_predictor();
>  +
> + clear_siginfo();
> + 
>   #ifdef CONFIG_DEBUG_USER
>   if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
>   ((user_debug & UDBG_BUS)  && (sig == SIGBUS))) {

This is now a conflict between the arm tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpUAdbik3yVK.pgp
Description: OpenPGP digital signature

Re: linux-next: manual merge of the userns tree with the arm tree

2018-06-05 Thread Stephen Rothwell

Hi all,

On Wed, 30 May 2018 18:30:58 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the userns tree got a conflict in:
> 
>   arch/arm/mm/fault.c
> 
> between commit:
> 
>   93a24d7e23e7 ("ARM: spectre-v2: harden user aborts in kernel space")
> 
> from the arm tree and commit:
> 
>   3eb0f5193b49 ("signal: Ensure every siginfo we send has all bits 
> initialized")
> 
> from the userns tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/arm/mm/fault.c
> index 3b1ba003c4f9,32034543f49c..
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@@ -163,9 -163,8 +163,11 @@@ __do_user_fault(struct task_struct *tsk
>   {
>   struct siginfo si;
>   
>  +if (addr > TASK_SIZE)
>  +harden_branch_predictor();
>  +
> + clear_siginfo();
> + 
>   #ifdef CONFIG_DEBUG_USER
>   if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
>   ((user_debug & UDBG_BUS)  && (sig == SIGBUS))) {

This is now a conflict between the arm tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpUAdbik3yVK.pgp
Description: OpenPGP digital signature

Re: [GIT PULL] cgroup changes for v4.18-rc1

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 12:22 PM Tejun Heo  wrote:
>
> * cgroup uses file modified events to notify certain events.  A rate
>   limiting mechanism is added.

This "explanation" didn't really parse for me at all.

I edited the merge message to something that I think is correct and
made more sense to me.

   Linus

Re: [GIT PULL] cgroup changes for v4.18-rc1

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 12:22 PM Tejun Heo  wrote:
>
> * cgroup uses file modified events to notify certain events.  A rate
>   limiting mechanism is added.

This "explanation" didn't really parse for me at all.

I edited the merge message to something that I think is correct and
made more sense to me.

   Linus

Re: [PATCH 0/5] mm: rework hmm to use devm_memremap_pages

2018-06-05 Thread Jerome Glisse

On Tue, Jun 05, 2018 at 04:06:12PM -0700, Dan Williams wrote:
> On Tue, Jun 5, 2018 at 3:19 PM, Dave Airlie  wrote:
> > On 6 June 2018 at 04:48, Jerome Glisse  wrote:
> >> On Tue, May 29, 2018 at 04:33:49PM -0700, Dan Williams wrote:
> >>> On Tue, May 29, 2018 at 4:00 PM, Dave Airlie  wrote:
> >>> > On 30 May 2018 at 08:31, Dan Williams  wrote:

[...]

> >>> It honestly was an oversight, and as we've gone on to add deeper and
> >>> deeper ties into the mm and filesystems [1] I realized this symbol was
> >>> mis-labeled.  It would be one thing if this was just some random
> >>> kernel leaf / library function, but this capability when turned on
> >>> causes the entire kernel to be recompiled as things like the
> >>> definition of put_page() changes. It's deeply integrated with how
> >>> Linux manages memory.
> >>
> >> I am personaly on the fence on deciding GPL versus non GPL export
> >> base on subjective view of what is deeply integrated and what is
> >> not. I think one can argue that every single linux kernel function
> >> is deeply integrated within the kernel, starting with all device
> >> drivers functions. One could similarly argue that nothing is ...
> >
> > This is the point I wasn't making so well, the whole deciding on a derived
> > work from the pov of one of the works isn't really going to be how a court
> > looks at it.
> >
> > At day 0, you have a Linux kernel, and a separate Windows kernel driver,
> > clearly they are not derived works.
> >
> > You add interfaces to the Windows kernel driver and it becomes a Linux
> > kernel driver, you never ship them together, derived work only if those
> > interfaces are GPL only? or derived work only if shipped together?
> > only shipped together and GPL only? Clearly not a clearcut case here.
> >
> > The code base is 99% the same, the kernel changes an export to a GPL
> > export, the external driver hasn't changed one line of code, and it suddenly
> > becomes a derived work?
> >
> > Oversights happen, but 3 years of advertising an interface under the non-GPL
> > and changing it doesn't change whether the external driver is derived or 
> > not,
> > nor will it change anyone's legal position.
> 
> My concern is the long term health and maintainability of the Linux
> kernel. HMM exports deep Linux internals out to proprietary drivers
> with no way for folks in the wider kernel community to validate that
> the interfaces are necessary or sufficient besides "take Jerome's word
> for it". Every time I've pushed back on any HMM feature the response
> is something to the effect of, "no, out of tree drivers need this".
> HMM needs to grow upstream users and the functionality needs to be
> limited to whatever those upstream users exploit. Since there are no
> upstream users of HMM, we should delete it unless / until those users
> arrive.

The raison d'être of HMM is to isolate driver from mm internal gut and
thus provide a clear contract and API to device driver. I tried to spell
that contract in include/linux/hmm.h which i can re-formulate shortly in:
  - provide call back when CPU try to access a device page so that
memory can be migrated back to CPU accessible page under the
control of the device driver for device synchronization reasons
(the whole gory mm details is still in mm/migrate.c it just does
provide way point in the migration process so that the device
driver can synchronize and update the hardware along the way too)
  - provide a 64bits storage inside struct page so that the device
driver can store either pointer to its internal data structure
or store necessary informations there while page is in use in a
process
  - inform device driver once a page is freed (ie no longer use in a
process address space)

This virtualy isolate device driver from the inner gut of mm and allow
mm to change as long as we can keep this contract in place. As long as
device driver only use HMM API to perform any of the above and this is
my intention to push for that and try to enforce it as strongly as i
can.

Nouveau patchset have been posted and i will post newer updated version
this month and i hope this can get upstream in 4.19 abidding by the drm
sub-system requirement of having open source userspace upstream in mesa
project too (which have been under work for last few months).

This whole thing have been a big chicken and egg nightmare with moving
pieces everywhere. I wish i was better at getting all the pieces ready
at the same time but alas i was not.

> 
> I want the EXPORT_SYMBOL_GPL on devm_memremap_pages() primarily for
> development purposes. Any new users of devm_memremap_pages() should be
> aware that they are subscribing to the whims of the core-VM, i.e. the
> ongoing evolution of 'struct page', and encourage those drivers to be
> upstream to improve the implementation, and consolidate use cases. I'm
> not qualified to comment on your "nor will it change anyone's legal
> position.", but I'm saying it's in the Linux

Re: [PATCH 0/5] mm: rework hmm to use devm_memremap_pages

2018-06-05 Thread Jerome Glisse

On Tue, Jun 05, 2018 at 04:06:12PM -0700, Dan Williams wrote:
> On Tue, Jun 5, 2018 at 3:19 PM, Dave Airlie  wrote:
> > On 6 June 2018 at 04:48, Jerome Glisse  wrote:
> >> On Tue, May 29, 2018 at 04:33:49PM -0700, Dan Williams wrote:
> >>> On Tue, May 29, 2018 at 4:00 PM, Dave Airlie  wrote:
> >>> > On 30 May 2018 at 08:31, Dan Williams  wrote:

[...]

> >>> It honestly was an oversight, and as we've gone on to add deeper and
> >>> deeper ties into the mm and filesystems [1] I realized this symbol was
> >>> mis-labeled.  It would be one thing if this was just some random
> >>> kernel leaf / library function, but this capability when turned on
> >>> causes the entire kernel to be recompiled as things like the
> >>> definition of put_page() changes. It's deeply integrated with how
> >>> Linux manages memory.
> >>
> >> I am personaly on the fence on deciding GPL versus non GPL export
> >> base on subjective view of what is deeply integrated and what is
> >> not. I think one can argue that every single linux kernel function
> >> is deeply integrated within the kernel, starting with all device
> >> drivers functions. One could similarly argue that nothing is ...
> >
> > This is the point I wasn't making so well, the whole deciding on a derived
> > work from the pov of one of the works isn't really going to be how a court
> > looks at it.
> >
> > At day 0, you have a Linux kernel, and a separate Windows kernel driver,
> > clearly they are not derived works.
> >
> > You add interfaces to the Windows kernel driver and it becomes a Linux
> > kernel driver, you never ship them together, derived work only if those
> > interfaces are GPL only? or derived work only if shipped together?
> > only shipped together and GPL only? Clearly not a clearcut case here.
> >
> > The code base is 99% the same, the kernel changes an export to a GPL
> > export, the external driver hasn't changed one line of code, and it suddenly
> > becomes a derived work?
> >
> > Oversights happen, but 3 years of advertising an interface under the non-GPL
> > and changing it doesn't change whether the external driver is derived or 
> > not,
> > nor will it change anyone's legal position.
> 
> My concern is the long term health and maintainability of the Linux
> kernel. HMM exports deep Linux internals out to proprietary drivers
> with no way for folks in the wider kernel community to validate that
> the interfaces are necessary or sufficient besides "take Jerome's word
> for it". Every time I've pushed back on any HMM feature the response
> is something to the effect of, "no, out of tree drivers need this".
> HMM needs to grow upstream users and the functionality needs to be
> limited to whatever those upstream users exploit. Since there are no
> upstream users of HMM, we should delete it unless / until those users
> arrive.

The raison d'être of HMM is to isolate driver from mm internal gut and
thus provide a clear contract and API to device driver. I tried to spell
that contract in include/linux/hmm.h which i can re-formulate shortly in:
  - provide call back when CPU try to access a device page so that
memory can be migrated back to CPU accessible page under the
control of the device driver for device synchronization reasons
(the whole gory mm details is still in mm/migrate.c it just does
provide way point in the migration process so that the device
driver can synchronize and update the hardware along the way too)
  - provide a 64bits storage inside struct page so that the device
driver can store either pointer to its internal data structure
or store necessary informations there while page is in use in a
process
  - inform device driver once a page is freed (ie no longer use in a
process address space)

This virtualy isolate device driver from the inner gut of mm and allow
mm to change as long as we can keep this contract in place. As long as
device driver only use HMM API to perform any of the above and this is
my intention to push for that and try to enforce it as strongly as i
can.

Nouveau patchset have been posted and i will post newer updated version
this month and i hope this can get upstream in 4.19 abidding by the drm
sub-system requirement of having open source userspace upstream in mesa
project too (which have been under work for last few months).

This whole thing have been a big chicken and egg nightmare with moving
pieces everywhere. I wish i was better at getting all the pieces ready
at the same time but alas i was not.

> 
> I want the EXPORT_SYMBOL_GPL on devm_memremap_pages() primarily for
> development purposes. Any new users of devm_memremap_pages() should be
> aware that they are subscribing to the whims of the core-VM, i.e. the
> ongoing evolution of 'struct page', and encourage those drivers to be
> upstream to improve the implementation, and consolidate use cases. I'm
> not qualified to comment on your "nor will it change anyone's legal
> position.", but I'm saying it's in the Linux

Re: linux-next: Signed-off-by missing for commit in the y2038 tree

2018-06-05 Thread Deepa Dinamani

Oh, you meant that it has keesc...@chromium.org author sign-off, but
it needs mine because I applied it.

Please add

Signed-off-by: Deepa Dinamani 

I also updated it in my tree.

Thanks,
- Deepa

On Tue, Jun 5, 2018 at 4:17 PM, Stephen Rothwell  wrote:
> Hi Deepa,
>
> On Tue, 5 Jun 2018 15:00:24 -0700 Deepa Dinamani  
> wrote:
>>
>> That patch belongs to Kees.
>
> But you committed it to the tree ...
>
> --
> Cheers,
> Stephen Rothwell

Re: linux-next: Signed-off-by missing for commit in the y2038 tree

2018-06-05 Thread Deepa Dinamani

Oh, you meant that it has keesc...@chromium.org author sign-off, but
it needs mine because I applied it.

Please add

Signed-off-by: Deepa Dinamani 

I also updated it in my tree.

Thanks,
- Deepa

On Tue, Jun 5, 2018 at 4:17 PM, Stephen Rothwell  wrote:
> Hi Deepa,
>
> On Tue, 5 Jun 2018 15:00:24 -0700 Deepa Dinamani  
> wrote:
>>
>> That patch belongs to Kees.
>
> But you committed it to the tree ...
>
> --
> Cheers,
> Stephen Rothwell

Re: [PATCH v3 1/2] PCI: Avoid panic when PCI IO resource's size is not page aligned

2018-06-05 Thread Bjorn Helgaas

On Tue, May 29, 2018 at 08:18:18PM +0800, Yisheng Xie wrote:
> Zhou reported a bug on Hisilicon arm64 D06 platform with 64KB page size:
> 
>  [2.470908] kernel BUG at lib/ioremap.c:72!
>  [2.475079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>  [2.480551] Modules linked in:
>  [2.483594] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 4.16.0-rc7-00062-g0b41260-dirty #23
>  [2.491756] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI Nemo 
> 2.0 RC0 - B120 03/23/2018
>  [2.500614] pstate: 80c9 (Nzcv daif +PAN +UAO)
>  [2.505395] pc : ioremap_page_range+0x268/0x36c
>  [2.509912] lr : pci_remap_iospace+0xe4/0x100
>  [...]
>  [2.603733] Call trace:
>  [2.606168]  ioremap_page_range+0x268/0x36c
>  [2.610337]  pci_remap_iospace+0xe4/0x100
>  [2.614334]  acpi_pci_probe_root_resources+0x1d4/0x214
>  [2.619460]  pci_acpi_root_prepare_resources+0x18/0xa8
>  [2.624585]  acpi_pci_root_create+0x98/0x214
>  [2.628843]  pci_acpi_scan_root+0x124/0x20c
>  [2.633013]  acpi_pci_root_add+0x224/0x494
>  [2.637096]  acpi_bus_attach+0xf8/0x200
>  [2.640918]  acpi_bus_attach+0x98/0x200
>  [2.644740]  acpi_bus_attach+0x98/0x200
>  [2.648562]  acpi_bus_scan+0x48/0x9c
>  [2.652125]  acpi_scan_init+0x104/0x268
>  [2.655948]  acpi_init+0x308/0x374
>  [2.659337]  do_one_initcall+0x48/0x14c
>  [2.663160]  kernel_init_freeable+0x19c/0x250
>  [2.667504]  kernel_init+0x10/0x100
>  [2.670979]  ret_from_fork+0x10/0x18
> 
> The cause is the size of PCI IO resource is 32KB, which is 4K aligned but
> not 64KB aligned, however, ioremap_page_range() request the range as page
> aligned or it will trigger a BUG_ON() on ioremap_pte_range() it calls, as
> ioremap_pte_range increase the addr by PAGE_SIZE, which makes addr != end
> until trigger BUG_ON, if its incoming end is not page aligned. More detail
> trace is as following:
> 
>  ioremap_page_range
>  -> ioremap_p4d_range
> -> ioremap_p4d_range
>-> ioremap_pud_range
>   -> ioremap_pmd_range
>  -> ioremap_pte_range
> 
> This patch avoid panic by return -EINVAL if vaddr or resource size is not
> page aligned.
> 
> Reported-by: Zhou Wang 
> Tested-by: Xiaojun Tan 
> Signed-off-by: Yisheng Xie 
> ---
> v3:
>  - pci_remap_iospace() sanitize its arguments instead - per Rafael
> 
> v2:
>  - Let the caller of ioremap_page_range() align the request by PAGE_SIZE - 
> per Toshi
> 
>  drivers/pci/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index dbfe7c4..0eb0381 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3544,6 +3544,9 @@ int pci_remap_iospace(const struct resource *res, 
> phys_addr_t phys_addr)
>   if (res->end > IO_SPACE_LIMIT)
>   return -EINVAL;
>  
> + if (!PAGE_ALIGNED(vaddr) || !PAGE_ALIGNED(resource_size(res)))
> + return -EINVAL;

Most other callers of ioremap_page_range() are in the ioremap() path,
and they align phys_addr themselves.  In some cases that results in a
mapping that covers more than necessary.  For instance, see the
function comment at the x86 version of __ioremap_caller().

Is there any reason we couldn't similarly align vaddr and phys_addr
here?

The acpi_pci_probe_root_resources() path you mention above basically
ignores the errors you're returning.  Your patches will avoid the
panic, which is an improvement, but I/O port space will not work, and
I don't see anything that gives the user a hint about why not.

If we could align vaddr and phys_addr (and possibly map more than
necessary), I/O port space would still work.

>   return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
> pgprot_device(PAGE_KERNEL));
>  #else
> -- 
> 1.7.12.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/2] PCI: Avoid panic when PCI IO resource's size is not page aligned

2018-06-05 Thread Bjorn Helgaas

On Tue, May 29, 2018 at 08:18:18PM +0800, Yisheng Xie wrote:
> Zhou reported a bug on Hisilicon arm64 D06 platform with 64KB page size:
> 
>  [2.470908] kernel BUG at lib/ioremap.c:72!
>  [2.475079] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>  [2.480551] Modules linked in:
>  [2.483594] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 4.16.0-rc7-00062-g0b41260-dirty #23
>  [2.491756] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI Nemo 
> 2.0 RC0 - B120 03/23/2018
>  [2.500614] pstate: 80c9 (Nzcv daif +PAN +UAO)
>  [2.505395] pc : ioremap_page_range+0x268/0x36c
>  [2.509912] lr : pci_remap_iospace+0xe4/0x100
>  [...]
>  [2.603733] Call trace:
>  [2.606168]  ioremap_page_range+0x268/0x36c
>  [2.610337]  pci_remap_iospace+0xe4/0x100
>  [2.614334]  acpi_pci_probe_root_resources+0x1d4/0x214
>  [2.619460]  pci_acpi_root_prepare_resources+0x18/0xa8
>  [2.624585]  acpi_pci_root_create+0x98/0x214
>  [2.628843]  pci_acpi_scan_root+0x124/0x20c
>  [2.633013]  acpi_pci_root_add+0x224/0x494
>  [2.637096]  acpi_bus_attach+0xf8/0x200
>  [2.640918]  acpi_bus_attach+0x98/0x200
>  [2.644740]  acpi_bus_attach+0x98/0x200
>  [2.648562]  acpi_bus_scan+0x48/0x9c
>  [2.652125]  acpi_scan_init+0x104/0x268
>  [2.655948]  acpi_init+0x308/0x374
>  [2.659337]  do_one_initcall+0x48/0x14c
>  [2.663160]  kernel_init_freeable+0x19c/0x250
>  [2.667504]  kernel_init+0x10/0x100
>  [2.670979]  ret_from_fork+0x10/0x18
> 
> The cause is the size of PCI IO resource is 32KB, which is 4K aligned but
> not 64KB aligned, however, ioremap_page_range() request the range as page
> aligned or it will trigger a BUG_ON() on ioremap_pte_range() it calls, as
> ioremap_pte_range increase the addr by PAGE_SIZE, which makes addr != end
> until trigger BUG_ON, if its incoming end is not page aligned. More detail
> trace is as following:
> 
>  ioremap_page_range
>  -> ioremap_p4d_range
> -> ioremap_p4d_range
>-> ioremap_pud_range
>   -> ioremap_pmd_range
>  -> ioremap_pte_range
> 
> This patch avoid panic by return -EINVAL if vaddr or resource size is not
> page aligned.
> 
> Reported-by: Zhou Wang 
> Tested-by: Xiaojun Tan 
> Signed-off-by: Yisheng Xie 
> ---
> v3:
>  - pci_remap_iospace() sanitize its arguments instead - per Rafael
> 
> v2:
>  - Let the caller of ioremap_page_range() align the request by PAGE_SIZE - 
> per Toshi
> 
>  drivers/pci/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index dbfe7c4..0eb0381 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3544,6 +3544,9 @@ int pci_remap_iospace(const struct resource *res, 
> phys_addr_t phys_addr)
>   if (res->end > IO_SPACE_LIMIT)
>   return -EINVAL;
>  
> + if (!PAGE_ALIGNED(vaddr) || !PAGE_ALIGNED(resource_size(res)))
> + return -EINVAL;

Most other callers of ioremap_page_range() are in the ioremap() path,
and they align phys_addr themselves.  In some cases that results in a
mapping that covers more than necessary.  For instance, see the
function comment at the x86 version of __ioremap_caller().

Is there any reason we couldn't similarly align vaddr and phys_addr
here?

The acpi_pci_probe_root_resources() path you mention above basically
ignores the errors you're returning.  Your patches will avoid the
panic, which is an improvement, but I/O port space will not work, and
I don't see anything that gives the user a hint about why not.

If we could align vaddr and phys_addr (and possibly map more than
necessary), I/O port space would still work.

>   return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
> pgprot_device(PAGE_KERNEL));
>  #else
> -- 
> 1.7.12.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers

2018-06-05 Thread Joel Fernandes

(Resending since Andy wasn't on CC - sorry)

Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" 
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
> preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
> preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
> preempt -10662...2 500012us : 
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -10691d..10us@: atomic_sect_run
> irq dis -10691d..1 51us : atomic_sect_run
> irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -10691d..1 55us : 
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<---

From: "Joel Fernandes (Google)" 
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
preempt -10662...2 500012us : 
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -10691d..10us@: atomic_sect_run
irq dis -10691d..1 51us : atomic_sect_run
irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
irq dis -10691d..1 55us : 
 => ret_from_fork

Co-developed-by: Erick Reyes 
Cc: Andy Shevchenko 
Signed-off-by: Joel Fernandes (Google) 
---
 lib/Kconfig.debug  |  8 
 lib/Makefile   |  1 +
 lib/test_atomic_sections.c | 77 ++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+   tristate "Simulate atomic sections for tracers to detect"
+   depends on m
+   help
+ Select this option to build a test module that can help test atomic
+ sections by simulating them with a duration supplied as a module
+ parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
tristate "Test CONFIG_DEBUG_VIRTUAL feature"
depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index ..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling

Re: [PATCH v8 7/8] lib: Add module to simulate atomic sections for testing preemptoff tracers

2018-06-05 Thread Joel Fernandes

(Resending since Andy wasn't on CC - sorry)

Hi,

On Tue, May 29, 2018 at 05:04:59PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" 
> 
> In this patch we introduce a test module for simulating a long atomic
> section in the kernel which the preemptoff or irqsoff tracers can
> detect. This module is to be used only for test purposes and is default
> disabled.
> 
> Following is the expected output (only briefly shown) that can be parsed
> to verify that the tracers are working correctly. We will use this from
> the kselftests in future patches.
> 
> For the preemptoff tracer:
> 
> echo preemptoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
> sleep 1
> bash-4.3# cat /d/tracing/trace
> preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
> preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
> preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
> preempt -10662...2 500012us : 
>  => kthread
>  => ret_from_fork
> 
> For the irqsoff tracer:
> 
> echo irqsoff > /d/tracing/current_tracer
> sleep 1
> insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
> sleep 1
> bash-4.3# cat /d/tracing/trace
> irq dis -10691d..10us@: atomic_sect_run
> irq dis -10691d..1 51us : atomic_sect_run
> irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
> irq dis -10691d..1 55us : 
>  => ret_from_fork

Andy, previously made some suggestions to this patch. The updated version is
below and I am planning to send it along with this series as v9. I have
included it in advance below for your Review.

Andy, would you be Ok with adding your Reviewed-by to it?

---8<---

From: "Joel Fernandes (Google)" 
Date: Wed, 16 May 2018 23:46:06 -0700
Subject: [PATCH v9 7/8] lib: Add module to simulate atomic sections for testing
 preemptoff tracers

In this patch we introduce a test module for simulating a long atomic
section in the kernel which the preemptoff or irqsoff tracers can
detect. This module is to be used only for test purposes and is default
disabled.

Following is the expected output (only briefly shown) that can be parsed
to verify that the tracers are working correctly. We will use this from
the kselftests in future patches.

For the preemptoff tracer:

echo preemptoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=preempt atomic_time=50
sleep 1
bash-4.3# cat /d/tracing/trace
preempt -10662...20us@: atomic_sect_run <-atomic_sect_run
preempt -10662...2 52us : atomic_sect_run <-atomic_sect_run
preempt -10662...2 54us : tracer_preempt_on <-atomic_sect_run
preempt -10662...2 500012us : 
 => kthread
 => ret_from_fork

For the irqsoff tracer:

echo irqsoff > /d/tracing/current_tracer
sleep 1
insmod ./test_atomic_sections.ko atomic_mode=irq atomic_time=50
sleep 1
bash-4.3# cat /d/tracing/trace
irq dis -10691d..10us@: atomic_sect_run
irq dis -10691d..1 51us : atomic_sect_run
irq dis -10691d..1 52us : tracer_hardirqs_on <-atomic_sect_run
irq dis -10691d..1 55us : 
 => ret_from_fork

Co-developed-by: Erick Reyes 
Cc: Andy Shevchenko 
Signed-off-by: Joel Fernandes (Google) 
---
 lib/Kconfig.debug  |  8 
 lib/Makefile   |  1 +
 lib/test_atomic_sections.c | 77 ++
 3 files changed, 86 insertions(+)
 create mode 100644 lib/test_atomic_sections.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..faebf0fe3bcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1933,6 +1933,14 @@ config TEST_KMOD
 
  If unsure, say N.
 
+config TEST_ATOMIC_SECTIONS
+   tristate "Simulate atomic sections for tracers to detect"
+   depends on m
+   help
+ Select this option to build a test module that can help test atomic
+ sections by simulating them with a duration supplied as a module
+ parameter. Preempt disable and irq disable modes can be requested.
+
 config TEST_DEBUG_VIRTUAL
tristate "Test CONFIG_DEBUG_VIRTUAL feature"
depends on DEBUG_VIRTUAL
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..e82cf5445b7b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,6 +46,7 @@ obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
 obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
+obj-$(CONFIG_TEST_ATOMIC_SECTIONS) += test_atomic_sections.o
 obj-y += kstrtox.o
 obj-$(CONFIG_FIND_BIT_BENCHMARK) += find_bit_benchmark.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
diff --git a/lib/test_atomic_sections.c b/lib/test_atomic_sections.c
new file mode 100644
index ..1eef518f0974
--- /dev/null
+++ b/lib/test_atomic_sections.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Atomic section emulation test module
+ *
+ * Emulates atomic sections by disabling

linux-next: merge window

2018-06-05 Thread Stephen Rothwell

Hi all,

Please do *not* add any v4.19 related material to your linux-next
included branches until after v4.18-rc1 has been released i.e. during
the current merge window.

-- 
Cheers,
Stephen Rothwell


pgpjhWrkCQJDU.pgp
Description: OpenPGP digital signature

linux-next: merge window

2018-06-05 Thread Stephen Rothwell

Hi all,

Please do *not* add any v4.19 related material to your linux-next
included branches until after v4.18-rc1 has been released i.e. during
the current merge window.

-- 
Cheers,
Stephen Rothwell


pgpjhWrkCQJDU.pgp
Description: OpenPGP digital signature

Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 4:20 PM Alexey Dobriyan  wrote:
>
> This is Broadwell Xeon E5-2620 v4.
> Which is somewhat strange indeed because it should be modern enough.

Yeah, odd.

Here's the benchmark I used:

  #define SIZE 4068

  int main(int argc, char **argv)
  {
int i;
unsigned char buffer[SIZE], *p;

for (i = 0; i < 100; i++)
asm volatile(
"1: movq %[zero],(%[mem]); addq %[eight],%[mem]; decl
%[count]; jne 1b"
: [mem] "=r" (p)
: [zero] "i" (0l), [eight] "i" (8l),
 "0" (buffer), [count] "r" (SIZE/8));
  }

where you can change that "i" for [zero] and [eight] to be "r" to get
the register version.

I just timed it, because I'm lazy and perf seemed to be overkill.

It might be some very specific loop buffer issue or something.

Or maybe my benchmark above is broken, I didn't really verify that the
end result was any good (I just did an objdump to verify the asm code
superficially).

 Linus

Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

2018-06-05 Thread Linus Torvalds

On Tue, Jun 5, 2018 at 4:20 PM Alexey Dobriyan  wrote:
>
> This is Broadwell Xeon E5-2620 v4.
> Which is somewhat strange indeed because it should be modern enough.

Yeah, odd.

Here's the benchmark I used:

  #define SIZE 4068

  int main(int argc, char **argv)
  {
int i;
unsigned char buffer[SIZE], *p;

for (i = 0; i < 100; i++)
asm volatile(
"1: movq %[zero],(%[mem]); addq %[eight],%[mem]; decl
%[count]; jne 1b"
: [mem] "=r" (p)
: [zero] "i" (0l), [eight] "i" (8l),
 "0" (buffer), [count] "r" (SIZE/8));
  }

where you can change that "i" for [zero] and [eight] to be "r" to get
the register version.

I just timed it, because I'm lazy and perf seemed to be overkill.

It might be some very specific loop buffer issue or something.

Or maybe my benchmark above is broken, I didn't really verify that the
end result was any good (I just did an objdump to verify the asm code
superficially).

 Linus

Re: [PATCH 2/2] PM / devfreq: Generic cpufreq governor

2018-06-05 Thread Saravana Kannan





On 05/27/2018 11:00 PM, MyungJoo Ham wrote:

Many CPU architectures have caches that can scale independent of the CPUs.
Frequency scaling of the caches is necessary to make sure the cache is not
a performance bottleneck that leads to poor performance and power. The same
idea applies for RAM/DDR.

To achieve this, this patch series adds a generic devfreq governor that can
listen to the frequency transitions of each CPU frequency domain and then
adjusts the frequency of the cache (or any devfreq device) based on the
frequency of the CPUs.

I agree that we have some hardware pieces that want to configure
frequencies based on the CPUfreq.

Creating a devfreq governor that configures devfreq-freq
based on incoming events (CPUFreq-transition-event in this case)
is indeed a good idea.

However, I would like to ask the followings:
The overall code appears to be overly complex compared what you need.
- Do you really need to revive "CPUFREQ POLICY" events for this?
especially when you are going to look at "first CPU" only?


Cheers,
MyungJoo

Sorry, didn't notice this email earlier. My message filters seem to be 
messed up.


The POLICY notifiers are necessary for cases when all CPUs in a policy 
are hotplugged off -- we need to ignore their frequencies to avoid 
getting the devfreq device stuck at a high frequency. Looking at "first 
CPU" is just an optimization to ignore multiple transition notifiers for 
the each CPU in a policy -- we'd want to do that even if we don't have 
policy notifiers. Not having policy notifier won't really simplify the 
code by much. We'd be forced to check for policy->related_cpus for every 
transition notifier call if the CPU state hasn't been already initialized.


-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 2/2] PM / devfreq: Generic cpufreq governor

2018-06-05 Thread Saravana Kannan





On 05/27/2018 11:00 PM, MyungJoo Ham wrote:

Many CPU architectures have caches that can scale independent of the CPUs.
Frequency scaling of the caches is necessary to make sure the cache is not
a performance bottleneck that leads to poor performance and power. The same
idea applies for RAM/DDR.

To achieve this, this patch series adds a generic devfreq governor that can
listen to the frequency transitions of each CPU frequency domain and then
adjusts the frequency of the cache (or any devfreq device) based on the
frequency of the CPUs.

I agree that we have some hardware pieces that want to configure
frequencies based on the CPUfreq.

Creating a devfreq governor that configures devfreq-freq
based on incoming events (CPUFreq-transition-event in this case)
is indeed a good idea.

However, I would like to ask the followings:
The overall code appears to be overly complex compared what you need.
- Do you really need to revive "CPUFREQ POLICY" events for this?
especially when you are going to look at "first CPU" only?


Cheers,
MyungJoo

Sorry, didn't notice this email earlier. My message filters seem to be 
messed up.


The POLICY notifiers are necessary for cases when all CPUs in a policy 
are hotplugged off -- we need to ignore their frequencies to avoid 
getting the devfreq device stuck at a high frequency. Looking at "first 
CPU" is just an optimization to ignore multiple transition notifiers for 
the each CPU in a policy -- we'd want to do that even if we don't have 
policy notifiers. Not having policy notifier won't really simplify the 
code by much. We'd be forced to check for policy->related_cpus for every 
transition notifier call if the CPU state hasn't been already initialized.


-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

2018-06-05 Thread Alexey Dobriyan

On Tue, Jun 05, 2018 at 04:04:37PM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
>  wrote:
> >
> > On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan  wrote:
> > >
> > > On my potato performance increase is 33%, sheesh.
> > > And CPU starts doing 3 instructions per cycle vs 2.
> >
> > Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> > makes absolutely no difference whether the values are integers or in
> > registers.
> 
> In fact, looking at Agner Fog's instruction lists, I don't see any CPU
> where it would make a difference, except for the P4 (where the
> immediate looks like it's a bad idea because it's an extra uop, but it
> might pack fine and not be noticeable).
> 
> But maybe I'm missing something subtle. What CPU, out of morbid interest?

This is Broadwell Xeon E5-2620 v4.
Which is somewhat strange indeed because it should be modern enough.

Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

2018-06-05 Thread Alexey Dobriyan

On Tue, Jun 05, 2018 at 04:04:37PM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
>  wrote:
> >
> > On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan  wrote:
> > >
> > > On my potato performance increase is 33%, sheesh.
> > > And CPU starts doing 3 instructions per cycle vs 2.
> >
> > Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> > makes absolutely no difference whether the values are integers or in
> > registers.
> 
> In fact, looking at Agner Fog's instruction lists, I don't see any CPU
> where it would make a difference, except for the P4 (where the
> immediate looks like it's a bad idea because it's an extra uop, but it
> might pack fine and not be noticeable).
> 
> But maybe I'm missing something subtle. What CPU, out of morbid interest?

This is Broadwell Xeon E5-2620 v4.
Which is somewhat strange indeed because it should be modern enough.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1606 matches

Mail list logo