date:20190418

On Tue, Apr 16, 2019 at 03:45:00PM +0200, Laurent Dufour wrote:
> From: Peter Zijlstra 
> 
> Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
> counts such that we can easily test if a VMA is changed.
> 
> The calls to vm_write_begin/end() in unmap_page_range() are
> used to detect when a VMA is being unmap and thus that new page fault
> should not be satisfied for this VMA. If the seqcount hasn't changed when
> the page table are locked, this means we are safe to satisfy the page
> fault.
> 
> The flip side is that we cannot distinguish between a vma_adjust() and
> the unmap_page_range() -- where with the former we could have
> re-checked the vma bounds against the address.
> 
> The VMA's sequence counter is also used to detect change to various VMA's
> fields used during the page fault handling, such as:
>  - vm_start, vm_end
>  - vm_pgoff
>  - vm_flags, vm_page_prot
>  - vm_policy

^ All above are under mmap write lock ?

>  - anon_vma

^ This is either under mmap write lock or under page table lock

So my question is do we need the complexity of seqcount_t for this ?

It seems that using regular int as counter and also relying on vm_flags
when vma is unmap should do the trick.

vma_delete(struct vm_area_struct *vma)
{
...
/*
 * Make sure the vma is mark as invalid ie neither read nor write
 * so that speculative fault back off. A racing speculative fault
 * will either see the flags as 0 or the new seqcount.
 */
vma->vm_flags = 0;
smp_wmb();
vma->seqcount++;
...
}

Then:
speculative_fault_begin(struct vm_area_struct *vma,
struct spec_vmf *spvmf)
{
...
spvmf->seqcount = vma->seqcount;
smp_rmb();
spvmf->vm_flags = vma->vm_flags;
if (!spvmf->vm_flags) {
// Back off the vma is dying ...
...
}
}

bool speculative_fault_commit(struct vm_area_struct *vma,
  struct spec_vmf *spvmf)
{
...
seqcount = vma->seqcount;
smp_rmb();
vm_flags = vma->vm_flags;

if (spvmf->vm_flags != vm_flags || seqcount != spvmf->seqcount) {
// Something did change for the vma
return false;
}
return true;
}

This would also avoid the lockdep issue described below. But maybe what
i propose is stupid and i will see it after further reviewing thing.


Cheers,
Jérôme


> 
> Signed-off-by: Peter Zijlstra (Intel) 
> 
> [Port to 4.12 kernel]
> [Build depends on CONFIG_SPECULATIVE_PAGE_FAULT]
> [Introduce vm_write_* inline function depending on
>  CONFIG_SPECULATIVE_PAGE_FAULT]
> [Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by
>  using vm_raw_write* functions]
> [Fix a lock dependency warning in mmap_region() when entering the error
>  path]
> [move sequence initialisation INIT_VMA()]
> [Review the patch description about unmap_page_range()]
> Signed-off-by: Laurent Dufour 
> ---
>  include/linux/mm.h   | 44 
>  include/linux/mm_types.h |  3 +++
>  mm/memory.c  |  2 ++
>  mm/mmap.c| 30 +++
>  4 files changed, 79 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2ceb1d2869a6..906b9e06f18e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1410,6 +1410,9 @@ struct zap_details {
>  static inline void INIT_VMA(struct vm_area_struct *vma)
>  {
>   INIT_LIST_HEAD(&vma->anon_vma_chain);
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + seqcount_init(&vma->vm_sequence);
> +#endif
>  }
>  
>  struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> @@ -1534,6 +1537,47 @@ static inline void unmap_shared_mapping_range(struct 
> address_space *mapping,
>   unmap_mapping_range(mapping, holebegin, holelen, 0);
>  }
>  
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +static inline void vm_write_begin(struct vm_area_struct *vma)
> +{
> + write_seqcount_begin(&vma->vm_sequence);
> +}
> +static inline void vm_write_begin_nested(struct vm_area_struct *vma,
> +  int subclass)
> +{
> + write_seqcount_begin_nested(&vma->vm_sequence, subclass);
> +}
> +static inline void vm_write_end(struct vm_area_struct *vma)
> +{
> + write_seqcount_end(&vma->vm_sequence);
> +}
> +static inline void vm_raw_write_begin(struct vm_area_struct *vma)
> +{
> + raw_write_seqcount_begin(&vma->vm_sequence);
> +}
> +static inline void vm_raw_write_end(struct vm_area_struct *vma)
> +{
> + raw_write_seqcount_end(&vma->vm_sequence);
> +}
> +#else
> +static inline void vm_write_begin(struct vm_area_struct *vma)
> +{
> +}
> +static inline void vm_write_begin_nested(struct vm_area_struct *vma,
> +  int subclass)
> +{
> +}
> +static inline void vm_write_end(struct vm_area_struct *vma)
> +{
> +}
> +static inline void vm_raw_write_begin(struct vm_area_struct *vma)
> +{
> +}
> +static inline void vm_raw_write_end(struct vm_area_

Re: [PATCH 0/2] disable NUMA affinity reassignments at runtime

Michal Suchánek  writes:

> On Thu, 18 Apr 2019 13:56:56 -0500
> Nathan Lynch  wrote:
>
> Hello,
>
>> Changing cpu <-> node relationships at runtime, as the pseries
>> platform code attempts to do for LPM, PRRN, and VPHN is essentially
>> unsupported by core subsystems. [1]
>
> Wasn't there a patch that solves the discrepancy by removing and
> re-adding the updated CPUs?
>
> http://patchwork.ozlabs.org/patch/1051761/

In our testing it seems that changing the result of cpu_to_node() for a
given cpu id, even with an intervening offline/online, leads to crashes
and assertions in the slab allocator. There have been some ideas floated
to sidestep that but I think there are larger issues to consider.

Even if changing CPU node assignments were possible to do without
destabilizing the system it's not all that useful without updating
memory/LMB affinity as well. (VPHN is an exception.)

Furthermore I'm not aware of any effort to make the numa/affinity APIs
at the system call level account for the possibility that the cpu/mem
<-> node relationship could be dynamic. Nor is there any facility for
notifying applications of changes. Even if the kernel were to properly
support this internally, NUMA-aware applications are the ones that will
suffer unless appropriate APIs are provided to them.

To me it seems this all needs more careful consideration and work, and
much of it should happen outside of this particular arch/platform
context.

Re: [PATCH v12 08/31] mm: introduce INIT_VMA()

On Tue, Apr 16, 2019 at 03:44:59PM +0200, Laurent Dufour wrote:
> Some VMA struct fields need to be initialized once the VMA structure is
> allocated.
> Currently this only concerns anon_vma_chain field but some other will be
> added to support the speculative page fault.
> 
> Instead of spreading the initialization calls all over the code, let's
> introduce a dedicated inline function.
> 
> Signed-off-by: Laurent Dufour 
> ---
>  fs/exec.c  | 1 +
>  include/linux/mm.h | 5 +
>  kernel/fork.c  | 2 +-
>  mm/mmap.c  | 3 +++
>  mm/nommu.c | 1 +
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 2e0033348d8e..9762e060295c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -266,6 +266,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
>   vma->vm_start = vma->vm_end - PAGE_SIZE;
>   vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | 
> VM_STACK_INCOMPLETE_SETUP;
>   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> + INIT_VMA(vma);
>  
>   err = insert_vm_struct(mm, vma);
>   if (err)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4ba2f53f9d60..2ceb1d2869a6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1407,6 +1407,11 @@ struct zap_details {
>   pgoff_t last_index; /* Highest page->index to unmap 
> */
>  };
>  
> +static inline void INIT_VMA(struct vm_area_struct *vma)

Can we leave capital names for macro ? Also i prefer vma_init_struct() (the
one thing i like in C++ is namespace and thus i like namespace_action() for
function name).

Also why not doing a coccinelle patch for this:

@@
struct vm_area_struct *vma;
@@
-INIT_LIST_HEAD(&vma->anon_vma_chain);
+vma_init_struct(vma);


Untested ...

> +{
> + INIT_LIST_HEAD(&vma->anon_vma_chain);
> +}
> +
>  struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>pte_t pte, bool with_public_device);
>  #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 915be4918a2b..f8dae021c2e5 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -341,7 +341,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
> *orig)
>  
>   if (new) {
>   *new = *orig;
> - INIT_LIST_HEAD(&new->anon_vma_chain);
> + INIT_VMA(new);
>   }
>   return new;
>  }
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bd7b9f293b39..5ad3a3228d76 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1765,6 +1765,7 @@ unsigned long mmap_region(struct file *file, unsigned 
> long addr,
>   vma->vm_flags = vm_flags;
>   vma->vm_page_prot = vm_get_page_prot(vm_flags);
>   vma->vm_pgoff = pgoff;
> + INIT_VMA(vma);
>  
>   if (file) {
>   if (vm_flags & VM_DENYWRITE) {
> @@ -3037,6 +3038,7 @@ static int do_brk_flags(unsigned long addr, unsigned 
> long len, unsigned long fla
>   }
>  
>   vma_set_anonymous(vma);
> + INIT_VMA(vma);
>   vma->vm_start = addr;
>   vma->vm_end = addr + len;
>   vma->vm_pgoff = pgoff;
> @@ -3395,6 +3397,7 @@ static struct vm_area_struct *__install_special_mapping(
>   if (unlikely(vma == NULL))
>   return ERR_PTR(-ENOMEM);
>  
> + INIT_VMA(vma);
>   vma->vm_start = addr;
>   vma->vm_end = addr + len;
>  
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 749276beb109..acf7ca72ca90 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1210,6 +1210,7 @@ unsigned long do_mmap(struct file *file,
>   region->vm_flags = vm_flags;
>   region->vm_pgoff = pgoff;
>  
> + INIT_VMA(vma);
>   vma->vm_flags = vm_flags;
>   vma->vm_pgoff = pgoff;
>  
> -- 
> 2.21.0
>

Re: [PATCH v12 07/31] mm: make pte_unmap_same compatible with SPF

On Tue, Apr 16, 2019 at 03:44:58PM +0200, Laurent Dufour wrote:
> pte_unmap_same() is making the assumption that the page table are still
> around because the mmap_sem is held.
> This is no more the case when running a speculative page fault and
> additional check must be made to ensure that the final page table are still
> there.
> 
> This is now done by calling pte_spinlock() to check for the VMA's
> consistency while locking for the page tables.
> 
> This is requiring passing a vm_fault structure to pte_unmap_same() which is
> containing all the needed parameters.
> 
> As pte_spinlock() may fail in the case of a speculative page fault, if the
> VMA has been touched in our back, pte_unmap_same() should now return 3
> cases :
>   1. pte are the same (0)
>   2. pte are different (VM_FAULT_PTNOTSAME)
>   3. a VMA's changes has been detected (VM_FAULT_RETRY)
> 
> The case 2 is handled by the introduction of a new VM_FAULT flag named
> VM_FAULT_PTNOTSAME which is then trapped in cow_user_page().
> If VM_FAULT_RETRY is returned, it is passed up to the callers to retry the
> page fault while holding the mmap_sem.
> 
> Acked-by: David Rientjes 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Jérôme Glisse 


> ---
>  include/linux/mm_types.h |  6 +-
>  mm/memory.c  | 37 +++--
>  2 files changed, 32 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 8ec38b11b361..fd7d38ee2e33 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -652,6 +652,8 @@ typedef __bitwise unsigned int vm_fault_t;
>   * @VM_FAULT_NEEDDSYNC:  ->fault did not modify page tables and 
> needs
>   *   fsync() to complete (for synchronous page faults
>   *   in DAX)
> + * @VM_FAULT_PTNOTSAME   Page table entries have changed during a
> + *   speculative page fault handling.
>   * @VM_FAULT_HINDEX_MASK:mask HINDEX value
>   *
>   */
> @@ -669,6 +671,7 @@ enum vm_fault_reason {
>   VM_FAULT_FALLBACK   = (__force vm_fault_t)0x000800,
>   VM_FAULT_DONE_COW   = (__force vm_fault_t)0x001000,
>   VM_FAULT_NEEDDSYNC  = (__force vm_fault_t)0x002000,
> + VM_FAULT_PTNOTSAME  = (__force vm_fault_t)0x004000,
>   VM_FAULT_HINDEX_MASK= (__force vm_fault_t)0x0f,
>  };
>  
> @@ -693,7 +696,8 @@ enum vm_fault_reason {
>   { VM_FAULT_RETRY,   "RETRY" },  \
>   { VM_FAULT_FALLBACK,"FALLBACK" },   \
>   { VM_FAULT_DONE_COW,"DONE_COW" },   \
> - { VM_FAULT_NEEDDSYNC,   "NEEDDSYNC" }
> + { VM_FAULT_NEEDDSYNC,   "NEEDDSYNC" },  \
> + { VM_FAULT_PTNOTSAME,   "PTNOTSAME" }
>  
>  struct vm_special_mapping {
>   const char *name;   /* The name, e.g. "[vdso]". */
> diff --git a/mm/memory.c b/mm/memory.c
> index 221ccdf34991..d5bebca47d98 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2094,21 +2094,29 @@ static inline bool pte_map_lock(struct vm_fault *vmf)
>   * parts, do_swap_page must check under lock before unmapping the pte and
>   * proceeding (but do_wp_page is only called after already making such a 
> check;
>   * and do_anonymous_page can safely check later on).
> + *
> + * pte_unmap_same() returns:
> + *   0   if the PTE are the same
> + *   VM_FAULT_PTNOTSAME  if the PTE are different
> + *   VM_FAULT_RETRY  if the VMA has changed in our back during
> + *   a speculative page fault handling.
>   */
> -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
> - pte_t *page_table, pte_t orig_pte)
> +static inline vm_fault_t pte_unmap_same(struct vm_fault *vmf)
>  {
> - int same = 1;
> + int ret = 0;
> +
>  #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
>   if (sizeof(pte_t) > sizeof(unsigned long)) {
> - spinlock_t *ptl = pte_lockptr(mm, pmd);
> - spin_lock(ptl);
> - same = pte_same(*page_table, orig_pte);
> - spin_unlock(ptl);
> + if (pte_spinlock(vmf)) {
> + if (!pte_same(*vmf->pte, vmf->orig_pte))
> + ret = VM_FAULT_PTNOTSAME;
> + spin_unlock(vmf->ptl);
> + } else
> + ret = VM_FAULT_RETRY;
>   }
>  #endif
> - pte_unmap(page_table);
> - return same;
> + pte_unmap(vmf->pte);
> + return ret;
>  }
>  
>  static inline void cow_user_page(struct page *dst, struct page *src, 
> unsigned long va, struct vm_area_struct *vma)
> @@ -2714,8 +2722,17 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>   int exclusive = 0;
>   vm_fault_t ret = 0;
>  
> - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
> + ret = pte_unmap_same(vmf);
> + if (ret) {
> + /*
> +

Re: [PATCH v12 06/31] mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE

On Tue, Apr 16, 2019 at 03:44:57PM +0200, Laurent Dufour wrote:
> When handling page fault without holding the mmap_sem the fetch of the
> pte lock pointer and the locking will have to be done while ensuring
> that the VMA is not touched in our back.
> 
> So move the fetch and locking operations in a dedicated function.
> 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Jérôme Glisse 


> ---
>  mm/memory.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index fc3698d13cb5..221ccdf34991 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2073,6 +2073,13 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
> long addr,
>  }
>  EXPORT_SYMBOL_GPL(apply_to_page_range);
>  
> +static inline bool pte_spinlock(struct vm_fault *vmf)
> +{
> + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> + spin_lock(vmf->ptl);
> + return true;
> +}
> +
>  static inline bool pte_map_lock(struct vm_fault *vmf)
>  {
>   vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
> @@ -3656,8 +3663,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>* validation through pte_unmap_same(). It's of NUMA type but
>* the pfn may be screwed if the read is non atomic.
>*/
> - vmf->ptl = pte_lockptr(vma->vm_mm, vmf->pmd);
> - spin_lock(vmf->ptl);
> + if (!pte_spinlock(vmf))
> + return VM_FAULT_RETRY;
>   if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) {
>   pte_unmap_unlock(vmf->pte, vmf->ptl);
>   goto out;
> @@ -3850,8 +3857,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>   if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
>   return do_numa_page(vmf);
>  
> - vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> - spin_lock(vmf->ptl);
> + if (!pte_spinlock(vmf))
> + return VM_FAULT_RETRY;
>   entry = vmf->orig_pte;
>   if (unlikely(!pte_same(*vmf->pte, entry)))
>   goto unlock;
> -- 
> 2.21.0
>

Re: [PATCH v12 05/31] mm: prepare for FAULT_FLAG_SPECULATIVE

On Tue, Apr 16, 2019 at 03:44:56PM +0200, Laurent Dufour wrote:
> From: Peter Zijlstra 
> 
> When speculating faults (without holding mmap_sem) we need to validate
> that the vma against which we loaded pages is still valid when we're
> ready to install the new PTE.
> 
> Therefore, replace the pte_offset_map_lock() calls that (re)take the
> PTL with pte_map_lock() which can fail in case we find the VMA changed
> since we started the fault.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> 
> [Port to 4.12 kernel]
> [Remove the comment about the fault_env structure which has been
>  implemented as the vm_fault structure in the kernel]
> [move pte_map_lock()'s definition upper in the file]
> [move the define of FAULT_FLAG_SPECULATIVE later in the series]
> [review error path in do_swap_page(), do_anonymous_page() and
>  wp_page_copy()]
> Signed-off-by: Laurent Dufour 

Reviewed-by: Jérôme Glisse 

> ---
>  mm/memory.c | 87 +++--
>  1 file changed, 58 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index c6ddadd9d2b7..fc3698d13cb5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2073,6 +2073,13 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
> long addr,
>  }
>  EXPORT_SYMBOL_GPL(apply_to_page_range);
>  
> +static inline bool pte_map_lock(struct vm_fault *vmf)

I am not fan of the name maybe pte_offset_map_lock_if_valid() ? But
that just a taste thing. So feel free to ignore this comment.


> +{
> + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
> +vmf->address, &vmf->ptl);
> + return true;
> +}
> +
>  /*
>   * handle_pte_fault chooses page fault handler according to an entry which 
> was
>   * read non-atomically.  Before making any commitment, on those architectures

Re: [PATCH RFT V3 8/8] clk: core: replace clk_{readl, writel} with {readl, writel}

Quoting Jonas Gorski (2019-04-18 04:12:11)
> Now that clk_{readl,writel} is just an alias for {readl,writel}, we can
> switch all users of clk_* to use the accessors directly and remove the
> helpers.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 7/8] clk: core: remove powerpc special handling

Quoting Jonas Gorski (2019-04-18 04:12:10)
> Now that the powerpc clocks are properly marked as big endian, we can
> remove the special handling for PowerPC.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 6/8] powerpc/512x: mark clocks as big endian

Quoting Jonas Gorski (2019-04-18 04:12:09)
> These clocks' registers are accessed as big endian, so mark them as
> such.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 5/8] clk: mux: add explicit big endian support

Quoting Jonas Gorski (2019-04-18 04:12:08)
> Add a clock specific flag to switch register accesses to big endian, to
> allow runtime configuration of big endian mux clocks.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 4/8] clk: multiplier: add explicit big endian support

Quoting Jonas Gorski (2019-04-18 04:12:07)
> Add a clock specific flag to switch register accesses to big endian, to
> allow runtime configuration of big endian multiplier clocks.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 3/8] clk: gate: add explicit big endian support

Quoting Jonas Gorski (2019-04-18 04:12:06)
> Add a clock specific flag to switch register accesses to big endian, to
> allow runtime configuration of big endian gated clocks.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH v12 04/31] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT

On Tue, Apr 16, 2019 at 03:41:56PM +0100, Mark Rutland wrote:
> On Tue, Apr 16, 2019 at 04:31:27PM +0200, Laurent Dufour wrote:
> > Le 16/04/2019 à 16:27, Mark Rutland a écrit :
> > > On Tue, Apr 16, 2019 at 03:44:55PM +0200, Laurent Dufour wrote:
> > > > From: Mahendran Ganesh 
> > > > 
> > > > Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for arm64. This
> > > > enables Speculative Page Fault handler.
> > > > 
> > > > Signed-off-by: Ganesh Mahendran 
> > > 
> > > This is missing your S-o-B.
> > 
> > You're right, I missed that...
> > 
> > > The first patch noted that the ARCH_SUPPORTS_* option was there because
> > > the arch code had to make an explicit call to try to handle the fault
> > > speculatively, but that isn't addeed until patch 30.
> > > 
> > > Why is this separate from that code?
> > 
> > Andrew was recommended this a long time ago for bisection purpose. This
> > allows to build the code with CONFIG_SPECULATIVE_PAGE_FAULT before the code
> > that trigger the spf handler is added to the per architecture's code.
> 
> Ok. I think it would be worth noting that in the commit message, to
> avoid anyone else asking the same question. :)

Should have read this thread before looking at x86 and ppc :)

In any case the patch is:

Reviewed-by: Jérôme Glisse

Re: [PATCH RFT V3 2/8] clk: fractional-divider: add explicit big endian support

Quoting Jonas Gorski (2019-04-18 04:12:05)
> Add a clock specific flag to switch register accesses to big endian, to
> allow runtime configuration of big endian fractional divider clocks.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH RFT V3 1/8] clk: divider: add explicit big endian support

Quoting Jonas Gorski (2019-04-18 04:12:04)
> Add a clock specific flag to switch register accesses to big endian, to
> allow runtime configuration of big endian divider clocks.
> 
> Signed-off-by: Jonas Gorski 
> ---

Applied to clk-next

Re: [PATCH v12 03/31] powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT

On Tue, Apr 16, 2019 at 03:44:54PM +0200, Laurent Dufour wrote:
> Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT for BOOK3S_64. This enables
> the Speculative Page Fault handler.
> 
> Support is only provide for BOOK3S_64 currently because:
> - require CONFIG_PPC_STD_MMU because checks done in
>   set_access_flags_filter()
> - require BOOK3S because we can't support for book3e_hugetlb_preload()
>   called by update_mmu_cache()
> 
> Cc: Michael Ellerman 
> Signed-off-by: Laurent Dufour 

Same comment as for x86.

Reviewed-by: Jérôme Glisse 

> ---
>  arch/powerpc/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2d0be82c3061..a29887ea5383 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -238,6 +238,7 @@ config PPC
>   select PCI_SYSCALL  if PCI
>   select RTC_LIB
>   select SPARSE_IRQ
> + select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT if PPC_BOOK3S_64
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
>   select VIRT_TO_BUS  if !PPC64
> -- 
> 2.21.0
>

Re: [PATCH v12 02/31] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT

On Tue, Apr 16, 2019 at 03:44:53PM +0200, Laurent Dufour wrote:
> Set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT which turns on the
> Speculative Page Fault handler when building for 64bit.
> 
> Cc: Thomas Gleixner 
> Signed-off-by: Laurent Dufour 

I think this patch should be move as last patch in the serie so that
the feature is not enabled mid-way without all the pieces ready if
someone bisect. But i have not review everything yet so maybe it is
fine.

Reviewed-by: Jérôme Glisse 

> ---
>  arch/x86/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0f2ab09da060..8bd575184d0b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -30,6 +30,7 @@ config X86_64
>   select SWIOTLB
>   select X86_DEV_DMA_OPS
>   select ARCH_HAS_SYSCALL_WRAPPER
> + select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
>  
>  #
>  # Arch settings
> -- 
> 2.21.0
>

Re: [PATCH v12 01/31] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT

On Tue, Apr 16, 2019 at 03:44:52PM +0200, Laurent Dufour wrote:
> This configuration variable will be used to build the code needed to
> handle speculative page fault.
> 
> By default it is turned off, and activated depending on architecture
> support, ARCH_HAS_PTE_SPECIAL, SMP and MMU.
> 
> The architecture support is needed since the speculative page fault handler
> is called from the architecture's page faulting code, and some code has to
> be added there to handle the speculative handler.
> 
> The dependency on ARCH_HAS_PTE_SPECIAL is required because vm_normal_page()
> does processing that is not compatible with the speculative handling in the
> case ARCH_HAS_PTE_SPECIAL is not set.
> 
> Suggested-by: Thomas Gleixner 
> Suggested-by: David Rientjes 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Jérôme Glisse 

Small question below

> ---
>  mm/Kconfig | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 0eada3f818fa..ff278ac9978a 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -761,4 +761,26 @@ config GUP_BENCHMARK
>  config ARCH_HAS_PTE_SPECIAL
>   bool
>  
> +config ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
> +   def_bool n
> +
> +config SPECULATIVE_PAGE_FAULT
> + bool "Speculative page faults"
> + default y
> + depends on ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
> + depends on ARCH_HAS_PTE_SPECIAL && MMU && SMP
> + help
> +   Try to handle user space page faults without holding the mmap_sem.
> +
> +   This should allow better concurrency for massively threaded processes

Is there any case where it does not provide better concurrency ? The
should make me wonder :)

> +   since the page fault handler will not wait for other thread's memory
> +   layout change to be done, assuming that this change is done in
> +   another part of the process's memory space. This type of page fault
> +   is named speculative page fault.
> +
> +   If the speculative page fault fails because a concurrent modification
> +   is detected or because underlying PMD or PTE tables are not yet
> +   allocated, the speculative page fault fails and a classic page fault
> +   is then tried.
> +
>  endmenu
> -- 
> 2.21.0
>

Re: [PATCH 0/2] disable NUMA affinity reassignments at runtime

2019-04-18 Thread Michal Suchánek

On Thu, 18 Apr 2019 13:56:56 -0500
Nathan Lynch  wrote:

Hello,

> Changing cpu <-> node relationships at runtime, as the pseries
> platform code attempts to do for LPM, PRRN, and VPHN is essentially
> unsupported by core subsystems. [1]

Wasn't there a patch that solves the discrepancy by removing and
re-adding the updated CPUs?

http://patchwork.ozlabs.org/patch/1051761/

Thanks

Michal

Re: [PATCH V2] ASoC: fsl_esai: Add pm runtime function

On Thu, Apr 18, 2019 at 11:34:15AM +, S.j. Wang wrote:
> Add pm runtime support and move clock handling there.
> fsl_esai_suspend is replaced by pm_runtime_force_suspend.
> fsl_esai_resume is replaced by pm_runtime_force_resume.

The change looks fine, yet I'd prefer to have a good justification
in the commit message. It could be simply some benefit of doing so
like what you have discussed with Mark.

> Signed-off-by: Shengjiu Wang 

Please add it in next version upon commit message update:

Acked-by: Nicolin Chen 

Thanks

Re: [PATCH V3 1/2] ASoC: fsl_asrc: replace the process_option table with function

On Thu, Apr 18, 2019 at 09:37:06AM +, S.j. Wang wrote:
> > > > And this is according to IMX6DQRM:
> > > > Limited support for the case when output sampling rates is
> > > > between 8kHz and 30kHz. The limitation is the supported ratio
> > > > (Fsin/Fsout) range as between 1/24 to 8
> > > >
> > > > This should cover your 8.125 condition already, even if having an
> > > > outrate range between [8KHz, 30KHz] check, since an outrate above
> > > > 30KHz will not have an inrate bigger than 8.125 times of it, given
> > > > the maximum input rate is 192KHz.
> > > >
> > > > So I think that we can just drop that 8.125 condition from your
> > > > change and there's no need to error out any more.
> > > >
> > > No, if outrate=8kHz,  inrate > 88.2kHz, these cases are not supported.
> > > This is not covered by
> > >
> > > if ((outrate > 8000 && outrate < 3) &&
> > > (outrate/inrate > 24 || inrate/outrate > 8)) {
> > 
> > Good catch. The range should be [8KHz, 30KHz] vs. (8KHz, 32KHz) in the
> > code. Then I think the fix should be at both lines:
> > 
> > - if ((outrate > 8000 && outrate < 3) &&
> > - (outrate/inrate > 24 || inrate/outrate > 8)) {
> > + if ((outrate >= 8000 && outrate =< 3) &&
> > + (outrate > 24 * inrate || inrate > 8 * outrate)) {
> > 
> > Overall, I think we should fix this instead of adding an extra one, since 
> > it is
> > very likely saying the same thing.
> 
> Actually if outrate < 8kHz, there will be issue too.

Here is the thing, the RM doesn't explicitly state that ASRC can
support a lower output sample rate than 8KHz. And I actually had
a concern when reviewing your PATCH-2, as the table of supported
output sample rate no longer matches RM.

If you've verified a lower output sample rate working solid with
the process_option function, that means our driver can go beyond
the limitation mentioned in the RM, then I believe [8KHz, 32KHz]
should be updated too -- that says we can do:
-   if ((outrate > 8000 && outrate < 3) &&
-   (outrate/inrate > 24 || inrate/outrate > 8)) {
+   if ((outrate >= 5512 && outrate =< 3) &&
+   (outrate > 24 * inrate || inrate > 8 * outrate)) {

Actually "ourate > 24 * inrate" is kind of pointless for range
[5KHz, 32KHz] but we can keep it since it matches RM.

[PATCH 2/2] powerpc/numa: document topology_updates_enabled, disable by default

Changing the NUMA associations for CPUs and memory at runtime is
basically unsupported by the core mm, scheduler etc. We see all manner
of crashes, warnings and instability when the pseries code tries to do
this. Disable this behavior by default, and document the switch a bit.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 48c9a97eb2c3..71af382ce1d5 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -908,16 +908,22 @@ static int __init early_numa(char *p)
 }
 early_param("numa", early_numa);
 
-static bool topology_updates_enabled = true;
+/*
+ * The platform can inform us through one of several mechanisms
+ * (post-migration device tree updates, PRRN or VPHN) that the NUMA
+ * assignment of a resource has changed. This controls whether we act
+ * on that. Disabled by default.
+ */
+static bool topology_updates_enabled;
 
 static int __init early_topology_updates(char *p)
 {
if (!p)
return 0;
 
-   if (!strcmp(p, "off")) {
-   pr_info("Disabling topology updates\n");
-   topology_updates_enabled = false;
+   if (!strcmp(p, "on")) {
+   pr_warn("Caution: enabling topology updates\n");
+   topology_updates_enabled = true;
}
 
return 0;
-- 
2.20.1

[PATCH 0/2] disable NUMA affinity reassignments at runtime

Changing cpu <-> node relationships at runtime, as the pseries
platform code attempts to do for LPM, PRRN, and VPHN is essentially
unsupported by core subsystems. [1]

While more significant changes (i.e. discarding all that code) likely
are in store, these patches are a minimally invasive way to disable
the problem behavior in a way that should be suitable for backporting
to -stable and distros, and is an improvement on the current
situation.

Note: this doesn't affect use of VPHN at boot time for detecting
shared processor node assignments. Only runtime VPHN-initiated
reassignments are disabled.

[1] E.g. see the discussion here:
https://lore.kernel.org/lkml/20180831115350.gc8...@linux.vnet.ibm.com/T/#u
 
Nathan Lynch (2):
  powerpc/numa: improve control of topology updates
  powerpc/numa: document topology_updates_enabled, disable by default

 arch/powerpc/mm/numa.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

-- 
2.20.1

[PATCH 1/2] powerpc/numa: improve control of topology updates

When booted with "topology_updates=no", or when "off" is written to
/proc/powerpc/topology_updates, NUMA reassignments are inhibited for
PRRN and VPHN events. However, migration and suspend unconditionally
re-enable reassignments via start_topology_update(). This is
incoherent.

Check the topology_updates_enabled flag in
start/stop_topology_update() so that callers of those APIs need not be
aware of whether reassignments are enabled. This allows the
administrative decision on reassignments to remain in force across
migrations and suspensions.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/mm/numa.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f976676004ad..48c9a97eb2c3 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1498,6 +1498,9 @@ int start_topology_update(void)
 {
int rc = 0;
 
+   if (!topology_updates_enabled)
+   return 0;
+
if (firmware_has_feature(FW_FEATURE_PRRN)) {
if (!prrn_enabled) {
prrn_enabled = 1;
@@ -1531,6 +1534,9 @@ int stop_topology_update(void)
 {
int rc = 0;
 
+   if (!topology_updates_enabled)
+   return 0;
+
if (prrn_enabled) {
prrn_enabled = 0;
 #ifdef CONFIG_SMP
@@ -1588,11 +1594,13 @@ static ssize_t topology_write(struct file *file, const 
char __user *buf,
 
kbuf[read_len] = '\0';
 
-   if (!strncmp(kbuf, "on", 2))
+   if (!strncmp(kbuf, "on", 2)) {
+   topology_updates_enabled = true;
start_topology_update();
-   else if (!strncmp(kbuf, "off", 3))
+   } else if (!strncmp(kbuf, "off", 3)) {
stop_topology_update();
-   else
+   topology_updates_enabled = false;
+   } else
return -EINVAL;
 
return count;
@@ -1607,9 +1615,7 @@ static const struct file_operations topology_ops = {
 
 static int topology_update_init(void)
 {
-   /* Do not poll for changes if disabled at boot */
-   if (topology_updates_enabled)
-   start_topology_update();
+   start_topology_update();
 
if (vphn_enabled)
topology_schedule_update();
-- 
2.20.1

Re: Linux 5.1-rc5

2019-04-18 Thread Martin Schwidefsky

On Thu, 18 Apr 2019 08:49:32 -0700
Linus Torvalds  wrote:

> On Thu, Apr 18, 2019 at 1:02 AM Martin Schwidefsky
>  wrote:
> >
> > The problematic lines in the generic gup code are these three:
> >
> > 1845:   pmdp = pmd_offset(&pud, addr);
> > 1888:   pudp = pud_offset(&p4d, addr);
> > 1916:   p4dp = p4d_offset(&pgd, addr);
> >
> > Passing the pointer of a *copy* of a page table entry to pxd_offset() does
> > not work with the page table folding on s390.  
> 
> Hmm. I wonder why. x86 too does the folding thing for the p4d and pud case.
> 
> The folding works with the local copy just the same way it works with
> the orignal value.

The difference is that with the static page table folding pgd_offset()
does the index calculation of the actual hardware top-level table. With
dynamic page table folding as s390 is doing it, if the task does not use
a 5-level page table pgd_offset() will see a pgd_index() of 0, the indexing
of the actual top-level table is done later with p4d_offset(), pud_offset()
or pmd_offset(). 

As an example, with a three level page table we have three indexes x/y/z.
The common code "thinks" 5 indexing steps, with static folding the index
sequence is x 0 0 y z. With dynamic folding the sequence is 0 0 x y z.
By moving the first indexing operation to pgd_offset the static sequence
does not add an index to a non-dereferenced pointer to a stack variable,
the dynamic sequence does.

> But I see that s390 does some other kind of folding and does that
> addition of the p*d_index() unconditionally.
> 
> I guess that does mean that s390 will just have to have its own walker.
> 
> For the issue of the page refcount overflow it really isn't a huge
> deal. Adding the refcount checking is simple (see the example patch I
> gave for powerpc - you'll just have a couple of extra cases since you
> do it all, rather than just the special hugetlb cases).
> 
> Obviously in general it would have been nicer to share as much code as
> possible, but let's not make things unnecessarily complex if s390 is
> just fundamentally different..

It would have been nice to use the generic code (less bugs) but not at
the price of over-complicating things. And that page table folding thing
always makes my head hurt.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

Re: Linux 5.1-rc5

2019-04-18 Thread Linus Torvalds

On Thu, Apr 18, 2019 at 1:02 AM Martin Schwidefsky
 wrote:
>
> The problematic lines in the generic gup code are these three:
>
> 1845:   pmdp = pmd_offset(&pud, addr);
> 1888:   pudp = pud_offset(&p4d, addr);
> 1916:   p4dp = p4d_offset(&pgd, addr);
>
> Passing the pointer of a *copy* of a page table entry to pxd_offset() does
> not work with the page table folding on s390.

Hmm. I wonder why. x86 too does the folding thing for the p4d and pud case.

The folding works with the local copy just the same way it works with
the orignal value.

But I see that s390 does some other kind of folding and does that
addition of the p*d_index() unconditionally.

I guess that does mean that s390 will just have to have its own walker.

For the issue of the page refcount overflow it really isn't a huge
deal. Adding the refcount checking is simple (see the example patch I
gave for powerpc - you'll just have a couple of extra cases since you
do it all, rather than just the special hugetlb cases).

Obviously in general it would have been nicer to share as much code as
possible, but let's not make things unnecessarily complex if s390 is
just fundamentally different..

   Linus

Re: [PATCH] crypto: powerpc - convert to use crypto_simd_usable()

2019-04-18 Thread Herbert Xu

On Fri, Apr 12, 2019 at 10:33:12PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Replace all calls to in_interrupt() in the PowerPC crypto code with
> !crypto_simd_usable().  This causes the crypto self-tests to test the
> no-SIMD code paths when CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.
> 
> The p8_ghash algorithm is currently failing and needs to be fixed, as it
> produces the wrong digest when no-SIMD updates are mixed with SIMD ones.
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/powerpc/crypto/crc32c-vpmsum_glue.c| 4 +++-
>  arch/powerpc/crypto/crct10dif-vpmsum_glue.c | 4 +++-
>  arch/powerpc/include/asm/Kbuild | 1 +
>  drivers/crypto/vmx/aes.c| 7 ---
>  drivers/crypto/vmx/aes_cbc.c| 7 ---
>  drivers/crypto/vmx/aes_ctr.c| 5 +++--
>  drivers/crypto/vmx/aes_xts.c| 5 +++--
>  drivers/crypto/vmx/ghash.c  | 9 -
>  8 files changed, 25 insertions(+), 17 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] Linux: Define struct termios2 in under _GNU_SOURCE [BZ #10339]

2019-04-18 Thread hpa

On April 18, 2019 4:09:07 AM PDT, Adhemerval Zanella 
 wrote:
>
>
>On 17/04/2019 19:04, H. Peter Anvin wrote:
>> On 4/15/19 10:22 AM, Adhemerval Zanella wrote:

 New interfaces are only necessary for the handful of architectures
>that don't have the speed fields *and* to space to put them in. 
>>>
>>> Based on your WIP, it seems that both sparc and mips could be
>adapted.
>>> Do we still have glibc supported architecture that would require
>compat
>>> symbols?
>>>

 Using symbol versioning doesn't really help much since the real
>problem is that struct termios can be passed around in userspace, and
>the interfaces between user space libraries don't have any versioning.
>However, my POC code deals with that too by only seeing BOTHER when
>necessary, so if the structure is extended garbage in the extra fields
>will be ignored unless new baud rates are in use.
>>>
>>> Yeah, we discussed this earlier and if recall correctly it was not
>settled
>>> that all architectures would allow the use to extra space for the
>new
>>> fields. It seems the case, which makes the adaptation for termios2
>even easier.
>>>
>>> The question I have for kernel side is whether termios2 is fully
>compatible
>>> with termios, meaning that if there is conner cases we need to
>handle in
>>> userland.
>>>
>> 
>> It depends on what you mean with "fully compatible."
>> 
>> Functionality-wise, the termios2 interfaces are a strict superset.
>There
>> is not, however, any guarantee that struct kernel_termios2 *contains*
>a
>> contiguous binary equivalent to struct kernel_termios (in fact, on
>most
>> architectures, it doesn't.)
>
>I mean that we can fully implement termios1 using termios2 by adjusting
>the termios struct in syscall wrappers.  If it is a superset I think it
>is fine.
>
>> 

 My POC code deals with Alpha as well by falling back to the old
>interfaces if necessary and possible, otherwise return error.

 Exporting termios2 to user space feels a bit odd at this stage as
>it would only be usable as a fallback on old glibc. Call it
>kernel_termios2 at least. ioctls using struct termios *must* be changed
>to kernel_termios anyway!

>>>
>>> I still prefer to avoid export it to userland and make it usable
>through
>>> default termios, as your wip does.  My understanding is new
>interfaces 
>>> should be semantic equal to current one with the only deviation that
>>> non-standard baudrates will handled as its values.  The only issue I
>
>>> can foresee is if POSIX starts to export new bauds value.
>>>
>> 
>> ... which will be easy to handle: just define a B constant with
>the
>> value equal to the baud rate.
>> 
>>  -hhpa
>
>Right.

termio, termios1 and termios2 are kernel ioctl interfaces ... there are no 
wrappers; it is an ioctl.

The glibc termios is different from all of these, and already is a wrapper 
around the kernel-provided ioctls.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] drivers: cpuidle: This patch fix the following checkpatch warning.

2019-04-18 Thread Rafael J. Wysocki

On Wednesday, April 17, 2019 4:52:34 PM CEST Mohan Kumar wrote:
> Use pr_debug instead of printk
> 
> WARNING: Prefer [subsystem eg: netdev]_dbg([subsystem]dev, ... then
> dev_dbg(dev, ... then pr_debug(...  to printk(KERN_DEBUG ...
> 
> Signed-off-by: Mohan Kumar 
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index 84b1ebe..7cf9835 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -401,7 +401,7 @@ static int __init powernv_processor_idle_init(void)
>   powernv_cpuidle_driver_init();
>   retval = cpuidle_register(&powernv_idle_driver, NULL);
>   if (retval) {
> - printk(KERN_DEBUG "Registration of powernv driver failed.\n");
> + pr_debug("Registration of powernv driver failed.\n");
>   return retval;
>   }
>  
> @@ -413,7 +413,7 @@ static int __init powernv_processor_idle_init(void)
>  "cpuidle/powernv:dead", NULL,
>  powernv_cpuidle_cpu_dead);
>   WARN_ON(retval < 0);
> - printk(KERN_DEBUG "powernv_idle_driver registered\n");
> + pr_debug("powernv_idle_driver registered\n");
>   return 0;
>  }
>  
> 

Recently, you've sent two different patches against two different drivers with 
the same subject.

IMO it is fair enough to call that "confusing".

Moreover, pr_debug() is not a direct replacement for printk(KERN_DEBUG ) as the 
latter does
not require dynamic debug to be enabled and I'm not really sure if you are 
aware of that
difference.

[PATCH v6 0/1] iommu: enhance IOMMU dma mode build options

2019-04-18 Thread Zhen Lei

v5 --> v6:
1. give up adding boot option iommu.dma_mode

v4 --> v5:
As Hanjun and Thomas Gleixner's suggestion:
1. Keep the old ARCH specific boot options no change.
2. Keep build option CONFIG_IOMMU_DEFAULT_PASSTHROUGH no change.

v4:
As Robin Murphy's suggestion:
"It's also not necessarily obvious to the user how this interacts with
IOMMU_DEFAULT_PASSTHROUGH, so if we really do go down this route, maybe it
would be better to refactor the whole lot into a single selection of something
like IOMMU_DEFAULT_MODE anyway."

In this version, I tried to normalize the IOMMU dma mode boot options for all
ARCHs. When IOMMU is enabled, there are 3 dma modes: paasthrough(bypass),
lazy(mapping but defer the IOTLB invalidation), strict. But currently each
ARCHs defined their private boot options, different with each other. For
example, to enable/disable "passthrough", ARM64 use iommu.passthrough=1/0,
X86 use iommu=pt/nopt, PPC/POWERNV use iommu=nobypass.

Zhen Lei (1):
  iommu: enhance IOMMU dma mode build options

 arch/ia64/kernel/pci-dma.c|  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
 arch/s390/pci/pci_dma.c   |  2 +-
 arch/x86/kernel/pci-dma.c |  7 ++---
 drivers/iommu/Kconfig | 44 ++-
 drivers/iommu/amd_iommu_init.c|  3 ++-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c |  3 ++-
 8 files changed, 48 insertions(+), 18 deletions(-)

-- 
1.8.3

[PATCH v6 1/1] iommu: enhance IOMMU dma mode build options

2019-04-18 Thread Zhen Lei

First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the three config options in an choice, make people can only choose one of
the three at a time.

The default IOMMU dma modes on each ARCHs have no change.

Signed-off-by: Zhen Lei 
---
 arch/ia64/kernel/pci-dma.c|  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
 arch/s390/pci/pci_dma.c   |  2 +-
 arch/x86/kernel/pci-dma.c |  7 ++---
 drivers/iommu/Kconfig | 44 ++-
 drivers/iommu/amd_iommu_init.c|  3 ++-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c |  3 ++-
 8 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index fe988c49f01ce6a..655511dbf3c3b34 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -22,7 +22,7 @@
 int force_iommu __read_mostly;
 #endif

-int iommu_pass_through;
+int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

 static int __init pci_iommu_init(void)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3ead4c237ed0ec9..383e082a9bb985c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char 
*level,
va_end(args);
 }

-static bool pnv_iommu_bypass_disabled __read_mostly;
+static bool pnv_iommu_bypass_disabled __read_mostly =
+   !IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
 static bool pci_reset_phbs __read_mostly;

 static int __init iommu_setup(char *str)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 9e52d1527f71495..784ad1e0acecfb1 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -17,7 +17,7 @@

 static struct kmem_cache *dma_region_table_cache;
 static struct kmem_cache *dma_page_table_cache;
-static int s390_iommu_strict;
+static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);

 static int zpci_refresh_global(struct zpci_dev *zdev)
 {
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d460998ae828514..fb2bab42a0a3173 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -43,11 +43,8 @@
  * It is also possible to disable by default in kernel config, and enable with
  * iommu=nopt at boot time.
  */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
+int iommu_pass_through __read_mostly =
+   IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816c64..8a1f1793cde76b4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,17 +74,47 @@ config IOMMU_DEBUGFS
  debug/iommu directory, and then populate a subdirectory with
  entries as required.

-config IOMMU_DEFAULT_PASSTHROUGH
-   bool "IOMMU passthrough by default"
+choice
+   prompt "IOMMU dma mode"
depends on IOMMU_API
-help
- Enable passthrough by default, removing the need to pass in
- iommu.passthrough=on or iommu=pt through command line. If this
- is enabled, you can still disable with iommu.passthrough=off
- or iommu=nopt depending on the architecture.
+   default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
+   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows IOMMU dma mode to be chose at build time, to
+ override the default dma mode of each ARCHs, removing the need to
+ pass in kernel parameters through command line. You can still use
+ ARCHs specific boot options to override this option again.
+
+config IOMMU_DEFAULT_PASSTHROUGH
+   bool "passthrough"
+   help
+ In this mode, the dma access through IOMMU without any addresses
+ transformation. That means, the wrong or illegal dma access can not
+ be caught, no error information will be reported.

  If unsure, say N here.

+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+

Re: [PATCH v3 21/21] docs: hwmon: Add an index file and rename docs to *.rst

2019-04-18 Thread Guenter Roeck

On 4/18/19 2:44 AM, Mauro Carvalho Chehab wrote:

Em Wed, 17 Apr 2019 10:47:28 -0700
Guenter Roeck escreveu:

On Wed, Apr 17, 2019 at 10:43:37AM -0700, Guenter Roeck wrote:

On Wed, Apr 17, 2019 at 02:22:15PM -0300, Mauro Carvalho Chehab wrote:

Em Wed, 17 Apr 2019 14:13:52 -0300
Mauro Carvalho Chehab escreveu:

Em Wed, 17 Apr 2019 09:47:41 -0700
Guenter Roeck escreveu:

On Wed, Apr 17, 2019 at 06:46:29AM -0300, Mauro Carvalho Chehab wrote:

Now that all files were converted to ReST format, rename them
and add an index.

Signed-off-by: Mauro Carvalho Chehab
Acked-by: Liviu Dudau

I applied all patches except this one, which fails due to a conflict in
ab8500. I also notice that this file has not been touched by your series,
which is odd. At the same time, patch 20/21 is missing from your series,
and has been missing all along. Does the missing patch possibly touch
Documentation/hwmon/ab8500 ?

Patch 20/21 is the biggest one. Maybe vger rejected it either due to
its size or due to the number of c/c.

Just bounced it to you. Please let me know if you didn't receive it
yet.

Btw, LKML got it:

https://lore.kernel.org/lkml/2a52363a5aaeea10e186ead8570503ea648e.1555494108.git.mchehab+sams...@kernel.org/

patchwork didn't get it (or didn't accept it). I got it now.
All patches applied, and pushed out to hwmon-next.

We have one (new) unconverted file left - Documentation/hwmon/lochnagar.

Plus ir38064 and isl68137. Lots of new drivers recently.

Ok, just sent a patch for those three new files. I wrote a more
detailed description about what steps I followed at the conversion
of those tree files, and why I did it.

Did the patches get lost ?

Hopefully, this would help hwmon developers
that may already be preparing a new driver for submission.

That would be very useful.

Thanks,
Guenter

Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-04-18 Thread Ingo Molnar



* Masahiro Yamada  wrote:

> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> 
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
> 
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
> 
> I tested this patch on my arm/arm64 boards.
> 
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> 
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> 
>   dec   file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
> 
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.

No objections against moving it from x86 code to generic code.

Thanks,

Ingo

Avoiding merge conflicts while adding new docs - Was: Re: [PATCH 00/57] Convert files to ReST

2019-04-18 Thread Mauro Carvalho Chehab

Jon,

Em Mon, 15 Apr 2019 23:55:25 -0300
Mauro Carvalho Chehab  escreveu:

> I have a separate patch series with do the actual rename and
> adjustment of references. I opted to submit this first, as it
> sounds easier to merge this way, as each subsystem maintainer
> can apply the conversion directly on their trees (or at docs
> tree), avoiding merge conflects.

Based on the number of feedbacks we have about this, I'm
considering to submit a second version of my conversion patch
series that would be doing the rename together with each patch,
adding the rst files to an index file.

However, doing that would produce lots of warnings. We have
already lots of those at linux-next:

checking consistency... 

docs/Documentation/accelerators/ocxl.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/admin-guide/mm/numaperf.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/filesystems/ext4/allocators.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/attributes.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/bigalloc.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/bitmaps.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/blockgroup.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/blocks.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/filesystems/ext4/checksums.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/directory.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/eainode.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/group_descr.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/ifork.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/filesystems/ext4/inlinedata.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/inodes.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/filesystems/ext4/journal.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/filesystems/ext4/mmp.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/filesystems/ext4/special_inodes.rst: WARNING: 
document isn't included in any toctree
docs/Documentation/filesystems/ext4/super.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/fmc/index.rst: WARNING: document isn't included in 
any toctree
docs/Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/interconnect/interconnect.rst: WARNING: document 
isn't included in any toctree
docs/Documentation/laptops/lg-laptop.rst: WARNING: document isn't 
included in any toctree
docs/Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: 
document isn't included in any toctree
docs/Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document 
isn't included in any toctree

After thinking a little bit and doing some tests, I think a good solution
would be to add ":orphan:" markup to the new .rst files that were not
added yet into the main body (e. g. something like the enclosed example).

That will make Sphinx suppress the:
"WARNING: document isn't included in any toctree"
warning for the new files, while they're not included at the main indexes.

This way, maintainers can do all the hard work of doing/applying the ReST
file conversion/addition patch series on their own trees, and, once
everything gets merged, submit a patch against the latest docs-next
tree, removing the :orphan: tag and add them to the common index.rst
files.

That should largely avoid merging conflicts.

For example, assuming that someone converts the stuff at
Documentation/accounting and rename the text files there to
RST (I'm actually doing that), he could add the enclosed change on
the top of its index file:

diff --git a/Documentation/accounting/index.rst 
b/Documentation/accounting/index.rst
index e7dda5afa73f..e1f6284b5ff3 100644
--- a/Documentation/accounting/index.rst
+++ b/Documentation/accounting/index.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 ==
 Accounting
 ==

With would make Sphinx to ignore the subdir index file while
reporting errors. It will still build the documentation, allowing
the developer to test the changes.

One of the advantages is that checking the orphaned docs is as
easy as running:

$ git grep -l ":orphan:" Documentation/
...
Documentation/accounting/index.rst

[PATCH V2] ASoC: fsl_esai: Add pm runtime function

Add pm runtime support and move clock handling there.
fsl_esai_suspend is replaced by pm_runtime_force_suspend.
fsl_esai_resume is replaced by pm_runtime_force_resume.

Signed-off-by: Shengjiu Wang 
---
Changes in v2
-refine the commit comments.
-move regcache_mark_dirty to runtime suspend.

 sound/soc/fsl/fsl_esai.c | 141 ++-
 1 file changed, 77 insertions(+), 64 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index bad0dfed6b68..10d2210c91ef 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -466,30 +467,6 @@ static int fsl_esai_startup(struct snd_pcm_substream 
*substream,
struct snd_soc_dai *dai)
 {
struct fsl_esai *esai_priv = snd_soc_dai_get_drvdata(dai);
-   int ret;
-
-   /*
-* Some platforms might use the same bit to gate all three or two of
-* clocks, so keep all clocks open/close at the same time for safety
-*/
-   ret = clk_prepare_enable(esai_priv->coreclk);
-   if (ret)
-   return ret;
-   if (!IS_ERR(esai_priv->spbaclk)) {
-   ret = clk_prepare_enable(esai_priv->spbaclk);
-   if (ret)
-   goto err_spbaclk;
-   }
-   if (!IS_ERR(esai_priv->extalclk)) {
-   ret = clk_prepare_enable(esai_priv->extalclk);
-   if (ret)
-   goto err_extalck;
-   }
-   if (!IS_ERR(esai_priv->fsysclk)) {
-   ret = clk_prepare_enable(esai_priv->fsysclk);
-   if (ret)
-   goto err_fsysclk;
-   }
 
if (!dai->active) {
/* Set synchronous mode */
@@ -506,16 +483,6 @@ static int fsl_esai_startup(struct snd_pcm_substream 
*substream,
 
return 0;
 
-err_fsysclk:
-   if (!IS_ERR(esai_priv->extalclk))
-   clk_disable_unprepare(esai_priv->extalclk);
-err_extalck:
-   if (!IS_ERR(esai_priv->spbaclk))
-   clk_disable_unprepare(esai_priv->spbaclk);
-err_spbaclk:
-   clk_disable_unprepare(esai_priv->coreclk);
-
-   return ret;
 }
 
 static int fsl_esai_hw_params(struct snd_pcm_substream *substream,
@@ -576,20 +543,6 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
*substream,
return 0;
 }
 
-static void fsl_esai_shutdown(struct snd_pcm_substream *substream,
- struct snd_soc_dai *dai)
-{
-   struct fsl_esai *esai_priv = snd_soc_dai_get_drvdata(dai);
-
-   if (!IS_ERR(esai_priv->fsysclk))
-   clk_disable_unprepare(esai_priv->fsysclk);
-   if (!IS_ERR(esai_priv->extalclk))
-   clk_disable_unprepare(esai_priv->extalclk);
-   if (!IS_ERR(esai_priv->spbaclk))
-   clk_disable_unprepare(esai_priv->spbaclk);
-   clk_disable_unprepare(esai_priv->coreclk);
-}
-
 static int fsl_esai_trigger(struct snd_pcm_substream *substream, int cmd,
struct snd_soc_dai *dai)
 {
@@ -658,7 +611,6 @@ static int fsl_esai_trigger(struct snd_pcm_substream 
*substream, int cmd,
 
 static const struct snd_soc_dai_ops fsl_esai_dai_ops = {
.startup = fsl_esai_startup,
-   .shutdown = fsl_esai_shutdown,
.trigger = fsl_esai_trigger,
.hw_params = fsl_esai_hw_params,
.set_sysclk = fsl_esai_set_dai_sysclk,
@@ -947,6 +899,10 @@ static int fsl_esai_probe(struct platform_device *pdev)
return ret;
}
 
+   pm_runtime_enable(&pdev->dev);
+
+   regcache_cache_only(esai_priv->regmap, true);
+
ret = imx_pcm_dma_init(pdev, IMX_ESAI_DMABUF_SIZE);
if (ret)
dev_err(&pdev->dev, "failed to init imx pcm dma: %d\n", ret);
@@ -954,6 +910,13 @@ static int fsl_esai_probe(struct platform_device *pdev)
return ret;
 }
 
+static int fsl_esai_remove(struct platform_device *pdev)
+{
+   pm_runtime_disable(&pdev->dev);
+
+   return 0;
+}
+
 static const struct of_device_id fsl_esai_dt_ids[] = {
{ .compatible = "fsl,imx35-esai", },
{ .compatible = "fsl,vf610-esai", },
@@ -961,22 +924,35 @@ static int fsl_esai_probe(struct platform_device *pdev)
 };
 MODULE_DEVICE_TABLE(of, fsl_esai_dt_ids);
 
-#ifdef CONFIG_PM_SLEEP
-static int fsl_esai_suspend(struct device *dev)
-{
-   struct fsl_esai *esai = dev_get_drvdata(dev);
-
-   regcache_cache_only(esai->regmap, true);
-   regcache_mark_dirty(esai->regmap);
-
-   return 0;
-}
-
-static int fsl_esai_resume(struct device *dev)
+#ifdef CONFIG_PM
+static int fsl_esai_runtime_resume(struct device *dev)
 {
struct fsl_esai *esai = dev_get_drvdata(dev);
int ret;
 
+   /*
+* Some platforms might use the same bit to gate all three or two of
+* clocks, so keep all clocks open/close at the same time for safety
+*/
+   ret = clk_prepare_enable(esai->coreclk);
+

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

Hi

> On Thu, Apr 18, 2019 at 10:15:24AM +, S.j. Wang wrote:
> > > On Thu, Apr 18, 2019 at 02:00:12AM -0700, Nicolin Chen wrote:
> > > > On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:
> 
> > > > Just for curiosity, we had similar situation on imx6sx, so we
> > > > added suspend/resume with regcache. Why will the clock enable
> > > > state be lost too? Does CCM on imx8 (might not be called CCM
> > > > though) have any difference? What about clock rate settings?
> 
> > > That sounds like a bug somewhere else - I'd expect that after resume
> > > the clocking would be restored to the state it was in before suspend.
> 
> > There is limitation in our internal design. That is in imx8 the power
> > of subsystem will be disabled at suspend, include the clock state , clock
> rate.
> 
> Right, that's fairly normal but usually it'd be restored as part of the resume
> process?
> 
> > This patch is to enable the pm runtime,  so I think it is better to
> > move the clock operation to pm runtime,  and close the clock at
> > suspend to reduce the power.
> 
> It's definitely good to turn the clock off as much as possible, yes.

Thanks, will send v2.

Best regards
Wang shengjiu

[PATCH RFT V3 8/8] clk: core: replace clk_{readl, writel} with {readl, writel}

Now that clk_{readl,writel} is just an alias for {readl,writel}, we can
switch all users of clk_* to use the accessors directly and remove the
helpers.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * no actual changes
V1 -> V2:
 * newly added patch

 drivers/clk/clk-divider.c   |  4 ++--
 drivers/clk/clk-fractional-divider.c|  4 ++--
 drivers/clk/clk-gate.c  |  4 ++--
 drivers/clk/clk-multiplier.c|  4 ++--
 drivers/clk/clk-mux.c   |  4 ++--
 drivers/clk/clk-xgene.c |  6 +++---
 drivers/clk/hisilicon/clk-hisi-phase.c  |  4 ++--
 drivers/clk/imx/clk-divider-gate.c  | 20 ++--
 drivers/clk/imx/clk-sccg-pll.c  | 12 ++--
 drivers/clk/nxp/clk-lpc18xx-ccu.c   |  6 +++---
 drivers/clk/nxp/clk-lpc18xx-cgu.c   | 24 
 drivers/clk/rockchip/clk-ddr.c  |  2 +-
 drivers/clk/rockchip/clk-half-divider.c |  6 +++---
 drivers/clk/tegra/clk-tegra124.c|  4 ++--
 drivers/clk/tegra/clk-tegra210.c|  6 +++---
 drivers/clk/zynq/clkc.c |  6 +++---
 drivers/clk/zynq/pll.c  | 18 +-
 include/linux/clk-provider.h| 15 ---
 18 files changed, 67 insertions(+), 82 deletions(-)

diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
index 25c7404e376c..2c600d750546 100644
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -30,7 +30,7 @@ static inline u32 clk_div_readl(struct clk_divider *divider)
if (divider->flags & CLK_DIVIDER_BIG_ENDIAN)
return ioread32be(divider->reg);
 
-   return clk_readl(divider->reg);
+   return readl(divider->reg);
 }
 
 static inline void clk_div_writel(struct clk_divider *divider, u32 val)
@@ -38,7 +38,7 @@ static inline void clk_div_writel(struct clk_divider 
*divider, u32 val)
if (divider->flags & CLK_DIVIDER_BIG_ENDIAN)
iowrite32be(val, divider->reg);
else
-   clk_writel(val, divider->reg);
+   writel(val, divider->reg);
 }
 
 #define div_mask(width)((1 << (width)) - 1)
diff --git a/drivers/clk/clk-fractional-divider.c 
b/drivers/clk/clk-fractional-divider.c
index f88df265e787..638a9bbc2ab8 100644
--- a/drivers/clk/clk-fractional-divider.c
+++ b/drivers/clk/clk-fractional-divider.c
@@ -18,7 +18,7 @@ static inline u32 clk_fd_readl(struct clk_fractional_divider 
*fd)
if (fd->flags & CLK_FRAC_DIVIDER_BIG_ENDIAN)
return ioread32be(fd->reg);
 
-   return clk_readl(fd->reg);
+   return readl(fd->reg);
 }
 
 static inline void clk_fd_writel(struct clk_fractional_divider *fd, u32 val)
@@ -26,7 +26,7 @@ static inline void clk_fd_writel(struct 
clk_fractional_divider *fd, u32 val)
if (fd->flags & CLK_FRAC_DIVIDER_BIG_ENDIAN)
iowrite32be(val, fd->reg);
else
-   clk_writel(val, fd->reg);
+   writel(val, fd->reg);
 }
 
 static unsigned long clk_fd_recalc_rate(struct clk_hw *hw,
diff --git a/drivers/clk/clk-gate.c b/drivers/clk/clk-gate.c
index 6ced7b1f5585..0c0bb83f714e 100644
--- a/drivers/clk/clk-gate.c
+++ b/drivers/clk/clk-gate.c
@@ -28,7 +28,7 @@ static inline u32 clk_gate_readl(struct clk_gate *gate)
if (gate->flags & CLK_GATE_BIG_ENDIAN)
return ioread32be(gate->reg);
 
-   return clk_readl(gate->reg);
+   return readl(gate->reg);
 }
 
 static inline void clk_gate_writel(struct clk_gate *gate, u32 val)
@@ -36,7 +36,7 @@ static inline void clk_gate_writel(struct clk_gate *gate, u32 
val)
if (gate->flags & CLK_GATE_BIG_ENDIAN)
iowrite32be(val, gate->reg);
else
-   clk_writel(val, gate->reg);
+   writel(val, gate->reg);
 }
 
 /*
diff --git a/drivers/clk/clk-multiplier.c b/drivers/clk/clk-multiplier.c
index 77327df9bf32..94470b4eadf4 100644
--- a/drivers/clk/clk-multiplier.c
+++ b/drivers/clk/clk-multiplier.c
@@ -16,7 +16,7 @@ static inline u32 clk_mult_readl(struct clk_multiplier *mult)
if (mult->flags & CLK_MULTIPLIER_BIG_ENDIAN)
return ioread32be(mult->reg);
 
-   return clk_readl(mult->reg);
+   return readl(mult->reg);
 }
 
 static inline void clk_mult_writel(struct clk_multiplier *mult, u32 val)
@@ -24,7 +24,7 @@ static inline void clk_mult_writel(struct clk_multiplier 
*mult, u32 val)
if (mult->flags & CLK_MULTIPLIER_BIG_ENDIAN)
iowrite32be(val, mult->reg);
else
-   clk_writel(val, mult->reg);
+   writel(val, mult->reg);
 }
 
 static unsigned long __get_mult(struct clk_multiplier *mult,
diff --git a/drivers/clk/clk-mux.c b/drivers/clk/clk-mux.c
index 61ad331b7ff4..893c9b285532 100644
--- a/drivers/clk/clk-mux.c
+++ b/drivers/clk/clk-mux.c
@@ -28,7 +28,7 @@ static inline u32 clk_mux_readl(struct clk_mux *mux)
if (mux->flags & CLK_MUX_BIG_ENDIAN)
return ioread32be(mux->reg);
 
-

[PATCH RFT V3 7/8] clk: core: remove powerpc special handling

Now that the powerpc clocks are properly marked as big endian, we can
remove the special handling for PowerPC.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * no actual changes
V1 -> V2:
 * no actual changes

 include/linux/clk-provider.h | 16 
 1 file changed, 16 deletions(-)

diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index f82cda41e1a8..479e616ce7f5 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -1001,20 +1001,6 @@ static inline int of_clk_detect_critical(struct 
device_node *np, int index,
  * for improved portability across platforms
  */
 
-#if IS_ENABLED(CONFIG_PPC)
-
-static inline u32 clk_readl(u32 __iomem *reg)
-{
-   return ioread32be(reg);
-}
-
-static inline void clk_writel(u32 val, u32 __iomem *reg)
-{
-   iowrite32be(val, reg);
-}
-
-#else  /* platform dependent I/O accessors */
-
 static inline u32 clk_readl(u32 __iomem *reg)
 {
return readl(reg);
@@ -1025,8 +1011,6 @@ static inline void clk_writel(u32 val, u32 __iomem *reg)
writel(val, reg);
 }
 
-#endif /* platform dependent I/O accessors */
-
 void clk_gate_restore_context(struct clk_hw *hw);
 
 #endif /* CONFIG_COMMON_CLK */
-- 
2.13.2

[PATCH RFT V3 6/8] powerpc/512x: mark clocks as big endian

These clocks' registers are accessed as big endian, so mark them as
such.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * slightly rework to avoid a line >80 chars
V1 -> V2:
 * switch from global to local flags

 arch/powerpc/platforms/512x/clock-commonclk.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/512x/clock-commonclk.c 
b/arch/powerpc/platforms/512x/clock-commonclk.c
index b3097fe6441b..af265ae40a61 100644
--- a/arch/powerpc/platforms/512x/clock-commonclk.c
+++ b/arch/powerpc/platforms/512x/clock-commonclk.c
@@ -239,6 +239,7 @@ static inline struct clk *mpc512x_clk_divider(
const char *name, const char *parent_name, u8 clkflags,
u32 __iomem *reg, u8 pos, u8 len, int divflags)
 {
+   divflags |= CLK_DIVIDER_BIG_ENDIAN;
return clk_register_divider(NULL, name, parent_name, clkflags,
reg, pos, len, divflags, &clklock);
 }
@@ -250,7 +251,7 @@ static inline struct clk *mpc512x_clk_divtable(
 {
u8 divflags;
 
-   divflags = 0;
+   divflags = CLK_DIVIDER_BIG_ENDIAN;
return clk_register_divider_table(NULL, name, parent_name, 0,
  reg, pos, len, divflags,
  divtab, &clklock);
@@ -261,10 +262,12 @@ static inline struct clk *mpc512x_clk_gated(
u32 __iomem *reg, u8 pos)
 {
int clkflags;
+   u8 gateflags;
 
clkflags = CLK_SET_RATE_PARENT;
+   gateflags = CLK_GATE_BIG_ENDIAN;
return clk_register_gate(NULL, name, parent_name, clkflags,
-reg, pos, 0, &clklock);
+reg, pos, gateflags, &clklock);
 }
 
 static inline struct clk *mpc512x_clk_muxed(const char *name,
@@ -275,7 +278,7 @@ static inline struct clk *mpc512x_clk_muxed(const char 
*name,
u8 muxflags;
 
clkflags = CLK_SET_RATE_PARENT;
-   muxflags = 0;
+   muxflags = CLK_MUX_BIG_ENDIAN;
return clk_register_mux(NULL, name,
parent_names, parent_count, clkflags,
reg, pos, len, muxflags, &clklock);
-- 
2.13.2

[PATCH RFT V3 5/8] clk: mux: add explicit big endian support

Add a clock specific flag to switch register accesses to big endian, to
allow runtime configuration of big endian mux clocks.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * drop unneeded else in clk_mux_readl
V1 -> V2:
 * switch from global to local flag

 drivers/clk/clk-mux.c| 22 +++---
 include/linux/clk-provider.h |  4 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-mux.c b/drivers/clk/clk-mux.c
index 2ad2df2e8909..61ad331b7ff4 100644
--- a/drivers/clk/clk-mux.c
+++ b/drivers/clk/clk-mux.c
@@ -23,6 +23,22 @@
  * parent - parent is adjustable through clk_set_parent
  */
 
+static inline u32 clk_mux_readl(struct clk_mux *mux)
+{
+   if (mux->flags & CLK_MUX_BIG_ENDIAN)
+   return ioread32be(mux->reg);
+
+   return clk_readl(mux->reg);
+}
+
+static inline void clk_mux_writel(struct clk_mux *mux, u32 val)
+{
+   if (mux->flags & CLK_MUX_BIG_ENDIAN)
+   iowrite32be(val, mux->reg);
+   else
+   clk_writel(val, mux->reg);
+}
+
 int clk_mux_val_to_index(struct clk_hw *hw, u32 *table, unsigned int flags,
 unsigned int val)
 {
@@ -73,7 +89,7 @@ static u8 clk_mux_get_parent(struct clk_hw *hw)
struct clk_mux *mux = to_clk_mux(hw);
u32 val;
 
-   val = clk_readl(mux->reg) >> mux->shift;
+   val = clk_mux_readl(mux) >> mux->shift;
val &= mux->mask;
 
return clk_mux_val_to_index(hw, mux->table, mux->flags, val);
@@ -94,12 +110,12 @@ static int clk_mux_set_parent(struct clk_hw *hw, u8 index)
if (mux->flags & CLK_MUX_HIWORD_MASK) {
reg = mux->mask << (mux->shift + 16);
} else {
-   reg = clk_readl(mux->reg);
+   reg = clk_mux_readl(mux);
reg &= ~(mux->mask << mux->shift);
}
val = val << mux->shift;
reg |= val;
-   clk_writel(reg, mux->reg);
+   clk_mux_writel(mux, reg);
 
if (mux->lock)
spin_unlock_irqrestore(mux->lock, flags);
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 9df78e3fb62b..f82cda41e1a8 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -510,6 +510,9 @@ void clk_hw_unregister_divider(struct clk_hw *hw);
  * .get_parent clk_op.
  * CLK_MUX_ROUND_CLOSEST - Use the parent rate that is closest to the desired
  * frequency.
+ * CLK_MUX_BIG_ENDIAN - By default little endian register accesses are used for
+ * the mux register.  Setting this flag makes the register accesses big
+ * endian.
  */
 struct clk_mux {
struct clk_hw   hw;
@@ -528,6 +531,7 @@ struct clk_mux {
 #define CLK_MUX_HIWORD_MASKBIT(2)
 #define CLK_MUX_READ_ONLY  BIT(3) /* mux can't be changed */
 #define CLK_MUX_ROUND_CLOSEST  BIT(4)
+#define CLK_MUX_BIG_ENDIAN BIT(5)
 
 extern const struct clk_ops clk_mux_ops;
 extern const struct clk_ops clk_mux_ro_ops;
-- 
2.13.2

[PATCH RFT V3 4/8] clk: multiplier: add explicit big endian support

Add a clock specific flag to switch register accesses to big endian, to
allow runtime configuration of big endian multiplier clocks.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * drop unneeded else in clk_mult_readl
V1 -> V2:
 * switch from global to local flag

 drivers/clk/clk-multiplier.c | 22 +++---
 include/linux/clk-provider.h |  4 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-multiplier.c b/drivers/clk/clk-multiplier.c
index 3c86f859c199..77327df9bf32 100644
--- a/drivers/clk/clk-multiplier.c
+++ b/drivers/clk/clk-multiplier.c
@@ -11,6 +11,22 @@
 #include 
 #include 
 
+static inline u32 clk_mult_readl(struct clk_multiplier *mult)
+{
+   if (mult->flags & CLK_MULTIPLIER_BIG_ENDIAN)
+   return ioread32be(mult->reg);
+
+   return clk_readl(mult->reg);
+}
+
+static inline void clk_mult_writel(struct clk_multiplier *mult, u32 val)
+{
+   if (mult->flags & CLK_MULTIPLIER_BIG_ENDIAN)
+   iowrite32be(val, mult->reg);
+   else
+   clk_writel(val, mult->reg);
+}
+
 static unsigned long __get_mult(struct clk_multiplier *mult,
unsigned long rate,
unsigned long parent_rate)
@@ -27,7 +43,7 @@ static unsigned long clk_multiplier_recalc_rate(struct clk_hw 
*hw,
struct clk_multiplier *mult = to_clk_multiplier(hw);
unsigned long val;
 
-   val = clk_readl(mult->reg) >> mult->shift;
+   val = clk_mult_readl(mult) >> mult->shift;
val &= GENMASK(mult->width - 1, 0);
 
if (!val && mult->flags & CLK_MULTIPLIER_ZERO_BYPASS)
@@ -118,10 +134,10 @@ static int clk_multiplier_set_rate(struct clk_hw *hw, 
unsigned long rate,
else
__acquire(mult->lock);
 
-   val = clk_readl(mult->reg);
+   val = clk_mult_readl(mult);
val &= ~GENMASK(mult->width + mult->shift - 1, mult->shift);
val |= factor << mult->shift;
-   clk_writel(val, mult->reg);
+   clk_mult_writel(mult, val);
 
if (mult->lock)
spin_unlock_irqrestore(mult->lock, flags);
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 8576c2dbc639..9df78e3fb62b 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -667,6 +667,9 @@ void clk_hw_unregister_fractional_divider(struct clk_hw 
*hw);
  * leaving the parent rate unmodified.
  * CLK_MULTIPLIER_ROUND_CLOSEST - Makes the best calculated divider to be
  * rounded to the closest integer instead of the down one.
+ * CLK_MULTIPLIER_BIG_ENDIAN - By default little endian register accesses are
+ * used for the multiplier register.  Setting this flag makes the register
+ * accesses big endian.
  */
 struct clk_multiplier {
struct clk_hw   hw;
@@ -681,6 +684,7 @@ struct clk_multiplier {
 
 #define CLK_MULTIPLIER_ZERO_BYPASS BIT(0)
 #define CLK_MULTIPLIER_ROUND_CLOSEST   BIT(1)
+#define CLK_MULTIPLIER_BIG_ENDIAN  BIT(2)
 
 extern const struct clk_ops clk_multiplier_ops;
 
-- 
2.13.2

[PATCH RFT V3 3/8] clk: gate: add explicit big endian support

Add a clock specific flag to switch register accesses to big endian, to
allow runtime configuration of big endian gated clocks.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * drop unneeded else in clk_gate_readl
V1 -> V2:
 * switch from global to local flag

 drivers/clk/clk-gate.c   | 22 +++---
 include/linux/clk-provider.h |  4 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-gate.c b/drivers/clk/clk-gate.c
index f05823cd9b21..6ced7b1f5585 100644
--- a/drivers/clk/clk-gate.c
+++ b/drivers/clk/clk-gate.c
@@ -23,6 +23,22 @@
  * parent - fixed parent.  No clk_set_parent support
  */
 
+static inline u32 clk_gate_readl(struct clk_gate *gate)
+{
+   if (gate->flags & CLK_GATE_BIG_ENDIAN)
+   return ioread32be(gate->reg);
+
+   return clk_readl(gate->reg);
+}
+
+static inline void clk_gate_writel(struct clk_gate *gate, u32 val)
+{
+   if (gate->flags & CLK_GATE_BIG_ENDIAN)
+   iowrite32be(val, gate->reg);
+   else
+   clk_writel(val, gate->reg);
+}
+
 /*
  * It works on following logic:
  *
@@ -55,7 +71,7 @@ static void clk_gate_endisable(struct clk_hw *hw, int enable)
if (set)
reg |= BIT(gate->bit_idx);
} else {
-   reg = clk_readl(gate->reg);
+   reg = clk_gate_readl(gate);
 
if (set)
reg |= BIT(gate->bit_idx);
@@ -63,7 +79,7 @@ static void clk_gate_endisable(struct clk_hw *hw, int enable)
reg &= ~BIT(gate->bit_idx);
}
 
-   clk_writel(reg, gate->reg);
+   clk_gate_writel(gate, reg);
 
if (gate->lock)
spin_unlock_irqrestore(gate->lock, flags);
@@ -88,7 +104,7 @@ int clk_gate_is_enabled(struct clk_hw *hw)
u32 reg;
struct clk_gate *gate = to_clk_gate(hw);
 
-   reg = clk_readl(gate->reg);
+   reg = clk_gate_readl(gate);
 
/* if a set bit disables this clk, flip it before masking */
if (gate->flags & CLK_GATE_SET_TO_DISABLE)
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 8c07d810acf5..8576c2dbc639 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -348,6 +348,9 @@ void of_fixed_clk_setup(struct device_node *np);
  * of this register, and mask of gate bits are in higher 16-bit of this
  * register.  While setting the gate bits, higher 16-bit should also be
  * updated to indicate changing gate bits.
+ * CLK_GATE_BIG_ENDIAN - by default little endian register accesses are used 
for
+ * the gate register.  Setting this flag makes the register accesses big
+ * endian.
  */
 struct clk_gate {
struct clk_hw hw;
@@ -361,6 +364,7 @@ struct clk_gate {
 
 #define CLK_GATE_SET_TO_DISABLEBIT(0)
 #define CLK_GATE_HIWORD_MASK   BIT(1)
+#define CLK_GATE_BIG_ENDIANBIT(2)
 
 extern const struct clk_ops clk_gate_ops;
 struct clk *clk_register_gate(struct device *dev, const char *name,
-- 
2.13.2

[PATCH RFT V3 1/8] clk: divider: add explicit big endian support

Add a clock specific flag to switch register accesses to big endian, to
allow runtime configuration of big endian divider clocks.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * fix passed arguments to clk_div_readl found by kbuild
 * drop unneeded else in clk_div_readl
V1 -> V2:
 * switch from global to local flag

 drivers/clk/clk-divider.c| 26 ++
 include/linux/clk-provider.h |  4 
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
index e5a17265cfaf..25c7404e376c 100644
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -25,6 +25,24 @@
  * parent - fixed parent.  No clk_set_parent support
  */
 
+static inline u32 clk_div_readl(struct clk_divider *divider)
+{
+   if (divider->flags & CLK_DIVIDER_BIG_ENDIAN)
+   return ioread32be(divider->reg);
+
+   return clk_readl(divider->reg);
+}
+
+static inline void clk_div_writel(struct clk_divider *divider, u32 val)
+{
+   if (divider->flags & CLK_DIVIDER_BIG_ENDIAN)
+   iowrite32be(val, divider->reg);
+   else
+   clk_writel(val, divider->reg);
+}
+
+#define div_mask(width)((1 << (width)) - 1)
+
 static unsigned int _get_table_maxdiv(const struct clk_div_table *table,
  u8 width)
 {
@@ -135,7 +153,7 @@ static unsigned long clk_divider_recalc_rate(struct clk_hw 
*hw,
struct clk_divider *divider = to_clk_divider(hw);
unsigned int val;
 
-   val = clk_readl(divider->reg) >> divider->shift;
+   val = clk_div_readl(divider) >> divider->shift;
val &= clk_div_mask(divider->width);
 
return divider_recalc_rate(hw, parent_rate, val, divider->table,
@@ -370,7 +388,7 @@ static long clk_divider_round_rate(struct clk_hw *hw, 
unsigned long rate,
if (divider->flags & CLK_DIVIDER_READ_ONLY) {
u32 val;
 
-   val = clk_readl(divider->reg) >> divider->shift;
+   val = clk_div_readl(divider) >> divider->shift;
val &= clk_div_mask(divider->width);
 
return divider_ro_round_rate(hw, rate, prate, divider->table,
@@ -420,11 +438,11 @@ static int clk_divider_set_rate(struct clk_hw *hw, 
unsigned long rate,
if (divider->flags & CLK_DIVIDER_HIWORD_MASK) {
val = clk_div_mask(divider->width) << (divider->shift + 16);
} else {
-   val = clk_readl(divider->reg);
+   val = clk_div_readl(divider);
val &= ~(clk_div_mask(divider->width) << divider->shift);
}
val |= (u32)value << divider->shift;
-   clk_writel(val, divider->reg);
+   clk_div_writel(divider, val);
 
if (divider->lock)
spin_unlock_irqrestore(divider->lock, flags);
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index db21437c77e2..7117b8cc0c0c 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -416,6 +416,9 @@ struct clk_div_table {
  * CLK_DIVIDER_MAX_AT_ZERO - For dividers which are like CLK_DIVIDER_ONE_BASED
  * except when the value read from the register is zero, the divisor is
  * 2^width of the field.
+ * CLK_DIVIDER_BIG_ENDIAN - By default little endian register accesses are used
+ * for the divider register.  Setting this flag makes the register accesses
+ * big endian.
  */
 struct clk_divider {
struct clk_hw   hw;
@@ -437,6 +440,7 @@ struct clk_divider {
 #define CLK_DIVIDER_ROUND_CLOSEST  BIT(4)
 #define CLK_DIVIDER_READ_ONLY  BIT(5)
 #define CLK_DIVIDER_MAX_AT_ZEROBIT(6)
+#define CLK_DIVIDER_BIG_ENDIAN BIT(7)
 
 extern const struct clk_ops clk_divider_ops;
 extern const struct clk_ops clk_divider_ro_ops;
-- 
2.13.2

[PATCH RFT V3 2/8] clk: fractional-divider: add explicit big endian support

Add a clock specific flag to switch register accesses to big endian, to
allow runtime configuration of big endian fractional divider clocks.

Signed-off-by: Jonas Gorski 
---
V2 -> V3:
 * drop unneeded else in clk_fd_readl
V1 -> V2:
 * switch from global to local flag

 drivers/clk/clk-fractional-divider.c | 22 +++---
 include/linux/clk-provider.h |  4 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-fractional-divider.c 
b/drivers/clk/clk-fractional-divider.c
index fdfe2e423d15..f88df265e787 100644
--- a/drivers/clk/clk-fractional-divider.c
+++ b/drivers/clk/clk-fractional-divider.c
@@ -13,6 +13,22 @@
 #include 
 #include 
 
+static inline u32 clk_fd_readl(struct clk_fractional_divider *fd)
+{
+   if (fd->flags & CLK_FRAC_DIVIDER_BIG_ENDIAN)
+   return ioread32be(fd->reg);
+
+   return clk_readl(fd->reg);
+}
+
+static inline void clk_fd_writel(struct clk_fractional_divider *fd, u32 val)
+{
+   if (fd->flags & CLK_FRAC_DIVIDER_BIG_ENDIAN)
+   iowrite32be(val, fd->reg);
+   else
+   clk_writel(val, fd->reg);
+}
+
 static unsigned long clk_fd_recalc_rate(struct clk_hw *hw,
unsigned long parent_rate)
 {
@@ -27,7 +43,7 @@ static unsigned long clk_fd_recalc_rate(struct clk_hw *hw,
else
__acquire(fd->lock);
 
-   val = clk_readl(fd->reg);
+   val = clk_fd_readl(fd);
 
if (fd->lock)
spin_unlock_irqrestore(fd->lock, flags);
@@ -115,10 +131,10 @@ static int clk_fd_set_rate(struct clk_hw *hw, unsigned 
long rate,
else
__acquire(fd->lock);
 
-   val = clk_readl(fd->reg);
+   val = clk_fd_readl(fd);
val &= ~(fd->mmask | fd->nmask);
val |= (m << fd->mshift) | (n << fd->nshift);
-   clk_writel(val, fd->reg);
+   clk_fd_writel(fd, val);
 
if (fd->lock)
spin_unlock_irqrestore(fd->lock, flags);
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 7117b8cc0c0c..8c07d810acf5 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -607,6 +607,9 @@ void clk_hw_unregister_fixed_factor(struct clk_hw *hw);
  * is the value read from the register. If CLK_FRAC_DIVIDER_ZERO_BASED
  * is set then the numerator and denominator are both the value read
  * plus one.
+ * CLK_FRAC_DIVIDER_BIG_ENDIAN - By default little endian register accesses are
+ * used for the divider register.  Setting this flag makes the register
+ * accesses big endian.
  */
 struct clk_fractional_divider {
struct clk_hw   hw;
@@ -627,6 +630,7 @@ struct clk_fractional_divider {
 #define to_clk_fd(_hw) container_of(_hw, struct clk_fractional_divider, hw)
 
 #define CLK_FRAC_DIVIDER_ZERO_BASEDBIT(0)
+#define CLK_FRAC_DIVIDER_BIG_ENDIANBIT(1)
 
 extern const struct clk_ops clk_fractional_divider_ops;
 struct clk *clk_register_fractional_divider(struct device *dev,
-- 
2.13.2

[PATCH RFT V3 0/8] clk: make register endianness a run-time property

Currently the endianness for register accesses of basic clocks if fixed
based on the architecture (BE for PowerPC, LE for everyone else). This
is inconvenient for architectures that support both.

To avoid adding more rules to the #ifdef, this patchset adds new flags
to the basic clocks to tag the registers as BE then converts the only
big endian machine PowerPC to use it.

While not used by PowerPC, also add big endian support to clk-fractional-
divider and clk-multiplier, to cover all basic clocks.
Technically clk-multiplier isn't one as it doesn't provide any
registration functions and none of its users set the basic clock flag,
but nevertheless it and its flags are defined clk-provider.h. So I think
it's close enough to a basic clock to still count.

That way we can drop the special casing for PowerPC, and allow other big
endian platforms/drivers to make use of the basic clocks.

In addition, we can now drop clk_readl and clk_writel, and replace them
with normal readl and writel accessors everywhere.

Still RFT because I don't have a PowerPC device to test, and especially
not a 512x one. I did compile test it though!

I looked really hard, and this is the only place I could find where a
PowerPC platform (indirectly) used the clk accessors.

None of the regular drivers in clk/ were selected in any of the powerpc
defconfigs, and this was the only platform code that registered basic
clocks.

Changelog:

V2 -> V3:
 * fix passed arguments to clk_div_readl found by kbuild
 * drop unneeded else in generic read accessors
 * fix a >80 chars line issue in the powerpc patch

V1 -> V2:
 * switch from global flag to per-clock flag
 * also added fractional divider and multiplier clocks, to make all
   basic or quasi basic clocks support big endian
 * reordered the basic clock patches in alphabetical order
 * drop clk_{readl,writel} instead of adding BE variants and use
   common accessors directly
 * dropped the RFC, as I got comments (yay). More always welcome of
   course :-)

Jonas Gorski (8):
  clk: divider: add explicit big endian support
  clk: fractional-divider: add explicit big endian support
  clk: gate: add explicit big endian support
  clk: multiplier: add explicit big endian support
  clk: mux: add explicit big endian support
  powerpc/512x: mark clocks as big endian
  clk: core: remove powerpc special handling
  clk: core: replace clk_{readl,writel} with {readl,writel}

 arch/powerpc/platforms/512x/clock-commonclk.c |  9 +++--
 drivers/clk/clk-divider.c | 26 +++---
 drivers/clk/clk-fractional-divider.c  | 22 ++--
 drivers/clk/clk-gate.c| 22 ++--
 drivers/clk/clk-multiplier.c  | 22 ++--
 drivers/clk/clk-mux.c | 22 ++--
 drivers/clk/clk-xgene.c   |  6 ++--
 drivers/clk/hisilicon/clk-hisi-phase.c|  4 +--
 drivers/clk/imx/clk-divider-gate.c| 20 +--
 drivers/clk/imx/clk-sccg-pll.c| 12 +++
 drivers/clk/nxp/clk-lpc18xx-ccu.c |  6 ++--
 drivers/clk/nxp/clk-lpc18xx-cgu.c | 24 ++---
 drivers/clk/rockchip/clk-ddr.c|  2 +-
 drivers/clk/rockchip/clk-half-divider.c   |  6 ++--
 drivers/clk/tegra/clk-tegra124.c  |  4 +--
 drivers/clk/tegra/clk-tegra210.c  |  6 ++--
 drivers/clk/zynq/clkc.c   |  6 ++--
 drivers/clk/zynq/pll.c| 18 +-
 include/linux/clk-provider.h  | 51 +++
 19 files changed, 181 insertions(+), 107 deletions(-)

-- 
2.13.2

Re: [PATCH] Linux: Define struct termios2 in under _GNU_SOURCE [BZ #10339]

2019-04-18 Thread Adhemerval Zanella




On 17/04/2019 19:04, H. Peter Anvin wrote:
> On 4/15/19 10:22 AM, Adhemerval Zanella wrote:
>>>
>>> New interfaces are only necessary for the handful of architectures that 
>>> don't have the speed fields *and* to space to put them in. 
>>
>> Based on your WIP, it seems that both sparc and mips could be adapted.
>> Do we still have glibc supported architecture that would require compat
>> symbols?
>>
>>>
>>> Using symbol versioning doesn't really help much since the real problem is 
>>> that struct termios can be passed around in userspace, and the interfaces 
>>> between user space libraries don't have any versioning. However, my POC 
>>> code deals with that too by only seeing BOTHER when necessary, so if the 
>>> structure is extended garbage in the extra fields will be ignored unless 
>>> new baud rates are in use.
>>
>> Yeah, we discussed this earlier and if recall correctly it was not settled
>> that all architectures would allow the use to extra space for the new
>> fields. It seems the case, which makes the adaptation for termios2 even 
>> easier.
>>
>> The question I have for kernel side is whether termios2 is fully compatible
>> with termios, meaning that if there is conner cases we need to handle in
>> userland.
>>
> 
> It depends on what you mean with "fully compatible."
> 
> Functionality-wise, the termios2 interfaces are a strict superset. There
> is not, however, any guarantee that struct kernel_termios2 *contains* a
> contiguous binary equivalent to struct kernel_termios (in fact, on most
> architectures, it doesn't.)

I mean that we can fully implement termios1 using termios2 by adjusting
the termios struct in syscall wrappers.  If it is a superset I think it
is fine.

> 
>>>
>>> My POC code deals with Alpha as well by falling back to the old interfaces 
>>> if necessary and possible, otherwise return error.
>>>
>>> Exporting termios2 to user space feels a bit odd at this stage as it would 
>>> only be usable as a fallback on old glibc. Call it kernel_termios2 at 
>>> least. ioctls using struct termios *must* be changed to kernel_termios 
>>> anyway!
>>>
>>
>> I still prefer to avoid export it to userland and make it usable through
>> default termios, as your wip does.  My understanding is new interfaces 
>> should be semantic equal to current one with the only deviation that
>> non-standard baudrates will handled as its values.  The only issue I 
>> can foresee is if POSIX starts to export new bauds value.
>>
> 
> ... which will be easy to handle: just define a B constant with the
> value equal to the baud rate.
> 
>   -hhpa

Right.

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

Hi
> 
> 
> On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:
> > In imx8 when systerm enter suspend state, the power of subsystem will
> > be off, the clock enable state will be lost and register configuration
> 
> Just for curiosity, we had similar situation on imx6sx, so we added
> suspend/resume with regcache. Why will the clock enable state be lost too?
> Does CCM on imx8 (might not be called CCM
> though) have any difference? What about clock rate settings?
> 
> > will be lost. So the driver need to enter runtime suspend state in
> > suspend.
> 
> > With this implementation the suspend function almost same as runtime
> > suspend function, so remove the suspend function, just use
> > pm_runtime_force_suspend instead, and same for the resume function.
> >
> > And also need to move clock enablement to runtime resume and clock
> > disablement to runtime suspend.
> 
> 
> > -static int fsl_esai_suspend(struct device *dev)
> > - regcache_cache_only(esai->regmap, true);
> > - regcache_mark_dirty(esai->regmap);
> 
> > +static int fsl_esai_runtime_resume(struct device *dev)
> >   regcache_cache_only(esai->regmap, false);
> > + regcache_mark_dirty(esai->regmap);
> 
> Why move the regcache_mark_dirty from suspend to resume?
> (I am not saying it's wrong but wondering if this is the  preferable way.)

Seems I should not do this change.
I will remain regcache_mark_dirty(esai->regmap); at its old place.

Best regards
Wang shengjiu

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

2019-04-18 Thread Mark Brown

On Thu, Apr 18, 2019 at 10:15:24AM +, S.j. Wang wrote:
> > On Thu, Apr 18, 2019 at 02:00:12AM -0700, Nicolin Chen wrote:
> > > On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:

> > > Just for curiosity, we had similar situation on imx6sx, so we added
> > > suspend/resume with regcache. Why will the clock enable state be lost
> > > too? Does CCM on imx8 (might not be called CCM
> > > though) have any difference? What about clock rate settings?

> > That sounds like a bug somewhere else - I'd expect that after resume the
> > clocking would be restored to the state it was in before suspend.

> There is limitation in our internal design. That is in imx8 the power of
> subsystem will be disabled at suspend, include the clock state , clock rate. 

Right, that's fairly normal but usually it'd be restored as part of the
resume process?

> This patch is to enable the pm runtime,  so I think it is better to move the
> clock operation to pm runtime,  and close the clock at suspend to reduce
> the power.

It's definitely good to turn the clock off as much as possible, yes.


signature.asc
Description: PGP signature

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

Hi

> 
> On Thu, Apr 18, 2019 at 02:00:12AM -0700, Nicolin Chen wrote:
> > On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:
> 
> > > In imx8 when systerm enter suspend state, the power of subsystem
> > > will be off, the clock enable state will be lost and register
> > > configuration
> 
> > Just for curiosity, we had similar situation on imx6sx, so we added
> > suspend/resume with regcache. Why will the clock enable state be lost
> > too? Does CCM on imx8 (might not be called CCM
> > though) have any difference? What about clock rate settings?
> 
> That sounds like a bug somewhere else - I'd expect that after resume the
> clocking would be restored to the state it was in before suspend.

There is limitation in our internal design. That is in imx8 the power of
subsystem will be disabled at suspend, include the clock state , clock rate. 

I should not add it in comments,  please ignore them, I will change the
description.

This patch is to enable the pm runtime,  so I think it is better to move the
clock operation to pm runtime,  and close the clock at suspend to reduce
the power.

Best regards
Wang shengjiu

Re: [PATCH 6/8] powerpc/eeh: Initialize EEH address cache earlier

2019-04-18 Thread Oliver O'Halloran

On Wed, 2019-03-20 at 13:58 +1100, Sam Bobroff wrote:
> The EEH address cache is currently initialized and populated by a
> single function: eeh_addr_cache_build().  While the initial population
> of the cache can only be done once resources are allocated,
> initialization (just setting up a spinlock) could be done much
> earlier.
> 
> So move the initialization step into a separate function and call it
> from a core_initcall (rather than a subsys initcall).
> 

> This will allow future work to make use of the cache during boot time
> PCI scanning.

What's the idea there? Checking for overlaps in the BAR assignments?

> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h  |  3 +++
>  arch/powerpc/kernel/eeh.c   |  2 ++
>  arch/powerpc/kernel/eeh_cache.c | 13 +++--
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index e217ccda55d0..791b9e6fcc45 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -295,6 +295,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
>  int __exit eeh_ops_unregister(const char *name);
>  int eeh_check_failure(const volatile void __iomem *token);
>  int eeh_dev_check_failure(struct eeh_dev *edev);
> +void eeh_addr_cache_init(void);
>  void eeh_addr_cache_build(void);
>  void eeh_add_device_early(struct pci_dn *);
>  void eeh_add_device_tree_early(struct pci_dn *);
> @@ -362,6 +363,8 @@ static inline int eeh_check_failure(const volatile void 
> __iomem *token)
>  
>  #define eeh_dev_check_failure(x) (0)
>  
> +static inline void eeh_addr_cache_init(void) { }
> +
>  static inline void eeh_addr_cache_build(void) { }
>  
>  static inline void eeh_add_device_early(struct pci_dn *pdn) { }
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 3dcff29cb9b3..7a406d58d2c0 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1219,6 +1219,8 @@ static int eeh_init(void)
>   list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
>   eeh_dev_phb_init_dynamic(hose);
>  
> + eeh_addr_cache_init();
> +
>   /* Initialize EEH event */
>   return eeh_event_init();
>  }
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index 9c68f0837385..f93dd5cf6a39 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -267,6 +267,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
>   spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
>  }
>  
> +/**
> + * eeh_addr_cache_init - Initialize a cache of I/O addresses
> + *
> + * Initialize a cache of pci i/o addresses.  This cache will be used to
> + * find the pci device that corresponds to a given address.
> + */
> +void eeh_addr_cache_init(void)
> +{
> + spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> +}

You could move this out of the pci_io_addr_cache structure and use
DEFINE_SPINLOCK() too. We might even be able to get rid of the hand-
rolled interval tree in eeh_cache.c in favour of the generic
implementation (see mm/interval_tree.c). I'm pretty sure the generic
one is a drop-in replacement, but I haven't had a chance to have a
detailed look to see if there's any differences in behaviour.

> +
>  /**
>   * eeh_addr_cache_build - Build a cache of I/O addresses
>   *
> @@ -282,8 +293,6 @@ void eeh_addr_cache_build(void)
>   struct eeh_dev *edev;
>   struct pci_dev *dev = NULL;
>  
> - spin_lock_init(&pci_io_addr_cache_root.piar_lock);
> -
>   for_each_pci_dev(dev) {
>   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>   if (!pdn)

Re: [PATCH 5/8] powerpc/eeh: Add eeh_show_enabled()

2019-04-18 Thread Oliver O'Halloran

On Wed, 2019-03-20 at 13:58 +1100, Sam Bobroff wrote:
> Move the EEH enabled message into it's own function so that future
> work can call it from multiple places.
> 
> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h |  3 +++
>  arch/powerpc/kernel/eeh.c  | 16 +++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index fe4cf7208890..e217ccda55d0 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -289,6 +289,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
>  
>  struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
>  void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
> +void eeh_show_enabled(void);
>  void eeh_probe_devices(void);
>  int __init eeh_ops_register(struct eeh_ops *ops);
>  int __exit eeh_ops_unregister(const char *name);
> @@ -338,6 +339,8 @@ static inline bool eeh_enabled(void)
>  return false;
>  }
>  
> +static inline void eeh_show_enabled(void) { }
> +
>  static inline bool eeh_phb_enabled(void)
>  {
>   return false;
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index b14d89547895..3dcff29cb9b3 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -163,6 +163,16 @@ static int __init eeh_setup(char *str)
>  }
>  __setup("eeh=", eeh_setup);
>  
> +void eeh_show_enabled(void)
> +{
> + if (eeh_has_flag(EEH_FORCE_DISABLED))
> + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (by 
> eeh=off)\n");

The allcaps looks kind of stupid.

> + else if (eeh_enabled())
> + pr_info("EEH: PCI Enhanced I/O Error Handling ENABLED (capable 
> adapter found)\n");
> + else
> + pr_info("EEH: PCI Enhanced I/O Error Handling DISABLED (no 
> capable adapter found)\n");

If we really have to keep these messages then make it say "no EEH
capable buses found" or something. EEH support has absolutely nothing
to do with the adapters and I'm not even sure we can get the "DISABLED"
message on PowerNV since the root port will always be there.

> +}
> +
>  /*
>   * This routine captures assorted PCI configuration space data
>   * for the indicated PCI device, and puts them into a buffer
> @@ -1166,11 +1176,7 @@ void eeh_probe_devices(void)
>   pdn = hose->pci_data;
>   traverse_pci_dn(pdn, eeh_ops->probe, NULL);
>   }
> - if (eeh_enabled())
> - pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
> - else
> - pr_info("EEH: No capable adapters found\n");
> -
> + eeh_show_enabled();
>  }
>  
>  /**

Re: [PATCH 3/8] powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag

2019-04-18 Thread Oliver O'Halloran

On Wed, 2019-03-20 at 13:58 +1100, Sam Bobroff wrote:
> The PHB flag, PNV_PHB_FLAG_EEH, is set (on PowerNV) individually on
> each PHB once the EEH subsystem is ready. It is the only use of the
> flags member of the phb struct.
> 
> However there is no need to store this separately on each PHB, so
> convert it to a global flag. For symmetry, the flag is now also set
> for pSeries; although it is currently unused it may be useful in the
> future.

I'd drop this patch and keep it as a seperate flag. Making this a
global flag doesn't really buy us anything either. It's also worth
remembering that we do have virtual PHBs, like the NVLink ones, that
don't have real EEH support and we should be doing something more
intelligent about that.

If you are going to remove the PNV_PHB flag then i would look at moving
that flag into the generic pci_controller structure that we use for
tracking PHBs since that would give us a way to toggle EEH support on a
per PHB basis on pseries and powernv.

> Signed-off-by: Sam Bobroff 
> ---
>  arch/powerpc/include/asm/eeh.h   | 11 +++
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 14 +++---
>  arch/powerpc/platforms/powernv/pci.c |  7 +++
>  arch/powerpc/platforms/powernv/pci.h |  2 --
>  arch/powerpc/platforms/pseries/pci.c |  4 
>  5 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 3613a56281f2..fe4cf7208890 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -43,6 +43,7 @@ struct pci_dn;
>  #define EEH_VALID_PE_ZERO0x10/* PE#0 is valid */
>  #define EEH_ENABLE_IO_FOR_LOG0x20/* Enable IO for log
>  */
>  #define EEH_EARLY_DUMP_LOG   0x40/* Dump log immediately  */
> +#define EEH_PHB_ENABLED  0x80/* PHB recovery uses EEH
>  */
>  
>  /*
>   * Delay for PE reset, all in ms
> @@ -245,6 +246,11 @@ static inline bool eeh_enabled(void)
>   return eeh_has_flag(EEH_ENABLED) && !eeh_has_flag(EEH_FORCE_DISABLED);
>  }
>  
> +static inline bool eeh_phb_enabled(void)
> +{
> + return eeh_has_flag(EEH_PHB_ENABLED);
> +}
> +
>  static inline void eeh_serialize_lock(unsigned long *flags)
>  {
>   raw_spin_lock_irqsave(&confirm_error_lock, *flags);
> @@ -332,6 +338,11 @@ static inline bool eeh_enabled(void)
>  return false;
>  }
>  
> +static inline bool eeh_phb_enabled(void)
> +{
> + return false;
> +}
> +
>  static inline void eeh_probe_devices(void) { }
>  
>  static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 6fc1a463b796..f0a95f663810 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -264,22 +264,14 @@ int pnv_eeh_post_init(void)
>   return ret;
>   }
>  
> - if (!eeh_enabled())
> + if (eeh_enabled())
> + eeh_add_flag(EEH_PHB_ENABLED);
> + else
>   disable_irq(eeh_event_irq);
>  
>   list_for_each_entry(hose, &hose_list, list_node) {
>   phb = hose->private_data;
>  
> - /*
> -  * If EEH is enabled, we're going to rely on that.
> -  * Otherwise, we restore to conventional mechanism
> -  * to clear frozen PE during PCI config access.
> -  */
> - if (eeh_enabled())
> - phb->flags |= PNV_PHB_FLAG_EEH;
> - else
> - phb->flags &= ~PNV_PHB_FLAG_EEH;
> -
>   /* Create debugfs entries */
>  #ifdef CONFIG_DEBUG_FS
>   if (phb->has_dbgfs || !phb->dbgfs)
> diff --git a/arch/powerpc/platforms/powernv/pci.c 
> b/arch/powerpc/platforms/powernv/pci.c
> index 307181fd8a17..d2b50f3bf6b1 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -717,10 +717,9 @@ int pnv_pci_cfg_write(struct pci_dn *pdn,
>  static bool pnv_pci_cfg_check(struct pci_dn *pdn)
>  {
>   struct eeh_dev *edev = NULL;
> - struct pnv_phb *phb = pdn->phb->private_data;
>  
>   /* EEH not enabled ? */
> - if (!(phb->flags & PNV_PHB_FLAG_EEH))
> + if (!eeh_phb_enabled())
>   return true;
>  
>   /* PE reset or device removed ? */
> @@ -761,7 +760,7 @@ static int pnv_pci_read_config(struct pci_bus *bus,
>  
>   ret = pnv_pci_cfg_read(pdn, where, size, val);
>   phb = pdn->phb->private_data;
> - if (phb->flags & PNV_PHB_FLAG_EEH && pdn->edev) {
> + if (eeh_phb_enabled() && pdn->edev) {
>   if (*val == EEH_IO_ERROR_VALUE(size) &&
>   eeh_dev_check_failure(pdn->edev))
>  return PCIBIOS_DEVICE_NOT_FOUND;
> @@ -789,7 +788,7 @@ static int pnv_pci_write_config(struct pci_bus *bus,
>

Re: [PATCH v3 21/21] docs: hwmon: Add an index file and rename docs to *.rst

2019-04-18 Thread Mauro Carvalho Chehab

Em Wed, 17 Apr 2019 10:47:28 -0700
Guenter Roeck  escreveu:

> On Wed, Apr 17, 2019 at 10:43:37AM -0700, Guenter Roeck wrote:
> > On Wed, Apr 17, 2019 at 02:22:15PM -0300, Mauro Carvalho Chehab wrote:  
> > > Em Wed, 17 Apr 2019 14:13:52 -0300
> > > Mauro Carvalho Chehab  escreveu:
> > >   
> > > > Em Wed, 17 Apr 2019 09:47:41 -0700
> > > > Guenter Roeck  escreveu:
> > > >   
> > > > > On Wed, Apr 17, 2019 at 06:46:29AM -0300, Mauro Carvalho Chehab 
> > > > > wrote:
> > > > > > Now that all files were converted to ReST format, rename them
> > > > > > and add an index.
> > > > > > 
> > > > > > Signed-off-by: Mauro Carvalho Chehab 
> > > > > > Acked-by: Liviu Dudau   
> > > > > 
> > > > > I applied all patches except this one, which fails due to a conflict 
> > > > > in
> > > > > ab8500. I also notice that this file has not been touched by your 
> > > > > series,
> > > > > which is odd. At the same time, patch 20/21 is missing from your 
> > > > > series,
> > > > > and has been missing all along. Does the missing patch possibly touch
> > > > > Documentation/hwmon/ab8500 ?
> > > > 
> > > > Patch 20/21 is the biggest one. Maybe vger rejected it either due to
> > > > its size or due to the number of c/c.
> > > > 
> > > > Just bounced it to you. Please let me know if you didn't receive it
> > > > yet.  
> > > 
> > > Btw, LKML got it:
> > > 
> > > https://lore.kernel.org/lkml/2a52363a5aaeea10e186ead8570503ea648e.1555494108.git.mchehab+sams...@kernel.org/
> > >   
> > patchwork didn't get it (or didn't accept it). I got it now.
> > All patches applied, and pushed out to hwmon-next.
> > 
> > We have one (new) unconverted file left - Documentation/hwmon/lochnagar.  
> 
> Plus ir38064 and isl68137. Lots of new drivers recently.

Ok, just sent a patch for those three new files. I wrote a more
detailed description about what steps I followed at the conversion
of those tree files, and why I did it. 

Hopefully, this would help hwmon developers
that may already be preparing a new driver for submission.

Thanks,
Mauro

Re: [PATCH V3 1/2] ASoC: fsl_asrc: replace the process_option table with function

Hi

> 
> On Thu, Apr 18, 2019 at 08:50:48AM +, S.j. Wang wrote:
> > > And this is according to IMX6DQRM:
> > > Limited support for the case when output sampling rates is
> > > between 8kHz and 30kHz. The limitation is the supported ratio
> > > (Fsin/Fsout) range as between 1/24 to 8
> > >
> > > This should cover your 8.125 condition already, even if having an
> > > outrate range between [8KHz, 30KHz] check, since an outrate above
> > > 30KHz will not have an inrate bigger than 8.125 times of it, given
> > > the maximum input rate is 192KHz.
> > >
> > > So I think that we can just drop that 8.125 condition from your
> > > change and there's no need to error out any more.
> > >
> > No, if outrate=8kHz,  inrate > 88.2kHz, these cases are not supported.
> > This is not covered by
> >
> > if ((outrate > 8000 && outrate < 3) &&
> > (outrate/inrate > 24 || inrate/outrate > 8)) {
> 
> Good catch. The range should be [8KHz, 30KHz] vs. (8KHz, 32KHz) in the
> code. Then I think the fix should be at both lines:
> 
> - if ((outrate > 8000 && outrate < 3) &&
> - (outrate/inrate > 24 || inrate/outrate > 8)) {
> + if ((outrate >= 8000 && outrate =< 3) &&
> + (outrate > 24 * inrate || inrate > 8 * outrate)) {
> 
> Overall, I think we should fix this instead of adding an extra one, since it 
> is
> very likely saying the same thing.

Actually if outrate < 8kHz, there will be issue too.

Best regards
Wang shengjiu

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

2019-04-18 Thread Mark Brown

On Thu, Apr 18, 2019 at 02:00:12AM -0700, Nicolin Chen wrote:
> On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:

> > In imx8 when systerm enter suspend state, the power of subsystem will
> > be off, the clock enable state will be lost and register configuration

> Just for curiosity, we had similar situation on imx6sx, so we
> added suspend/resume with regcache. Why will the clock enable
> state be lost too? Does CCM on imx8 (might not be called CCM
> though) have any difference? What about clock rate settings?

That sounds like a bug somewhere else - I'd expect that after resume the
clocking would be restored to the state it was in before suspend.


signature.asc
Description: PGP signature

Re: [PATCH V3 1/2] ASoC: fsl_asrc: replace the process_option table with function

On Thu, Apr 18, 2019 at 08:50:48AM +, S.j. Wang wrote:
> > And this is according to IMX6DQRM:
> > Limited support for the case when output sampling rates is
> > between 8kHz and 30kHz. The limitation is the supported ratio
> > (Fsin/Fsout) range as between 1/24 to 8
> > 
> > This should cover your 8.125 condition already, even if having an outrate
> > range between [8KHz, 30KHz] check, since an outrate above 30KHz will not
> > have an inrate bigger than 8.125 times of it, given the maximum input rate
> > is 192KHz.
> > 
> > So I think that we can just drop that 8.125 condition from your change and
> > there's no need to error out any more.
> > 
> No, if outrate=8kHz,  inrate > 88.2kHz, these cases are not supported. 
> This is not covered by
> 
> if ((outrate > 8000 && outrate < 3) &&
> (outrate/inrate > 24 || inrate/outrate > 8)) {

Good catch. The range should be [8KHz, 30KHz] vs. (8KHz, 32KHz)
in the code. Then I think the fix should be at both lines:

- if ((outrate > 8000 && outrate < 3) &&
- (outrate/inrate > 24 || inrate/outrate > 8)) {
+ if ((outrate >= 8000 && outrate =< 3) &&
+ (outrate > 24 * inrate || inrate > 8 * outrate)) {

Overall, I think we should fix this instead of adding an extra
one, since it is very likely saying the same thing.

Re: [PATCH RFT V2 1/8] clk: divider: add explicit big endian support

On Thu, 18 Apr 2019 at 01:32, Stephen Boyd  wrote:
>
> Quoting Jonas Gorski (2019-04-15 03:10:39)
> > @@ -370,7 +388,7 @@ static long clk_divider_round_rate(struct clk_hw *hw, 
> > unsigned long rate,
> > if (divider->flags & CLK_DIVIDER_READ_ONLY) {
> > u32 val;
> >
> > -   val = clk_readl(divider->reg) >> divider->shift;
> > +   val = clk_div_readl(divider->reg) >> divider->shift;
>
> Good deal that kbuild running sparse found that this was supposed to be
> divider and not divider->reg. If you can fix that and remove the else in
> all the basic type readl wrappers then this series looks good to merge
> for me.

Yeah, I'm quite glad about that. I was first confused why I didn't
catch it in my test build, but then remembered only C++ complains
about implicit void* casts (IIRC). I have it already fixed locally and
did a sparse build myself, was just waiting for at least one non-bot
comment before resending the series.

Regards
Jonas

Re: [PATCH] ASoC: fsl_esai: Add pm runtime function

On Thu, Apr 18, 2019 at 03:29:09AM +, S.j. Wang wrote:
> In imx8 when systerm enter suspend state, the power of subsystem will
> be off, the clock enable state will be lost and register configuration

Just for curiosity, we had similar situation on imx6sx, so we
added suspend/resume with regcache. Why will the clock enable
state be lost too? Does CCM on imx8 (might not be called CCM
though) have any difference? What about clock rate settings?

> will be lost. So the driver need to enter runtime suspend state in
> suspend.

> With this implementation the suspend function almost same as runtime
> suspend function, so remove the suspend function, just use
> pm_runtime_force_suspend instead, and same for the resume function.
> 
> And also need to move clock enablement to runtime resume and clock
> disablement to runtime suspend.

> -static int fsl_esai_suspend(struct device *dev)
> - regcache_cache_only(esai->regmap, true);
> - regcache_mark_dirty(esai->regmap);

> +static int fsl_esai_runtime_resume(struct device *dev)
>   regcache_cache_only(esai->regmap, false);
> + regcache_mark_dirty(esai->regmap);

Why move the regcache_mark_dirty from suspend to resume?
(I am not saying it's wrong but wondering if this is the
 preferable way.)

Re: [PATCH V3 1/2] ASoC: fsl_asrc: replace the process_option table with function

Hi

> 
> 
> On Thu, Apr 18, 2019 at 02:37:03AM +, S.j. Wang wrote:
> > > Here:
> > > > + /* Does not support cases: Tsout > 8.125 * Tsin */
> > > > + if (inrate * 8 > 65 * outrate)
> 
> Though it might not matter any more (see my last comments), it should be
> "inrate > 8.125 * outrate" in the comments.
> 
> > > > + return -EINVAL;
> > > And here:
> > > > + ret = fsl_asrc_sel_proc(inrate, outrate, &pre_proc, &post_proc);
> > > > + if (ret) {
> > > > + pair_err("No supported pre-processing options\n");
> > > > + return ret;
> > > > + }
> > >
> > > Instead of a general message, I was thinking of a more specific one
> > > by telling users that the ratio between the two rates isn't
> > > supported -- something similar to what I suggested previously:
> > >
> > > pair_err("Does not support %d (input) > 8.125 * %d (output)\n",
> > >  outrate, inrate);
> > >
> 
> > In fsl_asrc_sel_proc,  we can't call the pair_err for there is no
> > struct fsl_asrc_pair *pair in the argument. Do you think we need to
> > add this argument?
> 
> I's thinking of adding it to the top of fsl_asrc_config_pair() as a part of
> inrate-outrate-validation, however, I found that actually we already have a
> similar check in the early routine:
> if ((outrate > 8000 && outrate < 3) &&
> (outrate/inrate > 24 || inrate/outrate > 8)) {
> pair_err("exceed supported ratio range [1/24, 8] for \
>  inrate/outrate: %d/%d\n", inrate, outrate);
> return -EINVAL;
> }
> 
> And this is according to IMX6DQRM:
> Limited support for the case when output sampling rates is
> between 8kHz and 30kHz. The limitation is the supported ratio
> (Fsin/Fsout) range as between 1/24 to 8
> 
> This should cover your 8.125 condition already, even if having an outrate
> range between [8KHz, 30KHz] check, since an outrate above 30KHz will not
> have an inrate bigger than 8.125 times of it, given the maximum input rate
> is 192KHz.
> 
> So I think that we can just drop that 8.125 condition from your change and
> there's no need to error out any more.
> 
No, if outrate=8kHz,  inrate > 88.2kHz, these cases are not supported. 
This is not covered by

if ((outrate > 8000 && outrate < 3) &&
(outrate/inrate > 24 || inrate/outrate > 8)) {

> However, we do need a patch to fix a potential rounding issue:
> -   (outrate/inrate > 24 || inrate/outrate > 8)) {
> +   (outrate > 24 * inrate || inrate > 8 * outrate)) {
> 
> Should fix the missing whitespace also. And it will be needed to send to
> stable kernel too. Will you help submit a change?
> 
> Thanks

Re: [PATCH V3 1/2] ASoC: fsl_asrc: replace the process_option table with function

On Thu, Apr 18, 2019 at 02:37:03AM +, S.j. Wang wrote:
> > Here:
> > > + /* Does not support cases: Tsout > 8.125 * Tsin */
> > > + if (inrate * 8 > 65 * outrate)

Though it might not matter any more (see my last comments),
it should be "inrate > 8.125 * outrate" in the comments.

> > > + return -EINVAL;
> > And here:
> > > + ret = fsl_asrc_sel_proc(inrate, outrate, &pre_proc, &post_proc);
> > > + if (ret) {
> > > + pair_err("No supported pre-processing options\n");
> > > + return ret;
> > > + }
> > 
> > Instead of a general message, I was thinking of a more specific one by
> > telling users that the ratio between the two rates isn't supported --
> > something similar to what I suggested previously:
> > 
> > pair_err("Does not support %d (input) > 8.125 * %d (output)\n",
> >  outrate, inrate);
> > 

> In fsl_asrc_sel_proc,  we can't call the pair_err for there is no
> struct fsl_asrc_pair *pair in the argument. Do you think we need to
> add this argument?

I's thinking of adding it to the top of fsl_asrc_config_pair()
as a part of inrate-outrate-validation, however, I found that
actually we already have a similar check in the early routine:
if ((outrate > 8000 && outrate < 3) &&
(outrate/inrate > 24 || inrate/outrate > 8)) {
pair_err("exceed supported ratio range [1/24, 8] for \
 inrate/outrate: %d/%d\n", inrate, outrate);
return -EINVAL;
}

And this is according to IMX6DQRM:
Limited support for the case when output sampling rates is
between 8kHz and 30kHz. The limitation is the supported ratio
(Fsin/Fsout) range as between 1/24 to 8

This should cover your 8.125 condition already, even if having
an outrate range between [8KHz, 30KHz] check, since an outrate
above 30KHz will not have an inrate bigger than 8.125 times of
it, given the maximum input rate is 192KHz.

So I think that we can just drop that 8.125 condition from your
change and there's no need to error out any more.

However, we do need a patch to fix a potential rounding issue:
-   (outrate/inrate > 24 || inrate/outrate > 8)) {
+   (outrate > 24 * inrate || inrate > 8 * outrate)) {

Should fix the missing whitespace also. And it will be needed
to send to stable kernel too. Will you help submit a change?

Thanks

Re: Linux 5.1-rc5

2019-04-18 Thread Martin Schwidefsky

On Wed, 17 Apr 2019 09:57:01 -0700
Linus Torvalds  wrote:

> On Wed, Apr 17, 2019 at 1:02 AM Martin Schwidefsky
>  wrote:
> >
> > Grumpf, that does *not* work. For gup the table entries may be read only
> > once. Now I remember why I open-coded p4d_offset, pud_offset and pmd_offset
> > in arch/s390/mm/gup.c, to avoid to read the table entries twice.
> > It will be hard to use the common gup code after all.  
> 
> Hmm. The common gup code generally should do the "read only once"
> thing too (since by definition the gup-fast case is done without
> locking), although it's probably the case that most architectures
> simply don't care.
> 
> What would it require for the generic code to work for s390?

The problematic lines in the generic gup code are these three:

1845:   pmdp = pmd_offset(&pud, addr);
1888:   pudp = pud_offset(&p4d, addr);
1916:   p4dp = p4d_offset(&pgd, addr);

Passing the pointer of a *copy* of a page table entry to pxd_offset() does
not work with the page table folding on s390. The pxd_offset() function
on s390 have to make a choice, either return the dereferenced value behind
the passed pointer (that works) or return the original page table pointer
if the table level is folded (that does not work).

To fix this we would need three new helpers pmd_offset_orig, pud_offset_orig
and p4d_offset_orig, their generic definition would look like this:

#define p4d_offset_orig(pgdp, pgd, address)p4d_offset(&pgd, address)
#define pud_offset_orig(p4dp, p4d, address)pud_offset(&p4d, address)
#define pmd_offset_orig(pudp, pud, address)pmd_offset(&pud, address)

For the s390 definition see the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git generic-gup

A quick test with this branch shows everything working normally.
Keeping my fingers crossed that I did not miss anything.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

Re: [PATCH v2 10/10] powerpc/32s: Implement Kernel Userspace Access Protection

Christophe Leroy  writes:
> diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
> b/arch/powerpc/include/asm/book3s/32/kup.h
> index 5f97c742ca71..b3560b2de435 100644
> --- a/arch/powerpc/include/asm/book3s/32/kup.h
> +++ b/arch/powerpc/include/asm/book3s/32/kup.h
> @@ -37,6 +37,113 @@
...
> +
> +static inline void allow_user_access(void __user *to, const void __user 
> *from, u32 size)
> +{
> + u32 addr = (__force u32)to;
> + u32 end = min(addr + size, TASK_SIZE);
> +
> + if (!addr || addr >= TASK_SIZE || !size)
> + return;
> +
> + current->thread.kuap = (addr & 0xf000) | end - 1) >> 28) + 1) & 
> 0xf);
> + kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
> +}

When rebasing on my v6 I changed the above to:

static inline void allow_user_access(void __user *to, const void __user *from, 
u32 size)
{
u32 addr, end;

if (__builtin_constant_p(to) && to == NULL)
return;

addr = (__force u32)to;

if (!addr || addr >= TASK_SIZE || !size)
return;

end = min(addr + size, TASK_SIZE);
current->thread.kuap = (addr & 0xf000) | end - 1) >> 28) + 1) & 
0xf);
kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
}

Which I think achieves the same result. It does boot :)

> +
> +static inline void prevent_user_access(void __user *to, const void __user 
> *from, u32 size)
> +{
> + u32 addr = (__force u32)to;
> + u32 end = min(addr + size, TASK_SIZE);
> +
> + if (!addr || addr >= TASK_SIZE || !size)
> + return;
> +
> + current->thread.kuap = 0;
> + kuap_update_sr(mfsrin(addr) | SR_KS, addr, end);/* set Ks */
> +}
> +
> +static inline void allow_read_from_user(const void __user *from, unsigned 
> long size)
> +{
> +}

And I dropped that.

cheers

Re: [PATCH v2 07/10] powerpc/8xx: Add Kernel Userspace Access Protection

Christophe Leroy  writes:
> diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
> b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
> new file mode 100644
> index ..a44cc6c1b901
> --- /dev/null
> +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_KUP_8XX_H_
> +#define _ASM_POWERPC_KUP_8XX_H_
> +
> +#include 
> +
> +#ifdef CONFIG_PPC_KUAP
> +
> +#ifdef __ASSEMBLY__
> +
> +.macro kuap_save_and_locksp, thread, gpr1, gpr2, gpr3
> + lis \gpr2, MD_APG_KUAP@h/* only APG0 and APG1 are used */
> + mfspr   \gpr1, SPRN_MD_AP
> + mtspr   SPRN_MD_AP, \gpr2
> + stw \gpr1, STACK_REGS_KUAP(\sp)
> +.endm
> +
> +.macro kuap_restore  sp, current, gpr1, gpr2, gpr3
> + lwz \gpr1, STACK_REGS_KUAP(\sp)
> + mtspr   SPRN_MD_AP, \gpr1
> +.endm
> +
> +.macro kuap_checkcurrent, gpr
> +#ifdef CONFIG_PPC_KUAP_DEBUG
> + mfspr   \gpr, SPRN_MD_AP
> + rlwinm  \gpr, \gpr, 16, 0x
> +999: twnei   \gpr, MD_APG_KUAP@h
> + EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
> BUGFLAG_ONCE)
> +#endif
> +.endm
> +
> +#else /* !__ASSEMBLY__ */
> +
> +#include 
> +
> +static inline void allow_user_access(void __user *to, const void __user 
> *from,
> +  unsigned long size)
> +{
> + mtspr(SPRN_MD_AP, MD_APG_INIT);
> +}
> +
> +static inline void prevent_user_access(void __user *to, const void __user 
> *from,
> +unsigned long size)
> +{
> + mtspr(SPRN_MD_AP, MD_APG_KUAP);
> +}
> +
> +static inline void allow_read_from_user(const void __user *from, unsigned 
> long size)
> +{
> + allow_user_access(NULL, from, size);
> +}
> +
> +static inline void allow_write_to_user(void __user *to, unsigned long size)
> +{
> + allow_user_access(to, NULL, size);
> +}

I dropped the two above functions when rebasing on my v6.

cheers

[PATCH v6 10/10] powerpc/mm: Detect bad KUAP faults

When KUAP is enabled we have logic to detect page faults that occur
outside of a valid user access region and are blocked by the AMR.

What we don't have at the moment is logic to detect a fault *within* a
valid user access region, that has been incorrectly blocked by AMR.
This is not meant to ever happen, but it can if we incorrectly
save/restore the AMR, or if the AMR was overwritten for some other
reason.

Currently if that happens we assume it's just a regular fault that
will be corrected by handling the fault normally, so we just return.
But there is nothing the fault handling code can do to fix it, so the
fault just happens again and we spin forever, leading to soft lockups.

So add some logic to detect that case and WARN() if we ever see it.
Arguably it should be a BUG(), but it's more polite to fail the access
and let the kernel continue, rather than taking down the box. There
should be no data integrity issue with failing the fault rather than
BUG'ing, as we're just going to disallow an access that should have
been allowed.

To make the code a little easier to follow, unroll the condition at
the end of bad_kernel_fault() and comment each case, before adding the
call to bad_kuap_fault().

Signed-off-by: Michael Ellerman 
---

v6: Simplify bad_kuap_fault() as suggested by Christophe.
---
 .../powerpc/include/asm/book3s/64/kup-radix.h |  6 +
 arch/powerpc/include/asm/kup.h|  1 +
 arch/powerpc/mm/fault.c   | 25 ---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 6d6628424134..7679bd0c5af0 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -95,6 +95,12 @@ static inline void prevent_user_access(void __user *to, 
const void __user *from,
set_kuap(AMR_KUAP_BLOCKED);
 }
 
+static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write)
+{
+   return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
+   (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
+   "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
+}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index d7312defbe1c..28ad4654eed2 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -26,6 +26,7 @@ static inline void allow_user_access(void __user *to, const 
void __user *from,
 unsigned long size) { }
 static inline void prevent_user_access(void __user *to, const void __user 
*from,
   unsigned long size) { }
+static inline bool bad_kuap_fault(struct pt_regs *regs, bool is_write) { 
return false; }
 #endif /* CONFIG_PPC_KUAP */
 
 static inline void allow_read_from_user(const void __user *from, unsigned long 
size)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 463d1e9d026e..b5d3578d9f65 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline bool notify_page_fault(struct pt_regs *regs)
 {
@@ -224,7 +225,7 @@ static int mm_fault_error(struct pt_regs *regs, unsigned 
long addr,
 
 /* Is this a bad kernel fault ? */
 static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
-unsigned long address)
+unsigned long address, bool is_write)
 {
int is_exec = TRAP(regs) == 0x400;
 
@@ -235,6 +236,9 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned 
long error_code,
address >= TASK_SIZE ? "exec-protected" : 
"user",
address,
from_kuid(&init_user_ns, current_uid()));
+
+   // Kernel exec fault is always bad
+   return true;
}
 
if (!is_exec && address < TASK_SIZE && (error_code & DSISR_PROTFAULT) &&
@@ -244,7 +248,22 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
unsigned long error_code,
from_kuid(&init_user_ns, current_uid()));
}
 
-   return is_exec || (address >= TASK_SIZE) || 
!search_exception_tables(regs->nip);
+   // Kernel fault on kernel address is bad
+   if (address >= TASK_SIZE)
+   return true;
+
+   // Fault on user outside of certain regions (eg. copy_tofrom_user()) is 
bad
+   if (!search_exception_tables(regs->nip))
+   return true;
+
+   // Read/write fault in a valid region (the exception table search passed
+   // above), but blocked by KUAP is bad, it can never succeed.
+   if (bad_kuap_fault(regs, is_write))
+   return true;
+
+   // What's left? Kernel fault on us

[PATCH v6 09/10] powerpc/64s: Implement KUAP for Radix MMU

Kernel Userspace Access Prevention utilises a feature of the Radix MMU
which disallows read and write access to userspace addresses. By
utilising this, the kernel is prevented from accessing user data from
outside of trusted paths that perform proper safety checks, such as
copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when
performing an operation like copy_{to/from}_user(). The register that
controls this (AMR) does not prevent userspace from accessing itself,
so there is no need to save and restore when entering and exiting
userspace.

When entering the kernel from the kernel we save AMR and if it is not
blocking user access (because eg. we faulted doing a user access) we
reblock user access for the duration of the exception (ie. the page
fault) and then restore the AMR when returning back to the kernel.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y)
and performing the following:

  # (echo ACCESS_USERSPACE) > [debugfs]/provoke-crash/DIRECT

If enabled, this should send SIGSEGV to the thread.

We also add paranoid checking of AMR in switch and syscall return
under CONFIG_PPC_KUAP_DEBUG.

Co-authored-by: Michael Ellerman 
Signed-off-by: Russell Currey 
Signed-off-by: Michael Ellerman 
---

v6:
 - Add an isync() in setup_kuap() as noticed by Christophe.
 - Fixup the arguments to kuap_check_amr.
 - Check for NULL to/from in allow_user_access() and choose the right
   access permissions.
 - setup_kuap() can't be __init.
 - Minor whitespace fixups.

v5:
 - On kernel entry check if the AMR is already blocking user access
   and if so don't do the mtspr again, because it's slow (pointed out
   by Nick) (in kuap_save_amr_and_lock).
 - Rework the constants to make the asm a bit cleaner and avoid any
   hard coded shifts.
 - Selectively disable read or write, we don't support separately
   nesting read/write access (use allow_user_access() instead) and
   shouldn't need to (famous last words).
 - Add isync() before & after setting AMR in set_kuap() as per the
   ISA. We'll investigate whether they are both really needed in
   future.
 - Don't touch the AMR in hmi_exception_early() it never goes to
   virtual mode.
 - Check the full value in kuap_check_amr

v4:
 - Drop the unused paca flags.
 - Zero the UAMOR to be safe.
 - Save the AMR when we enter the kernel from the kernel and then lock
   it again.
 - Restore on the way back to the kernel.
 - That means we handle nesting of interrupts properly, ie. we are
   protected inside the page fault handler caused by a user access.
 - Add paranoid checking of AMR in switch and syscall return.
 - Add an isync() to prevent_user_access()
---
 .../powerpc/include/asm/book3s/64/kup-radix.h | 102 ++
 arch/powerpc/include/asm/exception-64s.h  |   2 +
 arch/powerpc/include/asm/feature-fixups.h |   3 +
 arch/powerpc/include/asm/kup.h|   4 +
 arch/powerpc/include/asm/mmu.h|  10 +-
 arch/powerpc/kernel/entry_64.S|  27 -
 arch/powerpc/kernel/exceptions-64s.S  |   3 +
 arch/powerpc/mm/pgtable-radix.c   |  19 
 arch/powerpc/mm/pkeys.c   |   1 +
 arch/powerpc/platforms/Kconfig.cputype|   8 ++
 10 files changed, 176 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
new file mode 100644
index ..6d6628424134
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
+#define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
+
+#include 
+
+#define AMR_KUAP_BLOCK_READUL(0x4000)
+#define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
+#define AMR_KUAP_SHIFT 62
+
+#ifdef __ASSEMBLY__
+
+.macro kuap_restore_amrgpr
+#ifdef CONFIG_PPC_KUAP
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   ld  \gpr, STACK_REGS_KUAP(r1)
+   mtspr   SPRN_AMR, \gpr
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+#endif
+.endm
+
+.macro kuap_check_amr gpr1, gpr2
+#ifdef CONFIG_PPC_KUAP_DEBUG
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   mfspr   \gpr1, SPRN_AMR
+   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
+   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+999:   tdne\gpr1, \gpr2
+   EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+#endif
+.endm
+
+.macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
+#ifdef CONFIG_PPC_KUAP
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   .ifnb \msr_pr_cr
+   bne \msr_pr_cr, 99f
+   .endif
+   mfspr   \gpr1, SPRN_AMR
+   std \gpr1, STACK_REGS_KUAP(r1

[PATCH v6 08/10] powerpc/lib: Refactor __patch_instruction() to use __put_user_asm()

From: Russell Currey 

__patch_instruction() is called in early boot, and uses
__put_user_size(), which includes the allow/prevent calls to enforce
KUAP, which could either be called too early, or in the Radix case,
forced to use "early_" versions of functions just to safely handle
this one case.

__put_user_asm() does not do this, and thus is safe to use both in
early boot, and later on since in this case it should only ever be
touching kernel memory.

__patch_instruction() was previously refactored to use
__put_user_size() in order to be able to return -EFAULT, which would
allow the kernel to patch instructions in userspace, which should
never happen. This has the functional change of causing faults on
userspace addresses if KUAP is turned on, which should never happen in
practice.

A future enhancement could be to double check the patch address is
definitely allowed to be tampered with by the kernel.

Signed-off-by: Russell Currey 
Signed-off-by: Michael Ellerman 
---
v6: Unchanged.
v5: Unchanged.
---
 arch/powerpc/lib/code-patching.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 506413a2c25e..42fdadac6587 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -26,9 +26,9 @@
 static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
   unsigned int *patch_addr)
 {
-   int err;
+   int err = 0;
 
-   __put_user_size(instr, patch_addr, 4, err);
+   __put_user_asm(instr, patch_addr, err, "stw");
if (err)
return err;
 
-- 
2.20.1

[PATCH v6 07/10] powerpc/mm/radix: Use KUEP API for Radix MMU