from:"LEROY Christophe"

Re: [PATCH v2 07/10] soc: fsl: cpm1: qmc: Introduce functions to get a channel from a phandle list

2024-07-12 Thread LEROY Christophe



Le 04/07/2024 à 05:01, Michael Ellerman a écrit :
> Mark Brown  writes:
>> On Mon, Jul 01, 2024 at 01:30:34PM +0200, Herve Codina wrote:
>>> qmc_chan_get_byphandle() and the resource managed version retrieve a
>>> channel from a simple phandle.
>>>
>>> Extend the API and introduce qmc_chan_get_byphandles_index() and the
>>> resource managed version in order to retrieve a channel from a phandle
>>> list using the provided index to identify the phandle in the list.
>>
>> These two PowerPC patches seem trivial enough and have got no response,
>> unless someone objects I'll go ahead and apply them.
> 
> Ack.
> 
> MAINTAINERS says:
> 
> FREESCALE QUICC ENGINE LIBRARY
> M:  Qiang Zhao 
> L:  linuxppc-dev@lists.ozlabs.org
> S:  Maintained
> F:  drivers/soc/fsl/qe/
> F:  include/soc/fsl/qe/
> 
> But I see no email from that address since January 2021:
> 
>https://lore.kernel.org/all/?q=f%3Aqiang.zhao%40nxp.com
> 
> And actually drivers/soc/fsl was marked orphan in April, maybe this
> should be also.
> 
> Or does Herve want to take over maintaining it?

We had some discussion about that in April, see 
https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756

Hervé has some of our hardware for a limited period of time because he 
is doing some implementation for us, but he won't keep that hardware on 
the long run.

I will send a patch to take over maintaining drivers/soc/fsl/

Christophe

Re: [PATCH v4 17/21] mm/mmap: Drop arch_unmap() call from all archs

2024-07-11 Thread LEROY Christophe



Le 11/07/2024 à 01:26, Liam R. Howlett a écrit :
> * LEROY Christophe  [240710 17:02]:
>>
>>
>> Le 10/07/2024 à 21:22, Liam R. Howlett a écrit :
>>> From: "Liam R. Howlett" 
>>>
>>> The arch_unmap call was previously moved above the rbtree modifications
>>> in commit 5a28fc94c914 ("x86/mpx, mm/core: Fix recursive munmap()
>>> corruption").  The move was motivated by an issue with calling
>>> arch_unmap() after the rbtree was modified.
>>>
>>> Since the above commit, mpx was dropped from the kernel in 45fc24e89b7c
>>> ("x86/mpx: remove MPX from arch/x86"), so the motivation for calling
>>> arch_unmap() prior to modifying the vma tree no longer exists
>>> (regardless of rbtree or maple tree implementations).
>>>
>>> Furthermore, the powerpc implementation is also no longer needed as per
>>> [1] and [2].  So the arch_unmap() function can be completely removed.
>>
>> I'm not sure to understand. Is it replaced by something else ?
>> We wanted to get rid of arch_unmap() but it was supposed to be replaced
>> by some core function because the functionnality itself is still
>> required and indeed all the discussion around [2] demonstrated that not
>> only powerpc but at least arm and probably others needed to properly
>> clean-up reference to VDSO mappings on unmapping.
>>
>> So as mentioned by Michael you can't just drop that without replacing it
>> by something else. We need the VDSO signal handling to properly fallback
>> on stack-based trampoline when the VDSO trampoline gets mapped out.
>
> I'll address this after the part I missed..

After ? What do you mean ? It needs to be addressed _before_ removing
arch_unmap()

>
>>
>> Or did I miss something ?
>>
>
> I think I missed something in regards to what you need in ppc.

It is not only powerpc. Powerpc is the only one doing it at the moment
but investigation has demonstrated that other architectures are affected.

>
>  From what I understand, other platforms still map and use the vdso
> (context.vdso is set), but unmap_arch() does nothing.  It is only the
> powerpc version that clears the vdso pointer if it is unmapped.

Yes on powerpc it works. On other platforms like arm it segfaults so it
should be fixed
(https://lore.kernel.org/lkml/87imd5h5kb@mpe.ellerman.id.au/)

Could be fixed by properly implementing arch_unmap() on every arch, or
carry-on with Dmitry's series.

>
> git grep -w arch_unmap shows:
> arch/powerpc/include/asm/mmu_context.h
> arch/x86/include/asm/mmu_context.h
> include/asm-generic/mm_hooks.h
> mm/mmap.c
>
> The generic and x86 versions are empty.
>
>  From the patch set you referenced, we see changes related to the files
> modified, but I don't think any of them did anything with unmap_arch().

In the v3 series from Dmitry, [PATCH v3 16/23] mm: Add vdso_base in
mm_struct
(https://lore.kernel.org/all/20210611180242.711399-17-d...@arista.com/)
it is done via special_mapping_close()

>
> arm: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all 
> architectures")
> arm64: d0fba04847ae ("arm64: vdso: Use generic union vdso_data_store")
> mips: d697a9997a0d ("MIPS: vdso: Use generic union vdso_data_store")
> s390: cb3444cfdb48 ("s390/vdso: Use generic union vdso_data_store")
> riscv: eba755314fa7 ("riscv: vdso: Use generic union vdso_data_store")
>
> ia64 is dead
> nds32 is dead
> hexagon has a bunch of vdso work in the logs as well.
>
> There is also a6c19dfe3994 ("arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove 
> default gate area")
>
> I do not see sparc changing away from what the patches were doing, but
> again, the arch_unmap() seems to do nothing there as well.
>
> So, what I was looking to do is to avoid a call to arch specific
> functions that does nothing but set the vdso pointer to NULL for
> powerpc.

That's what is doing Dmitry's series, removing arch_unmap() and replace
it with core handling. The advantage being that it addresses it for all
affected architectures, improving the current situation.

>
> The thread referenced in the git bug [1] seems to indicate this is for
> CRIU unmapping/restoring a task, but CRIU now just moves the vdso
> mapping (or just works on ppc at this point?).  Since [2] hasn't landed,
> isn't this still broken for CRIU on powerpc as it is?
>
> So, are we keeping the unmap_arch() function around, which has errors
> that were never fixed, for a single application that utilizes a newer
> method of moving the vdso anyways?

Again, we want to remove arch_unmap() but we want the core-mm to handle
it instead.

>
> On

Re: [PATCH v3 16/16] mm/mmap: Move may_expand_vm() check in mmap_region()

2024-07-10 Thread LEROY Christophe



Le 10/07/2024 à 18:09, Liam R. Howlett a écrit :
> * LEROY Christophe  [240710 08:59]:
>>
> ...
>>>>>>
>>>>>> Assuming the removal of the vdso does not cause the application to seg
>>>>>> fault, then the user visible change is that any vdso call after a failed
>>>>>> mmap(MAP_FIXED) call would result in a seg fault.  The only reason it
>>>>>> would fail is if the mapping process was attempting to map a large
>>>>>> enough area over the vdso (which is accounted and in the vma tree,
>>>>>> afaict) and ran out of memory. Note that this situation could arise
>>>>>> already since we could run out of memory (not accounting) after the
>>>>>> arch_unmap() call within the kernel.
>>>>>>
>>>>>> The code today can suffer the same fate, but not by the accounting
>>>>>> failure.  It can happen due to failure to allocate a new vma,
>>>>>> do_vmi_munmap() failure after the arch_unmap() call, or any of the other
>>>>>> failure scenarios later in the mmap_region() function.
>>>>>>
>>>>>> At the very least, this requires an expanded change log.
>>>>>
> ...
>
>>>>> I mean why are they unmapping the VDSO, why is that valid, why does it 
>>>>> need
>>>>> that field to be set to NULL, is it possible to signify that in some other
>>>>> way etc.?
>>>>
>>>> It was originally for CRIU. So a niche workload on a niche architecture.
>>>>
>>>> But from the commit that added it, it sounds like CRIU was using mremap,
>>>> which should be handled these days by vdso_mremap(). So it could be that
>>>> arch_unmap() is not actually needed for CRIU anymore.
>>>
>>> Oh that's interesting!
>>>
>>>>
>>>> Then I guess we have to decide if removing our arch_unmap() would be an
>>>> ABI break, regardless of whether CRIU needs it or not.
>>>
>>> Seems to me like an internal implementation detail that should hopefully
>>> not result in anything that should have visible ABI impact?
>>>
>>> I guess this is something we ought to assess. It would be useful to
>>> eliminate hooks where we can so we can better control VMA behaviour without
>>> having to worry about an arch being able to do arbitrary things at
>>> unexpected times, especially pertinent where we change the order of things.
>>>
>>
>> I see you are talking about arch_unmap(). I didn't follow the entire
>> discussion but we have some related stuff here:
>> https://github.com/linuxppc/issues/issues/241
>>
>> If I remember correctly arch_unmap() should have gone away we Dmitry's
>> series
>> https://lore.kernel.org/lkml/20210611180242.711399-1-d...@arista.com/#r
>> but it hasn't been applied yet.
>>
>
> That is good news!
>
> To review, ppc is the only arch using this now and it sounds like you
> want to remove it too.

Yes want to remove it but needs to be replaced by a more generic
core-based equivalent.

>
> Considering the age of that thread and the possibility of conflict with
> my series, can I drop the entire arch_unmap() function instead of
> modifying the handling in core mm?  I'm going to assume that's okay and
> start working on this for v4 (because there hasn't been a public reply
> for v4 since 2023/10/11).

drop it yes but not without implementing a replacement in core mm like
proposed by Dmitry.

>
> This would mean less arch-specific hooks and that's always a good idea.
>

Indeed.

Christophe

Re: [PATCH v4 17/21] mm/mmap: Drop arch_unmap() call from all archs

2024-07-10 Thread LEROY Christophe



Le 10/07/2024 à 21:22, Liam R. Howlett a écrit :
> From: "Liam R. Howlett" 
> 
> The arch_unmap call was previously moved above the rbtree modifications
> in commit 5a28fc94c914 ("x86/mpx, mm/core: Fix recursive munmap()
> corruption").  The move was motivated by an issue with calling
> arch_unmap() after the rbtree was modified.
> 
> Since the above commit, mpx was dropped from the kernel in 45fc24e89b7c
> ("x86/mpx: remove MPX from arch/x86"), so the motivation for calling
> arch_unmap() prior to modifying the vma tree no longer exists
> (regardless of rbtree or maple tree implementations).
> 
> Furthermore, the powerpc implementation is also no longer needed as per
> [1] and [2].  So the arch_unmap() function can be completely removed.

I'm not sure to understand. Is it replaced by something else ?
We wanted to get rid of arch_unmap() but it was supposed to be replaced 
by some core function because the functionnality itself is still 
required and indeed all the discussion around [2] demonstrated that not 
only powerpc but at least arm and probably others needed to properly 
clean-up reference to VDSO mappings on unmapping.

So as mentioned by Michael you can't just drop that without replacing it 
by something else. We need the VDSO signal handling to properly fallback 
on stack-based trampoline when the VDSO trampoline gets mapped out.

Or did I miss something ?

Christophe

> 
> Link: https://lore.kernel.org/lkml/20210611180242.711399-1-d...@arista.com/
> Link: https://github.com/linuxppc/issues/issues/241

Re: [PATCH v3 16/16] mm/mmap: Move may_expand_vm() check in mmap_region()

2024-07-10 Thread LEROY Christophe

Le 10/07/2024 à 14:45, Lorenzo Stoakes a écrit :
> On Wed, Jul 10, 2024 at 10:28:01PM GMT, Michael Ellerman wrote:
>> Lorenzo Stoakes  writes:
>>> On Mon, Jul 08, 2024 at 04:43:15PM GMT, Liam R. Howlett wrote:

>> ...
 The functionality here has changed
 --- from ---
 may_expand_vm() check
 can_modify_mm() check
 arch_unmap()
 vms_gather_munmap_vmas()
 ...

 --- to ---
 can_modify_mm() check
 arch_unmap()
 vms_gather_munmap_vmas()
 may_expand_vm() check
 ...

 vms_gather_munmap_vmas() does nothing but figures out what to do later,
 but could use memory and can fail.

 The user implications are:

 1. The return type on the error may change to -EPERM from -ENOMEM, if
 you are not allowed to expand and are trying to overwrite mseal()'ed
 VMAs. That seems so very rare that I'm not sure it's worth mentioning.

 2. arch_unmap() called prior to may_expand_vm().
 powerpc uses this to set mm->context.vdso = NULL if mm->context.vdso is
 within the unmap range.  User implication of this means that an
 application my set the vdso to NULL prior to hitting the -ENOMEM case in
 may_expand_vm() due to the address space limit.

 Assuming the removal of the vdso does not cause the application to seg
 fault, then the user visible change is that any vdso call after a failed
 mmap(MAP_FIXED) call would result in a seg fault.  The only reason it
 would fail is if the mapping process was attempting to map a large
 enough area over the vdso (which is accounted and in the vma tree,
 afaict) and ran out of memory. Note that this situation could arise
 already since we could run out of memory (not accounting) after the
 arch_unmap() call within the kernel.

 The code today can suffer the same fate, but not by the accounting
 failure.  It can happen due to failure to allocate a new vma,
 do_vmi_munmap() failure after the arch_unmap() call, or any of the other
 failure scenarios later in the mmap_region() function.

 At the very least, this requires an expanded change log.
>>>
>>> Indeed, also (as mentioned on IRC) I feel like we need to look at whether
>>> we _truly_ need this arch_unmap() call for a single, rather antiquated,
>>> architecture.
>>
>> You can call it "niche" or "irrelevant" or "fringe", but "antiquated" is
>> factually wrong :) Power10 came out of the fab just a few years ago at
>> 7nm.
> 
> Fair point ;) perhaps we could go with "rarified"? :>)
> 
>>
>>> I mean why are they unmapping the VDSO, why is that valid, why does it need
>>> that field to be set to NULL, is it possible to signify that in some other
>>> way etc.?
>>
>> It was originally for CRIU. So a niche workload on a niche architecture.
>>
>> But from the commit that added it, it sounds like CRIU was using mremap,
>> which should be handled these days by vdso_mremap(). So it could be that
>> arch_unmap() is not actually needed for CRIU anymore.
> 
> Oh that's interesting!
> 
>>
>> Then I guess we have to decide if removing our arch_unmap() would be an
>> ABI break, regardless of whether CRIU needs it or not.
> 
> Seems to me like an internal implementation detail that should hopefully
> not result in anything that should have visible ABI impact?
> 
> I guess this is something we ought to assess. It would be useful to
> eliminate hooks where we can so we can better control VMA behaviour without
> having to worry about an arch being able to do arbitrary things at
> unexpected times, especially pertinent where we change the order of things.
> 

I see you are talking about arch_unmap(). I didn't follow the entire 
discussion but we have some related stuff here: 
https://github.com/linuxppc/issues/issues/241

If I remember correctly arch_unmap() should have gone away we Dmitry's 
series 
https://lore.kernel.org/lkml/20210611180242.711399-1-d...@arista.com/#r 
but it hasn't been applied yet.

Christophe

Re: [PATCH v6 21/23] powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD

2024-06-24 Thread LEROY Christophe



Le 25/06/2024 à 06:49, Nicholas Piggin a écrit :
> On Tue Jun 25, 2024 at 12:45 AM AEST, Christophe Leroy wrote:
>> On book3s/64, the only user of hugepd is hash in 4k mode.
>>
>> All other setups (hash-64, radix-4, radix-64) use leaf PMD/PUD.
>>
>> Rework hash-4k to use contiguous PMD and PUD instead.
>>
>> In that setup there are only two huge page sizes: 16M and 16G.
>>
>> 16M sits at PMD level and 16G at PUD level.
>>
>> pte_update doesn't know page size, lets use the same trick as
>> hpte_need_flush() to get page size from segment properties. That's
>> not the most efficient way but let's do that until callers of
>> pte_update() provide page size instead of just a huge flag.
>>
>> Signed-off-by: Christophe Leroy 
> 
> [snip]
> 
>> +static inline unsigned long hash__pte_update(struct mm_struct *mm,
>> + unsigned long addr,
>> + pte_t *ptep, unsigned long clr,
>> + unsigned long set,
>> + int huge)
>> +{
>> +unsigned long old;
>> +
>> +old = hash__pte_update_one(ptep, clr, set);
>> +
>> +if (IS_ENABLED(CONFIG_PPC_4K_PAGES) && huge) {
>> +unsigned int psize = get_slice_psize(mm, addr);
>> +int nb, i;
>> +
>> +if (psize == MMU_PAGE_16M)
>> +nb = SZ_16M / PMD_SIZE;
>> +else if (psize == MMU_PAGE_16G)
>> +nb = SZ_16G / PUD_SIZE;
>> +else
>> +nb = 1;
>> +
>> +WARN_ON_ONCE(nb == 1);  /* Should never happen */
>> +
>> +for (i = 1; i < nb; i++)
>> +hash__pte_update_one(ptep + i, clr, set);
>> +}
>>  /* huge pages use the old page table lock */
>>  if (!huge)
>>  assert_pte_locked(mm, addr);
>>   
>> -old = be64_to_cpu(old_be);
>>  if (old & H_PAGE_HASHPTE)
>>  hpte_need_flush(mm, addr, ptep, old, huge);
>>   
> 
> We definitely need a bit more comment and changelog about the atomicity
> issues here. I think the plan should be all hash-side access just
> operates on PTE[0], which should avoid that whole race. There could be
> some cases that don't follow that. Adding some warnings to catch such
> things could be good too.

That seems to be the case indeed, as we have the following in 
hash_page_mm():

#ifndef CONFIG_PPC_64K_PAGES
/*
 * If we use 4K pages and our psize is not 4K, then we might
 * be hitting a special driver mapping, and need to align the
 * address before we fetch the PTE.
 *
 * It could also be a hugepage mapping, in which case this is
 * not necessary, but it's not harmful, either.
 */
if (psize != MMU_PAGE_4K)
ea &= ~((1ul << mmu_psize_defs[psize].shift) - 1);
#endif /* CONFIG_PPC_64K_PAGES */


> 
> I'd been meaning to do more on this sooner, sorry. I've started
> tinkering with adding a bit of debug code. I'll see if I can help with
> adding a bit of comments.

Yes would we very welcome, I guess you'll send it as followup/fixup 
patch to the series ?

> 
> [snip]
> 
>> diff --git a/arch/powerpc/mm/book3s64/hugetlbpage.c 
>> b/arch/powerpc/mm/book3s64/hugetlbpage.c
>> index 5a2e512e96db..83c3361b358b 100644
>> --- a/arch/powerpc/mm/book3s64/hugetlbpage.c
>> +++ b/arch/powerpc/mm/book3s64/hugetlbpage.c
>> @@ -53,6 +53,16 @@ int __hash_page_huge(unsigned long ea, unsigned long 
>> access, unsigned long vsid,
>>  /* If PTE permissions don't match, take page fault */
>>  if (unlikely(!check_pte_access(access, old_pte)))
>>  return 1;
>> +/*
>> + * If hash-4k, hugepages use seeral contiguous PxD entries
>> + * so bail out and let mm make the page young or dirty
>> + */
>> +if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) {
>> +if (!(old_pte & _PAGE_ACCESSED))
>> +return 1;
>> +if ((access & _PAGE_WRITE) && !(old_pte & _PAGE_DIRTY))
>> +return 1;
>> +}
>>   
>>  /*
>>   * Try to lock the PTE, add ACCESSED and DIRTY if it was
> 
> I'm hoping we wouldn't have to do this, if we follow the PTE[0] rule.

But we still need all entries to be updated so that page walker which 
don't know they must use PTE[0] get the right information ?

> 
> I think is minor enough that should not prevent testing in -mm.

Thanks.

> 
> Thanks,
> Nick

Christophe

Re: [PATCH v5 16/18] powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD

2024-06-24 Thread LEROY Christophe



Le 13/06/2024 à 09:39, Oscar Salvador a écrit :
> On Mon, Jun 10, 2024 at 07:55:01AM +0200, Christophe Leroy wrote:
>> On book3s/64, the only user of hugepd is hash in 4k mode.
>>
>> All other setups (hash-64, radix-4, radix-64) use leaf PMD/PUD.
>>
>> Rework hash-4k to use contiguous PMD and PUD instead.
>>
>> In that setup there are only two huge page sizes: 16M and 16G.
>>
>> 16M sits at PMD level and 16G at PUD level.
>>
>> pte_update doesn't know page size, lets use the same trick as
>> hpte_need_flush() to get page size from segment properties. That's
>> not the most efficient way but let's do that until callers of
>> pte_update() provide page size instead of just a huge flag.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
> ...
>> +static inline unsigned long hash__pte_update(struct mm_struct *mm,
>> + unsigned long addr,
>> + pte_t *ptep, unsigned long clr,
>> + unsigned long set,
>> + int huge)
>> +{
>> +unsigned long old;
>> +
>> +old = hash__pte_update_one(ptep, clr, set);
>> +
>> +if (IS_ENABLED(CONFIG_PPC_4K_PAGES) && huge) {
>> +unsigned int psize = get_slice_psize(mm, addr);
>> +int nb, i;
>> +
>> +if (psize == MMU_PAGE_16M)
>> +nb = SZ_16M / PMD_SIZE;
>> +else if (psize == MMU_PAGE_16G)
>> +nb = SZ_16G / PUD_SIZE;
>> +else
>> +nb = 1;
> 
> Although that might be a safe default, it might carry consequences down the 
> road?
> It might not, but if we reach that, something went wrong, so I would put a
> WARN_ON_ONCE at least.

Ah right you said it already and I forgot it.

It is now in v6.

> 
>> --- a/arch/powerpc/mm/book3s64/hugetlbpage.c
>> +++ b/arch/powerpc/mm/book3s64/hugetlbpage.c
>> @@ -53,6 +53,16 @@ int __hash_page_huge(unsigned long ea, unsigned long 
>> access, unsigned long vsid,
>>  /* If PTE permissions don't match, take page fault */
>>  if (unlikely(!check_pte_access(access, old_pte)))
>>  return 1;
>> +/*
>> + * If hash-4k, hugepages use seeral contiguous PxD entries
>> + * so bail out and let mm make the page young or dirty
>> + */
>> +if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) {
>> +if (!(old_pte & _PAGE_ACCESSED))
>> +return 1;
>> +if ((access & _PAGE_WRITE) && !(old_pte & _PAGE_DIRTY))
>> +return 1;
>> +}
> 
> You mentioned that we need to bail out otherwise only the first PxD would be
> updated.
> In the comment you say that mm will take care of making the page young
> or dirty.
> Does this mean that the PxDs underneath will not have its bits updated?
>   
> 

No it just means that mm will do it properly by using 
huge_ptep_set_access_flags()

Christophe

Re: [PATCH 07/15] parisc: use generic sys_fanotify_mark implementation

2024-06-20 Thread LEROY Christophe



Le 20/06/2024 à 23:21, Helge Deller a écrit :
> [Vous ne recevez pas souvent de courriers de del...@gmx.de. Découvrez
> pourquoi ceci est important à
> https://aka.ms/LearnAboutSenderIdentification ]
>
> On 6/20/24 18:23, Arnd Bergmann wrote:
>> From: Arnd Bergmann 
>>
>> The sys_fanotify_mark() syscall on parisc uses the reverse word order
>> for the two halves of the 64-bit argument compared to all syscalls on
>> all 32-bit architectures. As far as I can tell, the problem is that
>> the function arguments on parisc are sorted backwards (26, 25, 24, 23,
>> ...) compared to everyone else,
>
> r26 is arg0, r25 is arg1, and so on.
> I'm not sure I would call this "sorted backwards".
> I think the reason is simply that hppa is the only 32-bit big-endian
> arch left...

powerpc/32 is big-endian: r3 is arg0, r4 is arg1, ... r10 is arg7.

In case of a 64bits arg, r3 is the high part and r4 is the low part.

Christophe

>
>> so the calling conventions of using an
>> even/odd register pair in native word order result in the lower word
>> coming first in function arguments, matching the expected behavior
>> on little-endian architectures. The system call conventions however
>> ended up matching what the other 32-bit architectures do.
>>
>> A glibc cleanup in 2020 changed the userspace behavior in a way that
>> handles all architectures consistently, but this inadvertently broke
>> parisc32 by changing to the same method as everyone else.
>
> I appreciate such cleanups to make arches consistent.
> But it's bad if breakages aren't noticed or reported then...
>
>> The change made it into glibc-2.35 and subsequently into debian 12
>> (bookworm), which is the latest stable release. This means we
>> need to choose between reverting the glibc change or changing the
>> kernel to match it again, but either hange will leave some systems
>> broken.
>>
>> Pick the option that is more likely to help current and future
>> users and change the kernel to match current glibc.
>
> Agreed (assuming we have really a problem on parisc).
>
>> This also
>> means the behavior is now consistent across architectures, but
>> it breaks running new kernels with old glibc builds before 2.35.
>>
>> Link:
>> https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=d150181d73d9
>> Link:
>> https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/arch/parisc/kernel/sys_parisc.c?h=57b1dfbd5b4a39d
>> Cc: Adhemerval Zanella 
>> Signed-off-by: Arnd Bergmann 
>> ---
>> I found this through code inspection, please double-check to make
>> sure I got the bug and the fix right.
>
> The patch looks good at first sight.
> I'll pick it up in my parisc git tree and will do some testing the
> next few days and then push forward for 6.11 when it opens
>
> Thank you!!
>
> Helge
>
>> The alternative is to fix this by reverting glibc back to the
>> unusual behavior.
>> ---
>>   arch/parisc/Kconfig | 1 +
>>   arch/parisc/kernel/sys_parisc32.c   | 9 -
>>   arch/parisc/kernel/syscalls/syscall.tbl | 2 +-
>>   3 files changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
>> index daafeb20f993..dc9b902de8ea 100644
>> --- a/arch/parisc/Kconfig
>> +++ b/arch/parisc/Kconfig
>> @@ -16,6 +16,7 @@ config PARISC
>>   select ARCH_HAS_UBSAN
>>   select ARCH_HAS_PTE_SPECIAL
>>   select ARCH_NO_SG_CHAIN
>> + select ARCH_SPLIT_ARG64 if !64BIT
>>   select ARCH_SUPPORTS_HUGETLBFS if PA20
>>   select ARCH_SUPPORTS_MEMORY_FAILURE
>>   select ARCH_STACKWALK
>> diff --git a/arch/parisc/kernel/sys_parisc32.c
>> b/arch/parisc/kernel/sys_parisc32.c
>> index 2a12a547b447..826c8e51b585 100644
>> --- a/arch/parisc/kernel/sys_parisc32.c
>> +++ b/arch/parisc/kernel/sys_parisc32.c
>> @@ -23,12 +23,3 @@ asmlinkage long sys32_unimplemented(int r26, int
>> r25, int r24, int r23,
>>   current->comm, current->pid, r20);
>>   return -ENOSYS;
>>   }
>> -
>> -asmlinkage long sys32_fanotify_mark(compat_int_t fanotify_fd,
>> compat_uint_t flags,
>> - compat_uint_t mask0, compat_uint_t mask1, compat_int_t dfd,
>> - const char  __user * pathname)
>> -{
>> - return sys_fanotify_mark(fanotify_fd, flags,
>> - ((__u64)mask1 << 32) | mask0,
>> -  dfd, pathname);
>> -}
>> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl
>> b/arch/parisc/kernel/syscalls/syscall.tbl
>> index 39e67fab7515..66dc406b12e4 100644
>> --- a/arch/parisc/kernel/syscalls/syscall.tbl
>> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
>> @@ -364,7 +364,7 @@
>>   320 common  accept4 sys_accept4
>>   321 common  prlimit64   sys_prlimit64
>>   322 common  fanotify_init   sys_fanotify_init
>> -323  common  fanotify_mark   sys_fanotify_mark
>> sys32_fanotify_mark
>> +323  common  fanotify_mark   sys_fanotify_mark
>> compat_sys_fanotify_mark
>>   324 32  clock_adjtime   sys_clock_adjtime32

Re: [PATCH v12 00/11] Support page table check PowerPC

2024-06-18 Thread LEROY Christophe



Le 02/04/2024 à 07:11, Rohan McLure a écrit :
> Support page table check on all PowerPC platforms. This works by
> serialising assignments, reassignments and clears of page table
> entries at each level in order to ensure that anonymous mappings
> have at most one writable consumer, and likewise that file-backed
> mappings are not simultaneously also anonymous mappings.
> 
> In order to support this infrastructure, a number of stubs must be
> defined for all powerpc platforms. Additionally, seperate set_pte_at()
> and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

Seems like this series doesn't apply, patch 1 has conflict with RISCV, 
patch 2 with mm and RISCV.

Please rebase as appropriate.

Christophe

> 
> v12:
>   * Rename commits that revert changes to instead reflect that we are
> reinstating old behaviour due to it providing more flexibility
>   * Add return line to pud_pfn() stub
>   * Instrument ptep_get_and_clear() for nohash
> 
> v11:
>   * The pud_pfn() stub, which previously had no legitimate users on any
> powerpc platform, now has users in Book3s64 with transparent pages.
> Include a stub of the same name for each platform that does not
> define their own.
>   * Drop patch that standardised use of p*d_leaf(), as already included
> upstream in v6.9.
>   * Provide fallback definitions of p{m,u}d_user_accessible_page() that
> do not reference p*d_leaf(), p*d_pte(), as they are defined after
> powerpc/mm headers by linux/mm headers.
>   * Ensure that set_pte_at_unchecked() has the same checks as
> set_pte_at().
> Link: 
> https://lore.kernel.org/linuxppc-dev/20240328045535.194800-14-rmcl...@linux.ibm.com/
> 
> v10:
>   * Revert patches that removed address and mm parameters from page table
> check routines, including consuming code from arm64, x86_64 and
> riscv.
>   * Implement *_user_accessible_page() routines in terms of pte_user()
> where available (64-bit, book3s) but otherwise by checking the
> address (on platforms where the pte does not imply whether the
> mapping is for user or kernel)
>   * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
> is identical, but prevents double instrumentation.
> Link: 
> https://lore.kernel.org/linuxppc-dev/20240313042118.230397-9-rmcl...@linux.ibm.com/T/
> 
> v9:
>   * Adapt to using the set_ptes() API, using __set_pte_at() where we need
> must avoid instrumentation.
>   * Use the logic of *_access_permitted() for implementing
> *_user_accessible_page(), which are required routines for page table
> check.
>   * Even though we no longer need p{m,u,4}d_leaf(), still default
> implement these to assist in refactoring out extant
> p{m,u,4}_is_leaf().
>   * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
> page table check wants all *user_accessible_page() variants, and we
> would like to default implement the variants in terms of
> pte_user_accessible_page().
>   * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
> its constituent calls instead for radix and hash.
> Link: 
> https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/
> 
> v8:
>   * Fix linux/page_table_check.h include in asm/pgtable.h breaking
> 32-bit.
> Link: 
> https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/
> 
> v7:
>   * Remove use of extern in set_pte prototypes
>   * Clean up pmdp_collapse_flush macro
>   * Replace set_pte_at with static inline function
>   * Fix commit message for patch 7
> Link: 
> https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/
> 
> v6:
>   * Support huge pages and p{m,u}d accounting.
>   * Remove instrumentation from set_pte from kernel internal pages.
>   * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
> as access to the mm_struct * is required.
> Link: 
> https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/
> 
> v5:
> Link: 
> https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/
> 
> Rohan McLure (11):
>mm/page_table_check: Reinstate address parameter in
>  [__]page_table_check_pud_set()
>mm/page_table_check: Reinstate address parameter in
>  [__]page_table_check_pmd_set()
>mm/page_table_check: Provide addr parameter to
>  page_table_check_pte_set()
>mm/page_table_check: Reinstate address parameter in
>  [__]page_table_check_pud_clear()
>mm/page_table_check: Reinstate address parameter in
>  [__]page_table_check_pmd_clear()
>mm/page_table_check: Reinstate address parameter in
>  [__]page_table_check_pte_clear()
>mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
>powerpc: mm: Add pud_pfn() stub
>poweprc: mm: Implement *_user_accessible_page() for ptes
>powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal

Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry

2024-06-13 Thread LEROY Christophe



Le 13/06/2024 à 09:19, Oscar Salvador a écrit :
> On Tue, Jun 11, 2024 at 07:00:14PM +0000, LEROY Christophe wrote:
>> We have space available in PMD if we need more flags, but in PTE I can't
>> see anything possible without additional churn that would require
>> additional instructions in TLB miss handlers, which is what I want to
>> avoid most.
>>
>> Bits mapped to HW PTE:
>>
>> #define _PAGE_PRESENT0x0001  /* V: Page is valid */
>> #define _PAGE_NO_CACHE   0x0002  /* CI: cache inhibit */
>> #define _PAGE_SH 0x0004  /* SH: No ASID (context) compare */
>> #define _PAGE_SPS0x0008  /* SPS: Small Page Size (1 if 16k, 512k or 8M)*/
>> #define _PAGE_DIRTY  0x0100  /* C: page changed */
>> #define _PAGE_NA 0x0200  /* Supervisor NA, User no access */
>> #define _PAGE_RO 0x0600  /* Supervisor RO, User no access */
>>
>> SW bits masked out in TLB miss handler:
>>
>> #define _PAGE_GUARDED0x0010  /* Copied to L1 G entry in DTLB */
>> #define _PAGE_ACCESSED   0x0020  /* Copied to L1 APG 1 entry in I/DTLB */
>> #define _PAGE_EXEC   0x0040  /* Copied to PP (bit 21) in ITLB */
>> #define _PAGE_SPECIAL0x0080  /* SW entry */
>> #define _PAGE_HUGE   0x0800  /* Copied to L1 PS bit 29 */
>>
>> All bits are used. The only thing would could do but that would have a
>> performance cost is to retrieve _PAGE_SH from the PMD and use that bit
>> for something else.
> 
> I guess that this would be the last resort if we run out of options.
> But at least it is good to know that there is a plan B (or Z if you will
> :-))
> 
>> But I was maybe thinking another way. Lets take the exemple of
>> pmd_write() helper:
>>
>> #define pmd_write(pmd)   pte_write(pmd_pte(pmd))
>>
>> At the time being we have
>>
>> static inline pte_t pmd_pte(pmd_t pmd)
>> {
>>  return __pte(pmd_val(pmd));
>> }
>>
>> But what about something like
>>
>> static inline pte_t pmd_pte(pmd_t pmd)
>> {
>>  return *(pte_t *)pmd_page_vaddr(pmd);
>> }
> 
> I think this could work, yes.
> 
> So, we should define all pmd_*(pmd) operations for 8xx the way they are 
> defined
> in include/asm/book3s/64/pgtable.h.
> 
> Other page size would not interfere because they already can perform
> operations on pte level.
> 
> Ok, I think we might have a shot here.
> 
> I would help testing, but I do not have 8xx hardware, and Qemu does not 
> support
> 8xx emulation, but I think that if we are careful enough, this can work.
> 
> Actually, as a smoketest would be enough to have a task with a 8MB huge
> mapped, and then do:
> 
>   static const struct mm_walk_ops test_walk_ops = {
>   .pmd_entry = test_8mbp_hugepage,
>   .pte_entry = test_16k_and_512k_hugepage,
>   .hugetlb_entry = check_hugetlb_entry,
>   .walk_lock = PGWALK_RDLOCK,
>   };
> 
>   static int test(void)
>   {
>   
>pr_info("%s: %s [0 - %lx]\n", __func__, current->comm, TASK_SIZE);
>mmap_read_lock(current->mm);
>ret = walk_page_range(current->mm, 0, TASK_SIZE, _walk_ops, 
> NULL);
>mmap_read_unlock(current->mm);
>
>pr_info("%s: %s ret: %d\n", __func__, current->comm, ret);
>
>return 0;
>   }
> 
> This is an extract of a debugging mechanism I have to check that I am
> not going off rails when unifying hugetlb and normal walkers.
> 
> test_8mbp_hugepage() could so some checks with pmd_ operations, print
> the results, and then compare them with those that check_hugetlb_entry()
> would give us.
> If everything is alright, both results should be the same.
> 
> I can write the tests up, so we run some sort of smoketests.
> 
> So yes, I do think that this is a good initiative.
> 


I can test whatever you want on my 8xx boards.

I have two types of board:
- One with MPC866 microcontroller and 32Mbytes memory
- One with MPC885 microcontroller and 128Mbytes memory

Christophe

Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry

2024-06-11 Thread LEROY Christophe



Le 11/06/2024 à 18:10, Oscar Salvador a écrit :
> [Vous ne recevez pas souvent de courriers de osalva...@suse.de. D?couvrez 
> pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Tue, Jun 11, 2024 at 11:20:01AM -0400, Peter Xu wrote:
>> On Tue, Jun 11, 2024 at 05:08:45PM +0200, Oscar Salvador wrote:
>>> The problem is that we do not have spare bits for 8xx to mark these ptes
>>> as cont-ptes or mark them pte as 8MB, so I do not see a clear path on how
>>> we could remove huge_ptep_get for 8xx.
>>
>> Right, I remember I thought about this too when I initially looked at one
>> previous version of the series, I didn't come up yet with a good solution,
>> but I guess we probably need to get rid of hugepd first anyway.  We may
>> somehow still need to identify this is a 8M large leaf, and I guess this is
>> again the only special case where contpte can go over >1 pmds.
> 
> Yes, we definitely need first to get rid of hugepd, which is a huge
> step, and one step closer to our goal, but at some point we will have to
> see what can we do about 8MB cont-ptes for 8xx and how to mark them,
> so ptep_get can work the same way as e.g: arm64
> (ptep_get->contpte_ptep_get).
> 
> @Christophe: Can you think of a way to flag those ptes? (e.g: a
> combination of bits etc)

We have space available in PMD if we need more flags, but in PTE I can't 
see anything possible without additional churn that would require 
additional instructions in TLB miss handlers, which is what I want to 
avoid most.

Bits mapped to HW PTE:

#define _PAGE_PRESENT   0x0001  /* V: Page is valid */
#define _PAGE_NO_CACHE  0x0002  /* CI: cache inhibit */
#define _PAGE_SH0x0004  /* SH: No ASID (context) compare */
#define _PAGE_SPS   0x0008  /* SPS: Small Page Size (1 if 16k, 512k or 8M)*/
#define _PAGE_DIRTY 0x0100  /* C: page changed */
#define _PAGE_NA0x0200  /* Supervisor NA, User no access */
#define _PAGE_RO0x0600  /* Supervisor RO, User no access */

SW bits masked out in TLB miss handler:

#define _PAGE_GUARDED   0x0010  /* Copied to L1 G entry in DTLB */
#define _PAGE_ACCESSED  0x0020  /* Copied to L1 APG 1 entry in I/DTLB */
#define _PAGE_EXEC  0x0040  /* Copied to PP (bit 21) in ITLB */
#define _PAGE_SPECIAL   0x0080  /* SW entry */
#define _PAGE_HUGE  0x0800  /* Copied to L1 PS bit 29 */

All bits are used. The only thing would could do but that would have a 
performance cost is to retrieve _PAGE_SH from the PMD and use that bit 
for something else.

But I was maybe thinking another way. Lets take the exemple of 
pmd_write() helper:

#define pmd_write(pmd)  pte_write(pmd_pte(pmd))

At the time being we have

static inline pte_t pmd_pte(pmd_t pmd)
{
return __pte(pmd_val(pmd));
}

But what about something like

static inline pte_t pmd_pte(pmd_t pmd)
{
return *(pte_t *)pmd_page_vaddr(pmd);
}

Would it do the trick ?

Of course it would require to carefully make sure all accesses are done 
through pmd_pte().

Would that work ?



Christophe

Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry

2024-06-11 Thread LEROY Christophe



Le 11/06/2024 à 17:08, Oscar Salvador a écrit :
> [Vous ne recevez pas souvent de courriers de osalva...@suse.de. D?couvrez 
> pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Tue, Jun 11, 2024 at 10:17:30AM -0400, Peter Xu wrote:
>> Oscar,
>>
>> On Tue, Jun 11, 2024 at 11:34:23AM +0200, Oscar Salvador wrote:
>>> Which means that they would be caught in the following code:
>>>
>>>  ptl = pmd_huge_lock(pmd, vma);
>>>  if (ptl) {
>>>  - 8MB hugepages will be handled here
>>>  smaps_pmd_entry(pmd, addr, walk);
>>>  spin_unlock(ptl);
>>>  }
>>>  /* pte stuff */
>>>  ...
>>
>> Just one quick comment: I think there's one challenge though as this is
>> also not a generic "pmd leaf", but a pgtable page underneath.  I think it
>> means smaps_pmd_entry() won't trivially work here, e.g., it will start to
>> do this:
>>
>>if (pmd_present(*pmd)) {
>>page = vm_normal_page_pmd(vma, addr, *pmd);
>>
>> Here vm_normal_page_pmd() will only work if pmd_leaf() satisfies its
>> definition as:
>>
>>   * - It should contain a huge PFN, which points to a huge page larger than
>>   *   PAGE_SIZE of the platform.  The PFN format isn't important here.
>>
>> But now it's a pgtable page, containing cont-ptes.  Similarly, I think most
>> pmd_*() helpers will stop working there if we report it as a leaf.
> 
> Heh, I think I managed to confuse myself.
> I do not why but I thought that
> 
>   static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, 
> pte_t *ptep)
>   {
>  if (ptep_is_8m_pmdp(mm, addr, ptep))
>   ptep = pte_offset_kernel((pmd_t *)ptep, 0);
>  return ptep_get(ptep);
>   }
> 
> would return the address of the pmd for 8MB hugepages, but it will
> return the address of the first pte?
> 
> Then yeah, this will not work as I thought.
> 
> The problem is that we do not have spare bits for 8xx to mark these ptes
> as cont-ptes or mark them pte as 8MB, so I do not see a clear path on how
> we could remove huge_ptep_get for 8xx.
> 
> I am really curious though how we handle that for THP? Or THP on 8xx
> does not support that size?

8xx doesn't support THP, as far as I know THP is only supported on 
single leaf PMD/PUD, not on cont-PUD/PMD allthough there is some work in 
progress on arm64 to add that.

But honestly I'm not too much interested in 8M transparent pages, what 
I'd like to have first is 512k transparent pages.

Christophe

Re: [PATCH v5 02/18] mm: Define __pte_leaf_size() to also take a PMD entry

2024-06-11 Thread LEROY Christophe



Le 11/06/2024 à 11:34, Oscar Salvador a écrit :
> [Vous ne recevez pas souvent de courriers de osalva...@suse.de. D?couvrez 
> pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Mon, Jun 10, 2024 at 07:54:47AM +0200, Christophe Leroy wrote:
>> On powerpc 8xx, when a page is 8M size, the information is in the PMD
>> entry. So allow architectures to provide __pte_leaf_size() instead of
>> pte_leaf_size() and provide the PMD entry to that function.
>>
>> When __pte_leaf_size() is not defined, define it as a pte_leaf_size()
>> so that architectures not interested in the PMD arguments are not
>> impacted.
>>
>> Only define a default pte_leaf_size() when __pte_leaf_size() is not
>> defined to make sure nobody adds new calls to pte_leaf_size() in the
>> core.
> 
> Hi Christophe,
> 
> Now I am going to give you a hard time, so sorry in advance.
> I should have raised this before, but I was not fully aware of it.
> 
> There is an ongoing effort of unifying pagewalkers [1], so hugetlb does not 
> have
> to be special-cased anymore, and the operations we do for THP on page-table 
> basis
> work for hugetlb as well.
> 
> The most special bit about this is huge_ptep_get.
> huge_ptep_get() gets special handled on arm/arm64/riscv and s390.
> 
> arm64 and riscv is about cont-pmd/pte and propagate the dirty/young bits 
> bits, so that
> is fine as walkers can already understand that.
> s390 is a funny one because it converts pud/pmd to pte and viceversa, because 
> hugetlb
> *works* with ptes, so before returning the pte it has to transfer all
> bits from PUD/PMD level into a something that PTE level can understand.
> As you can imagine, this can be gone as we already have all the
> information in PUD/PMD and that is all pagewalkers need.
> 
> But we are left with the one you will introduce in patch#8.
> 
> 8MB pages get mapped as cont-pte, but all the information is stored in
> the PMD entries (size, dirtiness, present etc).

I'm not sure I understand what you mean.

In my case, the PMD entry is almost standard, the only thing it contains 
is a bit telling that the pointed PTEs are to be mapped 8M.

> huge_ptep_get() will return the PMD for 8MB, and so all operations hugetlb
> code performs with what huge_ptep_get returns will be performed on those PMDs.

Indeed no, my huge_ptep_get() doesn't return the PMD but the PTE.

> 
> Which brings me to this point:
> 
> I do not think __pte_leaf_size is needed. AFAICS, it should be enough to 
> define
> pmd_leaf on 8xx, and return 8MB if it is a 8MB hugepage.

If I declare it as a PMD leaf, then many places will expect the PTE 
entry to be the PMD entry, which is not the case here. As far as I 
understand, in order that the walker walks down to the page table, we 
need it flaged as non-leaf by PMD-leaf.

> 
> #define pmd_leaf pmd_leaf
> static inline bool pmd_leaf(pmd_t pmd)
> {
>return pmd_val(pmd) & _PMD_PAGE_8M);
> }
> 
> and then pmd_leaf_size to return _PMD_PAGE_8M.
> 
> This will help because on the ongoing effort of unifying hugetlb and
> getting rid of huge_ptep_get() [1], pagewalkers will stumble upon the
> 8mb-PMD as they do for regular PMDs.

But AFAIU, it won't work that simple, because *pmd is definitely not a 
PTE but still a pointer to a page table which contains the PTE.

> 
> Which means that they would be caught in the following code:
> 
>  ptl = pmd_huge_lock(pmd, vma);
>  if (ptl) {
>  - 8MB hugepages will be handled here
>  smaps_pmd_entry(pmd, addr, walk);
>  spin_unlock(ptl);
>  }
>  /* pte stuff */
>  ...
> 
> where pmd_huge_lock is:
> 
>   static inline spinlock_t *pmd_huge_lock(pmd_t *pmd, struct vm_area_struct 
> *vma)
>   {
>  spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
> 
>  if (pmd_leaf(*pmd) || pmd_devmap(*pmd))
>  return ptl;
>  spin_unlock(ptl);
>  return NULL;
>   }
> 
> So, since pmd_leaf() will return true for 8MB hugepages, we are fine,
> because anyway we want to perform pagetable operations on *that* PMD and
> not the ptes that are cont-mapped, which is different for e.g: 512K
> hugepages, where we perform it on pte level.

We still want to do the operation on the cont-PTE, in fact in both 4M 
page tables so that we cover the 8M. There is no operation to do on the 
PMD entry itself except that telling it is a 8M page table underneath.

> 
> So I would suggest that instead of this patch, we have one implementing 
> pmd_leaf
> and pmd_leaf_size for 8Mb hugepages on power8xx, as that takes us closer to 
> our goal of
> unifying hugetlb.

But then, how will it work to go down the PTE road ?

Christophe

Re: [PATCH v5 1/3] powerpc/bitops: Use immediate operand when possible

2021-11-26 Thread LEROY Christophe

Hi Michael,

Any chance to get this series merged this cycle ?

Thanks
Christophe

Le 21/09/2021 à 17:09, Christophe Leroy a écrit :
> Today we get the following code generation for bitops like
> set or clear bit:
> 
>   c0009fe0:   39 40 08 00 li  r10,2048
>   c0009fe4:   7c e0 40 28 lwarx   r7,0,r8
>   c0009fe8:   7c e7 53 78 or  r7,r7,r10
>   c0009fec:   7c e0 41 2d stwcx.  r7,0,r8
> 
>   c000d568:   39 00 18 00 li  r8,6144
>   c000d56c:   7c c0 38 28 lwarx   r6,0,r7
>   c000d570:   7c c6 40 78 andcr6,r6,r8
>   c000d574:   7c c0 39 2d stwcx.  r6,0,r7
> 
> Most set bits are constant on lower 16 bits, so it can easily
> be replaced by the "immediate" version of the operation. Allow
> GCC to choose between the normal or immediate form.
> 
> For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
> when all bits to be cleared are consecutive.
> 
> On 64 bits we don't have any equivalent single operation for clearing,
> single bits or a few bits, we'd need two 'rldicl' so it is not
> worth it, the li/andc sequence is doing the same.
> 
> With this patch we get:
> 
>   c0009fe0:   7d 00 50 28 lwarx   r8,0,r10
>   c0009fe4:   61 08 08 00 ori r8,r8,2048
>   c0009fe8:   7d 00 51 2d stwcx.  r8,0,r10
> 
>   c000d558:   7c e0 40 28 lwarx   r7,0,r8
>   c000d55c:   54 e7 05 64 rlwinm  r7,r7,0,21,18
>   c000d560:   7c e0 41 2d stwcx.  r7,0,r8
> 
> On pmac32_defconfig, it reduces the text by approx 10 kbytes.
> 
> Signed-off-by: Christophe Leroy 
> Reviewed-by: Segher Boessenkool 
> ---
> v5: Fixed the argument of is_rlwinm_mask_valid() in test_and_clear_bits()
> 
> v4: Rebased
> 
> v3:
> - Using the mask validation proposed by Segher
> 
> v2:
> - Use "n" instead of "i" as constraint for the rlwinm mask
> - Improve mask verification to handle more than single bit masks
> 
> Signed-off-by: Christophe Leroy 
> ---
>   arch/powerpc/include/asm/bitops.h | 89 ---
>   1 file changed, 81 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bitops.h 
> b/arch/powerpc/include/asm/bitops.h
> index 11847b6a244e..a05d8c62cbea 100644
> --- a/arch/powerpc/include/asm/bitops.h
> +++ b/arch/powerpc/include/asm/bitops.h
> @@ -71,19 +71,61 @@ static inline void fn(unsigned long mask, \
>   __asm__ __volatile__ (  \
>   prefix  \
>   "1:"PPC_LLARX "%0,0,%3,0\n" \
> - stringify_in_c(op) "%0,%0,%2\n" \
> + #op "%I2 %0,%0,%2\n"\
>   PPC_STLCX "%0,0,%3\n"   \
>   "bne- 1b\n" \
>   : "=" (old), "+m" (*p)\
> - : "r" (mask), "r" (p)   \
> + : "rK" (mask), "r" (p)  \
>   : "cc", "memory");  \
>   }
>   
>   DEFINE_BITOP(set_bits, or, "")
> -DEFINE_BITOP(clear_bits, andc, "")
> -DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)
>   DEFINE_BITOP(change_bits, xor, "")
>   
> +static __always_inline bool is_rlwinm_mask_valid(unsigned long x)
> +{
> + if (!x)
> + return false;
> + if (x & 1)
> + x = ~x; // make the mask non-wrapping
> + x += x & -x;// adding the low set bit results in at most one bit set
> +
> + return !(x & (x - 1));
> +}
> +
> +#define DEFINE_CLROP(fn, prefix) \
> +static inline void fn(unsigned long mask, volatile unsigned long *_p)
> \
> +{\
> + unsigned long old;  \
> + unsigned long *p = (unsigned long *)_p; \
> + \
> + if (IS_ENABLED(CONFIG_PPC32) && \
> + __builtin_constant_p(mask) && is_rlwinm_mask_valid(~mask)) {\
> + asm volatile (  \
> + prefix  \
> + "1:""lwarx  %0,0,%3\n"  \
> + "rlwinm %0,%0,0,%2\n"   \
> + "stwcx. %0,0,%3\n"  \
> + "bne- 1b\n" \
> + : "=" (old), "+m" (*p)\
> + : "n" (~mask), "r" (p)  \
> + : "cc", "memory");  \
> + } else {\
> + asm volatile (  \
> + prefix  \
>

Re: [PATCH] recordmcount: Support empty section from recent binutils

2021-11-26 Thread LEROY Christophe



Le 24/11/2021 à 15:43, Christophe Leroy a écrit :
> Looks like recent binutils (2.36 and over ?) may empty some section,
> leading to failure like:
> 
>   Cannot find symbol for section 11: .text.unlikely.
>   kernel/kexec_file.o: failed
>   make[1]: *** [scripts/Makefile.build:287: kernel/kexec_file.o] Error 1
> 
> In order to avoid that, ensure that the section has a content before
> returning it's name in has_rel_mcount().

This patch doesn't work, on PPC32 I get the following message with this 
patch applied:

[0.00] ftrace: No functions to be traced?

Without the patch I get:

[0.00] ftrace: allocating 22381 entries in 66 pages
[0.00] ftrace: allocated 66 pages with 2 groups

Christophe

> 
> Suggested-by: Steven Rostedt 
> Link: https://github.com/linuxppc/issues/issues/388
> Link: https://lore.kernel.org/all/20210215162209.5e2a4...@gandalf.local.home/
> Signed-off-by: Christophe Leroy 
> ---
>   scripts/recordmcount.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
> index 1e9baa5c4fc6..cc6600b729ae 100644
> --- a/scripts/recordmcount.h
> +++ b/scripts/recordmcount.h
> @@ -575,6 +575,8 @@ static char const *has_rel_mcount(Elf_Shdr const *const 
> relhdr,
> char const *const shstrtab,
> char const *const fname)
>   {
> + if (!shdr0->sh_size)
> + return NULL;
>   if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
>   return NULL;
>   return __has_rel_mcount(relhdr, shdr0, shstrtab, fname);
>

Re: [PATCH] powerpc/signal32: Use struct_group() to zero spe regs

2021-11-19 Thread LEROY Christophe



Le 18/11/2021 à 21:36, Kees Cook a écrit :
> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memset(), avoid intentionally writing across
> neighboring fields.
> 
> Add a struct_group() for the spe registers so that memset() can correctly 
> reason
> about the size:
> 
> In function 'fortify_memset_chk',
> inlined from 'restore_user_regs.part.0' at 
> arch/powerpc/kernel/signal_32.c:539:3:
> >> include/linux/fortify-string.h:195:4: error: call to 
> '__write_overflow_field' declared with attribute warning: detected write 
> beyond size of field (1st parameter); maybe use struct_group()? 
> [-Werror=attribute-warning]
>   195 |__write_overflow_field();
>   |^~~~
> 
> Reported-by: kernel test robot 
> Signed-off-by: Kees Cook 

Reviewed-by: Christophe Leroy 

However, is it really worth adding that grouping ? Wouldn't it be 
cleaner to handle evr[] and acc separately ? Now that we are using 
unsafe variants of get/put user performance wouldn't be impacted.

I have some doubts about things like:

unsafe_copy_to_user(>mc_vregs, current->thread.evr,
ELF_NEVRREG * sizeof(u32), failed);

Because as far as I can see, ELF_NEVRREG is 34 but mc_vregs is a table 
of 33 u32 and is at the end of the structure:

struct mcontext {
elf_gregset_t   mc_gregs;
elf_fpregset_t  mc_fregs;
unsigned long   mc_pad[2];
elf_vrregset_t  mc_vregs __attribute__((__aligned__(16)));
};

typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];

# define ELF_NEVRREG34  /* includes acc (as 2) */
# define ELF_NVRREG 33  /* includes vscr */



> ---
>   arch/powerpc/include/asm/processor.h |  6 --
>   arch/powerpc/kernel/signal_32.c  | 14 +-
>   2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index e39bd0ff69f3..978a80308466 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -191,8 +191,10 @@ struct thread_struct {
>   int used_vsr;   /* set if process has used VSX */
>   #endif /* CONFIG_VSX */
>   #ifdef CONFIG_SPE
> - unsigned long   evr[32];/* upper 32-bits of SPE regs */
> - u64 acc;/* Accumulator */
> + struct_group(spe,
> + unsigned long   evr[32];/* upper 32-bits of SPE regs */
> + u64 acc;/* Accumulator */
> + );
>   unsigned long   spefscr;/* SPE & eFP status */
>   unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
>  call or trap return */
> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index 00a9c9cd6d42..5e1664b501e4 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -527,16 +527,20 @@ static long restore_user_regs(struct pt_regs *regs,
>   regs_set_return_msr(regs, regs->msr & ~(MSR_FP | MSR_FE0 | MSR_FE1));
>   
>   #ifdef CONFIG_SPE
> - /* force the process to reload the spe registers from
> -current->thread when it next does spe instructions */
> + /*
> +  * Force the process to reload the spe registers from
> +  * current->thread when it next does spe instructions.
> +  * Since this is user ABI, we must enforce the sizing.
> +  */
> + BUILD_BUG_ON(sizeof(current->thread.spe) != ELF_NEVRREG * sizeof(u32));
>   regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
>   if (msr & MSR_SPE) {
>   /* restore spe registers from the stack */
> - unsafe_copy_from_user(current->thread.evr, >mc_vregs,
> -   ELF_NEVRREG * sizeof(u32), failed);
> + unsafe_copy_from_user(>thread.spe, >mc_vregs,
> +   sizeof(current->thread.spe), failed);
>   current->thread.used_spe = true;
>   } else if (current->thread.used_spe)
> - memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
> + memset(>thread.spe, 0, sizeof(current->thread.spe));
>   
>   /* Always get SPEFSCR back */
>   unsafe_get_user(current->thread.spefscr, (u32 __user *)>mc_vregs + 
> ELF_NEVRREG, failed);
>

Re: [RESEND PATCH v5 0/4] Add perf interface to expose nvdimm

2021-11-16 Thread LEROY Christophe

Hi

Le 16/11/2021 à 05:49, Kajol Jain a écrit :
> Patchset adds performance stats reporting support for nvdimm.
> Added interface includes support for pmu register/unregister
> functions. A structure is added called nvdimm_pmu to be used for
> adding arch/platform specific data such as cpumask, nvdimm device
> pointer and pmu event functions like event_init/add/read/del.
> User could use the standard perf tool to access perf events
> exposed via pmu.
> 
> Interface also defines supported event list, config fields for the
> event attributes and their corresponding bit values which are exported
> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
> performance stats using this interface.

You resending your v5 series ? Is there any news since your submittion 
last month that's awaiting in patchwork here at 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=264422 ?

Christophe


> 
> Result from power9 pseries lpar with 2 nvdimm device:
> 
> Ex: List all event by perf list
> 
> command:# perf list nmem
> 
>nmem0/cache_rh_cnt/[Kernel PMU event]
>nmem0/cache_wh_cnt/[Kernel PMU event]
>nmem0/cri_res_util/[Kernel PMU event]
>nmem0/ctl_res_cnt/ [Kernel PMU event]
>nmem0/ctl_res_tm/  [Kernel PMU event]
>nmem0/fast_w_cnt/  [Kernel PMU event]
>nmem0/host_l_cnt/  [Kernel PMU event]
>nmem0/host_l_dur/  [Kernel PMU event]
>nmem0/host_s_cnt/  [Kernel PMU event]
>nmem0/host_s_dur/  [Kernel PMU event]
>nmem0/med_r_cnt/   [Kernel PMU event]
>nmem0/med_r_dur/   [Kernel PMU event]
>nmem0/med_w_cnt/   [Kernel PMU event]
>nmem0/med_w_dur/   [Kernel PMU event]
>nmem0/mem_life/[Kernel PMU event]
>nmem0/poweron_secs/[Kernel PMU event]
>...
>nmem1/mem_life/[Kernel PMU event]
>nmem1/poweron_secs/[Kernel PMU event]
> 
> Patch1:
>  Introduces the nvdimm_pmu structure
> Patch2:
>  Adds common interface to add arch/platform specific data
>  includes nvdimm device pointer, pmu data along with
>  pmu event functions. It also defines supported event list
>  and adds attribute groups for format, events and cpumask.
>  It also adds code for cpu hotplug support.
> Patch3:
>  Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>  nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
>  capabilities, cpumask and event functions and then registers
>  the pmu by adding callbacks to register_nvdimm_pmu.
> Patch4:
>  Sysfs documentation patch
> 
> Changelog
> ---
> v4 -> v5:
> - Remove multiple variables defined in nvdimm_pmu structure include
>name and pmu functions(event_int/add/del/read) as they are just
>used to copy them again in pmu variable. Now we are directly doing
>this step in arch specific code as suggested by Dan Williams.
> 
> - Remove attribute group field from nvdimm pmu structure and
>defined these attribute groups in common interface which
>includes format, event list along with cpumask as suggested by
>Dan Williams.
>Since we added static defination for attrbute groups needed in
>common interface, removes corresponding code from papr.
> 
> - Add nvdimm pmu event list with event codes in the common interface.
> 
> - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
>to handle review comments from Dan.
> 
> - Make nvdimm_pmu_free_hotplug_memory function static as reported
>by kernel test robot, also add corresponding Reported-by tag.
> 
> - Link to the patchset v4: https://lkml.org/lkml/2021/9/3/45
> 
> v3 -> v4
> - Rebase code on top of current papr_scm code without any logical
>changes.
> 
> - Added Acked-by tag from Peter Zijlstra and Reviewed by tag
>from Madhavan Srinivasan.
> 
> - Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
> 
> v2 -> v3
> - Added Tested-by tag.
> 
> - Fix nvdimm mailing list in the ABI Documentation.
> 
> - Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
> 
> v1 -> v2
> - Fix hotplug code by adding pmu migration call
>incase current designated cpu got offline. As
>pointed by Peter Zijlstra.
> 
> - Removed the retun -1 part from cpu hotplug offline
>function.
> 
> - Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
> 
> Kajol Jain (4):
>drivers/nvdimm: Add nvdimm pmu structure
>drivers/nvdimm: Add perf interface

Re: [PATCH v1 1/1] soc: fsl: Replace kernel.h with the necessary inclusions

2021-10-29 Thread LEROY Christophe



Le 29/10/2021 à 17:55, Andy Shevchenko a écrit :
> On Wed, Oct 27, 2021 at 06:33:54PM +0300, Andy Shevchenko wrote:
>> When kernel.h is used in the headers it adds a lot into dependency hell,
>> especially when there are circular dependencies are involved.
>>
>> Replace kernel.h inclusion with the list of what is really being used.
> 
> Seems nobody from PPC took this patch.
> Any idea who can take it?
> 

You have to check in MAINTAINERS file in the root directory of kernel 
sources: https://github.com/linuxppc/linux/blob/master/MAINTAINERS

That's Michael who takes them. But you have to allow him enough time for it.

Christophe

Re: [PATCH] powerpc/64s: Default to 64K pages for 64 bit book3s

2021-10-14 Thread LEROY Christophe



Le 14/10/2021 à 01:31, Joel Stanley a écrit :
> For 64-bit book3s the default should be 64K as that's what modern CPUs
> are designed for.
> 
> The following defconfigs already set CONFIG_PPC_64K_PAGES:
> 
>   cell_defconfig
>   pasemi_defconfig
>   powernv_defconfig
>   ppc64_defconfig
>   pseries_defconfig
>   skiroot_defconfig
> 
> The have the option removed from the defconfig, as it is now the
> default.
> 
> The defconfigs that now need to set CONFIG_PPC_4K_PAGES to maintain
> their existing behaviour are:
> 
>   g5_defconfig
>   maple_defconfig
>   microwatt_defconfig
>   ps3_defconfig
> 
> Link: https://github.com/linuxppc/issues/issues/109
> Signed-off-by: Joel Stanley 
> ---
>   arch/powerpc/Kconfig | 1 +
>   arch/powerpc/configs/cell_defconfig  | 1 -
>   arch/powerpc/configs/g5_defconfig| 1 +
>   arch/powerpc/configs/maple_defconfig | 1 +
>   arch/powerpc/configs/microwatt_defconfig | 2 +-
>   arch/powerpc/configs/pasemi_defconfig| 1 -
>   arch/powerpc/configs/powernv_defconfig   | 1 -
>   arch/powerpc/configs/ppc64_defconfig | 1 -
>   arch/powerpc/configs/ps3_defconfig   | 1 +
>   arch/powerpc/configs/pseries_defconfig   | 1 -
>   arch/powerpc/configs/skiroot_defconfig   | 1 -
>   11 files changed, 5 insertions(+), 7 deletions(-)
> 

> diff --git a/arch/powerpc/configs/microwatt_defconfig 
> b/arch/powerpc/configs/microwatt_defconfig
> index 9465209b8c5b..556ec5eec684 100644
> --- a/arch/powerpc/configs/microwatt_defconfig
> +++ b/arch/powerpc/configs/microwatt_defconfig
> @@ -1,7 +1,6 @@
>   # CONFIG_SWAP is not set
>   # CONFIG_CROSS_MEMORY_ATTACH is not set
>   CONFIG_HIGH_RES_TIMERS=y
> -CONFIG_PREEMPT_VOLUNTARY=y

This seems unrelated.

>   CONFIG_TICK_CPU_ACCOUNTING=y
>   CONFIG_LOG_BUF_SHIFT=16
>   CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
> @@ -26,6 +25,7 @@ CONFIG_PPC_MICROWATT=y
>   # CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set
>   CONFIG_CPU_FREQ=y
>   CONFIG_HZ_100=y
> +CONFIG_PPC_4K_PAGES=y
>   # CONFIG_PPC_MEM_KEYS is not set
>   # CONFIG_SECCOMP is not set
>   # CONFIG_MQ_IOSCHED_KYBER is not set

Re: [PATCH] powerpc: don't select KFENCE on platform PPC_FSL_BOOK3E

2021-10-11 Thread LEROY Christophe



Le 12/10/2021 à 03:43, Liu Shixin a écrit :
> kindly ping.

Hi

Based on the discussion we had, this patch is not enough. It should at 
least also de-activate DEBUG_PAGEALLOC,

However I'm looking at fixing it the other way round. Give me one week 
or two.

Christophe

> 
> 
> On 2021/9/24 14:39, Liu Shixin wrote:
>> On platform PPC_FSL_BOOK3E, all lowmem is managed by tlbcam. That means
>> we didn't really map the kfence pool with page granularity. Therefore,
>> if KFENCE is enabled, the system will hit the following panic:
>>
>>  BUG: Kernel NULL pointer dereference on read at 0x
>>  Faulting instruction address: 0xc01de598
>>  Oops: Kernel access of bad area, sig: 11 [#1]
>>  BE PAGE_SIZE=4K SMP NR_CPUS=4 MPC8544 DS
>>  Dumping ftrace buffer:
>> (ftrace buffer empty)
>>  Modules linked in:
>>  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3+ #298
>>  NIP:  c01de598 LR: c08ae9c4 CTR: 
>>  REGS: c0b4bea0 TRAP: 0300   Not tainted  (5.12.0-rc3+)
>>  MSR:  00021000   CR: 24000228  XER: 2000
>>  DEAR:  ESR: 
>>  GPR00: c08ae9c4 c0b4bf60 c0ad64e0 ef72 00021000   
>> 0200
>>  GPR08: c0ad5000   0004  008fbb30  
>> 
>>  GPR16:     c000   
>> 
>>  GPR24: c08ca004 c08ca004 c0b6a0e0 c0b6 c0b58f00 c085 c08ca000 
>> ef72
>>  NIP [c01de598] kfence_protect+0x44/0x6c
>>  LR [c08ae9c4] kfence_init+0xfc/0x2a4
>>  Call Trace:
>>  [c0b4bf60] [efffe160] 0xefffe160 (unreliable)
>>  [c0b4bf70] [c08ae9c4] kfence_init+0xfc/0x2a4
>>  [c0b4bfb0] [c0894d3c] start_kernel+0x3bc/0x574
>>  [c0b4bff0] [c470] set_ivor+0x14c/0x188
>>  Instruction dump:
>>  7c0802a6 8109d594 546a653a 90010014 54630026 3920 7d48502e 2c0a
>>  41820010 554a0026 5469b53a 7d295214 <8149> 38831000 554a003c 
>> 9149
>>  random: get_random_bytes called from print_oops_end_marker+0x40/0x78 
>> with crng_init=0
>>  ---[ end trace  ]---
>>
>> Signed-off-by: Liu Shixin 
>> ---
>>   arch/powerpc/Kconfig | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index d46db0bfb998..cffd57bcb5e4 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -185,7 +185,7 @@ config PPC
>>  select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
>>  select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
>>  select HAVE_ARCH_KGDB
>> -select HAVE_ARCH_KFENCE if PPC32
>> +select HAVE_ARCH_KFENCE if PPC32 && !PPC_FSL_BOOK3E
>>  select HAVE_ARCH_MMAP_RND_BITS
>>  select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
>>  select HAVE_ARCH_NVRAM_OPS
>

Re: [PATCH v2 02/10] powerpc/bpf: Validate branch ranges

2021-10-05 Thread LEROY Christophe



Le 05/10/2021 à 22:25, Naveen N. Rao a écrit :
> Add checks to ensure that we never emit branch instructions with
> truncated branch offsets.
> 
> Acked-by: Song Liu 
> Acked-by: Johan Almbladh 
> Tested-by: Johan Almbladh 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Naveen N. Rao 

Reviewed-by: Christophe Leroy 

> ---
>   arch/powerpc/net/bpf_jit.h| 26 --
>   arch/powerpc/net/bpf_jit_comp.c   |  6 +-
>   arch/powerpc/net/bpf_jit_comp32.c |  8 ++--
>   arch/powerpc/net/bpf_jit_comp64.c |  8 ++--
>   4 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 935ea95b66359e..7e9b978b768ed9 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -24,16 +24,30 @@
>   #define EMIT(instr) PLANT_INSTR(image, ctx->idx, instr)
>   
>   /* Long jump; (unconditional 'branch') */
> -#define PPC_JMP(dest)EMIT(PPC_INST_BRANCH |  
>   \
> -  (((dest) - (ctx->idx * 4)) & 0x03fc))
> +#define PPC_JMP(dest)
>   \
> + do {  \
> + long offset = (long)(dest) - (ctx->idx * 4);  \
> + if (!is_offset_in_branch_range(offset)) { \
> + pr_err_ratelimited("Branch offset 0x%lx (@%u) out of 
> range\n", offset, ctx->idx);   \
> + return -ERANGE;   \
> + } \
> + EMIT(PPC_INST_BRANCH | (offset & 0x03fc));\
> + } while (0)
> +
>   /* blr; (unconditional 'branch' with link) to absolute address */
>   #define PPC_BL_ABS(dest)EMIT(PPC_INST_BL |\
>(((dest) - (unsigned long)(image + 
> ctx->idx)) & 0x03fc))
>   /* "cond" here covers BO:BI fields. */
> -#define PPC_BCC_SHORT(cond, dest)EMIT(PPC_INST_BRANCH_COND |   \
> -  (((cond) & 0x3ff) << 16) |   \
> -  (((dest) - (ctx->idx * 4)) & \
> -   0xfffc))
> +#define PPC_BCC_SHORT(cond, dest)  \
> + do {  \
> + long offset = (long)(dest) - (ctx->idx * 4);  \
> + if (!is_offset_in_cond_branch_range(offset)) {\
> + pr_err_ratelimited("Conditional branch offset 0x%lx 
> (@%u) out of range\n", offset, ctx->idx);   \
> + return -ERANGE;   \
> + } \
> + EMIT(PPC_INST_BRANCH_COND | (((cond) & 0x3ff) << 16) | (offset 
> & 0xfffc));  \
> + } while (0)
> +
>   /* Sign-extended 32-bit immediate load */
>   #define PPC_LI32(d, i)  do {
>   \
>   if ((int)(uintptr_t)(i) >= -32768 &&  \
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 53aefee3fe70be..fcbf7a917c566e 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -210,7 +210,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   /* Now build the prologue, body code & epilogue for real. */
>   cgctx.idx = 0;
>   bpf_jit_build_prologue(code_base, );
> - bpf_jit_build_body(fp, code_base, , addrs, extra_pass);
> + if (bpf_jit_build_body(fp, code_base, , addrs, 
> extra_pass)) {
> + bpf_jit_binary_free(bpf_hdr);
> + fp = org_fp;
> + goto out_addrs;
> + }
>   bpf_jit_build_epilogue(code_base, );
>   
>   if (bpf_jit_enable > 1)
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index beb12cbc8c2994..a74d52204f8da2 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -200,7 +200,7 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
> codegen_context *ctx, u64 fun
>   }
>   }
>   
> -static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, 
> u32 out)
> +static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, 
> u32 out)
>   {
>   /*
>* By now, the eBPF program has already setup parameters in r3-r6
> @@ -261,7 +261,9 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
> codegen_context *ctx, u32
>

Re: [PATCH v4 6/8] bpf ppc64: Access only if addr is kernel address

2021-09-29 Thread LEROY Christophe



Le 29/09/2021 à 13:18, Hari Bathini a écrit :
> From: Ravi Bangoria 
> 
> On PPC64 with KUAP enabled, any kernel code which wants to
> access userspace needs to be surrounded by disable-enable KUAP.
> But that is not happening for BPF_PROBE_MEM load instruction.
> So, when BPF program tries to access invalid userspace address,
> page-fault handler considers it as bad KUAP fault:
> 
>Kernel attempted to read user page (d000) - exploit attempt? (uid: 0)
> 
> Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
> mode) could either be a valid kernel pointer or NULL but should
> never be a pointer to userspace address, execute BPF_PROBE_MEM load
> only if addr is kernel address, otherwise set dst_reg=0 and move on.
> 
> This will catch NULL, valid or invalid userspace pointers. Only bad
> kernel pointer will be handled by BPF exception table.
> 
> [Alexei suggested for x86]
> Suggested-by: Alexei Starovoitov 
> Signed-off-by: Ravi Bangoria 
> Signed-off-by: Hari Bathini 

Reviewed-by: Christophe Leroy 

> ---
> 
> Changes in v4:
> * Used IS_ENABLED() instead of #ifdef.
> * Dropped the else case that is not applicable for PPC64.
> 
> 
>   arch/powerpc/net/bpf_jit_comp64.c | 26 ++
>   1 file changed, 26 insertions(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 4170999371ee..e1ea64081ae1 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -727,6 +727,32 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> struct codegen_context *
>   /* dst = *(u64 *)(ul) (src + off) */
>   case BPF_LDX | BPF_MEM | BPF_DW:
>   case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
> + /*
> +  * As PTR_TO_BTF_ID that uses BPF_PROBE_MEM mode could 
> either be a valid
> +  * kernel pointer or NULL but not a userspace address, 
> execute BPF_PROBE_MEM
> +  * load only if addr is kernel address (see 
> is_kernel_addr()), otherwise
> +  * set dst_reg=0 and move on.
> +  */
> + if (BPF_MODE(code) == BPF_PROBE_MEM) {
> + EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, 
> off));
> + if (IS_ENABLED(CONFIG_PPC_BOOK3E_64))
> + PPC_LI64(b2p[TMP_REG_2], 
> 0x8000ul);
> + else /* BOOK3S_64 */
> + PPC_LI64(b2p[TMP_REG_2], PAGE_OFFSET);
> + EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
> b2p[TMP_REG_2]));
> + PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
> + EMIT(PPC_RAW_LI(dst_reg, 0));
> + /*
> +  * Check if 'off' is word aligned because 
> PPC_BPF_LL()
> +  * (BPF_DW case) generates two instructions if 
> 'off' is not
> +  * word-aligned and one instruction otherwise.
> +  */
> + if (BPF_SIZE(code) == BPF_DW && (off & 3))
> + PPC_JMP((ctx->idx + 3) * 4);
> + else
> + PPC_JMP((ctx->idx + 2) * 4);
> + }
> +
>   switch (size) {
>   case BPF_B:
>   EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
>

Re: [PATCH v2 4/8] powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro

2021-09-17 Thread LEROY Christophe



Le 17/09/2021 à 17:30, Hari Bathini a écrit :
> Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
> macro is used while adding BPF_PROBE_MEM support.
> 
> Signed-off-by: Hari Bathini 

Reviewed-by: Christophe Leroy 

> ---
> 
> Changes in v2:
> * New patch to introduce PPC_RAW_BRANCH() macro.
> 
> 
>   arch/powerpc/include/asm/ppc-opcode.h | 2 ++
>   arch/powerpc/net/bpf_jit.h| 4 ++--
>   2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
> b/arch/powerpc/include/asm/ppc-opcode.h
> index baea657bc868..f50213e2a3e0 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -566,6 +566,8 @@
>   #define PPC_RAW_MTSPR(spr, d)   (0x7c0003a6 | ___PPC_RS(d) | 
> __PPC_SPR(spr))
>   #define PPC_RAW_EIEIO() (0x7c0006ac)
>   
> +#define PPC_RAW_BRANCH(addr) (PPC_INST_BRANCH | ((addr) & 
> 0x03fc))
> +
>   /* Deal with instructions that older assemblers aren't aware of */
>   #define PPC_BCCTR_FLUSH stringify_in_c(.long 
> PPC_INST_BCCTR_FLUSH)
>   #define PPC_CP_ABORTstringify_in_c(.long PPC_RAW_CP_ABORT)
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 411c63d945c7..0c8f885b8f48 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -24,8 +24,8 @@
>   #define EMIT(instr) PLANT_INSTR(image, ctx->idx, instr)
>   
>   /* Long jump; (unconditional 'branch') */
> -#define PPC_JMP(dest)EMIT(PPC_INST_BRANCH |  
>   \
> -  (((dest) - (ctx->idx * 4)) & 0x03fc))
> +#define PPC_JMP(dest)EMIT(PPC_RAW_BRANCH((dest) - (ctx->idx 
> * 4)))
> +
>   /* blr; (unconditional 'branch' with link) to absolute address */
>   #define PPC_BL_ABS(dest)EMIT(PPC_INST_BL |\
>(((dest) - (unsigned long)(image + 
> ctx->idx)) & 0x03fc))
>

RE: [PATCH v3] ftrace: Cleanup ftrace_dyn_arch_init()

2021-09-07 Thread LEROY Christophe




> -Message d'origine-
> De : Linuxppc-dev  bounces+christophe.leroy=csgroup...@lists.ozlabs.org> De la part de Weizhao
> Ouyang
>
> Most of ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
> ftrace_dyn_arch_init() to cleanup them.
>
> Signed-off-by: Weizhao Ouyang 
> Acked-by: Heiko Carstens  (s390)
> Acked-by: Helge Deller  (parisc)
>
> ---
> Changes in v3:
> -- fix unrecognized opcode on PowerPC
>
> Changes in v2:
> -- correct CONFIG_DYNAMIC_FTRACE on PowerPC
> -- add Acked-by tag
>
> ---
>  arch/arm/kernel/ftrace.c  | 5 -
>  arch/arm64/kernel/ftrace.c| 5 -
>  arch/csky/kernel/ftrace.c | 5 -
>  arch/ia64/kernel/ftrace.c | 6 --
>  arch/microblaze/kernel/ftrace.c   | 5 -
>  arch/mips/include/asm/ftrace.h| 2 ++
>  arch/nds32/kernel/ftrace.c| 5 -
>  arch/parisc/kernel/ftrace.c   | 5 -
>  arch/powerpc/include/asm/ftrace.h | 4 
>  arch/riscv/kernel/ftrace.c| 5 -
>  arch/s390/kernel/ftrace.c | 5 -
>  arch/sh/kernel/ftrace.c   | 5 -
>  arch/sparc/kernel/ftrace.c| 5 -
>  arch/x86/kernel/ftrace.c  | 5 -
>  include/linux/ftrace.h| 1 -
>  kernel/trace/ftrace.c | 5 +
>  16 files changed, 11 insertions(+), 62 deletions(-)
>
> diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
> index b463f2aa5a61..ed013e767390 100644
> --- a/arch/mips/include/asm/ftrace.h
> +++ b/arch/mips/include/asm/ftrace.h
> @@ -76,6 +76,8 @@ do {\
>
>
>  #ifdef CONFIG_DYNAMIC_FTRACE
> +int __init ftrace_dyn_arch_init(void);
> +

Why ?


>  static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  {
>   return addr;
> diff --git a/arch/powerpc/include/asm/ftrace.h
> b/arch/powerpc/include/asm/ftrace.h
> index debe8c4f7062..b05c43f13a4d 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -126,6 +126,10 @@ static inline void this_cpu_enable_ftrace(void) { }
>  static inline void this_cpu_set_ftrace_enabled(u8 ftrace_enabled) { }
>  static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
>  #endif /* CONFIG_PPC64 */
> +
> +#ifdef CONFIG_DYNAMIC_FTRACE
> +int __init ftrace_dyn_arch_init(void);
> +#endif /* CONFIG_DYNAMIC_FTRACE */

Why ?

>  #endif /* !__ASSEMBLY__ */
>
>  #endif /* _ASM_POWERPC_FTRACE */
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 832e65f06754..f1eca123d89d 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -573,7 +573,6 @@ ftrace_set_early_filter(struct ftrace_ops *ops, char
> *buf, int enable);
>
>  /* defined in arch */
>  extern int ftrace_ip_converted(unsigned long ip);
> -extern int ftrace_dyn_arch_init(void);

Why removing that ?

Have you tried to build kernel/trace/ftrace.o with C=2 ? It will likely tell 
you that the function is not declared and that it should be static

We could eventually consider that in the past, this generic declaration was 
unrelevant because the definitions where in the arch specific sections.
Now that you are implementing a generic weak version of this function, it would 
make sense to have a generic declaration as well.

I really don't see the point in duplicating the declaration of the function in 
the arch specific headers.

>  extern void ftrace_replace_code(int enable);
>  extern int ftrace_update_ftrace_func(ftrace_func_t func);
>  extern void ftrace_caller(void);

Christophe

CS Group - Document Interne

Re: [PATCH v3 31/41] powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE

2021-08-02 Thread LEROY Christophe





Le 01/08/2021 à 03:21, Finn Thain a écrit :

On Sat, 31 Jul 2021, Christophe Leroy wrote:



Stan Johnson contacted me about a regression in mainline that he
observed on his G3 Powerbooks. Using 'git bisect' we determined that
this patch was the cause of the regression, i.e. commit 4c0104a83fc3
("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE").

When testing 4c0104a83fc and all subsequent builds, various user
processes were liable to segfault. Here is the console log that Stan
provided:


Hi, i will be able to look at that more in details next week, however I
have a few preliminary qurstions.

Can you reliabily reproduce the problem with the said commit, and can
you reliabily run without problem with the parent commit ?


Yes and yes. (I already asked Stan to establish those things before I
contacted the list.)


I think I found the problem with that commit. Can you retry with the 
following change:


diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S

index 0a3d7d4a9ec4..a294103a91a1 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -299,7 +299,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HPTE_TABLE)
EXCEPTION_PROLOG_1
EXCEPTION_PROLOG_2 0x300 DataAccess handle_dar_dsisr=1
prepare_transfer_to_handler
-   lwz r5, _DSISR(r11)
+   lwz r5, _DSISR(r1)
andis.  r0, r5, DSISR_DABRMATCH@h
bne-1f
bl  do_page_fault
---
Thanks
Christophe

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread LEROY Christophe


Aaro Koskinen  a écrit :


Hi,

On Wed, May 22, 2019 at 07:44:56AM +, Christophe Leroy wrote:

On 05/22/2019 06:14 AM, Christophe Leroy wrote:
>Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :
>>I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when
>>trying
>>to kexec a kernel the system gets stuck (no errors seen on the console).
>
>Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or
>do you have a working v5.1 kernel, but kexec doesn't work with it ?
>
>>
>>Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
>>in a SPRG"). This commit doesn't revert cleanly anymore but I tested
>>that the one before works OK.
>
>Not sure that's the problem. There was a problem with that commit, but it
>was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of
>SPRN_SPRG_PGDIR for hash32").
>You probably hit some commit between those two during bisect, that's
>likely the reason why you ended here.
>
>Can you restart your bisect from 4622a2d43101 ?
>
>If you have CONFIG_SMP, maybe you should also consider taking 397d2300b08c
>("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 5.1.4 includes it.
>
>>
>>With current Linus HEAD (9c7db5004280), it gets a bit further but still
>>doesn't work: now I get an error on the console after kexec "Starting
>>new kernel! ... Bye!":
>>
>>kernel tried to execute exec-protected page (...) - exploit attempt?
>
>Interesting.
>
>Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you
>retry without it ?

After looking at the code, I don't thing CONFIG_STRICT_KERNEL_RWX will make
any difference. Can you try the patch below ?


Doesn't help (git refuses the patch as corrupted, so I had to do those
changes manually, but I'm pretty sure I got it right).

I still get the "kernel tried to execute exec-protected page...". What
should I try next?


Can you provide full details of the Oops you get ? And also your System.map ?
K
Also build with CONFIG_PPC_PTDUMP and mount /sys/kernel/debug and give  
content of /sys/kernel/debug/powerpc/block_address_translation and  
.../segment_registers before the failing kexec, and also  
/sys/kernel/debug/kernel_page_tables


Thx
Christophe



A.


From 8c1039da0d0f26cdf995156a905fc97fe7bda36c Mon Sep 17 00:00:00 2001
From: Christophe Leroy 
Date: Wed, 22 May 2019 07:28:42 +
Subject: [PATCH] Fix Kexec

---
 arch/powerpc/include/asm/pgtable.h | 2 ++
 arch/powerpc/kernel/machine_kexec_32.c | 4 
 arch/powerpc/mm/pgtable_32.c   | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable.h
b/arch/powerpc/include/asm/pgtable.h
index 3f53be60fb01..642eea937229 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -140,6 +140,8 @@ static inline void pte_frag_set(mm_context_t *ctx, void
*p)
 }
 #endif

+int change_page_attr(struct page *page, int numpages, pgprot_t prot);
+
 #endif /* __ASSEMBLY__ */

 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/kernel/machine_kexec_32.c
b/arch/powerpc/kernel/machine_kexec_32.c
index affe5dcce7f4..4f719501e6ae 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -54,6 +54,10 @@ void default_machine_kexec(struct kimage *image)
memcpy((void *)reboot_code_buffer, relocate_new_kernel,
relocate_new_kernel_size);

+   change_page_attr(image->control_code_page,
+ALIGN(KEXEC_CONTROL_PAGE_SIZE, PAGE_SIZE) >> 
PAGE_SHIFT,
+PAGE_KERNEL_TEXT);
+
flush_icache_range(reboot_code_buffer,
reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
printk(KERN_INFO "Bye!\n");
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 16ada373b32b..0e4651d803fc 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -340,7 +340,7 @@ static int __change_page_attr_noflush(struct page *page,
pgprot_t prot)
  *
  * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
  */
-static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
int i, err = 0;
unsigned long flags;
--
2.13.3

Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-03-23 Thread LEROY Christophe


Masahiro Yamada  a écrit :


Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.

The idea is obviously arch-agnostic although we need some code fixups.
This commit moves the config entry from arch/x86/Kconfig.debug to
lib/Kconfig.debug so that all architectures (except MIPS for now) can
benefit from it.

At this moment, I added "depends on !MIPS" because fixing 0day bot reports
for MIPS was complex to me.

I tested this patch on my arm/arm64 boards.

This can make a huge difference in kernel image size especially when
CONFIG_OPTIMIZE_FOR_SIZE is enabled.

For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.

  dec   file
  18983424  arch/arm64/boot/Image.before
  18321920  arch/arm64/boot/Image.after

This also slightly improves the "Kernel hacking" Kconfig menu.
Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
mentioned this config option would be a good fit in the "compiler option"
menu. I did so.

I fixed up some files to avoid build warnings/errors.

[1] arch/arm64/include/asm/cpufeature.h

In file included from ././include/linux/compiler_types.h:68,
 from :
./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
./include/linux/compiler-gcc.h:120:38: warning: asm operand 0  
probably doesn't match constraints

 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
  ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of  
macro 'asm_volatile_goto'

  asm_volatile_goto(
  ^
./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
  ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of  
macro 'asm_volatile_goto'

  asm_volatile_goto(
  ^

[2] arch/mips/kernel/cpu-bugs64.c

arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably  
doesn't match constraints [-Werror]

  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably  
doesn't match constraints [-Werror]

  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~

[3] arch/powerpc/mm/tlb-radix.c

arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably  
doesn't match constraints [-Werror]

  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
  ^~~
arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
  CC  arch/powerpc/perf/hv-gpci.o

[4] arch/s390/include/asm/cpacf.h

In file included from arch/s390/crypto/des_s390.c:19:
./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3  
probably doesn't match constraints

  asm volatile(
  ^~~
./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'

[5] arch/powerpc/kernel/prom_init.c

WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in  
reference from the function .prom_getprop() to the function  
.init.text:.call_prom()

The function .prom_getprop() references
the function __init .call_prom().
This is often because .prom_getprop lacks a __init
annotation or the annotation of .call_prom is wrong.

WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in  
reference from the function .prom_getproplen() to the function  
.init.text:.call_prom()

The function .prom_getproplen() references
the function __init .call_prom().
This is often because .prom_getproplen lacks a __init
annotation or the annotation of .call_prom is wrong.

[6] drivers/mtd/nand/raw/vf610_nfc.c

drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: ‘offset’ may be  
used uninitialized in this function [-Wmaybe-uninitialized]

   vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset,
   ^~~
nfc->regs + NFC_MAIN_AREA(0) + offset,
~~
trfr_sz, !nfc->data_access);
~~~

[7] arch/arm/kernel/smp.c

arch/arm/kernel/smp.c: In function ‘raise_nmi’:
arch/arm/kernel/smp.c:522:2: warning: array subscript is above array  
bounds [-Warray-bounds]

  trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
  ^

The fixup is not included in this. The patch is available in ML:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-February/409393.html

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/include/asm/cpufeature.h |  4 ++--

Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-03-23 Thread LEROY Christophe


Arnd Bergmann  a écrit :


On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann  wrote:


I've added your patch to my randconfig test setup and will let you
know if I see anything noticeable. I'm currently testing clang-arm32,
clang-arm64 and gcc-x86.


This is the only additional bug that has come up so far:

`.exit.text' referenced in section `.alt.smp.init' of
drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
`exit.text' of drivers/char/ipmi/ipmi_msghandler.o


Wouldn't it be useful to activate -Winline gcc warning to ease finding  
out problematic usage of inline keyword ?


Christophe



diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
index 201100226301..84b12e33104d 100644
--- a/arch/arm/kernel/atags.h
+++ b/arch/arm/kernel/atags.h
@@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
 const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
unsigned int machine_nr);
 #else
-static inline const struct machine_desc *
+static __always_inline const struct machine_desc *
 setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
 {
early_print("no ATAGS support: can't continue\n");

Re: powerpc32 boot crash in 5.1-rc1

2019-03-22 Thread LEROY Christophe


Meelis Roos  a écrit :

While 5.0.0 worked fine on my PowerMac G4, 5.0 + git (unknown rev  
as of Mar 13), 5.0.0-11520-gf261c4e and todays git all fail to boot.


The problem seems to be in page fault handler in load_elf_binary()  
of init process.


The patch at https://patchwork.ozlabs.org/patch/1053385/ should fix it


Tested it - yes, it fixes the boot.


Thanks for testing. It will be merged in 5.1-rc2

Christophe



--
Meelis Roos

Re: powerpc32 boot crash in 5.1-rc1

2019-03-21 Thread LEROY Christophe


Meelis Roos  a écrit :

While 5.0.0 worked fine on my PowerMac G4, 5.0 + git (unknown rev as  
of Mar 13), 5.0.0-11520-gf261c4e and todays git all fail to boot.


The problem seems to be in page fault handler in load_elf_binary()  
of init process.


The patch at https://patchwork.ozlabs.org/patch/1053385/ should fix it

Christophe



Two different screenshots are attached (let's see if they come through).

--
Meelis Roos

Re: [PATCH v13 00/10] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2019-01-19 Thread LEROY Christophe


Michael Ellerman  a écrit :


Christophe Leroy  writes:


The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since v12:
 - Patch 1: Taken comment from Mike (re-introduced the 'panic' in  
case memblock allocation fails in setup_64.c
 - Patch 1: Added alloc_stack() function in setup_32.c to also  
panic in case of allocation failure.


Hi Christophe,

I can't get this series to boot on qemu mac99. I'm getting eg:


Problem new with version 13 or it is the first time you test ?



[0.981514] NFS: Registering the id_resolver key type
[0.981752] Key type id_resolver registered
[0.981868] Key type id_legacy registered
[0.995711] Unrecoverable exception 0 at 0 (msr=0)
[0.996091] Oops: Unrecoverable exception, sig: 6 [#1]
[0.996314] BE PAGE_SIZE=4K MMU=Hash PowerMac
[0.996617] Modules linked in:
[0.996869] CPU: 0 PID: 416 Comm: modprobe Not tainted  
5.0.0-rc2-gcc-7.3.0-00043-g53f2de798792 #342


Comm:modprobe  ==> Something wrong with modules ? I never tested with  
CONFIG_MODULES.


Christophe


[0.997138] NIP:   LR:  CTR: 
[0.997309] REGS: ef237f50 TRAP:    Not tainted   
(5.0.0-rc2-gcc-7.3.0-00043-g53f2de798792)

[0.997508] MSR:   <>  CR:   XER: 
[0.997712]
[0.997712] GPR00:  ef238000     
  
[0.997712] GPR08:       
 c006477c ef13d8c0
[0.997712] GPR16:       
  
[0.997712] GPR24:       
  

[0.998671] NIP []   (null)
[0.998774] LR []   (null)
[0.998895] Call Trace:
[0.999030] Instruction dump:
[0.999320]        
 
[0.999546]     6000   
 

[1.23] ---[ end trace 925ea3419844fe68 ]---

I haven't had time to dig any further.

cheers

Re: [PATCH] Remove 'type' argument from access_ok() function

2019-01-04 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


In commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with
access_ok()") an attempt was made to remove a warning by referencing the
variable `type`, however in commit 96d4f267e40f ("Remove 'type' argument
from access_ok() function") the variable `type` has been removed.

Revert commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with
access_ok()") to fix the following compilation error:

  arch/powerpc/include/asm/uaccess.h:66:32: error: ‘type’ undeclared  
(first use in this function)


Signed-off-by: Mathieu Malaterre 


Should you add a Fixes:  96d4f267e40f ?

Note that sparc arch will have the same issue.

Christophe


---
 arch/powerpc/include/asm/uaccess.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h  
b/arch/powerpc/include/asm/uaccess.h

index b31bf45eebd4..5f0c98e511a0 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -63,7 +63,7 @@ static inline int __access_ok(unsigned long addr,  
unsigned long size,

 #endif

 #define access_ok(addr, size)  \
-   (__chk_user_ptr(addr), (void)(type),\
+   (__chk_user_ptr(addr),  \
 __access_ok((__force unsigned long)(addr), (size), get_fs()))

 /*
--
2.19.2

Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits

2019-01-03 Thread LEROY Christophe


Breno Leitao  a écrit :


This patch simply adds definitions for the MSR bits and some macros to
test for MSR TM bits.

This was copied from arch/powerpc/include/asm/reg.h generic MSR part.


Can't we find a way to avoid duplicating such defines ?

Christophe



Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/include/reg.h | 45 +++
 1 file changed, 45 insertions(+)

diff --git a/tools/testing/selftests/powerpc/include/reg.h  
b/tools/testing/selftests/powerpc/include/reg.h

index 52b4710469d2..b67a85404255 100644
--- a/tools/testing/selftests/powerpc/include/reg.h
+++ b/tools/testing/selftests/powerpc/include/reg.h
@@ -77,6 +77,51 @@
 #define TEXASR_TE  0x0400
 #define TEXASR_ROT 0x0200

+/* MSR register bits */
+#define MSR_SF_LG   63  /* Enable 64 bit mode */
+#define MSR_ISF_LG  61  /* Interrupt 64b mode valid  
on 630 */

+#define MSR_HV_LG   60  /* Hypervisor state */
+#define MSR_TS_T_LG 34  /* Trans Mem state: Transactional */
+#define MSR_TS_S_LG 33  /* Trans Mem state: Suspended */
+#define MSR_TS_LG   33  /* Trans Mem state (2 bits) */
+#define MSR_TM_LG   32  /* Trans Mem Available */
+#define MSR_VEC_LG  25  /* Enable AltiVec */
+#define MSR_VSX_LG  23  /* Enable VSX */
+#define MSR_POW_LG  18  /* Enable Power Management */
+#define MSR_WE_LG   18  /* Wait State Enable */
+#define MSR_TGPR_LG 17  /* TLB Update registers in use */
+#define MSR_CE_LG   17  /* Critical Interrupt Enable */
+#define MSR_ILE_LG  16  /* Interrupt Little Endian */
+#define MSR_EE_LG   15  /* External Interrupt Enable */
+#define MSR_PR_LG   14  /* Problem State /  
Privilege Level */

+#define MSR_FP_LG   13  /* Floating Point enable */
+#define MSR_ME_LG   12  /* Machine Check Enable */
+#define MSR_FE0_LG  11  /* Floating Exception mode 0 */
+#define MSR_SE_LG   10  /* Single Step */
+#define MSR_BE_LG   9   /* Branch Trace */
+#define MSR_DE_LG   9   /* Debug Exception Enable */
+#define MSR_FE1_LG  8   /* Floating Exception mode 1 */
+#define MSR_IP_LG   6   /* Exception prefix 0x000/0xFFF */
+#define MSR_IR_LG   5   /* Instruction Relocate */
+#define MSR_DR_LG   4   /* Data Relocate */
+#define MSR_PE_LG   3   /* Protection Enable */
+#define MSR_PX_LG   2   /* Protection Exclusive Mode */
+#define MSR_PMM_LG  2   /* Performance monitor */
+#define MSR_RI_LG   1   /* Recoverable Exception */
+#define MSR_LE_LG   0   /* Little Endian */
+
+#ifdef __ASSEMBLY__
+#define __MASK(X)   (1<<(X))
+#else
+#define __MASK(X)   (1UL<<(X))
+#endif
+
+/* macros to check TM MSR bits */
+#define MSR_TM  __MASK(MSR_TM_LG) /* Transactional Mem  
Available */

+#define MSR_TS_S__MASK(MSR_TS_S_LG)   /* Transaction Suspended */
+#define MSR_TS_T__MASK(MSR_TS_T_LG)   /* Transaction  
Transactional */

+#define MSR_TS_MASK (MSR_TS_T | MSR_TS_S) /* Transaction State bits */
+
 /* Vector Instructions */
 #define VSX_XX1(xs, ra, rb)(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
 ((rb) << 11) | (((xs) >> 5)))
--
2.19.0

Re: [PATCH v5] soc/fsl/qe: fix err handling of ucc_of_parse_tdm

2019-01-02 Thread LEROY Christophe


Peng Hao  a écrit :


From: Wen Yang 

Currently there are some issues with the ucc_of_parse_tdm function:
1, a possible null pointer dereference in ucc_of_parse_tdm,
detected by the semantic patch deref_null.cocci,
with the following warning:
drivers/soc/fsl/qe/qe_tdm.c:177:21-24: ERROR: pdev is NULL but dereferenced.
2, dev gets modified, so in any case that devm_iounmap() will fail
even when the new pdev is valid, because the iomap was done with a
 different pdev.
3, there is no driver bind with the "fsl,t1040-qe-si" or
"fsl,t1040-qe-siram" device. So allocating resources using devm_*()
with these devices won't provide a cleanup path for these resources
when the caller fails.

This patch fixes them.

Suggested-by: Li Yang 
Suggested-by: Christophe LEROY 
Signed-off-by: Wen Yang 
Reviewed-by: Peng Hao 
CC: Julia Lawall 
CC: Zhao Qiang 
CC: David S. Miller 
CC: net...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
---


In order to ease review, could add the list of changes between each  
version of the patch ? Usually we do it at this place so that it is  
available for reviewers but not part of the commit text.


Christophe

 drivers/net/wan/fsl_ucc_hdlc.c | 62  
+-

 drivers/soc/fsl/qe/qe_tdm.c| 55 -
 2 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 839fa77..f30a040 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1057,6 +1057,54 @@ static const struct net_device_ops uhdlc_ops = {
.ndo_tx_timeout = uhdlc_tx_timeout,
 };

+static int hdlc_map_iomem(char *name, int init_flag, void __iomem **ptr)
+{
+   struct device_node *np;
+   struct platform_device *pdev;
+   struct resource *res;
+   static int siram_init_flag;
+   int ret = 0;
+
+   np = of_find_compatible_node(NULL, NULL, name);
+   if (!np)
+   return -EINVAL;
+
+   pdev = of_find_device_by_node(np);
+   if (!pdev) {
+   pr_err("%pOFn: failed to lookup pdev\n", np);
+   of_node_put(np);
+   return -EINVAL;
+   }
+
+   of_node_put(np);
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res) {
+   ret = -EINVAL;
+   goto error_put_device;
+   }
+   *ptr = ioremap(res->start, resource_size(res));
+   if (!*ptr) {
+   ret = -ENOMEM;
+   goto error_put_device;
+   }
+
+   /* We've remapped the addresses, and we don't need the device any
+* more, so we should release it.
+*/
+   put_device(>dev);
+
+   if (init_flag && siram_init_flag == 0) {
+   memset_io(*ptr, 0, resource_size(res));
+   siram_init_flag = 1;
+   }
+   return  0;
+
+error_put_device:
+   put_device(>dev);
+
+   return ret;
+}
+
 static int ucc_hdlc_probe(struct platform_device *pdev)
 {
struct device_node *np = pdev->dev.of_node;
@@ -1151,6 +1199,15 @@ static int ucc_hdlc_probe(struct  
platform_device *pdev)

ret = ucc_of_parse_tdm(np, utdm, ut_info);
if (ret)
goto free_utdm;
+
+   ret = hdlc_map_iomem("fsl,t1040-qe-si", 0,
+(void __iomem **)>si_regs);
+   if (ret)
+   goto free_utdm;
+   ret = hdlc_map_iomem("fsl,t1040-qe-siram", 1,
+(void __iomem **)>siram);
+   if (ret)
+   goto unmap_si_regs;
}

if (of_property_read_u16(np, "fsl,hmask", _priv->hmask))
@@ -1159,7 +1216,7 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
ret = uhdlc_init(uhdlc_priv);
if (ret) {
dev_err(>dev, "Failed to init uhdlc\n");
-   goto free_utdm;
+   goto undo_uhdlc_init;
}

dev = alloc_hdlcdev(uhdlc_priv);
@@ -1188,6 +1245,9 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
 free_dev:
free_netdev(dev);
 undo_uhdlc_init:
+   iounmap(utdm->siram);
+unmap_si_regs:
+   iounmap(utdm->si_regs);
 free_utdm:
if (uhdlc_priv->tsa)
kfree(utdm);
diff --git a/drivers/soc/fsl/qe/qe_tdm.c b/drivers/soc/fsl/qe/qe_tdm.c
index f78c346..76480df 100644
--- a/drivers/soc/fsl/qe/qe_tdm.c
+++ b/drivers/soc/fsl/qe/qe_tdm.c
@@ -44,10 +44,6 @@ int ucc_of_parse_tdm(struct device_node *np,  
struct ucc_tdm *utdm,

const char *sprop;
int ret = 0;
u32 val;
-   struct resource *res;
-   struct device_node *np2;
-   static int siram_init_flag;
-   struct platform_device *pdev;

sprop = of_get_property(np, "fsl,rx-sync-clock", NULL);
if (sprop) {
@@ -124,57 +120,6 @@ int ucc_of_parse_tdm(struct device_node *np,  
struct ucc_tdm *utdm,

Re: [PATCH 1/2] powerpc/4xx/ocm: Fix phys_addr_t printf warnings

2019-01-01 Thread LEROY Christophe


Michael Ellerman  a écrit :


Currently the code produces several warnings, eg:

  arch/powerpc/platforms/4xx/ocm.c:240:38: error: format '%llx'
  expects argument of type 'long long unsigned int', but argument 3
  has type 'phys_addr_t {aka unsigned int}'
 seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys);
   ~~~^ ~

Fix it by using the special %pa[p] format for printing phys_addr_t.
Note we need to pass the value by reference for the special specifier
to work.


When I fixed the same problem in prom.c, you suggested to simply cast  
it to unsigned long long. Is this a better solution ?


Christophe



Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/4xx/ocm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/4xx/ocm.c  
b/arch/powerpc/platforms/4xx/ocm.c

index c22b099c42f1..a1aaa1569d7c 100644
--- a/arch/powerpc/platforms/4xx/ocm.c
+++ b/arch/powerpc/platforms/4xx/ocm.c
@@ -237,12 +237,12 @@ static int ocm_debugfs_show(struct seq_file  
*m, void *v)

continue;

seq_printf(m, "PPC4XX OCM   : %d\n", ocm->index);
-   seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys);
+   seq_printf(m, "PhysAddr : %pa[p]\n", &(ocm->phys));
seq_printf(m, "MemTotal : %d Bytes\n", ocm->memtotal);
seq_printf(m, "MemTotal(NC) : %d Bytes\n", ocm->nc.memtotal);
seq_printf(m, "MemTotal(C)  : %d Bytes\n\n", ocm->c.memtotal);

-   seq_printf(m, "NC.PhysAddr  : 0x%llx\n", ocm->nc.phys);
+   seq_printf(m, "NC.PhysAddr  : %pa[p]\n", &(ocm->nc.phys));
seq_printf(m, "NC.VirtAddr  : 0x%p\n", ocm->nc.virt);
seq_printf(m, "NC.MemTotal  : %d Bytes\n", ocm->nc.memtotal);
seq_printf(m, "NC.MemFree   : %d Bytes\n", ocm->nc.memfree);
@@ -252,7 +252,7 @@ static int ocm_debugfs_show(struct seq_file *m, void *v)
blk->size, blk->owner);
}

-   seq_printf(m, "\nC.PhysAddr   : 0x%llx\n", ocm->c.phys);
+   seq_printf(m, "\nC.PhysAddr   : %pa[p]\n", &(ocm->c.phys));
seq_printf(m, "C.VirtAddr   : 0x%p\n", ocm->c.virt);
seq_printf(m, "C.MemTotal   : %d Bytes\n", ocm->c.memtotal);
seq_printf(m, "C.MemFree: %d Bytes\n", ocm->c.memfree);
--
2.17.2

Re: [PATCH v8 13/25] m68k: Dispatch nvram_ops calls to Atari or Mac functions

2018-12-30 Thread LEROY Christophe

Finn Thain  a écrit :

On Sat, 29 Dec 2018, Arnd Bergmann wrote:

On Wed, Dec 26, 2018 at 1:43 AM Finn Thain  
 wrote:

> +
> +static ssize_t m68k_nvram_get_size(void)
> +{
> +   if (MACH_IS_ATARI)
> +   return atari_nvram_get_size();
> +   else if (MACH_IS_MAC)
> +   return mac_pram_get_size();
> +   return -ENODEV;
> +}
> +
> +/* Atari device drivers call .read (to get checksum validation) whereas
> + * Mac and PowerMac device drivers just use .read_byte.
> + */
> +const struct nvram_ops arch_nvram_ops = {
> +#ifdef CONFIG_MAC
> +   .read_byte  = m68k_nvram_read_byte,
> +   .write_byte = m68k_nvram_write_byte,
> +#endif
> +#ifdef CONFIG_ATARI
> +   .read   = m68k_nvram_read,
> +   .write  = m68k_nvram_write,
> +   .set_checksum   = m68k_nvram_set_checksum,
> +   .initialize = m68k_nvram_initialize,
> +#endif
> +   .get_size   = m68k_nvram_get_size,
> +};
> +EXPORT_SYMBOL(arch_nvram_ops);

Since the operations are almost entirely distinct, why not have two
separate 'nvram_ops' instances here that each refer to just
the set they actually need?

The reason for that is that I am alergic to code duplication. But I'll
change it if you think it matters. BTW, this patch has already been acked
by Geert.

I agree it would be cleaner, as it would also avoid this  
m68k_nvram_get_size() wouldn't it ?

I don't see potential code duplication here, do you ?

Christophe

I was actually expecting one more patch here that would make the
arch_nvram_ops a pointer to one of multiple structures, which would
be easier to do with multiple copies, but I suppose there is no need
for that here (there might be on ppc, I have to look again).

Yes, I considered that too. I picked the variation that makes everything
const.

--

   Arnd

Re: [PATCH v8 01/25] scsi/atari_scsi: Don't select CONFIG_NVRAM

2018-12-30 Thread LEROY Christophe

Arnd Bergmann  a écrit :

On Sat, Dec 29, 2018 at 3:51 AM Michael Schmitz  wrote:

Hi Finn,

Am 29.12.2018 um 15:34 schrieb Finn Thain:
> On Sat, 29 Dec 2018, Michael Schmitz wrote:
>
>>
>> IS_BUILTIN(CONFIG_NVRAM) is probably what Christophe really  
meant to suggest.

>>
>> Or (really going out on a limb here):
>>
>> IS_BUILTIN(CONFIG_NVRAM) ||
>> ( IS_MODULE(CONFIG_ATARI_SCSI) && IS_ENABLED(CONFIG_NVRAM) )
>>
>> Not that I'd advocate that, for this series.
>>
>
> Well, you are a maintainer for atari_scsi.c.
>
> Are you saying that you want IS_BUILTIN(CONFIG_NVRAM) used here instead of
> ifdef?

No, just pointing out that there would be a way to avoid the ifdef
without messing up driver behaviour. I'm fine with the ifdef - not least
because it clearly eliminates code that would be unreachable.

(On second thought - I don't want to speculate whether there's weird
compiler options that could result in the nvram_check_checksum and
nvram_read_bytes symbols to still be referenced in the final link, even
though IS_BUILTIN(CONFIG_NVRAM) always evaluates to false. Best leave
this as-is.)

As far as I know, it's totally reliable with the supported compilers  
(gcc-4.6+).

In the older compilers (e.g. 4.1), there was a corner case, where it could
have failed to eliminate a function that was only referenced through  
a pointer

from a discarded variable, but a plain IS_ENABLED() check like the one here
was still ok, and lots of code relies on that.

Other than that, I agree either way is totally fine here, so no objections
to using the #ifdef.

As far as I know, kernel codying style promotes the use of  
IS_ENABLED() etc. instead of #ifdefs when possible.

Christophe

  Arnd

Re: [PATCH v8 20/25] powerpc, fbdev: Use arch_nvram_ops methods instead of nvram_read_byte() and nvram_write_byte()

2018-12-29 Thread LEROY Christophe


Finn Thain  a écrit :


Make use of arch_nvram_ops in device drivers so that the nvram_* function
exports can be removed.

Since they are no longer global symbols, rename the PPC32 nvram_* functions
appropriately.

Signed-off-by: Finn Thain 
---
 arch/powerpc/kernel/setup_32.c | 8 
 drivers/char/generic_nvram.c   | 4 ++--
 drivers/video/fbdev/controlfb.c| 4 ++--
 drivers/video/fbdev/imsttfb.c  | 4 ++--
 drivers/video/fbdev/matrox/matroxfb_base.c | 2 +-
 drivers/video/fbdev/platinumfb.c   | 4 ++--
 drivers/video/fbdev/valkyriefb.c   | 4 ++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index e0d045677472..bdbe6acbef11 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -152,20 +152,18 @@ __setup("l3cr=", ppc_setup_l3cr);

 #ifdef CONFIG_GENERIC_NVRAM

-unsigned char nvram_read_byte(int addr)
+static unsigned char ppc_nvram_read_byte(int addr)
 {
if (ppc_md.nvram_read_val)
return ppc_md.nvram_read_val(addr);
return 0xff;
 }
-EXPORT_SYMBOL(nvram_read_byte);

-void nvram_write_byte(unsigned char val, int addr)
+static void ppc_nvram_write_byte(unsigned char val, int addr)
 {
if (ppc_md.nvram_write_val)
ppc_md.nvram_write_val(addr, val);
 }
-EXPORT_SYMBOL(nvram_write_byte);

 static ssize_t ppc_nvram_get_size(void)
 {
@@ -182,6 +180,8 @@ static long ppc_nvram_sync(void)
 }

 const struct nvram_ops arch_nvram_ops = {
+   .read_byte  = ppc_nvram_read_byte,
+   .write_byte = ppc_nvram_write_byte,
.get_size   = ppc_nvram_get_size,
.sync   = ppc_nvram_sync,
 };
diff --git a/drivers/char/generic_nvram.c b/drivers/char/generic_nvram.c
index f32d5663de95..41b76bf9614e 100644
--- a/drivers/char/generic_nvram.c
+++ b/drivers/char/generic_nvram.c
@@ -48,7 +48,7 @@ static ssize_t read_nvram(struct file *file, char  
__user *buf,

if (*ppos >= nvram_len)
return 0;
for (i = *ppos; count > 0 && i < nvram_len; ++i, ++p, --count)
-   if (__put_user(nvram_read_byte(i), p))
+   if (__put_user(arch_nvram_ops.read_byte(i), p))


Instead of modifying all drivers (in this patch and previous ones  
related to other arches), wouldn't it be better to add helpers like  
the following in nvram.h:


Static inline unsigned char nvram_read_byte(int addr)
{
return arch_nvram_ops.read_byte(addr);
}

Christophe


return -EFAULT;
*ppos = i;
return p - buf;
@@ -68,7 +68,7 @@ static ssize_t write_nvram(struct file *file,  
const char __user *buf,

for (i = *ppos; count > 0 && i < nvram_len; ++i, ++p, --count) {
if (__get_user(c, p))
return -EFAULT;
-   nvram_write_byte(c, i);
+   arch_nvram_ops.write_byte(c, i);
}
*ppos = i;
return p - buf;
diff --git a/drivers/video/fbdev/controlfb.c  
b/drivers/video/fbdev/controlfb.c

index 9cb0ef7ac29e..27ff33ccafcb 100644
--- a/drivers/video/fbdev/controlfb.c
+++ b/drivers/video/fbdev/controlfb.c
@@ -413,7 +413,7 @@ static int __init init_control(struct fb_info_control *p)
/* Try to pick a video mode out of NVRAM if we have one. */
 #ifdef CONFIG_NVRAM
if (default_cmode == CMODE_NVRAM) {
-   cmode = nvram_read_byte(NV_CMODE);
+   cmode = arch_nvram_ops.read_byte(NV_CMODE);
if(cmode < CMODE_8 || cmode > CMODE_32)
cmode = CMODE_8;
} else
@@ -421,7 +421,7 @@ static int __init init_control(struct fb_info_control *p)
cmode=default_cmode;
 #ifdef CONFIG_NVRAM
if (default_vmode == VMODE_NVRAM) {
-   vmode = nvram_read_byte(NV_VMODE);
+   vmode = arch_nvram_ops.read_byte(NV_VMODE);
if (vmode < 1 || vmode > VMODE_MAX ||
control_mac_modes[vmode - 1].m[full] < cmode) {
sense = read_control_sense(p);
diff --git a/drivers/video/fbdev/imsttfb.c b/drivers/video/fbdev/imsttfb.c
index 8d231591ff0e..e9f3b8914145 100644
--- a/drivers/video/fbdev/imsttfb.c
+++ b/drivers/video/fbdev/imsttfb.c
@@ -1393,12 +1393,12 @@ static void init_imstt(struct fb_info *info)
int vmode = init_vmode, cmode = init_cmode;

if (vmode == -1) {
-   vmode = nvram_read_byte(NV_VMODE);
+   vmode = arch_nvram_ops.read_byte(NV_VMODE);
if (vmode <= 0 || vmode > VMODE_MAX)
vmode = VMODE_640_480_67;
}
if (cmode == -1) {
-   cmode = nvram_read_byte(NV_CMODE);
+   cmode = arch_nvram_ops.read_byte(NV_CMODE);
if (cmode < CMODE_8 || cmode > CMODE_32)

Re: [PATCH v8 01/25] scsi/atari_scsi: Don't select CONFIG_NVRAM

2018-12-29 Thread LEROY Christophe


Michael Schmitz  a écrit :


Hi Finn,

Am 29.12.2018 um 14:06 schrieb Finn Thain:

On Fri, 28 Dec 2018, LEROY Christophe wrote:

diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index 89f5154c40b6..99e5729d910d 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -755,9 +755,10 @@ static int __init atari_scsi_probe(struct
platform_device *pdev)
if (ATARIHW_PRESENT(TT_SCSI) && setup_sg_tablesize >= 0)
atari_scsi_template.sg_tablesize = setup_sg_tablesize;

-   if (setup_hostid >= 0) {
+   if (setup_hostid >= 0)
atari_scsi_template.this_id = setup_hostid & 7;
-   } else {
+#ifdef CONFIG_NVRAM
+   else


Such ifdefs should be avoided in C files.
It would be better to use

} else if (IS_ENABLED(CONFIG_NVRAM)) {



I don't like #ifdefs either. However, as the maintainer of this file, I am
okay with this one.

The old #ifdef CONFIG_NVRAM conditional compilation convention that gets
used here and under drivers/video/fbdev could probably be improved upon
but I consider this to be out-of-scope for this series, which is
complicated enough.

And as explained in the commit log, CONFIG_NVRAM=y and CONFIG_NVRAM=m are
treaded differently by drivers. Therefore, IS_ENABLED would be incorrect.


IS_BUILTIN(CONFIG_NVRAM) is probably what Christophe really meant to suggest.


Doesn't #ifdef means either y or m ? So the same as IS_ENABLED() ?

Christophe



Or (really going out on a limb here):

IS_BUILTIN(CONFIG_NVRAM) ||
( IS_MODULE(CONFIG_ATARI_SCSI) && IS_ENABLED(CONFIG_NVRAM) )

Not that I'd advocate that, for this series.

Cheers,

Michael

Re: [PATCH v8 02/25] m68k/atari: Move Atari-specific code out of drivers/char/nvram.c

2018-12-28 Thread LEROY Christophe


Finn Thain  a écrit :


Move the m68k-specific code out of the driver to make the driver generic.

I've used 'SPDX-License-Identifier: GPL-2.0+' for the new file because the
old file is covered by MODULE_LICENSE("GPL").

Signed-off-by: Finn Thain 
Tested-by: Christian T. Steigies 
Acked-by: Geert Uytterhoeven 
---
Changed since v7:
 - Added SPDX-License-Identifier.
---
 arch/m68k/atari/Makefile |   2 +
 arch/m68k/atari/nvram.c  | 243 +
 drivers/char/nvram.c | 280 +--
 3 files changed, 280 insertions(+), 245 deletions(-)
 create mode 100644 arch/m68k/atari/nvram.c

diff --git a/arch/m68k/atari/Makefile b/arch/m68k/atari/Makefile
index 0cac723306f9..0b86bb6cfa87 100644
--- a/arch/m68k/atari/Makefile
+++ b/arch/m68k/atari/Makefile
@@ -6,3 +6,5 @@ obj-y   := config.o time.o debug.o ataints.o stdma.o \
atasound.o stram.o

 obj-$(CONFIG_ATARI_KBD_CORE)   += atakeyb.o
+
+obj-$(CONFIG_NVRAM:m=y)+= nvram.o
diff --git a/arch/m68k/atari/nvram.c b/arch/m68k/atari/nvram.c
new file mode 100644
index ..3e620ee955ba
--- /dev/null
+++ b/arch/m68k/atari/nvram.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * CMOS/NV-RAM driver for Atari. Adapted from drivers/char/nvram.c.
+ * Copyright (C) 1997 Roman Hodek 
+ * idea by and with help from Richard Jelinek 
+ * Portions copyright (c) 2001,2002 Sun Microsystems (thoc...@sun.com)
+ * Further contributions from Cesar Barros, Erik Gilling, Tim Hockin and
+ * Wim Van Sebroeck.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define NVRAM_BYTES50
+
+/* It is worth noting that these functions all access bytes of general
+ * purpose memory in the NVRAM - that is to say, they all add the
+ * NVRAM_FIRST_BYTE offset. Pass them offsets into NVRAM as if you did not
+ * know about the RTC cruft.
+ */
+
+/* Note that *all* calls to CMOS_READ and CMOS_WRITE must be done with
+ * rtc_lock held. Due to the index-port/data-port design of the RTC, we
+ * don't want two different things trying to get to it at once. (e.g. the
+ * periodic 11 min sync from kernel/time/ntp.c vs. this driver.)
+ */
+
+unsigned char __nvram_read_byte(int i)
+{
+   return CMOS_READ(NVRAM_FIRST_BYTE + i);
+}
+
+unsigned char nvram_read_byte(int i)
+{
+   unsigned long flags;
+   unsigned char c;
+
+   spin_lock_irqsave(_lock, flags);
+   c = __nvram_read_byte(i);
+   spin_unlock_irqrestore(_lock, flags);
+   return c;
+}
+EXPORT_SYMBOL(nvram_read_byte);
+
+/* This races nicely with trying to read with checksum checking */
+void __nvram_write_byte(unsigned char c, int i)
+{
+   CMOS_WRITE(c, NVRAM_FIRST_BYTE + i);
+}
+
+void nvram_write_byte(unsigned char c, int i)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(_lock, flags);
+   __nvram_write_byte(c, i);
+   spin_unlock_irqrestore(_lock, flags);
+}
+
+/* On Ataris, the checksum is over all bytes except the checksum bytes
+ * themselves; these are at the very end.
+ */
+#define ATARI_CKS_RANGE_START  0
+#define ATARI_CKS_RANGE_END47
+#define ATARI_CKS_LOC  48
+
+int __nvram_check_checksum(void)
+{
+   int i;
+   unsigned char sum = 0;
+
+   for (i = ATARI_CKS_RANGE_START; i <= ATARI_CKS_RANGE_END; ++i)
+   sum += __nvram_read_byte(i);
+   return (__nvram_read_byte(ATARI_CKS_LOC) == (~sum & 0xff)) &&
+  (__nvram_read_byte(ATARI_CKS_LOC + 1) == (sum & 0xff));
+}
+
+int nvram_check_checksum(void)
+{
+   unsigned long flags;
+   int rv;
+
+   spin_lock_irqsave(_lock, flags);
+   rv = __nvram_check_checksum();
+   spin_unlock_irqrestore(_lock, flags);
+   return rv;
+}
+EXPORT_SYMBOL(nvram_check_checksum);
+
+static void __nvram_set_checksum(void)
+{
+   int i;
+   unsigned char sum = 0;
+
+   for (i = ATARI_CKS_RANGE_START; i <= ATARI_CKS_RANGE_END; ++i)
+   sum += __nvram_read_byte(i);
+   __nvram_write_byte(~sum, ATARI_CKS_LOC);
+   __nvram_write_byte(sum, ATARI_CKS_LOC + 1);
+}
+
+#ifdef CONFIG_PROC_FS
+static struct {
+   unsigned char val;
+   const char *name;
+} boot_prefs[] = {
+   { 0x80, "TOS" },
+   { 0x40, "ASV" },
+   { 0x20, "NetBSD (?)" },
+   { 0x10, "Linux" },
+   { 0x00, "unspecified" },
+};
+
+static const char * const languages[] = {
+   "English (US)",
+   "German",
+   "French",
+   "English (UK)",
+   "Spanish",
+   "Italian",
+   "6 (undefined)",
+   "Swiss (French)",
+   "Swiss (German)",
+};
+
+static const char * const dateformat[] = {
+   "MM%cDD%cYY",
+   "DD%cMM%cYY",
+   "YY%cMM%cDD",
+   "YY%cDD%cMM",
+   "4 (undefined)",
+   "5 (undefined)",
+   "6 (undefined)",
+   "7 (undefined)",
+};
+
+static const char * const colors[] = {
+   "2", "4",

Re: [PATCH v8 01/25] scsi/atari_scsi: Don't select CONFIG_NVRAM

2018-12-28 Thread LEROY Christophe


Finn Thain  a écrit :


On powerpc, setting CONFIG_NVRAM=n builds a kernel with no NVRAM support.
Setting CONFIG_NVRAM=m enables the /dev/nvram misc device module without
enabling NVRAM support in drivers. Setting CONFIG_NVRAM=y enables the
misc device (built-in) and also enables NVRAM support in drivers.

m68k shares the valkyriefb driver with powerpc, and since that driver uses
NVRAM, it is affected by CONFIG_ATARI_SCSI, because of the use of
"select NVRAM".

Adopt the powerpc convention on m68k to avoid surprises.

Signed-off-by: Finn Thain 
Tested-by: Christian T. Steigies 
---
This patch temporarily disables CONFIG_NVRAM on Atari, to prevent build
failures when bisecting the rest of this patch series. It gets enabled
again with the introduction of CONFIG_HAVE_ARCH_NVRAM_OPS, once the
nvram_* global functions have been moved to an ops struct.
---
 drivers/char/Kconfig  | 5 +
 drivers/scsi/Kconfig  | 6 +++---
 drivers/scsi/atari_scsi.c | 7 ---
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 9d03b2ff5df6..5b54595dfe30 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -236,7 +236,7 @@ source "drivers/char/hw_random/Kconfig"

 config NVRAM
tristate "/dev/nvram support"
-   depends on ATARI || X86 || GENERIC_NVRAM
+   depends on X86 || GENERIC_NVRAM
---help---
  If you say Y here and create a character special file /dev/nvram
  with major number 10 and minor number 144 using mknod ("man mknod"),
@@ -254,9 +254,6 @@ config NVRAM
  should NEVER idly tamper with it. See Ralf Brown's interrupt list
  for a guide to the use of CMOS bytes by your BIOS.

- On Atari machines, /dev/nvram is always configured and does not need
- to be selected.
-
  To compile this driver as a module, choose M here: the
  module will be called nvram.

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 640cd1b31a18..924eb69e7fc4 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1381,14 +1381,14 @@ config ATARI_SCSI
tristate "Atari native SCSI support"
depends on ATARI && SCSI
select SCSI_SPI_ATTRS
-   select NVRAM
---help---
  If you have an Atari with built-in NCR5380 SCSI controller (TT,
  Falcon, ...) say Y to get it supported. Of course also, if you have
  a compatible SCSI controller (e.g. for Medusa).

- To compile this driver as a module, choose M here: the
- module will be called atari_scsi.
+ To compile this driver as a module, choose M here: the module will
+ be called atari_scsi. If you also enable NVRAM support, the SCSI
+ host's ID is taken from the setting in TT RTC NVRAM.

  This driver supports both styles of NCR integration into the
  system: the TT style (separate DMA), and the Falcon style (via
diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index 89f5154c40b6..99e5729d910d 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -755,9 +755,10 @@ static int __init atari_scsi_probe(struct  
platform_device *pdev)

if (ATARIHW_PRESENT(TT_SCSI) && setup_sg_tablesize >= 0)
atari_scsi_template.sg_tablesize = setup_sg_tablesize;

-   if (setup_hostid >= 0) {
+   if (setup_hostid >= 0)
atari_scsi_template.this_id = setup_hostid & 7;
-   } else {
+#ifdef CONFIG_NVRAM
+   else


Such ifdefs should be avoided in C files.
It would be better to use

} else if (IS_ENABLED(CONFIG_NVRAM)) {


/* Test if a host id is set in the NVRam */
if (ATARIHW_PRESENT(TT_CLK) && nvram_check_checksum()) {
unsigned char b = nvram_read_byte(16);
@@ -768,7 +769,7 @@ static int __init atari_scsi_probe(struct  
platform_device *pdev)

if (b & 0x80)
atari_scsi_template.this_id = b & 7;
}
-   }
+#endif

/* If running on a Falcon and if there's TT-Ram (i.e., more than one
 * memory block, since there's always ST-Ram in a Falcon), then
--
2.19.2

Re: [PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-31 Thread LEROY Christophe

Russell Currey  a écrit :

On Fri, 2018-10-26 at 18:35 +0200, LEROY Christophe wrote:

Why not call our new functionnality SMAP instead of calling it GUAP ?

mpe wasn't a fan of using the same terminology as other architectures.

I don't like too much the word 'guarded' because it means something  
different for powerpc MMUs.

What about something like 'user space access protection' ?

Christophe

Having a separate term does avoid some assumptions about how things
work or are implemented, but sharing compatibility with an existing
parameter is nice.

Personally I don't really care too much about the name.

- Russell

Christophe

Russell Currey  a écrit :

> Signed-off-by: Russell Currey 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index a5ad67d5cb16..8f78e75965f0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2764,7 +2764,7 @@
>noexec=on: enable non-executable mappings
> (default)
>noexec=off: disable non-executable mappings
>
> -  nosmap  [X86]
> +  nosmap  [X86,PPC]
>Disable SMAP (Supervisor Mode Access
> Prevention)
>even if it is supported by processor.
>
> --
> 2.19.1

Re: [PATCH 0/5] Guarded Userspace Access Prevention on Radix

2018-10-31 Thread LEROY Christophe


Russell Currey  a écrit :


On Fri, 2018-10-26 at 18:29 +0200, LEROY Christophe wrote:

Russell Currey  a écrit :

> Guarded Userspace Access Prevention is a security mechanism that
> prevents
> the kernel from being able to read and write userspace addresses
> outside of
> the allowed paths, most commonly copy_{to/from}_user().
>
> At present, the only CPU that supports this is POWER9, and only
> while using
> the Radix MMU.  Privileged reads and writes cannot access user data
> when
> key 0 of the AMR is set.  This is described in the "Radix Tree
> Translation
> Storage Protection" section of the POWER ISA as of version 3.0.

It is not right that only power9 can support that.


It's true that not only P9 can support it, but there are more
considerations under hash than radix, implementing this for radix is a
first step.


I don't know much about hash, but I was talking about the 8xx which is  
a nohash ppc32. I'll see next week if I can do something with it on  
top of your serie.






The 8xx has mmu access protection registers which can serve the
same
purpose. Today on the 8xx kernel space is group 0 and user space is
group 1. Group 0 is set to "page defined access permission" in
MD_AP
and MI_AP registers, and group 1 is set to "all accesses done with
supervisor rights". By setting group 1 to "user and supervisor
interpretation swapped" we can forbid kernel access to user space
while still allowing user access to it. Then by simply changing
group
1 mode at dedicated places we can lock/unlock kernel access to user
space.

Could you implement something as generic as possible having that in
mind for a future patch ?


I don't think anything in this series is particularly problematic in
relation to future work for hash.  I am interested in doing a hash
implementation in future too.


I think we have to look at that carrefuly to avoid uggly ifdefs

Christophe



- Russell



Christophe

> GUAP code sets key 0 of the AMR (thus disabling accesses of user
> data)
> early during boot, and only ever "unlocks" access prior to certain
> operations, like copy_{to/from}_user(), futex ops, etc.  Setting
> this does
> not prevent unprivileged access, so userspace can operate fine
> while access
> is locked.
>
> There is a performance impact, although I don't consider it
> heavy.  Running
> a worst-case benchmark of a 1GB copy 1 byte at a time (and thus
> constant
> read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower
> than
> when disabled.  In most cases, the difference is negligible.  The
> main
> performance impact is the mtspr instruction, which is quite slow.
>
> There are a few caveats with this series that could be improved
> upon in
> future.  Right now there is no saving and restoring of the AMR
> value -
> there is no userspace exploitation of the AMR on Radix in POWER9,
> but if
> this were to change in future, saving and restoring the value would
> be
> necessary.
>
> No attempt to optimise cases of repeated calls - for example, if
> some
> code was repeatedly calling copy_to_user() for small sizes very
> frequently,
> it would be slower than the equivalent of wrapping that code in an
> unlock
> and lock and only having to modify the AMR once.
>
> There are some interesting cases that I've attempted to handle,
> such as if
> the AMR is unlocked (i.e. because a copy_{to_from}_user is in
> progress)...
>
> - and an exception is taken, the kernel would then be running
> with the
> AMR unlocked and freely able to access userspace again.  I am
> working
> around this by storing a flag in the PACA to indicate if the
> AMR is
> unlocked (to save a costly SPR read), and if so, locking the
> AMR in
> the exception entry path and unlocking it on the way out.
>
> - and gets context switched out, goes into a path that locks
> the AMR,
> then context switches back, access will be disabled and will
> fault.
> As a result, I context switch the AMR between tasks as if it
> was used
> by userspace like hash (which already implements this).
>
> Another consideration is use of the isync instruction.  Without an
> isync
> following the mtspr instruction, there is no guarantee that the
> change
> takes effect.  The issue is that isync is very slow, and so I tried
> to
> avoid them wherever necessary.  In this series, the only place an
> isync
> gets used is after *unlocking* the AMR, because if an access takes
> place
> and access is still prevented, the kernel will fault.
>
> On the flipside, a slight delay in unlocking caused by skipping an
> isync
> potentially allows a small window of vulnerability.  It is my
> opinion
> that this window is practically

Re: [PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention

2018-10-31 Thread LEROY Christophe


Russell Currey  a écrit :


On Sun, 2018-10-28 at 18:57 +0100, LEROY Christophe wrote:

Russell Currey  a écrit :

> Guarded Userspace Access Prevention (GUAP)  utilises a feature of
> the Radix MMU which disallows read and write access to userspace
> addresses.  By utilising this, the kernel is prevented from
> accessing
> user data from outside of trusted paths that perform proper safety
> checks,
> such as copy_{to/from}_user() and friends.
>
> Userspace access is disabled from early boot and is only enabled
> when:
>
>- exiting the kernel and entering userspace
>- performing an operation like copy_{to/from}_user()
>- context switching to a process that has access enabled
>
> and similarly, access is disabled again when exiting userspace and
> entering
> the kernel.
>
> This feature has a slight performance impact which I roughly
> measured to be
> 3% slower in the worst case (performing 1GB of 1 byte
> read()/write()
> syscalls), and is gated behind the CONFIG_PPC_RADIX_GUAP option for
> performance-critical builds.
>
> This feature can be tested by using the lkdtm driver
> (CONFIG_LKDTM=y) and
> performing the following:
>
>echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT
>
> if enabled, this should send SIGSEGV to the thread.
>
> Signed-off-by: Russell Currey 

I think this patch should be split in at least two parts:
First part for implementing the generic part, including the changes
to
futex and csum, and a second part implementing the radix part.


I'll see how I go making generic handlers - I am concerned about the
implementation becoming more complex than it needs to be just to
accommodate potential future changes that could end up having different
requirements anyway, rather than something simple that works today.


I think we can do something quite simple. I'll draft something next  
week and send it to get your feedback.





> ---
> Since the previous version of this patchset (named KHRAP) there
> have been
> several changes, some of which include:
>
> - macro naming, suggested by Nick
> - builds should be fixed outside of 64s
> - no longer unlock heading out to userspace
> - removal of unnecessary isyncs
> - more config option testing
> - removal of save/restore
> - use pr_crit() and reword message on fault
>
>  arch/powerpc/include/asm/exception-64e.h |  3 ++
>  arch/powerpc/include/asm/exception-64s.h | 19 +++-
>  arch/powerpc/include/asm/mmu.h   |  7 +++
>  arch/powerpc/include/asm/paca.h  |  3 ++
>  arch/powerpc/include/asm/reg.h   |  1 +
>  arch/powerpc/include/asm/uaccess.h   | 57
> 
>  arch/powerpc/kernel/asm-offsets.c|  1 +
>  arch/powerpc/kernel/dt_cpu_ftrs.c|  4 ++
>  arch/powerpc/kernel/entry_64.S   | 17 ++-
>  arch/powerpc/mm/fault.c  | 12 +
>  arch/powerpc/mm/pgtable-radix.c  |  2 +
>  arch/powerpc/mm/pkeys.c  |  7 ++-
>  arch/powerpc/platforms/Kconfig.cputype   | 15 +++
>  13 files changed, 135 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/exception-64e.h
> b/arch/powerpc/include/asm/exception-64e.h
> index 555e22d5e07f..bf25015834ee 100644
> --- a/arch/powerpc/include/asm/exception-64e.h
> +++ b/arch/powerpc/include/asm/exception-64e.h
> @@ -215,5 +215,8 @@ exc_##label##_book3e:
>  #define RFI_TO_USER
>\
>rfi
>
> +#define UNLOCK_USER_ACCESS(reg)
> +#define LOCK_USER_ACCESS(reg)
> +
>  #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
>
> diff --git a/arch/powerpc/include/asm/exception-64s.h
> b/arch/powerpc/include/asm/exception-64s.h
> index 3b4767ed3ec5..0cac5bd380ca 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943)
>\
>std ra,offset(r13); \
>  END_FTR_SECTION_NESTED(ftr,ftr,943)
>
> +#define LOCK_USER_ACCESS(reg)
>\
> +BEGIN_MMU_FTR_SECTION_NESTED(944)
> \
> +  LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
> +  mtspr   SPRN_AMR,reg;
> \
> +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,9
> 44)
> +
> +#define UNLOCK_USER_ACCESS(reg)
>\
> +BEGIN_MMU_FTR_SECTION_NESTED(945)
> \
> +  li  reg,0;  \
> +  mtspr   SPRN_AMR,reg;
> \
> +  isync
> \
> +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,9
> 45)
> +
>  #define EXCEPTION_PROLOG_0(area)
> \
>GET_PACA(r13);
> \
>std

Re: NXP P50XX/e5500: SMP doesn't work anymore with the latest Git kernel

2018-10-29 Thread LEROY Christophe


Christian Zigotzky  a écrit :


Hello,

I figured out that the problem is in the OF source code of the  
commit: Merge tag devicetree-for-4.20. [1]


That's a merge commit. Can you bisect the branch and identify the  
faulting commit ?


Christophe



I reverted the following OF files and SMP works!

drivers/of/base.c
drivers/of/device.c
drivers/of/of_mdio.c
drivers/of/of_numa.c
drivers/of/of_private.h
drivers/of/overlay.c
drivers/of/platform.c
drivers/of/unittest-data/overlay_15.dts
drivers/of/unittest-data/tests-overlay.dtsi
drivers/of/unittest.c
include/linux/of.h

Cheers,
Christian

[1]  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b27186abb37b7bd19e0ca434f4f425c807dbd708



On 29 October 2018 at 10:56AM, Christian Zigotzky wrote:

Hello,

I have figured out that the commit 'devicetree-for-4.20' [1] is  
responsible for the SMP problem. I was able to revert this commit  
with 'git revert b27186abb37b7bd19e0ca434f4f425c807dbd708 -m 1'  
today.


[master ec81438] Revert "Merge tag 'devicetree-for-4.20' of  
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux"

138 files changed, 931 insertions(+), 1538 deletions(-)
rename Documentation/devicetree/bindings/arm/{atmel-sysregs.txt =>  
atmel-at91.txt} (67%)
delete mode 100644  
Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-dcfg.txt
delete mode 100644  
Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
rename Documentation/devicetree/bindings/arm/{zte,sysctrl.txt =>  
zte.txt} (62%)

delete mode 100644 Documentation/devicetree/bindings/misc/lwn-bk4.txt
create mode 100644 arch/c6x/boot/dts/linked_dtb.S
delete mode 100644 arch/nios2/boot/dts/Makefile
create mode 100644 arch/nios2/boot/linked_dtb.S
delete mode 100644 arch/powerpc/boot/dts/Makefile
delete mode 100644 arch/powerpc/boot/dts/fsl/Makefile
delete mode 100644 scripts/dtc/yamltree.c

It solves the SMP problem! SMP works again on my P5020 board and on  
virtual e5500 QEMU machines.


QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048  
-kernel /home/christian/Downloads/uImage-4.20-alpha5 -drive  
format=raw,file=/home/christian/Dokumente/ubuntu_MATE_16.04.3_LTS_PowerPC_QEMU/ubuntu_MATE_16.04_PowerPC.img,index=0,if=virtio -nic user,model=e1000 -append "rw root=/dev/vda3" -device virtio-vga -device virtio-mouse-pci -device virtio-keyboard-pci -soundhw es1370 -smp  
4


Screenshot:  
https://plus.google.com/u/0/photos/photo/115515624056477014971/6617705776207990082


Do we need a new dtb file or is it a bug?

Thanks,
Christian

[1]  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b27186abb37b7bd19e0ca434f4f425c807dbd708



On 28 October 2018 at 5:35PM, Christian Zigotzky wrote:

Hello,

SMP doesn't work anymore with the latest Git kernel (28/10/18  
11:12AM GMT) on my P5020 board and on virtual e5500 QEMU machines.


Board with P5020 dual core CPU:

[    0.00] -
[    0.00] phys_mem_size = 0x2
[    0.00] dcache_bsize  = 0x40
[    0.00] icache_bsize  = 0x40
[    0.00] cpu_features  = 0x0003008003b4
[    0.00]   possible    = 0x0003009003b4
[    0.00]   always  = 0x0003008003b4
[    0.00] cpu_user_features = 0xcc008000 0x0800
[    0.00] mmu_features  = 0x000a0010
[    0.00] firmware_features = 0x
[    0.00] -
[    0.00] CoreNet Generic board

    ...

[    0.002161] smp: Bringing up secondary CPUs ...
[    0.002339] No cpu-release-addr for cpu 1
[    0.002347] smp: failed starting cpu 1 (rc -2)
[    0.002401] smp: Brought up 1 node, 1 CPU

Virtual e5500 quad core QEMU machine:

[    0.026394] smp: Bringing up secondary CPUs ...
[    0.027831] No cpu-release-addr for cpu 1
[    0.027989] smp: failed starting cpu 1 (rc -2)
[    0.030143] No cpu-release-addr for cpu 2
[    0.030304] smp: failed starting cpu 2 (rc -2)
[    0.032400] No cpu-release-addr for cpu 3
[    0.032533] smp: failed starting cpu 3 (rc -2)
[    0.033117] smp: Brought up 1 node, 1 CPU

QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048  
-kernel  
/home/christian/Downloads/vmlinux-4.20-alpha4-AmigaOne_X1000_X5000/X5000_and_QEMU_e5500/uImage-4.20 -drive format=raw,file=/home/christian/Downloads/MATE_PowerPC_Remix_2017_0.9.img,index=0,if=virtio -nic user,model=e1000 -append "rw root=/dev/vda" -device virtio-vga -device virtio-mouse-pci -device virtio-keyboard-pci -usb -soundhw es1370 -smp  
4


.config:

...
CONFIG_SMP=y
CONFIG_NR_CPUS=4
...

Please test the latest Git kernel on your NXP P50XX boards.

Thanks,
Christian

Re: [PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention

2018-10-28 Thread LEROY Christophe


Russell Currey  a écrit :


Guarded Userspace Access Prevention (GUAP)  utilises a feature of
the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety checks,
such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

- exiting the kernel and entering userspace
- performing an operation like copy_{to/from}_user()
- context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and entering
the kernel.

This feature has a slight performance impact which I roughly measured to be
3% slower in the worst case (performing 1GB of 1 byte read()/write()
syscalls), and is gated behind the CONFIG_PPC_RADIX_GUAP option for
performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
performing the following:

echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

Signed-off-by: Russell Currey 


I think this patch should be split in at least two parts:
First part for implementing the generic part, including the changes to  
futex and csum, and a second part implementing the radix part.

---
Since the previous version of this patchset (named KHRAP) there have been
several changes, some of which include:

- macro naming, suggested by Nick
- builds should be fixed outside of 64s
- no longer unlock heading out to userspace
- removal of unnecessary isyncs
- more config option testing
- removal of save/restore
- use pr_crit() and reword message on fault

 arch/powerpc/include/asm/exception-64e.h |  3 ++
 arch/powerpc/include/asm/exception-64s.h | 19 +++-
 arch/powerpc/include/asm/mmu.h   |  7 +++
 arch/powerpc/include/asm/paca.h  |  3 ++
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/include/asm/uaccess.h   | 57 
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c|  4 ++
 arch/powerpc/kernel/entry_64.S   | 17 ++-
 arch/powerpc/mm/fault.c  | 12 +
 arch/powerpc/mm/pgtable-radix.c  |  2 +
 arch/powerpc/mm/pkeys.c  |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype   | 15 +++
 13 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h  
b/arch/powerpc/include/asm/exception-64e.h

index 555e22d5e07f..bf25015834ee 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -215,5 +215,8 @@ exc_##label##_book3e:
 #define RFI_TO_USER\
rfi

+#define UNLOCK_USER_ACCESS(reg)
+#define LOCK_USER_ACCESS(reg)
+
 #endif /* _ASM_POWERPC_EXCEPTION_64E_H */

diff --git a/arch/powerpc/include/asm/exception-64s.h  
b/arch/powerpc/include/asm/exception-64s.h

index 3b4767ed3ec5..0cac5bd380ca 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943)  
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)

+#define LOCK_USER_ACCESS(reg)  
\
+BEGIN_MMU_FTR_SECTION_NESTED(944)  \
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   mtspr   SPRN_AMR,reg;   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,944)
+
+#define UNLOCK_USER_ACCESS(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(945)  \
+   li  reg,0;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,945)
+
 #define EXCEPTION_PROLOG_0(area)   \
GET_PACA(r13);  \
std r9,area+EX_R9(r13); /* save r9 */   \
@@ -500,7 +513,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
SAVE_PPR(area, r9);\
-4: EXCEPTION_PROLOG_COMMON_2(area)\
+4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13);  \
+   cmpwi   cr1,r9,0;

Re: [PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-26 Thread LEROY Christophe


Why not call our new functionnality SMAP instead of calling it GUAP ?

Christophe

Russell Currey  a écrit :


Signed-off-by: Russell Currey 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt  
b/Documentation/admin-guide/kernel-parameters.txt

index a5ad67d5cb16..8f78e75965f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2764,7 +2764,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings

-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.

--
2.19.1

Re: [PATCH 3/5] powerpc/lib: checksum GUAP support

2018-10-26 Thread LEROY Christophe


Same comment as for futex

Christophe

Russell Currey  a écrit :


Wrap the checksumming code in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/lib/checksum_wrappers.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/checksum_wrappers.c  
b/arch/powerpc/lib/checksum_wrappers.c

index a0cb63fb76a1..c67db0a6e18b 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user  
*src, void *dst,

 {
unsigned int csum;

+   unlock_user_access();
might_sleep();

*err_ptr = 0;
@@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user  
*src, void *dst,

}

 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void  
__user *dst, int len,

 {
unsigned int csum;

+   unlock_user_access();
might_sleep();

*err_ptr = 0;
@@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src,  
void __user *dst, int len,

}

 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_to_user);
--
2.19.1

Re: [PATCH 2/5] powerpc/futex: GUAP support for futex ops

2018-10-26 Thread LEROY Christophe


Russell Currey  a écrit :


Wrap the futex operations in GUAP locks and unlocks.


Does it means futex doesn't work anymore once only patch 1 is applied  
? If so, then you should split patch 1 in two parts and reorder  
patches so that guap can only be activated once all necessary changes  
are done. Otherwise the serie won't be bisectable


Christophe



Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/futex.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/futex.h  
b/arch/powerpc/include/asm/futex.h

index 94542776a62d..3aed640ee9ef 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int  
op, int oparg, int *oval,

 {
int oldval = 0, ret;

+   unlock_user_access();
pagefault_disable();

switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int  
op, int oparg, int *oval,

if (!ret)
*oval = oldval;

+   lock_user_access();
return ret;
 }

@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

+   unlock_user_access();
 __asm__ __volatile__ (
 PPC_ATOMIC_ENTRY_BARRIER
 "1: lwarx   %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 : "cc", "memory");

*uval = prev;
+   lock_user_access();
 return ret;
 }

--
2.19.1

Re: [PATCH 0/5] Guarded Userspace Access Prevention on Radix

2018-10-26 Thread LEROY Christophe


Russell Currey  a écrit :


Guarded Userspace Access Prevention is a security mechanism that prevents
the kernel from being able to read and write userspace addresses outside of
the allowed paths, most commonly copy_{to/from}_user().

At present, the only CPU that supports this is POWER9, and only while using
the Radix MMU.  Privileged reads and writes cannot access user data when
key 0 of the AMR is set.  This is described in the "Radix Tree Translation
Storage Protection" section of the POWER ISA as of version 3.0.


It is not right that only power9 can support that.

The 8xx has mmu access protection registers which can serve the same  
purpose. Today on the 8xx kernel space is group 0 and user space is  
group 1. Group 0 is set to "page defined access permission" in MD_AP  
and MI_AP registers, and group 1 is set to "all accesses done with  
supervisor rights". By setting group 1 to "user and supervisor  
interpretation swapped" we can forbid kernel access to user space  
while still allowing user access to it. Then by simply changing group  
1 mode at dedicated places we can lock/unlock kernel access to user  
space.


Could you implement something as generic as possible having that in  
mind for a future patch ?


Christophe



GUAP code sets key 0 of the AMR (thus disabling accesses of user data)
early during boot, and only ever "unlocks" access prior to certain
operations, like copy_{to/from}_user(), futex ops, etc.  Setting this does
not prevent unprivileged access, so userspace can operate fine while access
is locked.

There is a performance impact, although I don't consider it heavy.  Running
a worst-case benchmark of a 1GB copy 1 byte at a time (and thus constant
read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower than
when disabled.  In most cases, the difference is negligible.  The main
performance impact is the mtspr instruction, which is quite slow.

There are a few caveats with this series that could be improved upon in
future.  Right now there is no saving and restoring of the AMR value -
there is no userspace exploitation of the AMR on Radix in POWER9, but if
this were to change in future, saving and restoring the value would be
necessary.

No attempt to optimise cases of repeated calls - for example, if some
code was repeatedly calling copy_to_user() for small sizes very frequently,
it would be slower than the equivalent of wrapping that code in an unlock
and lock and only having to modify the AMR once.

There are some interesting cases that I've attempted to handle, such as if
the AMR is unlocked (i.e. because a copy_{to_from}_user is in progress)...

- and an exception is taken, the kernel would then be running with the
AMR unlocked and freely able to access userspace again.  I am working
around this by storing a flag in the PACA to indicate if the AMR is
unlocked (to save a costly SPR read), and if so, locking the AMR in
the exception entry path and unlocking it on the way out.

- and gets context switched out, goes into a path that locks the AMR,
then context switches back, access will be disabled and will fault.
As a result, I context switch the AMR between tasks as if it was used
by userspace like hash (which already implements this).

Another consideration is use of the isync instruction.  Without an isync
following the mtspr instruction, there is no guarantee that the change
takes effect.  The issue is that isync is very slow, and so I tried to
avoid them wherever necessary.  In this series, the only place an isync
gets used is after *unlocking* the AMR, because if an access takes place
and access is still prevented, the kernel will fault.

On the flipside, a slight delay in unlocking caused by skipping an isync
potentially allows a small window of vulnerability.  It is my opinion
that this window is practically impossible to exploit, but if someone
thinks otherwise, please do share.

This series is my first attempt at POWER assembly so all feedback is very
welcome.

The official theme song of this series can be found here:
https://www.youtube.com/watch?v=QjTrnKAcYjE

Russell Currey (5):
  powerpc/64s: Guarded Userspace Access Prevention
  powerpc/futex: GUAP support for futex ops
  powerpc/lib: checksum GUAP support
  powerpc/64s: Disable GUAP with nosmap option
  powerpc/64s: Document that PPC supports nosmap

 .../admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/exception-64e.h  |  3 +
 arch/powerpc/include/asm/exception-64s.h  | 19 ++-
 arch/powerpc/include/asm/futex.h  |  6 ++
 arch/powerpc/include/asm/mmu.h|  7 +++
 arch/powerpc/include/asm/paca.h   |  3 +
 arch/powerpc/include/asm/reg.h|  1 +
 arch/powerpc/include/asm/uaccess.h| 57 ---
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c |  4 ++
 arch/powerpc/kernel/entry_64.S| 17

Re: [PATCH 3/9] powerpc/mm: Remove extern from function definition

2018-10-24 Thread LEROY Christophe


Breno Leitao  a écrit :


hi Christophe,

On 10/23/2018 12:38 PM, LEROY Christophe wrote:

Breno Leitao  a écrit :


This patch removes the keyword from the definition part, while keeps
it in
the declaration part.


I think checkpatch also says that extern should be avoided in declarations.


Thanks for the review. I tried to look at this complain, but I didn't see
this behavior on checkpatch.pl from kernel 4.19. I created a commit that adds
a new extern prototype and checked the patch. Take a look:


Use option --strict with checkpatch.pl

Christophe



# git show

commit 720cd4ee7bf3c0607eaea79e209b719bac79508e
Author: Breno Leitao 
Date:   Wed Oct 24 10:31:54 2018 -0400

powerpc/mm: New test function

New test function.

Signed-off-by: Breno Leitao 

diff --git a/arch/powerpc/include/asm/hugetlb.h
b/arch/powerpc/include/asm/hugetlb.h
index 2d00cc530083..4a348e42cab6 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
	@@ -167,6 +167,8 @@ extern int huge_ptep_set_access_flags(struct  
vm_area_struct *vma,

  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);

+extern int test(int foo);
+
static inline pte_t huge_ptep_get(pte_t *ptep)
{
return *ptep;

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5c390f5a5207..2e8f5f77f7f6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3204,6 +3204,11 @@ static void set_huge_ptep_writable(struct
vm_area_struct *vma,
update_mmu_cache(vma, address, ptep);
}

+int test(int foo)
+{
+   return foo;
+}
+
bool is_hugetlb_entry_migration(pte_t pte)
{
swp_entry_t swp;


# scripts/checkpatch.pl -g HEAD

total: 0 errors, 0 warnings, 19 lines checked

Commit 720cd4ee7bf3 ("powerpc/mm: New test function") has no obvious
style problems and is ready for submission.

Re: [PATCH 3/9] powerpc/mm: Remove extern from function definition

2018-10-23 Thread LEROY Christophe


Breno Leitao  a écrit :


Function huge_ptep_set_access_flags() has the 'extern' keyword in the
function definition and also in the function declaration. This causes a
warning in 'sparse' since the 'extern' storage class should be used only on
symbol declarations.

	arch/powerpc/mm/pgtable.c:232:12: warning: function  
'huge_ptep_set_access_flags' with external linkage has definition


This patch removes the keyword from the definition part, while keeps it in
the declaration part.


I think checkpatch also says that extern should be avoided in declarations.
Can you remove both ?

Christophe



Signed-off-by: Breno Leitao 
---
 arch/powerpc/mm/pgtable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index d71c669c..213933b78077 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -229,7 +229,7 @@ int ptep_set_access_flags(struct vm_area_struct  
*vma, unsigned long address,

 }

 #ifdef CONFIG_HUGETLB_PAGE
-extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty)
 {
--
2.19.0

Re: [PATCH v3 06/24] powerpc/mm: properly set PAGE_KERNEL flags in ioremap()

2018-10-14 Thread LEROY Christophe


LEROY Christophe  a écrit :


Michael Ellerman  a écrit :


Christophe Leroy  writes:


Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy 


Something in here is breaking my p5020ds (both 32-bit and 64-bit):

# first bad commit: [cb9220cc18345a2a1ec287752e576a2c88fa4a2b]  
powerpc/mm: properly set PAGE_KERNEL flags in ioremap()


Strange, it is calling ioremap_prot(), it should have been calling  
ioremap_cache() hence ioremap() since patch 4  of the serie. Did  
patch 4 apply correctly ?


Oops, forget that stupid comment. ioremap_cache() uses ioremap_prot()  
but with the correct flags instead of 0


So patch 4 is needed anyway, because using 0 as flags will now not  
anymore implicitely imply PAGE_KERNEL


Christophe



Christophe



64-bit:

 io scheduler mq-deadline registered
 io scheduler kyber registered
 pcieport :00:00.0: AER enabled with IRQ 482
 pcieport 2000:00:00.0: AER enabled with IRQ 480
 Unable to handle kernel paging request for data at address  
0x88008008

 Faulting instruction address: 0xc00152e4
 Oops: Kernel access of bad area, sig: 11 [#1]
 BE SMP NR_CPUS=32 CoreNet Generic
 Modules linked in:
 CPU: 1 PID: 1 Comm: swapper/0 Not tainted  
4.19.0-rc3-gcc-7.3.1-00135-gcb9220cc1834 #16

 NIP:  c00152e4 LR: c05173b8 CTR: 0010
 REGS: c000f30eb420 TRAP: 0300   Not tainted   
(4.19.0-rc3-gcc-7.3.1-00135-gcb9220cc1834)

 MSR:  80029000   CR: 24000224  XER: 
 DEAR: 88008008 ESR: 0080 IRQMASK: 0
 GPR00: c05173a0 c000f30eb6a0 c0f86e00 88008008
 GPR04:  0040 0ffbff04 0008
 GPR08:  0010  0080
 GPR12: 84000422 c00037c0 c000263c 
 GPR16:    
 GPR20:    
 GPR24: c0dbe0a0 c0d41bf8 88008008 c00089a8
 GPR28: c000f3492400 c000f3492410 0040 c0ff0a18
 NIP [c00152e4] ._memset_io+0x6c/0x9c
 LR [c05173b8] .fsl_qman_probe+0x188/0x918
 Call Trace:
 [c000f30eb6a0] [c05173a0] .fsl_qman_probe+0x170/0x918  
(unreliable)

 [c000f30eb740] [c05797dc] .platform_drv_probe+0x58/0xac
 [c000f30eb7c0] [c05772b0] .really_probe+0x2a8/0x380
 [c000f30eb860] [c05776f0] .__driver_attach+0x138/0x13c
 [c000f30eb8f0] [c0574550] .bus_for_each_dev+0x9c/0x110
 [c000f30eb9a0] [c0576a18] .driver_attach+0x24/0x38
 [c000f30eba10] [c05761ec] .bus_add_driver+0x16c/0x2ac
 [c000f30ebab0] [c0578194] .driver_register+0x88/0x178
 [c000f30ebb30] [c0579770] .__platform_driver_register+0x48/0x5c
 [c000f30ebba0] [c0d85744] .fsl_qman_driver_init+0x1c/0x30
 [c000f30ebc10] [c0002374] .do_one_initcall+0x70/0x258
 [c000f30ebcf0] [c0d4a244] .kernel_init_freeable+0x340/0x43c
 [c000f30ebdb0] [c0002658] .kernel_init+0x1c/0x130
 [c000f30ebe30] [c9e4] .ret_from_kernel_thread+0x58/0x74
 Instruction dump:
 4e800020 2ba50003 40dd003c 3925fffc 5488402e 7929f082 7d082378 39290001
 550a801e 7d2903a6 7d4a4378 794a0020 <9143> 38630004 4200fff8 70a50003
 ---[ end trace 372a57fd67efb6fe ]---

32 bit:

 [1.076133] pcieport 2000:02:00.0: AER enabled with IRQ 480
 [1.082106] Unable to handle kernel paging request for data at  
address 0xf110

 [1.089488] Faulting instruction address: 0xc0011c80
 [1.094437] Oops: Kernel access of bad area, sig: 11 [#1]
 [1.099816] BE SMP NR_CPUS=24 CoreNet Generic
 [1.104157] Modules linked in:
 [1.107197] CPU: 0 PID: 1 Comm: swapper/0 Not tainted  
4.19.0-rc3-gcc7x-g8f0c636b0542 #1

 [1.115181] NIP:  c0011c80 LR: c058f970 CTR: 0010
 [1.120216] REGS: e8107c80 TRAP: 0300   Not tainted   
(4.19.0-rc3-gcc7x-g8f0c636b0542)

 [1.128026] MSR:  00029002   CR: 24000284  XER: 
 [1.134190] DEAR: f110 ESR: 0080
 [1.134190] GPR00: c058f958 e8107d30 e8108000 f110   
0040 e8107cd8 e8107d0c
 [1.134190] GPR08:  0010  ff00 24000282  
 c0003340 
 [1.134190] GPR16:     c0d64410  
c0edb34c c0ec6700 0007
 [1.134190] GPR24: c0ef f110 efffcab8 c0ef e8231c00  
e8231c10 0040003f c101d290

 [1.171519] NIP [c0011c80] _memset_io+0x90/0xd0
 [1.176030] LR [c058f970] fsl_qman_probe+0x190/0x8c0
 [1.180975] Call Trace:
 [1.183410] [e8107d30] [c001f6c0] ioremap_prot+0x40/0x50 (unreliable)
 [1.189830] [e8

Re: [PATCH v3 06/24] powerpc/mm: properly set PAGE_KERNEL flags in ioremap()

2018-10-14 Thread LEROY Christophe


Michael Ellerman  a écrit :


Michael Ellerman  writes:


Christophe Leroy  writes:


Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy 


Something in here is breaking my p5020ds (both 32-bit and 64-bit):


Oh duh.

That's because I didn't take patch 4.

It didn't have any acks, but I guess I'll just merge it rather than
breaking things.


Yes indeed. Maybe should I have followed it more carrefully to ensure  
it gets an ack.


Thanks
Christophe

Re: [PATCH v3 06/24] powerpc/mm: properly set PAGE_KERNEL flags in ioremap()

2018-10-14 Thread LEROY Christophe


Michael Ellerman  a écrit :


Christophe Leroy  writes:


Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy 


Something in here is breaking my p5020ds (both 32-bit and 64-bit):

# first bad commit: [cb9220cc18345a2a1ec287752e576a2c88fa4a2b]  
powerpc/mm: properly set PAGE_KERNEL flags in ioremap()


Strange, it is calling ioremap_prot(), it should have been calling  
ioremap_cache() hence ioremap() since patch 4  of the serie. Did patch  
4 apply correctly ?


Christophe



64-bit:

  io scheduler mq-deadline registered
  io scheduler kyber registered
  pcieport :00:00.0: AER enabled with IRQ 482
  pcieport 2000:00:00.0: AER enabled with IRQ 480
  Unable to handle kernel paging request for data at address  
0x88008008

  Faulting instruction address: 0xc00152e4
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE SMP NR_CPUS=32 CoreNet Generic
  Modules linked in:
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted  
4.19.0-rc3-gcc-7.3.1-00135-gcb9220cc1834 #16

  NIP:  c00152e4 LR: c05173b8 CTR: 0010
  REGS: c000f30eb420 TRAP: 0300   Not tainted   
(4.19.0-rc3-gcc-7.3.1-00135-gcb9220cc1834)

  MSR:  80029000   CR: 24000224  XER: 
  DEAR: 88008008 ESR: 0080 IRQMASK: 0
  GPR00: c05173a0 c000f30eb6a0 c0f86e00 88008008
  GPR04:  0040 0ffbff04 0008
  GPR08:  0010  0080
  GPR12: 84000422 c00037c0 c000263c 
  GPR16:    
  GPR20:    
  GPR24: c0dbe0a0 c0d41bf8 88008008 c00089a8
  GPR28: c000f3492400 c000f3492410 0040 c0ff0a18
  NIP [c00152e4] ._memset_io+0x6c/0x9c
  LR [c05173b8] .fsl_qman_probe+0x188/0x918
  Call Trace:
  [c000f30eb6a0] [c05173a0] .fsl_qman_probe+0x170/0x918  
(unreliable)

  [c000f30eb740] [c05797dc] .platform_drv_probe+0x58/0xac
  [c000f30eb7c0] [c05772b0] .really_probe+0x2a8/0x380
  [c000f30eb860] [c05776f0] .__driver_attach+0x138/0x13c
  [c000f30eb8f0] [c0574550] .bus_for_each_dev+0x9c/0x110
  [c000f30eb9a0] [c0576a18] .driver_attach+0x24/0x38
  [c000f30eba10] [c05761ec] .bus_add_driver+0x16c/0x2ac
  [c000f30ebab0] [c0578194] .driver_register+0x88/0x178
  [c000f30ebb30] [c0579770] .__platform_driver_register+0x48/0x5c
  [c000f30ebba0] [c0d85744] .fsl_qman_driver_init+0x1c/0x30
  [c000f30ebc10] [c0002374] .do_one_initcall+0x70/0x258
  [c000f30ebcf0] [c0d4a244] .kernel_init_freeable+0x340/0x43c
  [c000f30ebdb0] [c0002658] .kernel_init+0x1c/0x130
  [c000f30ebe30] [c9e4] .ret_from_kernel_thread+0x58/0x74
  Instruction dump:
  4e800020 2ba50003 40dd003c 3925fffc 5488402e 7929f082 7d082378 39290001
  550a801e 7d2903a6 7d4a4378 794a0020 <9143> 38630004 4200fff8 70a50003
  ---[ end trace 372a57fd67efb6fe ]---

32 bit:

  [1.076133] pcieport 2000:02:00.0: AER enabled with IRQ 480
  [1.082106] Unable to handle kernel paging request for data at  
address 0xf110

  [1.089488] Faulting instruction address: 0xc0011c80
  [1.094437] Oops: Kernel access of bad area, sig: 11 [#1]
  [1.099816] BE SMP NR_CPUS=24 CoreNet Generic
  [1.104157] Modules linked in:
  [1.107197] CPU: 0 PID: 1 Comm: swapper/0 Not tainted  
4.19.0-rc3-gcc7x-g8f0c636b0542 #1

  [1.115181] NIP:  c0011c80 LR: c058f970 CTR: 0010
  [1.120216] REGS: e8107c80 TRAP: 0300   Not tainted   
(4.19.0-rc3-gcc7x-g8f0c636b0542)

  [1.128026] MSR:  00029002   CR: 24000284  XER: 
  [1.134190] DEAR: f110 ESR: 0080
  [1.134190] GPR00: c058f958 e8107d30 e8108000 f110   
0040 e8107cd8 e8107d0c
  [1.134190] GPR08:  0010  ff00 24000282  
 c0003340 
  [1.134190] GPR16:     c0d64410  
c0edb34c c0ec6700 0007
  [1.134190] GPR24: c0ef f110 efffcab8 c0ef e8231c00  
e8231c10 0040003f c101d290

  [1.171519] NIP [c0011c80] _memset_io+0x90/0xd0
  [1.176030] LR [c058f970] fsl_qman_probe+0x190/0x8c0
  [1.180975] Call Trace:
  [1.183410] [e8107d30] [c001f6c0] ioremap_prot+0x40/0x50 (unreliable)
  [1.189830] [e8107d40] [c058f958] fsl_qman_probe+0x178/0x8c0
  [1.195475] [e8107d70] [c0600894] platform_drv_probe+0x54/0xb0
  [1.201287] [e8107d90] [c05fe15c] really_probe+0x28c/0x350
  [1.206756] [e8107dc0] [c05fe73c]

Re: [PATCH net] powerpc: use big endian to hash len and proto in csum_ipv6_magic

2018-09-08 Thread LEROY Christophe


Xin Long  a écrit :


The function csum_ipv6_magic doesn't convert len and proto to big
endian before doing ipv6 csum hash, which is not consistent with
RFC and other arches.

Jianlin found it when ICMPv6 packets from other hosts were dropped
in the powerpc64 system.

This patch is to fix it by using instruction 'lwbrx' to do this
conversion in powerpc32/64 csum_ipv6_magic.

Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly")
Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 arch/powerpc/lib/checksum_32.S | 4 
 arch/powerpc/lib/checksum_64.S | 4 
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index aa22406..7d3446e 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -325,6 +325,10 @@ _GLOBAL(csum_ipv6_magic)
adder0, r0, r9
lwz r11, 12(r4)
adder0, r0, r10
+   STWX_BE r5, 0, r1
+   lwz r5, 0(r1)
+   STWX_BE r6, 0, r1
+   lwz r6, 0(r1)


PPC32 doesn't support little endian, so nothing to do here.


add r5, r5, r6  /* assumption: len + proto doesn't carry */
adder0, r0, r11
adder0, r0, r5
diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
index 886ed94..302e732 100644
--- a/arch/powerpc/lib/checksum_64.S
+++ b/arch/powerpc/lib/checksum_64.S
@@ -439,6 +439,10 @@ EXPORT_SYMBOL(csum_partial_copy_generic)
 _GLOBAL(csum_ipv6_magic)
ld  r8, 0(r3)
ld  r9, 8(r3)
+   STWX_BE r5, 0, r1
+   lwz r5, 0(r1)
+   STWX_BE r6, 0, r1
+   lwz r6, 0(r1)
add r5, r5, r6


This is overkill. For LE it should be enough to rotate r5 by 8 bits  
after the sum. Best place to do it would be after ld r11 I think.


Christophe


addcr0, r8, r9
ld  r10, 0(r4)
--
2.1.0

Re: [PATCH v2] powerpc/tm: Print 64-bits MSR

2018-08-07 Thread LEROY Christophe


Breno Leitao  a écrit :


Hi,

On 08/07/2018 02:15 PM, Christophe LEROY wrote:

Le 07/08/2018 à 15:35, Breno Leitao a écrit :

On a kernel TM Bad thing program exception, the Machine State Register
(MSR) is not being properly displayed. The exception code dumps a 32-bits
value but MSR is a 64 bits register for all platforms that have HTM
enabled.

This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
order to do so, the 'reason' variable could not be used, since it trimmed
MSR to 32-bits (int).


reason is not always regs->msr, see get_reason(), allthough in your  
case it is.


I think it would be better to change 'reason' to 'unsigned long' instead of
replacing it by regs->msr for the printk.


That was my initial approach, but this code seems to run on 32 bits system,
and I do not want to change the whole 'reason' bit width without having a 32
bits to test, at least.


But 'unsigned long' is still 32 bits on ppc32, so it makes no  
difference with 'unsigned int'

And I will test it for you if needed

Christophe



Also, it is a bit weird doing something as:

printk("(msr 0x%lx)", reason);

I personally think that the follow code is much more readable:

printk(" (msr 0x%lx)...", regs->msr);

Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-30 Thread LEROY Christophe

Murilo Opsfelder Araujo  a écrit :

Hi, Christophe.

On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:

Murilo Opsfelder Araujo  a écrit :

> Simplify the message format by using REG_FMT as the register format.  This
> avoids having two different formats and avoids checking for MSR_64BIT.

Are you sure it is what we want ?

Yes.

Won't it change the behaviour for a 32 bits app running on a 64bits kernel ?

In fact, this changes how many zeroes are prefixed when displaying  
the registers

(%016lx vs. %08lx format).  For example, 32-bits userspace, 64-bits kernel:

Indeed that's what I suspected. What is the real benefit of this  
change ? Why not keep the current format for 32bits userspace ? All  
those leading zeroes are pointless to me.

before this series:
  [66475.002900] segv[4599]: unhandled signal 11 at  nip  
1420 lr 0fe61854 code 1

after this series:
  [   73.414535] segv[3759]: segfault (11) at  nip  
1420 lr 0fe61854 code 1 in segv[1000+1]
  [   73.414641] segv[3759]: code: 4e800421 80010014 38210010  
7c0803a6 4b30 9421ffd0 93e1002c 7c3f0b78
  [   73.414665] segv[3759]: code: 3920 913f001c 813f001c  
3941 <9149> 3920 7d234b78 397f0030

Have you spotted any other behaviour change?

Not as of today

Christophe

Cheers
Murilo

Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-27 Thread LEROY Christophe


Murilo Opsfelder Araujo  a écrit :


Simplify the message format by using REG_FMT as the register format.  This
avoids having two different formats and avoids checking for MSR_64BIT.


Are you sure it is what we want ?

Won't it change the behaviour for a 32 bits app running on a 64bits kernel ?

Christophe




Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4faab4705774..047d980ac776 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -311,17 +311,13 @@ static bool show_unhandled_signals_ratelimited(void)
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
-   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %08lx nip %08lx lr %08lx code %x\n";
-   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %016lx nip %016lx lr %016lx code %x\n";
-
if (!unhandled_signal(current, signr))
return;

-   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
+   pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
+   " nip "REG_FMT" lr "REG_FMT" code %x\n",
+   current->comm, current->pid, signr, addr,
+   regs->nip, regs->link, code);
 }

 void _exception_pkey(int signr, struct pt_regs *regs, int code,
--
2.17.1

Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-26 Thread LEROY Christophe


Alex Ghiti  a écrit :


Hi everyone,

This is the result of the build for all arches tackled in this  
series rebased on 4.18-rc6:


arm:
    versatile_defconfig: without huge page OK
    keystone_defconfig: with huge page OK
arm64:
    defconfig: with huge page OK
ia64:
    generic_defconfig: with huge page OK
mips:
    Paul Burton tested cavium octeon with huge page OK
parisc:
    generic-64bit_defconfig: with huge page does not link
    generic-64bit_defconfig: without huge page does not link
    BUT not because of this series, any feedback welcome.
powerpc:
    ppc64_defconfig: without huge page OK
    ppc64_defconfig: with huge page OK


Can you also test ppc32 both with and without hugepage (mpc885_ads_defconfig)

Thanks
Christophe


sh:
    dreamcast_defconfig: with huge page OK
sparc:
    sparc32_defconfig: without huge page OK
sparc64:
    sparc64_defconfig: with huge page OK
x86:
    with huge page OK

Alex

On 07/23/2018 02:00 PM, Michael Ellerman wrote:

Alex Ghiti  writes:


Does anyone have any suggestion about those patches ?

Cross compiling it for some non-x86 arches would be a good start :)

There are cross compilers available here:

  https://mirrors.edge.kernel.org/pub/tools/crosstool/


cheers


On 07/09/2018 02:16 PM, Michal Hocko wrote:
[CC hugetlb guys -  
http://lkml.kernel.org/r/20180705110716.3919-1-a...@ghiti.fr]


On Thu 05-07-18 11:07:05, Alexandre Ghiti wrote:

In order to reduce copy/paste of functions across architectures and then
make riscv hugetlb port (and future ports) simpler and smaller, this
patchset intends to factorize the numerous hugetlb primitives that are
defined across all the architectures.

Except for prepare_hugepage_range, this patchset moves the versions that
are just pass-through to standard pte primitives into
asm-generic/hugetlb.h by using the same #ifdef semantic that can be
found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.

s390 architecture has not been tackled in this serie since it does not
use asm-generic/hugetlb.h at all.
powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).

This patchset has been compiled on x86 only.

Changelog:

v4:
   Fix powerpc build error due to misplacing of #include
outside of #ifdef CONFIG_HUGETLB_PAGE, as
   pointed by Christophe Leroy.

v1, v2, v3:
   Same version, just problems with email provider and misuse of
   --batch-size option of git send-email

Alexandre Ghiti (11):
   hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
   hugetlb: Introduce generic version of hugetlb_free_pgd_range
   hugetlb: Introduce generic version of set_huge_pte_at
   hugetlb: Introduce generic version of huge_ptep_get_and_clear
   hugetlb: Introduce generic version of huge_ptep_clear_flush
   hugetlb: Introduce generic version of huge_pte_none
   hugetlb: Introduce generic version of huge_pte_wrprotect
   hugetlb: Introduce generic version of prepare_hugepage_range
   hugetlb: Introduce generic version of huge_ptep_set_wrprotect
   hugetlb: Introduce generic version of huge_ptep_set_access_flags
   hugetlb: Introduce generic version of huge_ptep_get

  arch/arm/include/asm/hugetlb-3level.h| 32 +-
  arch/arm/include/asm/hugetlb.h   | 33 +--
  arch/arm64/include/asm/hugetlb.h | 39 +++-
  arch/ia64/include/asm/hugetlb.h  | 47 ++-
  arch/mips/include/asm/hugetlb.h  | 40 +++--
  arch/parisc/include/asm/hugetlb.h| 33 +++
  arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +
  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
  arch/powerpc/include/asm/hugetlb.h   | 43 ++
  arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +
  arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
  arch/sh/include/asm/hugetlb.h| 54 ++---
  arch/sparc/include/asm/hugetlb.h | 40 +++--
  arch/x86/include/asm/hugetlb.h   | 72  
+--
  include/asm-generic/hugetlb.h| 88  
+++-

  15 files changed, 143 insertions(+), 384 deletions(-)

--
2.16.2

Re: [PATCH 7/7] powerpc/traps: Show instructions on exceptions

2018-07-25 Thread LEROY Christophe


Murilo Opsfelder Araujo  a écrit :


Move show_instructions() declaration to arch/powerpc/include/asm/stacktrace.h
and include asm/stracktrace.h in arch/powerpc/kernel/process.c,  
which contains

the implementation.

Modify show_instructions() not to call __kernel_text_address(), allowing
userspace instruction dump.  probe_kernel_address(), which returns -EFAULT if
something goes wrong, is still being called.

Call show_instructions() in arch/powerpc/kernel/traps.c to dump  
instructions at

faulty location, useful to debugging.


Shouldn't this part be in a second patch ?

Wouldn't it be better to also see regs in addition if we want to use  
this to understand what happened ?

So you could call show_regs() instead of show_instructions() ?

Christophe



Before this patch, an unhandled signal message looked like:

Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault  
(11) at 17d0 nip 161c lr 7fffbd295100  
code 2 in pandafault[1000+1]


After this patch, it looks like:

Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault  
(11) at 17d0 nip 161c lr 7fffbd295100  
code 2 in pandafault[1000+1]

Jul 24 09:57:00 localhost kernel: Instruction dump:
Jul 24 09:57:00 localhost kernel: 4bfffeec 4bfffee8 3c401002  
38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
Jul 24 09:57:00 localhost kernel: 392988d0 f93f0020 e93f0020  
39400048 <9949> 3920 7d234b78 383f0040


Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/include/asm/stacktrace.h | 7 +++
 arch/powerpc/kernel/process.c | 6 +++---
 arch/powerpc/kernel/traps.c   | 3 +++
 3 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

diff --git a/arch/powerpc/include/asm/stacktrace.h  
b/arch/powerpc/include/asm/stacktrace.h

new file mode 100644
index ..46e5ef451578
--- /dev/null
+++ b/arch/powerpc/include/asm/stacktrace.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STACKTRACE_H
+#define _ASM_POWERPC_STACKTRACE_H
+
+void show_instructions(struct pt_regs *regs);
+
+#endif /* _ASM_POWERPC_STACKTRACE_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b1af3390249c..ee1d63e03c52 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1261,7 +1262,7 @@ struct task_struct *__switch_to(struct  
task_struct *prev,


 static int instructions_to_print = 16;

-static void show_instructions(struct pt_regs *regs)
+void show_instructions(struct pt_regs *regs)
 {
int i;
unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
@@ -1283,8 +1284,7 @@ static void show_instructions(struct pt_regs *regs)
pc = (unsigned long)phys_to_virt(pc);
 #endif

-   if (!__kernel_text_address(pc) ||
-probe_kernel_address((unsigned int __user *)pc, instr)) {
+   if (probe_kernel_address((unsigned int __user *)pc, instr)) {
pr_cont(" ");
} else {
if (regs->nip == pc)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e55ee639d010..3beca17ac1b1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 

 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -357,6 +358,8 @@ static void show_signal_msg(int signr, struct  
pt_regs *regs, int code,

print_vma_addr(KERN_CONT " in ", regs->nip);

pr_cont("\n");
+
+   show_instructions(regs);
 }

 void _exception_pkey(int signr, struct pt_regs *regs, int code,
--
2.17.1

Re: [PATCH 6/7] powerpc/traps: Print signal name for unhandled signals

2018-07-25 Thread LEROY Christophe


Murilo Opsfelder Araujo  a écrit :


This adds a human-readable name in the unhandled signal message.

Before this patch, a page fault looked like:

Jul 11 16:04:11 localhost kernel: pandafault[6303]: unhandled  
signal 11 at 17d0 nip 161c lr  
7fff93c55100 code 2 in pandafault[1000+1]


After this patch, a page fault looks like:

Jul 11 18:14:48 localhost kernel: pandafault[6352]: segfault  
(11) at 00013a2a09f8 nip 00013a2a086c lr 7fffb63e5100  
code 2 in pandafault[13a2a+1]


Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 43 +
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e6c43ef9fb50..e55ee639d010 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -96,6 +96,41 @@ EXPORT_SYMBOL(__debugger_fault_handler);
 #define TM_DEBUG(x...) do { } while(0)
 #endif

+static const char *signames[SIGRTMIN + 1] = {
+   "UNKNOWN",
+   "SIGHUP", // 1
+   "SIGINT", // 2
+   "SIGQUIT",// 3
+   "SIGILL", // 4
+   "unhandled trap", // 5 = SIGTRAP
+   "SIGABRT",// 6 = SIGIOT
+   "bus error",  // 7 = SIGBUS
+   "floating point exception",   // 8 = SIGFPE
+   "illegal instruction",// 9 = SIGILL
+   "SIGUSR1",// 10
+   "segfault",   // 11 = SIGSEGV
+   "SIGUSR2",// 12
+   "SIGPIPE",// 13
+   "SIGALRM",// 14
+   "SIGTERM",// 15
+   "SIGSTKFLT",  // 16
+   "SIGCHLD",// 17
+   "SIGCONT",// 18
+   "SIGSTOP",// 19
+   "SIGTSTP",// 20
+   "SIGTTIN",// 21
+   "SIGTTOU",// 22
+   "SIGURG", // 23
+   "SIGXCPU",// 24
+   "SIGXFSZ",// 25
+   "SIGVTALRM",  // 26
+   "SIGPROF",// 27
+   "SIGWINCH",   // 28
+   "SIGIO",  // 29 = SIGPOLL = SIGLOST
+   "SIGPWR", // 30
+   "SIGSYS", // 31 = SIGUNUSED
+};
+
 /*
  * Trap & Exception support
  */
@@ -314,10 +349,10 @@ static void show_signal_msg(int signr, struct  
pt_regs *regs, int code,

if (!unhandled_signal(current, signr))
return;

-   pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
-   " nip "REG_FMT" lr "REG_FMT" code %x",
-   current->comm, current->pid, signr, addr,
-   regs->nip, regs->link, code);
+   pr_info("%s[%d]: %s (%d) at "REG_FMT" nip "REG_FMT \
+   " lr "REG_FMT" code %x",
+   current->comm, current->pid, signames[signr],
+   signr, addr, regs->nip, regs->link, code);


Are we sure that signr is always within the limits of the table ?

Christophe


print_vma_addr(KERN_CONT " in ", regs->nip);

--
2.17.1

Re: [PATCH 2/7] powerpc/traps: Return early in show_signal_msg()

2018-07-25 Thread LEROY Christophe


Murilo Opsfelder Araujo  a écrit :


Modify logic of show_signal_msg() to return early, if possible.  Replace
printk_ratelimited() by printk() and a default rate limit burst to limit
displaying unhandled signals messages.


Can you explain more the benefits of this change ?

Christophe



Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cbd3dc365193..4faab4705774 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,6 +301,13 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }

+static bool show_unhandled_signals_ratelimited(void)
+{
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+   return show_unhandled_signals && __ratelimit();
+}
+
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
@@ -309,11 +316,12 @@ static void show_signal_msg(int signr, struct  
pt_regs *regs, int code,

const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
"at %016lx nip %016lx lr %016lx code %x\n";

-   if (show_unhandled_signals && unhandled_signal(current, signr)) {
-   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
-   }
+   if (!unhandled_signal(current, signr))
+   return;
+
+   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+  current->comm, current->pid, signr,
+  addr, regs->nip, regs->link, code);
 }

 void _exception_pkey(int signr, struct pt_regs *regs, int code,
@@ -326,7 +334,8 @@ void _exception_pkey(int signr, struct pt_regs  
*regs, int code,

return;
}

-   show_signal_msg(signr, regs, code, addr);
+   if (show_unhandled_signals_ratelimited())
+   show_signal_msg(signr, regs, code, addr);

if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
local_irq_enable();
--
2.17.1

Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E

2018-07-18 Thread LEROY Christophe


Diana Madalina Craciun  a écrit :


On 7/17/2018 7:47 PM, LEROY Christophe wrote:

Diana Craciun  a écrit :


The NXP PPC Book3E platforms are not vulnerable to meltdown and
Spectre v4, so make them PPC_BOOK3S_64 specific.

Signed-off-by: Diana Craciun 
---
History:

v2-->v3
- used the existing functions for spectre v1/v2

 arch/powerpc/Kconfig   | 7 ++-
 arch/powerpc/kernel/security.c | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75f..116c953 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -165,7 +165,7 @@ config PPC
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
-   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
+   select GENERIC_CPU_VULNERABILITIES  if PPC_NOSPEC

I don't understand.  You say this patch is to make something specific
to book3s64 specific, and you are creating a new config param that
make things less specific

Christophe


In order to enable the vulnerabilities reporting on NXP socs I need to
enable them for PPC_FSL_BOOK3E. So they will be enabled for both
PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the
Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs
are not vulnerable to meltdown, so I made the meltdown reporting
PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in
a separate patch to be more clear.


Yes you can. Or keep it as a single patch and add the details you gave  
me in the patch description.


Christophe



Diana




select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_SMP_IDLE_THREAD
@@ -240,6 +240,11 @@ config PPC
# Please keep this list sorted alphabetically.
#

+config PPC_NOSPEC
+bool
+default y
+depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+
 config GENERIC_CSUM
def_bool n

diff --git a/arch/powerpc/kernel/security.c  
b/arch/powerpc/kernel/security.c

index 3a4e5c3..539c744 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)
 device_initcall(barrier_nospec_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */

+#ifdef CONFIG_PPC_BOOK3S_64
 ssize_t cpu_show_meltdown(struct device *dev, struct
device_attribute *attr, char *buf)
 {
bool thread_priv;
@@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, cha

return sprintf(buf, "Vulnerable\n");
 }
+#endif

 ssize_t cpu_show_spectre_v1(struct device *dev, struct
device_attribute *attr, char *buf)
 {
--
2.5.5

Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E

2018-07-17 Thread LEROY Christophe


Diana Craciun  a écrit :


The NXP PPC Book3E platforms are not vulnerable to meltdown and
Spectre v4, so make them PPC_BOOK3S_64 specific.

Signed-off-by: Diana Craciun 
---
History:

v2-->v3
- used the existing functions for spectre v1/v2

 arch/powerpc/Kconfig   | 7 ++-
 arch/powerpc/kernel/security.c | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75f..116c953 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -165,7 +165,7 @@ config PPC
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
-   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
+   select GENERIC_CPU_VULNERABILITIES  if PPC_NOSPEC


I don't understand.  You say this patch is to make something specific  
to book3s64 specific, and you are creating a new config param that  
make things less specific


Christophe


select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_SMP_IDLE_THREAD
@@ -240,6 +240,11 @@ config PPC
# Please keep this list sorted alphabetically.
#

+config PPC_NOSPEC
+bool
+default y
+depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+
 config GENERIC_CSUM
def_bool n

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 3a4e5c3..539c744 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)
 device_initcall(barrier_nospec_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */

+#ifdef CONFIG_PPC_BOOK3S_64
 ssize_t cpu_show_meltdown(struct device *dev, struct  
device_attribute *attr, char *buf)

 {
bool thread_priv;
@@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,  
struct device_attribute *attr, cha


return sprintf(buf, "Vulnerable\n");
 }
+#endif

 ssize_t cpu_show_spectre_v1(struct device *dev, struct  
device_attribute *attr, char *buf)

 {
--
2.5.5

Re: [PATCH v4 0/6] powerpc/fsl: Speculation barrier for NXP PowerPC Book3E

2018-07-17 Thread LEROY Christophe


Diana Craciun  a écrit :


Implement barrier_nospec for NXP PowerPC Book3E processors.

Diana Craciun (6):
  Disable the speculation barrier from the command line
  Document nospectre_v1 kernel parameter.
  Make stf barrier PPC_BOOK3S_64 specific.
  Enable cpu vulnerabilities reporting for NXP PPC BOOK3E
  Add barrier_nospec implementation for NXP PowerPC Book3E
  powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit
platforms


This list doesn't corresponds to the names of following 6 patches

Christophe



 Documentation/admin-guide/kernel-parameters.txt |  4 +++
 arch/powerpc/Kconfig|  7 -
 arch/powerpc/include/asm/barrier.h  | 12 ++---
 arch/powerpc/include/asm/setup.h|  6 -
 arch/powerpc/kernel/Makefile|  3 ++-
 arch/powerpc/kernel/entry_32.S  | 10 +++
 arch/powerpc/kernel/module.c|  4 ++-
 arch/powerpc/kernel/security.c  | 17 +++-
 arch/powerpc/kernel/setup-common.c  |  2 ++
 arch/powerpc/kernel/vmlinux.lds.S   |  4 ++-
 arch/powerpc/lib/feature-fixups.c   | 35  
-

 arch/powerpc/platforms/powernv/setup.c  |  1 -
 arch/powerpc/platforms/pseries/setup.c  |  1 -
 13 files changed, 94 insertions(+), 12 deletions(-)

--
History:
v3 --> v4
- fixed compilation issues

v2 --> v3
- addressed review comments
- included the 32bit sanitization in the same patch series

v1 --> v2
- added implementation for cpu_show_spectre_x functions
- the mitigation is no longer enabled through device tree options

2.5.5

Re: OOM killer invoked while still one forth of mem is available

2018-04-26 Thread LEROY Christophe


Michal Hocko  a écrit :


On Thu 26-04-18 15:28:46, Christophe LEROY wrote:



Le 26/04/2018 à 15:11, Michal Hocko a écrit :
> On Thu 26-04-18 08:10:30, Christophe LEROY wrote:
> >
> >
> > Le 25/04/2018 à 21:57, David Rientjes a écrit :
> > > On Tue, 24 Apr 2018, christophe leroy wrote:
> > >
> > > > Hi
> > > >
> > > > Allthough there is still about one forth of memory available (7976kB
> > > > among 32MB), oom-killer is invoked and makes a victim.
> > > >
> > > > What could be the reason and how could it be solved ?
> > > >
> > > > [   54.400754] S99watchdogd-ap invoked oom-killer:
> > > > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), nodemask=0,
> > > > order=1, oom_score_adj=0
> > > > [   54.400815] CPU: 0 PID: 777 Comm: S99watchdogd-ap Not tainted
> > > > 4.9.85-local-knld-998 #5
> > > > [   54.400830] Call Trace:
> > > > [   54.400910] [c1ca5d10] [c0327d28] dump_header.isra.4+0x54/0x17c
> > > > (unreliable)
> > > > [   54.400998] [c1ca5d50] [c0079d88] oom_kill_process+0xc4/0x414
> > > > [   54.401067] [c1ca5d90] [c007a5c8] out_of_memory+0x35c/0x37c
> > > > [   54.401220] [c1ca5dc0] [c007d68c]  
__alloc_pages_nodemask+0x8ec/0x9a8
> > > > [   54.401318] [c1ca5e70] [c00169d4]  
copy_process.isra.9.part.10+0xdc/0x10d0

> > > > [   54.401398] [c1ca5f00] [c0017b30] _do_fork+0xcc/0x2a8
> > > > [   54.401473] [c1ca5f40] [c000a660] ret_from_syscall+0x0/0x38
> > >
> > > Looks like this is because the allocation is order-1, likely the
> > > allocation of a struct task_struct for a new process on fork.
> >
> > I'm not sure I understand what you mean. The allocation is order 1, yes,
> > does it explains why OOM killer is invoked ?
>
> Well, not really
> [   54.437414] DMA: 460*4kB (UH) 201*8kB (UH) 121*16kB (UH)  
43*32kB (UH) 10*64kB (U) 4*128kB (UH) 0*256kB 0*512kB 0*1024kB  
0*2048kB 0*4096kB 0*8192kB = 7912kB`

>
> You should have enough order-1+ pages to proceed.
>

So, order is 1 so order - 1 is 0,


Not sure what you mean by order - 1, maybe I've confused you. order-1
means that the order is 1. So free is not all that important. What you
should look at though is how many order 1+ free blocks are available.


Oh, ok, i thought you were saying order minus one.
So we need an order one, ie a 8k block.




what's wrong then ? Do the (UH) and (U)
means anything special ?


Yes, show_migration_types. But I do not see why unmovable pageblocks
should block the allocation. This is a GFP_KERNEL allocation request
essentially - thus unmovable itself. This smells like a bug. We are way
above reserves which could block the allocation.


Any suggestion on how to investigate that bug ? Anything to trace ?

Christophe



Otherwise, just above it says 'free:1994', so with
1994 pages free I should have enough to proceed, shouldn't I ?


Not for high order pages as per above...

--
Michal Hocko
SUSE Labs

Re: [PATCH v4 03/19] powerpc: Mark variable `l` as unused, remove `path`

2018-04-06 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


Add gcc attribute unused for `l` variable, replace `path` variable directly
with prom_scratch. Fix warnings treated as errors with W=1:

  arch/powerpc/kernel/prom_init.c:607:6: error: variable ‘l’ set but  
not used [-Werror=unused-but-set-variable]
  arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set  
but not used [-Werror=unused-but-set-variable]


Suggested-by: Michael Ellerman 
Signed-off-by: Mathieu Malaterre 
---
v4: redo v3 since path variable can be avoided
v3: really move path within ifdef DEBUG_PROM
v2: move path within ifdef DEBUG_PROM

 arch/powerpc/kernel/prom_init.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c  
b/arch/powerpc/kernel/prom_init.c

index f8a9a50ff9b5..4b223a9470be 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -604,7 +604,7 @@ static void __init early_cmdline_parse(void)
const char *opt;

char *p;
-   int l = 0;
+   int l __maybe_unused = 0;


Instead of hiding the problem with __maybe_unused, I think we could  
replace the

#ifdef CONFIG_CMDLINE
by a
if (IS_ENABLED(CONFIG_CMDLINE_BOOL))

This is recommanded by Linux codying style

Christophe



prom_cmd_line[0] = 0;
p = prom_cmd_line;
@@ -1386,7 +1386,7 @@ static void __init reserve_mem(u64 base, u64 size)
 static void __init prom_init_mem(void)
 {
phandle node;
-   char *path, type[64];
+   char type[64];
unsigned int plen;
cell_t *p, *endp;
__be32 val;
@@ -1407,7 +1407,6 @@ static void __init prom_init_mem(void)
prom_debug("root_size_cells: %x\n", rsc);

prom_debug("scanning memory:\n");
-   path = prom_scratch;

for (node = 0; prom_next_node(); ) {
type[0] = 0;
@@ -1432,9 +1431,9 @@ static void __init prom_init_mem(void)
endp = p + (plen / sizeof(cell_t));

 #ifdef DEBUG_PROM
-   memset(path, 0, PROM_SCRATCH_SIZE);
-   call_prom("package-to-path", 3, 1, node, path, 
PROM_SCRATCH_SIZE-1);
-   prom_debug("  node %s :\n", path);
+   memset(prom_scratch, 0, PROM_SCRATCH_SIZE);
+		call_prom("package-to-path", 3, 1, node, prom_scratch,  
PROM_SCRATCH_SIZE - 1);

+   prom_debug("  node %s :\n", prom_scratch);
 #endif /* DEBUG_PROM */

while ((endp - p) >= (rac + rsc)) {
--
2.11.0

Re: [PATCH v2 03/19] powerpc: Mark variables as unused

2018-04-05 Thread LEROY Christophe


Michael Ellerman <m...@ellerman.id.au> a écrit :


LEROY Christophe <christophe.le...@c-s.fr> writes:


Mathieu Malaterre <ma...@debian.org> a écrit :


Add gcc attribute unused for two variables. Fix warnings treated as errors
with W=1:

  arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set
but not used [-Werror=unused-but-set-variable]

Suggested-by: Christophe Leroy <christophe.le...@c-s.fr>
Signed-off-by: Mathieu Malaterre <ma...@debian.org>
---
v2: move path within ifdef DEBUG_PROM

 arch/powerpc/kernel/prom_init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c
b/arch/powerpc/kernel/prom_init.c
index acf4b2e0530c..4163b11abb6c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -603,7 +603,7 @@ static void __init early_cmdline_parse(void)
const char *opt;

char *p;
-   int l = 0;
+   int l __maybe_unused = 0;

prom_cmd_line[0] = 0;
p = prom_cmd_line;
@@ -1385,7 +1385,7 @@ static void __init reserve_mem(u64 base, u64 size)
 static void __init prom_init_mem(void)
 {
phandle node;
-   char *path, type[64];
+   char *path __maybe_unused, type[64];


You should enclose that in an ifdef DEBUG_PROM instead of hiding the warning


I disagree, the result is horrible:

 static void __init prom_init_mem(void)
 {
phandle node;
-   char *path, type[64];
+#ifdef DEBUG_PROM
+   char *path;
+#endif
+   char type[64];
unsigned int plen;
cell_t *p, *endp;
__be32 val;


The right fix is to move the debug logic into a helper, and put the path
in there, eg. something like (not tested):

diff --git a/arch/powerpc/kernel/prom_init.c  
b/arch/powerpc/kernel/prom_init.c

index f9d6befb55a6..b02fa2ccc70b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1389,6 +1389,18 @@ static void __init reserve_mem(u64 base, u64 size)
mem_reserve_cnt = cnt + 1;
 }

+#ifdef DEBUG_PROM
+static void prom_debug_path(phandle node)
+{
+   char *path;
+   path = prom_scratch;
+   memset(path, 0, PROM_SCRATCH_SIZE);
+   call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1);
+   prom_debug("  node %s :\n", path);
+}
+#else
+static void prom_debug_path(phandle node) { }


Or put the ifdef inside the function to avoid double definition ?


+#endif /* DEBUG_PROM */
 /*
  * Initialize memory allocation mechanism, parse "memory" nodes and
  * obtain that way the top of memory and RMO to setup out local allocator
@@ -1441,11 +1453,7 @@ static void __init prom_init_mem(void)
p = regbuf;
endp = p + (plen / sizeof(cell_t));

-#ifdef DEBUG_PROM
-   memset(path, 0, PROM_SCRATCH_SIZE);
-   call_prom("package-to-path", 3, 1, node, path, 
PROM_SCRATCH_SIZE-1);
-   prom_debug("  node %s :\n", path);
-#endif /* DEBUG_PROM */
+   prom_debug_path(node);

while ((endp - p) >= (rac + rsc)) {
unsigned long base, size;


Although that also begs the question of why the hell do we need path at
all, and not just use prom_scratch directly?


Wondering the same, why not use prom_scratch directly

Christophe



cheers

Re: [PATCH] powerpc/mm/hash: Move the slb_addr_limit check within PPC_MM_SLICES

2018-03-30 Thread LEROY Christophe


Michael Ellerman  a écrit :


"Aneesh Kumar K.V"  writes:

Should not have any impact, because we always select PP_MM_SLICES  
these days.

Nevertheless it is good to indicate that slb_addr_limit is available only
with slice code.


That file can only be built if PPC_MM_SLICES=y.

So let's just remove the ifdef entirely.

These days PPC_MM_SLICES == PPC_BOOK3S_64, so we should remove
PPC_MM_SLICES #defines wherever possible and replace them with
PPC_BOOK3S_64 otherwise IMO.


PPC8xx also selects PPC_MM_SLICES when hugepages is selected.

Christophe



cheers


diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index c66cb06e73a1..337ef162851d 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -166,6 +166,8 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 */
cmpdi   r9, 0
bne-8f
+
+#ifdef CONFIG_PPC_MM_SLICES
 /*
  * user space make sure we are within the allowed limit
 */
@@ -183,7 +185,6 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 * really do dynamic patching unfortunately as processes might flip
 * between 4k and 64k standard page size
 */
-#ifdef CONFIG_PPC_MM_SLICES
/* r10 have esid */
cmpldi  r10,16
/* below SLICE_LOW_TOP */
--
2.14.3

Re: [PATCH v2 04/19] powerpc/kvm: Prefer fault_in_pages_readable function

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


Directly use fault_in_pages_readable instead of manual __get_user code. Fix
warning treated as error with W=1:

  arch/powerpc/kernel/kvm.c:675:6: error: variable ‘tmp’ set but not  
used [-Werror=unused-but-set-variable]


Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 


Reviewed-by: Christophe Leroy 


---
v2: use fault_in_pages_readable instead
 arch/powerpc/kernel/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 9ad37f827a97..124c51030b75 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -672,14 +673,13 @@ static void kvm_use_magic_page(void)
 {
u32 *p;
u32 *start, *end;
-   u32 tmp;
u32 features;

/* Tell the host to map the magic page to -4096 on all CPUs */
on_each_cpu(kvm_map_magic_page, , 1);

/* Quick self-test to see if the mapping works */
-   if (__get_user(tmp, (u32*)KVM_MAGIC_PAGE)) {
+   if (!fault_in_pages_readable((const char *)KVM_MAGIC_PAGE, 
sizeof(u32))) {
kvm_patching_worked = false;
return;
}
--
2.11.0

Re: [PATCH v2 05/19] powerpc/chrp/setup: Add attribute unused and make some functions static

2018-03-29 Thread LEROY Christophe


The subject of the patch should be updated as well

Christophe


Mathieu Malaterre  a écrit :


Remove variable declaration idu_size and associated code since not used.

These functions can all be static, make it so. Fix warnings treated as
errors with W=1:

  arch/powerpc/platforms/chrp/setup.c:97:6: error: no previous  
prototype for ‘chrp_show_cpuinfo’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/chrp/setup.c:302:13: error: no previous  
prototype for ‘chrp_setup_arch’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/chrp/setup.c:385:16: error: variable  
‘idu_size’ set but not used [-Werror=unused-but-set-variable]
  arch/powerpc/platforms/chrp/setup.c:526:13: error: no previous  
prototype for ‘chrp_init_IRQ’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/chrp/setup.c:559:1: error: no previous  
prototype for ‘chrp_init2’ [-Werror=missing-prototypes]


Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
v2: Remove totally variable idu_size
 arch/powerpc/platforms/chrp/setup.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/chrp/setup.c  
b/arch/powerpc/platforms/chrp/setup.c

index 481ed133e04b..d6d8ffc0271e 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -94,7 +94,7 @@ static const char *chrp_names[] = {
"Total Impact Briq"
 };

-void chrp_show_cpuinfo(struct seq_file *m)
+static void chrp_show_cpuinfo(struct seq_file *m)
 {
int i, sdramen;
unsigned int t;
@@ -299,7 +299,7 @@ static __init void chrp_init(void)
of_node_put(node);
 }

-void __init chrp_setup_arch(void)
+static void __init chrp_setup_arch(void)
 {
struct device_node *root = of_find_node_by_path("/");
const char *machine = NULL;
@@ -382,7 +382,7 @@ static void __init chrp_find_openpic(void)
 {
struct device_node *np, *root;
int len, i, j;
-   int isu_size, idu_size;
+   int isu_size;
const unsigned int *iranges, *opprop = NULL;
int oplen = 0;
unsigned long opaddr;
@@ -427,11 +427,9 @@ static void __init chrp_find_openpic(void)
}

isu_size = 0;
-   idu_size = 0;
if (len > 0 && iranges[1] != 0) {
printk(KERN_INFO "OpenPIC irqs %d..%d in IDU\n",
   iranges[0], iranges[0] + iranges[1] - 1);
-   idu_size = iranges[1];
}
if (len > 1)
isu_size = iranges[3];
@@ -523,7 +521,7 @@ static void __init chrp_find_8259(void)
}
 }

-void __init chrp_init_IRQ(void)
+static void __init chrp_init_IRQ(void)
 {
 #if defined(CONFIG_VT) && defined(CONFIG_INPUT_ADBHID) &&  
defined(CONFIG_XMON)

struct device_node *kbd;
@@ -555,7 +553,7 @@ void __init chrp_init_IRQ(void)
 #endif
 }

-void __init
+static void __init
 chrp_init2(void)
 {
 #ifdef CONFIG_NVRAM
--
2.11.0

Re: [PATCH v2 03/19] powerpc: Mark variables as unused

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


Add gcc attribute unused for two variables. Fix warnings treated as errors
with W=1:

  arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set  
but not used [-Werror=unused-but-set-variable]


Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
v2: move path within ifdef DEBUG_PROM

 arch/powerpc/kernel/prom_init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c  
b/arch/powerpc/kernel/prom_init.c

index acf4b2e0530c..4163b11abb6c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -603,7 +603,7 @@ static void __init early_cmdline_parse(void)
const char *opt;

char *p;
-   int l = 0;
+   int l __maybe_unused = 0;

prom_cmd_line[0] = 0;
p = prom_cmd_line;
@@ -1385,7 +1385,7 @@ static void __init reserve_mem(u64 base, u64 size)
 static void __init prom_init_mem(void)
 {
phandle node;
-   char *path, type[64];
+   char *path __maybe_unused, type[64];


You should enclose that in an ifdef DEBUG_PROM instead of hiding the warning

Christophe


unsigned int plen;
cell_t *p, *endp;
__be32 val;
@@ -1406,7 +1406,6 @@ static void __init prom_init_mem(void)
prom_debug("root_size_cells: %x\n", rsc);

prom_debug("scanning memory:\n");
-   path = prom_scratch;

for (node = 0; prom_next_node(); ) {
type[0] = 0;
@@ -1431,6 +1430,7 @@ static void __init prom_init_mem(void)
endp = p + (plen / sizeof(cell_t));

 #ifdef DEBUG_PROM
+   path = prom_scratch;
memset(path, 0, PROM_SCRATCH_SIZE);
call_prom("package-to-path", 3, 1, node, path, 
PROM_SCRATCH_SIZE-1);
prom_debug("  node %s :\n", path);
--
2.11.0

Re: [PATCH v2 02/19] powerpc/powermac: Mark variable x as unused

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


Since the value of x is never intended to be read, remove it. Fix warning
treated as error with W=1:

  arch/powerpc/platforms/powermac/udbg_scc.c:76:9: error: variable  
‘x’ set but not used [-Werror=unused-but-set-variable]


Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 


Reviewed-by: Christophe Leroy 


---
v2: remove x completely

 arch/powerpc/platforms/powermac/udbg_scc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/udbg_scc.c  
b/arch/powerpc/platforms/powermac/udbg_scc.c

index d83135a9830e..8901973ed683 100644
--- a/arch/powerpc/platforms/powermac/udbg_scc.c
+++ b/arch/powerpc/platforms/powermac/udbg_scc.c
@@ -73,7 +73,7 @@ void udbg_scc_init(int force_scc)
struct device_node *stdout = NULL, *escc = NULL, *macio = NULL;
struct device_node *ch, *ch_def = NULL, *ch_a = NULL;
const char *path;
-   int i, x;
+   int i;

escc = of_find_node_by_name(NULL, "escc");
if (escc == NULL)
@@ -120,7 +120,7 @@ void udbg_scc_init(int force_scc)
mb();

for (i = 2; i != 0; --i)
-   x = in_8(sccc);
+   in_8(sccc);
out_8(sccc, 0x09);  /* reset A or B side */
out_8(sccc, 0xc0);

--
2.11.0

Re: [PATCH v2 01/19] powerpc/powermac: Mark variable x as unused

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


Since the value of x is never intended to be read, declare it with gcc
attribute as unused. Fix warning treated as error with W=1:

  arch/powerpc/platforms/powermac/bootx_init.c:471:21: error:  
variable ‘x’ set but not used [-Werror=unused-but-set-variable]


Signed-off-by: Mathieu Malaterre 
---
v2: move x variable within its local scope

 arch/powerpc/platforms/powermac/bootx_init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/bootx_init.c  
b/arch/powerpc/platforms/powermac/bootx_init.c

index c3c9bbb3573a..d44e8571c1ec 100644
--- a/arch/powerpc/platforms/powermac/bootx_init.c
+++ b/arch/powerpc/platforms/powermac/bootx_init.c
@@ -468,7 +468,7 @@ void __init bootx_init(unsigned long r3,  
unsigned long r4)

boot_infos_t *bi = (boot_infos_t *) r4;
unsigned long hdr;
unsigned long space;
-   unsigned long ptr, x;
+   unsigned long ptr;
char *model;
unsigned long offset = reloc_offset();

@@ -562,6 +562,7 @@ void __init bootx_init(unsigned long r3,  
unsigned long r4)

 * MMU switched OFF, so this should not be useful anymore.
 */
if (bi->version < 4) {
+   unsigned long x __maybe_unused;
bootx_printf("Touching pages...\n");


Stylewise, there should be an empty line after your declaration.

But I believe you should remove that ugly loop and replace it by a  
call to fault_in_pages_readable()


Christophe


/*
--
2.11.0

Re: [PATCH 15/19] powerpc: Add missing prototype

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


On Fri, Mar 23, 2018 at 1:20 PM, christophe leroy
 wrote:



Le 22/03/2018 à 21:20, Mathieu Malaterre a écrit :


Add one missing prototype for function rh_dump_blk. Fix warning treated as
error in W=1:

   arch/powerpc/lib/rheap.c:740:6: error: no previous prototype for
‘rh_dump_blk’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/rheap.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/rheap.h
b/arch/powerpc/include/asm/rheap.h
index 172381769cfc..e75d96de19a0 100644
--- a/arch/powerpc/include/asm/rheap.h
+++ b/arch/powerpc/include/asm/rheap.h
@@ -83,6 +83,9 @@ extern int rh_get_stats(rh_info_t * info, int what, int
max_stats,
  /* Simple dump of remote heap info */
  extern void rh_dump(rh_info_t * info);
  +/* Simple dump of remote info block */
+extern void rh_dump_blk(rh_info_t *info, rh_block_t *blk);
+



Only used in one place, should be static


Well here is what I see over here:

$ git grep rh_dump_blk
...
arch/powerpc/lib/rheap.c:EXPORT_SYMBOL_GPL(rh_dump_blk);


If it was really used by anybody in a module, it would already be in a  
.h so I think we should simply delete the function


Christophe






Christophe


  /* Set owner of taken block */
  extern int rh_set_owner(rh_info_t * info, unsigned long start, const
char *owner);




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

Re: [PATCH 11/19] powerpc/powermac: Move pmac_pfunc_base_install prototype to header file

2018-03-29 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


On Fri, Mar 23, 2018 at 1:13 PM, christophe leroy
 wrote:



Le 22/03/2018 à 21:19, Mathieu Malaterre a écrit :


The pmac_pfunc_base_install prototype was declared in powermac/smp.c since
function was used there, move it to pmac_pfunc.h header to be visible in
pfunc_base.c. Fix a warning treated as error with W=1:

   arch/powerpc/platforms/powermac/pfunc_base.c:330:12: error: no previous
prototype for ‘pmac_pfunc_base_install’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/include/asm/pmac_pfunc.h | 1 +
  arch/powerpc/platforms/powermac/smp.c | 1 -
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pmac_pfunc.h
b/arch/powerpc/include/asm/pmac_pfunc.h
index 73bd8f28f2a8..99f7a288789a 100644
--- a/arch/powerpc/include/asm/pmac_pfunc.h
+++ b/arch/powerpc/include/asm/pmac_pfunc.h
@@ -245,6 +245,7 @@ extern void pmf_put_function(struct pmf_function
*func);
extern int pmf_call_one(struct pmf_function *func, struct pmf_args
*args);
  +extern int pmac_pfunc_base_install(void);




extern keyword is not needed


I understand; but for consistency every single protoypes in this
header file actually use the extern keyword. Is there a guide/best
practice to refer to in this case ?


Consistancy is not a valid reason to continue bad practice. Every  
single modufication is an opportunity to clean things up


You should run script/checkpatch.pl --strict on all your patches  
before submitting.

And follow as much as possible the linux codying style

Christophe




Christophe


/* Suspend/resume code called by via-pmu directly for now */
  extern void pmac_pfunc_base_suspend(void);
diff --git a/arch/powerpc/platforms/powermac/smp.c
b/arch/powerpc/platforms/powermac/smp.c
index 95275e0e2efa..447da6db450a 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -65,7 +65,6 @@
  #endif
extern void __secondary_start_pmac_0(void);
-extern int pmac_pfunc_base_install(void);
static void (*pmac_tb_freeze)(int freeze);
  static u64 timebase;



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

Re: [PATCH] powerpc/64: Fix checksum folding in csum_add

2018-03-29 Thread LEROY Christophe

Paul Mackerras <pau...@ozlabs.org> a écrit :

On Tue, Mar 27, 2018 at 05:22:32PM +0200, LEROY Christophe wrote:

Shile Zhang <shile.zh...@nokia.com> a écrit :

>fix the missed point in Paul's patch:
>"powerpc/64: Fix checksum folding in csum_tcpudp_nofold and
>ip_fast_csum_nofold"
>
>Signed-off-by: Shile Zhang <shile.zh...@nokia.com>
>---
> arch/powerpc/include/asm/checksum.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/include/asm/checksum.h
>b/arch/powerpc/include/asm/checksum.h
>index 5b1a6e3..430d038 100644
>--- a/arch/powerpc/include/asm/checksum.h
>+++ b/arch/powerpc/include/asm/checksum.h
>@@ -108,7 +108,7 @@ static inline __wsum csum_add(__wsum csum,  
__wsum addend)

>
> #ifdef __powerpc64__
>res += (__force u64)addend;
>-   return (__force __wsum)((u32)res + (res >> 32));
>+   return (__force __wsum) from64to32(res);

Did you encounter a bug due to that ?
As far as i understand, csum and addend are 32 bits so can't exceed  
0x

Then their sum won't exceed 0x1fffe. So the sum of upper and lower part
won't carry

If the sum of the two halves was 0x1fffe, then that previously got
truncated to 32 bits and returned as 0xfffe, which is wrong - the
result should be 0x.

It is the sum of the two arguments of csum_add() which can't exceed  
0x1fffe

So the sum of the two halves will be 0x which is the expected result.

The issue would start with res = 0x1, in that case the sum of  
the two halves would be 0x1 which would have been truncated to  
0 whereas we expect the result to be 1 in that case.
But in order to obtain res = 0x1 or higher, you have to sum at  
least three 32 bits numbers. You can't obtain such a value with a sum  
of two 32 bits numbers.

Christophe

Re: [PATCH 14/19] powerpc/altivec: Add missing prototypes for altivec

2018-03-27 Thread LEROY Christophe


LEROY Christophe <christophe.le...@c-s.fr> a écrit :


Mathieu Malaterre <ma...@debian.org> a écrit :


Christophe,

On Sat, Mar 24, 2018 at 9:10 PM, LEROY Christophe
<christophe.le...@c-s.fr> wrote:

Mathieu Malaterre <ma...@debian.org> a écrit :



On Fri, Mar 23, 2018 at 1:19 PM, christophe leroy
<christophe.le...@c-s.fr> wrote:




Le 22/03/2018 à 21:20, Mathieu Malaterre a écrit :



Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes directly in xor_vmx, fix warnings treated as errors
with
W=1:

  arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype
for
‘xor_altivec_2’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype
for
‘xor_altivec_3’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype
for
‘xor_altivec_4’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype
for
‘xor_altivec_5’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre <ma...@debian.org>
---
 arch/powerpc/lib/xor_vmx.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/lib/xor_vmx.h b/arch/powerpc/lib/xor_vmx.h
index 5c2b0839b179..2173e3c84151 100644
--- a/arch/powerpc/lib/xor_vmx.h
+++ b/arch/powerpc/lib/xor_vmx.h
@@ -19,3 +19,17 @@ void __xor_altivec_4(unsigned long bytes, unsigned
long
*v1_in,
 void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
unsigned long *v2_in, unsigned long *v3_in,
unsigned long *v4_in, unsigned long
*v5_in);
+
+void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in);
+




Only used in one place, should be static instead of adding it in a .h

Same for the other ones.



$ git grep xor_altivec_2
[...]
arch/powerpc/lib/xor_vmx_glue.c:EXPORT_SYMBOL(xor_altivec_2);

Are you sure I can change this function to static ?



Yes you are right.  But in fact those fonctions are already defined in
asm/xor. h
So you just need to add the missing #include


I originally tried it, but this leads to:

 CC  arch/powerpc/lib/xor_vmx_glue.o
In file included from arch/powerpc/lib/xor_vmx_glue.c:16:0:
./arch/powerpc/include/asm/xor.h:39:15: error: variable
‘xor_block_altivec’ has initializer but incomplete type
static struct xor_block_template xor_block_altivec = {
  ^~
./arch/powerpc/include/asm/xor.h:40:2: error: unknown field ‘name’
specified in initializer
 .name = "altivec",
 ^
[...]

The file  (powerpc) is pretty much expected to be included
after .

I did not want to tweak  to test for #ifdef _XOR_H just before

#ifdef _XOR_H
static struct xor_block_template xor_block_altivec = {
[...]

since this seems like a hack to me.

Is this ok to test for #ifdef _XOR_H in  ?


What about including linux/raid/xor.h in asm/xor.h ?


Or better: including linux/raid/xor.h then asm/xor.h in xor_vmx_glue.c ?

Christophe


Christophe



Christophe





Christophe



+void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long
*v3_in);
+
+void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in);
+
+void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in, unsigned long
*v5_in);



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

Re: [PATCH 14/19] powerpc/altivec: Add missing prototypes for altivec

2018-03-27 Thread LEROY Christophe


Mathieu Malaterre <ma...@debian.org> a écrit :


Christophe,

On Sat, Mar 24, 2018 at 9:10 PM, LEROY Christophe
<christophe.le...@c-s.fr> wrote:

Mathieu Malaterre <ma...@debian.org> a écrit :



On Fri, Mar 23, 2018 at 1:19 PM, christophe leroy
<christophe.le...@c-s.fr> wrote:




Le 22/03/2018 à 21:20, Mathieu Malaterre a écrit :



Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes directly in xor_vmx, fix warnings treated as errors
with
W=1:

   arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype
for
‘xor_altivec_2’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype
for
‘xor_altivec_3’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype
for
‘xor_altivec_4’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype
for
‘xor_altivec_5’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre <ma...@debian.org>
---
  arch/powerpc/lib/xor_vmx.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/lib/xor_vmx.h b/arch/powerpc/lib/xor_vmx.h
index 5c2b0839b179..2173e3c84151 100644
--- a/arch/powerpc/lib/xor_vmx.h
+++ b/arch/powerpc/lib/xor_vmx.h
@@ -19,3 +19,17 @@ void __xor_altivec_4(unsigned long bytes, unsigned
long
*v1_in,
  void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
 unsigned long *v2_in, unsigned long *v3_in,
 unsigned long *v4_in, unsigned long
*v5_in);
+
+void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in);
+




Only used in one place, should be static instead of adding it in a .h

Same for the other ones.



$ git grep xor_altivec_2
[...]
arch/powerpc/lib/xor_vmx_glue.c:EXPORT_SYMBOL(xor_altivec_2);

Are you sure I can change this function to static ?



Yes you are right.  But in fact those fonctions are already defined in
asm/xor. h
So you just need to add the missing #include


I originally tried it, but this leads to:

  CC  arch/powerpc/lib/xor_vmx_glue.o
In file included from arch/powerpc/lib/xor_vmx_glue.c:16:0:
./arch/powerpc/include/asm/xor.h:39:15: error: variable
‘xor_block_altivec’ has initializer but incomplete type
 static struct xor_block_template xor_block_altivec = {
   ^~
./arch/powerpc/include/asm/xor.h:40:2: error: unknown field ‘name’
specified in initializer
  .name = "altivec",
  ^
[...]

The file  (powerpc) is pretty much expected to be included
after .

I did not want to tweak  to test for #ifdef _XOR_H just before

#ifdef _XOR_H
static struct xor_block_template xor_block_altivec = {
[...]

since this seems like a hack to me.

Is this ok to test for #ifdef _XOR_H in  ?


What about including linux/raid/xor.h in asm/xor.h ?

Christophe



Christophe





Christophe



+void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long
*v3_in);
+
+void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in);
+
+void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in, unsigned long
*v5_in);



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

Re: [PATCH] powerpc/64: Fix checksum folding in csum_add

2018-03-27 Thread LEROY Christophe


Shile Zhang  a écrit :


fix the missed point in Paul's patch:
"powerpc/64: Fix checksum folding in csum_tcpudp_nofold and
ip_fast_csum_nofold"

Signed-off-by: Shile Zhang 
---
 arch/powerpc/include/asm/checksum.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/checksum.h  
b/arch/powerpc/include/asm/checksum.h

index 5b1a6e3..430d038 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -108,7 +108,7 @@ static inline __wsum csum_add(__wsum csum, __wsum addend)

 #ifdef __powerpc64__
res += (__force u64)addend;
-   return (__force __wsum)((u32)res + (res >> 32));
+   return (__force __wsum) from64to32(res);


Did you encounter a bug due to that ?
As far as i understand, csum and addend are 32 bits so can't exceed 0x
Then their sum won't exceed 0x1fffe. So the sum of upper and lower  
part won't carry


Christophe


 #else
asm("addc %0,%0,%1;"
"addze %0,%0;"
--
2.6.2

Re: [PATCH 01/16] initrd: Add generic code path for common initrd unloading logic.

2018-03-25 Thread LEROY Christophe


Shea Levy  a écrit :


Signed-off-by: Shea Levy 
---
 init/initramfs.c | 7 +++
 usr/Kconfig  | 4 
 2 files changed, 11 insertions(+)

diff --git a/init/initramfs.c b/init/initramfs.c
index 7e99a0038942..de5ce873eb5a 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -526,6 +526,13 @@ extern unsigned long __initramfs_size;
 #include 
 #include 

+#ifdef CONFIG_INITRAMFS_GENERIC_UNLOAD
+void free_initrd_mem(unsigned long start, unsigned long end)
+{
+   free_reserved_area((void *)start, (void *)end, -1, "initrd");
+}
+#endif


In powerpc this was an __init function. Why not also put the generic  
one in __init section ?


Christophe



+
 static void __init free_initrd(void)
 {
 #ifdef CONFIG_KEXEC_CORE
diff --git a/usr/Kconfig b/usr/Kconfig
index 43658b8a975e..fd79d4d6fa26 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -233,3 +233,7 @@ config INITRAMFS_COMPRESSION
default ".lzma" if RD_LZMA
default ".bz2"  if RD_BZIP2
default ""
+
+# Arches can select this for a generic initrd unloading codepath
+config INITRAMFS_GENERIC_UNLOAD
+   bool
--
2.16.2

Re: [PATCH 14/19] powerpc/altivec: Add missing prototypes for altivec

2018-03-24 Thread LEROY Christophe


Mathieu Malaterre  a écrit :


On Fri, Mar 23, 2018 at 1:19 PM, christophe leroy
 wrote:



Le 22/03/2018 à 21:20, Mathieu Malaterre a écrit :


Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes directly in xor_vmx, fix warnings treated as errors
with
W=1:

   arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype for
‘xor_altivec_2’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype for
‘xor_altivec_3’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype for
‘xor_altivec_4’ [-Werror=missing-prototypes]
   arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype for
‘xor_altivec_5’ [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/lib/xor_vmx.h | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/lib/xor_vmx.h b/arch/powerpc/lib/xor_vmx.h
index 5c2b0839b179..2173e3c84151 100644
--- a/arch/powerpc/lib/xor_vmx.h
+++ b/arch/powerpc/lib/xor_vmx.h
@@ -19,3 +19,17 @@ void __xor_altivec_4(unsigned long bytes, unsigned long
*v1_in,
  void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
 unsigned long *v2_in, unsigned long *v3_in,
 unsigned long *v4_in, unsigned long *v5_in);
+
+void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in);
+



Only used in one place, should be static instead of adding it in a .h

Same for the other ones.


$ git grep xor_altivec_2
[...]
arch/powerpc/lib/xor_vmx_glue.c:EXPORT_SYMBOL(xor_altivec_2);

Are you sure I can change this function to static ?


Yes you are right.  But in fact those fonctions are already defined in  
asm/xor. h

So you just need to add the missing #include

Christophe




Christophe



+void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in);
+
+void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in);
+
+void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
+unsigned long *v2_in, unsigned long *v3_in,
+unsigned long *v4_in, unsigned long *v5_in);



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-18 Thread LEROY Christophe


Meelis Roos  a écrit :


> > How early does it hang ? Any oops or trace ?
>
> Very early - instead oif kernel emssages, I see some repeated gibberish
> of some characteers, and the background turns white.
> I am booting from yaboot, background is normally black.

Ok, could you try by replacing #ifdef CONFIG_STRICT_KERNEL_RWX by #if 0
in arch/powerpc/lib/code-patching.c


With this change and CONFIG_STRICT_KERNEL_RWX=y, it still boots.

BTW, I get these warnings (sorry for the word wrap from screen paste) -
may they be related or rather not?

  WRAParch/powerpc/boot/zImage.pmac
  WRAParch/powerpc/boot/zImage.coff
  WRAParch/powerpc/boot/zImage.miboot
INFO: Uncompressed kernel (size 0x5d4c3c) overlaps the address of  
the wrapper(0x40)

INFO: Fixing the link_address of wrapper to (0x60)



Then i believe there is something wrong with commit  
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20171115=37bc3e5fd764fb258ff4fcbb90b6d1b67fb466c1


Balbir, do you have any idea ?

Christophe



And before that:

  HOSTCC  arch/powerpc/boot/addnote
arch/powerpc/boot/addnote.c: In function ‘main’:
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro
‘PUT_16BE’
 #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro
‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro
‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro
‘PUT_16BE’
 buf[(off) + 1] = (v) & 0xff)
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro
‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro
‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro
‘PUT_16BE’
 #define PUT_16BE(off, v)(buf[off] = ((v) >> 8) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro
‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro
‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro
‘PUT_16BE’
 buf[(off) + 1] = (v) & 0xff)
   ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro
‘PUT_32BE’
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
   ^~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro
‘PUT_64BE’
 #define PUT_64(off, v)  (e_data == ELFDATA2MSB ? PUT_64BE(off, v) : \
  ^~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro ‘PUT_64’
   PUT_64(ph + PH_OFFSET, ns);
   ^~
arch/powerpc/boot/addnote.c:85:73: warning: right shift count >= width
of type [-Wshift-count-overflow]
 #define PUT_64LE(off, v) (PUT_32LE((off), (v)), PUT_32LE((off) + 4, (v)

32L))


^
arch/powerpc/boot/addnote.c:82:39: note: in definition of macro
‘PUT_16LE’
 #define PUT_16LE(off, v) (buf[off] = (v) & 0xff, \
   ^
arch/powerpc/boot/addnote.c:85:49: note: in expansion of macro

Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-17 Thread LEROY Christophe


Meelis Roos  a écrit :


Meelis Roos  a écrit :

> > > For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to
> > > the following commit. I had STRICT_KERNEL_RWX enabled when I met the
> > > option. When I disabled STRICT_KERNEL_RWX, the same kernel  
booted fine.

> >
> > Can you please check that 4.13 boots properly with 'nobats' kernel
> > parametre ?
>
> Yes, 4.13.0 with nobats works.

How early does it hang ? Any oops or trace ?


Very early - instead oif kernel emssages, I see some repeated gibberish
of some characteers, and the background turns white.
I am booting from yaboot, background is normally black.


Ok, could you try by replacing #ifdef CONFIG_STRICT_KERNEL_RWX by #if 0
in arch/powerpc/lib/code-patching.c





Christophe

>
> >
> > Christophe
> >
> > >
> > >
> > > 95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
> > > commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
> > > Author: Christophe Leroy 
> > > Date:   Wed Aug 2 15:51:05 2017 +0200
> > >
> > >powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32
> > >
> > >This patch implements STRICT_KERNEL_RWX on PPC32.
> > >
> > >As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
> > >in order to allow page protection setup at the level of each page.
> > >
> > >As BAT/LTLB mappings are deactivated, there might be a performance
> > >impact.
> > >
> > >Signed-off-by: Christophe Leroy 
> > >Signed-off-by: Michael Ellerman 
> > >
> > > :04 04 1eac3de57642856e31a914da2e1fe5368095f04b
> > > ee3634b9ae309852feebc69b8a6bd473944e212c M  arch
> > >
> > >
> > > Config:
> > >
> > > #
> > > # Automatically generated file; DO NOT EDIT.
> > > # Linux/powerpc 4.13.0-rc2 Kernel Configuration
> > > #
> > > # CONFIG_PPC64 is not set
> > >
> > > #
> > > # Processor support
> > > #
> > > CONFIG_PPC_BOOK3S_32=y
> > > # CONFIG_PPC_85xx is not set
> > > # CONFIG_PPC_8xx is not set
> > > # CONFIG_40x is not set
> > > # CONFIG_44x is not set
> > > # CONFIG_E200 is not set
> > > CONFIG_PPC_BOOK3S=y
> > > CONFIG_6xx=y
> > > CONFIG_PPC_FPU=y
> > > CONFIG_ALTIVEC=y
> > > CONFIG_PPC_STD_MMU=y
> > > CONFIG_PPC_STD_MMU_32=y
> > > # CONFIG_PPC_MM_SLICES is not set
> > > CONFIG_PPC_HAVE_PMU_SUPPORT=y
> > > # CONFIG_FORCE_SMP is not set
> > > # CONFIG_SMP is not set
> > > # CONFIG_PPC_DOORBELL is not set
> > > CONFIG_VDSO32=y
> > > CONFIG_CPU_BIG_ENDIAN=y
> > > CONFIG_PPC32=y
> > > CONFIG_32BIT=y
> > > # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
> > > # CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
> > > CONFIG_MMU=y
> > > CONFIG_ARCH_MMAP_RND_BITS_MAX=17
> > > CONFIG_ARCH_MMAP_RND_BITS_MIN=11
> > > CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
> > > CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
> > > # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
> > > # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
> > > CONFIG_NR_IRQS=512
> > > CONFIG_STACKTRACE_SUPPORT=y
> > > CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> > > CONFIG_LOCKDEP_SUPPORT=y
> > > CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> > > CONFIG_GENERIC_HWEIGHT=y
> > > CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
> > > CONFIG_PPC=y
> > > # CONFIG_GENERIC_CSUM is not set
> > > CONFIG_EARLY_PRINTK=y
> > > CONFIG_PANIC_TIMEOUT=180
> > > CONFIG_GENERIC_NVRAM=y
> > > CONFIG_SCHED_OMIT_FRAME_POINTER=y
> > > CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> > > # CONFIG_PPC_UDBG_16550 is not set
> > > # CONFIG_GENERIC_TBSYNC is not set
> > > CONFIG_AUDIT_ARCH=y
> > > CONFIG_GENERIC_BUG=y
> > > # CONFIG_EPAPR_BOOT is not set
> > > # CONFIG_DEFAULT_UIMAGE is not set
> > > CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> > > CONFIG_ARCH_SUSPEND_POSSIBLE=y
> > > # CONFIG_PPC_DCR_NATIVE is not set
> > > # CONFIG_PPC_DCR_MMIO is not set
> > > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > > CONFIG_ARCH_SUPPORTS_UPROBES=y
> > > CONFIG_PGTABLE_LEVELS=2
> > > CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> > > CONFIG_IRQ_WORK=y
> > > CONFIG_BUILDTIME_EXTABLE_SORT=y
> > >
> > > #
> > > # General setup
> > > #
> > > CONFIG_BROKEN_ON_SMP=y
> > > CONFIG_INIT_ENV_ARG_LIMIT=32
> > > CONFIG_CROSS_COMPILE=""
> > > # CONFIG_COMPILE_TEST is not set
> > > CONFIG_LOCALVERSION=""
> > > CONFIG_LOCALVERSION_AUTO=y
> > > CONFIG_HAVE_KERNEL_GZIP=y
> > > CONFIG_KERNEL_GZIP=y
> > > CONFIG_DEFAULT_HOSTNAME="pohl"
> > > CONFIG_SWAP=y
> > > CONFIG_SYSVIPC=y
> > > CONFIG_SYSVIPC_SYSCTL=y
> > > CONFIG_POSIX_MQUEUE=y
> > > CONFIG_POSIX_MQUEUE_SYSCTL=y
> > > CONFIG_CROSS_MEMORY_ATTACH=y
> > > CONFIG_FHANDLE=y
> > > # CONFIG_USELIB is not set
> > > # CONFIG_AUDIT is not set
> > > CONFIG_HAVE_ARCH_AUDITSYSCALL=y
> > >
> > > #
> > > # IRQ subsystem
> > > #
> > > CONFIG_GENERIC_IRQ_SHOW=y
> > > CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
> > > CONFIG_IRQ_DOMAIN=y
> > > # CONFIG_IRQ_DOMAIN_DEBUG is not set
> > > CONFIG_IRQ_FORCED_THREADING=y
> > > CONFIG_SPARSE_IRQ=y
> > > # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> > > CONFIG_GENERIC_TIME_VSYSCALL=y
> > >

Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-17 Thread LEROY Christophe


Meelis Roos  a écrit :


> For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to
> the following commit. I had STRICT_KERNEL_RWX enabled when I met the
> option. When I disabled STRICT_KERNEL_RWX, the same kernel booted fine.

Can you please check that 4.13 boots properly with 'nobats' kernel  
parametre ?


Yes, 4.13.0 with nobats works.


How early does it hang ? Any oops or trace ?

Christophe





Christophe

>
>
> 95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
> commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
> Author: Christophe Leroy 
> Date:   Wed Aug 2 15:51:05 2017 +0200
>
>powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32
>
>This patch implements STRICT_KERNEL_RWX on PPC32.
>
>As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
>in order to allow page protection setup at the level of each page.
>
>As BAT/LTLB mappings are deactivated, there might be a performance
>impact.
>
>Signed-off-by: Christophe Leroy 
>Signed-off-by: Michael Ellerman 
>
> :04 04 1eac3de57642856e31a914da2e1fe5368095f04b
> ee3634b9ae309852feebc69b8a6bd473944e212c M  arch
>
>
> Config:
>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/powerpc 4.13.0-rc2 Kernel Configuration
> #
> # CONFIG_PPC64 is not set
>
> #
> # Processor support
> #
> CONFIG_PPC_BOOK3S_32=y
> # CONFIG_PPC_85xx is not set
> # CONFIG_PPC_8xx is not set
> # CONFIG_40x is not set
> # CONFIG_44x is not set
> # CONFIG_E200 is not set
> CONFIG_PPC_BOOK3S=y
> CONFIG_6xx=y
> CONFIG_PPC_FPU=y
> CONFIG_ALTIVEC=y
> CONFIG_PPC_STD_MMU=y
> CONFIG_PPC_STD_MMU_32=y
> # CONFIG_PPC_MM_SLICES is not set
> CONFIG_PPC_HAVE_PMU_SUPPORT=y
> # CONFIG_FORCE_SMP is not set
> # CONFIG_SMP is not set
> # CONFIG_PPC_DOORBELL is not set
> CONFIG_VDSO32=y
> CONFIG_CPU_BIG_ENDIAN=y
> CONFIG_PPC32=y
> CONFIG_32BIT=y
> # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
> # CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
> CONFIG_MMU=y
> CONFIG_ARCH_MMAP_RND_BITS_MAX=17
> CONFIG_ARCH_MMAP_RND_BITS_MIN=11
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
> # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
> # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
> CONFIG_NR_IRQS=512
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
> CONFIG_PPC=y
> # CONFIG_GENERIC_CSUM is not set
> CONFIG_EARLY_PRINTK=y
> CONFIG_PANIC_TIMEOUT=180
> CONFIG_GENERIC_NVRAM=y
> CONFIG_SCHED_OMIT_FRAME_POINTER=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> # CONFIG_PPC_UDBG_16550 is not set
> # CONFIG_GENERIC_TBSYNC is not set
> CONFIG_AUDIT_ARCH=y
> CONFIG_GENERIC_BUG=y
> # CONFIG_EPAPR_BOOT is not set
> # CONFIG_DEFAULT_UIMAGE is not set
> CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> CONFIG_ARCH_SUSPEND_POSSIBLE=y
> # CONFIG_PPC_DCR_NATIVE is not set
> # CONFIG_PPC_DCR_MMIO is not set
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> CONFIG_PGTABLE_LEVELS=2
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> CONFIG_IRQ_WORK=y
> CONFIG_BUILDTIME_EXTABLE_SORT=y
>
> #
> # General setup
> #
> CONFIG_BROKEN_ON_SMP=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> CONFIG_CROSS_COMPILE=""
> # CONFIG_COMPILE_TEST is not set
> CONFIG_LOCALVERSION=""
> CONFIG_LOCALVERSION_AUTO=y
> CONFIG_HAVE_KERNEL_GZIP=y
> CONFIG_KERNEL_GZIP=y
> CONFIG_DEFAULT_HOSTNAME="pohl"
> CONFIG_SWAP=y
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_POSIX_MQUEUE=y
> CONFIG_POSIX_MQUEUE_SYSCTL=y
> CONFIG_CROSS_MEMORY_ATTACH=y
> CONFIG_FHANDLE=y
> # CONFIG_USELIB is not set
> # CONFIG_AUDIT is not set
> CONFIG_HAVE_ARCH_AUDITSYSCALL=y
>
> #
> # IRQ subsystem
> #
> CONFIG_GENERIC_IRQ_SHOW=y
> CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
> CONFIG_IRQ_DOMAIN=y
> # CONFIG_IRQ_DOMAIN_DEBUG is not set
> CONFIG_IRQ_FORCED_THREADING=y
> CONFIG_SPARSE_IRQ=y
> # CONFIG_GENERIC_IRQ_DEBUGFS is not set
> CONFIG_GENERIC_TIME_VSYSCALL=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CMOS_UPDATE=y
>
> #
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
>
> #
> # CPU/Task time and stats accounting
> #
> CONFIG_TICK_CPU_ACCOUNTING=y
> # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
> # CONFIG_IRQ_TIME_ACCOUNTING is not set
> # CONFIG_BSD_PROCESS_ACCT is not set
> # CONFIG_TASKSTATS is not set
>
> #
> # RCU Subsystem
> #
> CONFIG_TINY_RCU=y
> # CONFIG_RCU_EXPERT is not set
> CONFIG_SRCU=y
> CONFIG_TINY_SRCU=y
> # CONFIG_TASKS_RCU is not set
> # CONFIG_RCU_STALL_COMMON is not set
> # CONFIG_RCU_NEED_SEGCBLIST is not set
> # CONFIG_BUILD_BIN2C is not set
> # CONFIG_IKCONFIG is not set
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
> CONFIG_CGROUPS=y
> # CONFIG_MEMCG is not set
> #

Re: STRICT_KERNEL_RWX on PPC32 is broken on PowerMac G4

2017-11-16 Thread LEROY Christophe


Meelis Roos  a écrit :


For me, 4.13 worked and 4.14 hangs early during boot. Bisecting led to
the following commit. I had STRICT_KERNEL_RWX enabled when I met the
option. When I disabled STRICT_KERNEL_RWX, the same kernel booted fine.


Can you please check that 4.13 boots properly with 'nobats' kernel parametre ?

Christophe




95902e6c8864d39b09134dcaa3c99d8161d1deea is the first bad commit
commit 95902e6c8864d39b09134dcaa3c99d8161d1deea
Author: Christophe Leroy 
Date:   Wed Aug 2 15:51:05 2017 +0200

powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32

This patch implements STRICT_KERNEL_RWX on PPC32.

As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings
in order to allow page protection setup at the level of each page.

As BAT/LTLB mappings are deactivated, there might be a performance
impact.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 

:04 04 1eac3de57642856e31a914da2e1fe5368095f04b  
ee3634b9ae309852feebc69b8a6bd473944e212c M  arch



Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.13.0-rc2 Kernel Configuration
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_FORCE_SMP is not set
# CONFIG_SMP is not set
# CONFIG_PPC_DOORBELL is not set
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_PPC32=y
CONFIG_32BIT=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_BITS_MIN=11
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_EPAPR_BOOT is not set
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_KERNEL_GZIP=y
CONFIG_DEFAULT_HOSTNAME="pohl"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_RCU_NEED_SEGCBLIST is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
#

Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion

2017-07-24 Thread LEROY Christophe


Michael Ellerman <m...@ellerman.id.au> a écrit :


LEROY Christophe <christophe.le...@c-s.fr> writes:


Benjamin Herrenschmidt <b...@kernel.crashing.org> a écrit :


When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.


Did you have a look at the following patch ? It's been waiting for
application for some weeks now.
https://patchwork.ozlabs.org/patch/771869


I actually merged it last merge window, but found I had no good way to
test it, so I took it out again until I can write a test case for it.

The way I realised it wasn't being tested was by removing all the
store_updates_sp logic entirely and having my system run happily for
several days :}


Which demonstrates how unlikely this is, hence doing that get_user()  
at every fault is waste of time.


How do you plan to handle that in parralele to ben's serie ?

I'll be back from vacation next week and may help finding a way to  
test that. (A test program using alloca() ?)


Christophe



cheers

Re: [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault()

2017-07-22 Thread LEROY Christophe


Benjamin Herrenschmidt  a écrit :


By filtering the relevant SRR1 bits in the assembly rather than
in do_page_fault() itself, we avoid a conditional branch (since we
already come from different path for data and instruction faults).

This will allow more simplifications later

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kernel/head_32.S  |  2 +-
 arch/powerpc/kernel/head_8xx.S |  4 ++--
 arch/powerpc/mm/fault.c| 14 ++
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index a5301248b098..f11d1cd6e314 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -409,7 +409,7 @@ InstructionAccess:
mr  r4,r12  /* SRR0 is fault address */
bl  hash_page
 1: mr  r4,r12
-   mr  r5,r9
+   andis.  r5,r9,0x4820/* Filter relevant SRR1 bits */
EXC_XFER_LITE(0x400, handle_page_fault)

 /* External interrupt */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c032fe8c2d26..da3afa2c1658 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -569,8 +569,8 @@ _ENTRY(DTLBMiss_jmp)
 InstructionTLBError:
EXCEPTION_PROLOG
mr  r4,r12
-   mr  r5,r9
-   andis.  r10,r5,0x4000
+   andis.  r5,r9,0x4820/* Filter relevant SRR1 bits */
+   andis.  r10,r9,0x4000
beq+1f
tlbie   r4
 itlbie:
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index faddc87d0205..f04bc9f6b134 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -203,23 +203,13 @@ static int __do_page_fault(struct pt_regs  
*regs, unsigned long address,

unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
int code = SEGV_MAPERR;
int is_write = 0;
-   int trap = TRAP(regs);
-   int is_exec = trap == 0x400;
+   int is_exec = TRAP(regs) == 0x400;


Don't we have a tab/space issue here ?


int is_user = user_mode(regs);
int fault;
int rc = 0, store_update_sp = 0;

 #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
-   /*
-* Fortunately the bit assignments in SRR1 for an instruction
-* fault and DSISR for a data fault are mostly the same for the
-* bits we are interested in.  But there are some bits which
-* indicate errors in DSISR but can validly be set in SRR1.
-*/
-   if (is_exec)
-   error_code &= 0x4820;
-   else
-   is_write = error_code & DSISR_ISSTORE;
+   is_write = error_code & DSISR_ISSTORE;
 #else
is_write = error_code & ESR_DST;
 #endif /* CONFIG_4xx || CONFIG_BOOKE */
--
2.13.3

Re: [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits

2017-07-22 Thread LEROY Christophe


Benjamin Herrenschmidt  a écrit :


Define a common page_fault_is_write() helper and use it

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/mm/fault.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index f257965b54b5..26ec0dd4f419 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -183,6 +183,16 @@ static int mm_fault_error(struct pt_regs *regs,  
unsigned long addr, int fault)

 }

 /*
+ * Define the correct "is_write" bit in error_code based
+ * on the processor family
+ */
+#if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
+#define page_fault_is_write(__err) ((__err) & ESR_DST)
+#else
+#define page_fault_is_write(__err) ((__err) & DSISR_ISSTORE)
+#endif


Doesn't linux kernel codying style make preference to static inline  
functions instead of macros ?



+
+/*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault. For 400-family  
processors

  * the error_code parameter is ESR for a data fault, 0 for an instruction
@@ -202,18 +212,12 @@ static int __do_page_fault(struct pt_regs  
*regs, unsigned long address,

struct mm_struct *mm = current->mm;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
int code = SEGV_MAPERR;
-   int is_write = 0;
int is_exec = TRAP(regs) == 0x400;
int is_user = user_mode(regs);
+   int is_write = page_fault_is_write(error_code);
int fault;
int rc = 0, store_update_sp = 0;

-#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
-   is_write = error_code & DSISR_ISSTORE;
-#else
-   is_write = error_code & ESR_DST;
-#endif /* CONFIG_4xx || CONFIG_BOOKE */
-
 #ifdef CONFIG_PPC_ICSWX
/*
 * we need to do this early because this "data storage
--
2.13.3

Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion

2017-07-21 Thread LEROY Christophe


Benjamin Herrenschmidt  a écrit :


When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.


Did you have a look at the following patch ? It's been waiting for  
application for some weeks now.  
https://patchwork.ozlabs.org/patch/771869

It limits number of calls to get_user()
Can you have a look and merge it with your serie ?



While at it, make store_update_sp a boolean.


My patch requires a tristate here

Christophe



Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/mm/fault.c | 84  
-

 1 file changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a229fd2d82d6..c2720ebb6a62 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,15 +71,15 @@ static inline bool notify_page_fault(struct  
pt_regs *regs)

  * Check whether the instruction at regs->nip is a store using
  * an update addressing form which will update r1.
  */
-static int store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(struct pt_regs *regs)
 {
unsigned int inst;

if (get_user(inst, (unsigned int __user *)regs->nip))
-   return 0;
+   return false;
/* check for 1 in the rA field */
if (((inst >> 16) & 0x1f) != 1)
-   return 0;
+   return false;
/* check major opcode */
switch (inst >> 26) {
case 37:/* stwu */
@@ -87,7 +87,7 @@ static int store_updates_sp(struct pt_regs *regs)
case 45:/* sthu */
case 53:/* stfsu */
case 55:/* stfdu */
-   return 1;
+   return true;
case 62:/* std or stdu */
return (inst & 3) == 1;
case 31:
@@ -99,10 +99,10 @@ static int store_updates_sp(struct pt_regs *regs)
case 439:   /* sthux */
case 695:   /* stfsux */
case 759:   /* stfdux */
-   return 1;
+   return true;
}
}
-   return 0;
+   return false;
 }
 /*
  * do_page_fault error handling helpers
@@ -222,6 +222,43 @@ static bool bad_kernel_fault(bool is_exec,  
unsigned long error_code,

return is_exec || (address >= TASK_SIZE);
 }

+static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+   struct vm_area_struct *vma,
+   bool store_update_sp)
+{
+   /*
+* N.B. The POWER/Open ABI allows programs to access up to
+* 288 bytes below the stack pointer.
+* The kernel signal delivery code writes up to about 1.5kB
+* below the stack pointer (r1) before decrementing it.
+* The exec code can write slightly over 640kB to the stack
+* before setting the user r1.  Thus we allow the stack to
+* expand to 1MB without further checks.
+*/
+   if (address + 0x10 < vma->vm_end) {
+   /* get user regs even if this fault is in kernel mode */
+   struct pt_regs *uregs = current->thread.regs;
+   if (uregs == NULL)
+   return true;
+
+   /*
+* A user-mode access to an address a long way below
+* the stack pointer is only valid if the instruction
+* is one which would update the stack pointer to the
+* address accessed if the instruction completed,
+* i.e. either stwu rs,n(r1) or stwux rs,r1,rb
+* (or the byte, halfword, float or double forms).
+*
+* If we don't check this then any write to the area
+* between the last mapped region and the stack will
+* expand the stack rather than segfaulting.
+*/
+   if (address + 2048 < uregs->gpr[1] && !store_update_sp)
+   return true;
+   }
+   return false;
+}
+
 static bool access_error(bool is_write, bool is_exec,
 struct vm_area_struct *vma)
 {
@@ -350,7 +387,7 @@ static int __do_page_fault(struct pt_regs *regs,  
unsigned long address,

int is_user = user_mode(regs);
int is_write = page_fault_is_write(error_code);
int fault, major = 0;
-   int store_update_sp = 0;
+   bool store_update_sp = false;

 #ifdef CONFIG_PPC_ICSWX
/*
@@ -458,36 +495,11 @@ static int __do_page_fault(struct pt_regs  
*regs, unsigned long address,

if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
return bad_area(regs, address);

-   /*
-* N.B. The POWER/Open ABI allows programs to access up to
-

Re: [RFC 0/2] Consolidate patch_instruction

2017-05-17 Thread LEROY Christophe

Balbir Singh <bsinghar...@gmail.com> a écrit :

On Tue, 2017-05-16 at 22:20 +0200, LEROY Christophe wrote:

Balbir Singh <bsinghar...@gmail.com> a écrit :

> patch_instruction is enhanced in this RFC to support
> patching via a different virtual address (text_poke_area).
> The mapping of text_poke_area->addr is RW and not RWX.
> This way the mapping allows write for patching and then we tear
> down the mapping. The downside is that we introduce a spinlock
> which serializes our patching to one patch at a time.

Very nice patch, would fit great with my patch for impmementing
CONFIG_DEBUG_RODATA (https://patchwork.ozlabs.org/patch/754289 ).
Would avoid having to set the text area back to RW for patching

Awesome! It seems like you have some of the work for CONFIG_STRICT_KERNEL_RWX
any reason why this is under CONFIG_DEBUG_RODATA? But I think there is
reuse capability across the future patches and the current set.

Indeed it looks the same, see https://patchwork.kernel.org/patch/9554881

Christophe

Cheers,
Balbir  Singh.

Re: [RFC 0/2] Consolidate patch_instruction

2017-05-16 Thread LEROY Christophe


Balbir Singh  a écrit :


patch_instruction is enhanced in this RFC to support
patching via a different virtual address (text_poke_area).
The mapping of text_poke_area->addr is RW and not RWX.
This way the mapping allows write for patching and then we tear
down the mapping. The downside is that we introduce a spinlock
which serializes our patching to one patch at a time.


Very nice patch, would fit great with my patch for impmementing  
CONFIG_DEBUG_RODATA (https://patchwork.ozlabs.org/patch/754289 ).

Would avoid having to set the text area back to RW for patching

Christophe



In this patchset we also consolidate instruction changes
in kprobes to use patch_instruction().

Balbir Singh (2):
  powerpc/lib/code-patching: Enhance code patching
  powerpc/kprobes: Move kprobes over to patch_instruction

 arch/powerpc/kernel/kprobes.c|  4 +-
 arch/powerpc/lib/code-patching.c | 88  
++--

 2 files changed, 86 insertions(+), 6 deletions(-)

--
2.9.3

Re: [PATCH V3] powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes

2017-04-06 Thread LEROY Christophe


Hi

Anshuman Khandual  a écrit :


This just adds user space exported ABI definitions for 2MB, 16MB, 1GB,
16GB non default huge page sizes to be used with mmap() system call.


Why not add all possible huge page sizes ?
For instance the 8xx (only) supports 512k and 8M hugepages

Christophe



Signed-off-by: Anshuman Khandual 
---
These defined values will be used along with MAP_HUGETLB while calling
mmap() system call if the desired HugeTLB page size is not the default
one. Follows similar definitions present in x86.

arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_2MB(21 <<  
MAP_HUGE_SHIFT)
arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_1GB(30 <<  
MAP_HUGE_SHIFT)


Changes in V3:
- Added comment about how these defines will help create 'flags'
  argument for mmap() system call per Balbir

Changes in V2:
- Added definitions for 2MB and 1GB HugeTLB pages per Aneesh

 arch/powerpc/include/uapi/asm/mman.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/mman.h  
b/arch/powerpc/include/uapi/asm/mman.h

index 03c06ba..0c84e14 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -29,4 +29,14 @@
 #define MAP_STACK	0x2		/* give out an address that is best  
suited for process/thread stacks */

 #define MAP_HUGETLB0x4 /* create a huge page mapping */

+/*
+ * These defines should be used for creating 'flags' argument
+ * (26:31 bits) for the mmap() system call should the caller
+ * decide to use non default HugeTLB page size.
+ */
+#define MAP_HUGE_2MB   (21 << MAP_HUGE_SHIFT)/* 2MB HugeTLB Page */
+#define MAP_HUGE_16MB  (24 << MAP_HUGE_SHIFT)/* 16MB HugeTLB Page */
+#define MAP_HUGE_1GB   (30 << MAP_HUGE_SHIFT)/* 1GB HugeTLB Page */
+#define MAP_HUGE_16GB  (34 << MAP_HUGE_SHIFT)/* 16GB HugeTLB Page */
+
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
--
1.8.5.2

Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-03 Thread LEROY Christophe


Anton Blanchard  a écrit :


From: Anton Blanchard 

Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.

Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!

A simple exec microbenchmark runs 6% faster on POWER8 with this fix:

 #include 
 #include 
 #include 

int main(int argc, char *argv[])
{
unsigned long left = atol(argv[1]);
char leftstr[16];

if (left-- == 0)
return 0;

sprintf(leftstr, "%ld", left);
execlp(argv[0], argv[0], leftstr, NULL);
perror("exec failed\n");

return 0;
}

Pass the number of iterations on the command line (eg 1) and time
how long it takes to execute.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fd6484fc2fa9..3a7d580fdc59 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned  
long address,

 * can result in fault, which will cause a deadlock when called with
 * mmap_sem held
 */
-   if (user_mode(regs))
+   if (!is_exec && user_mode(regs))


Shouldn't it also check 'is_write' ?
If it is a store, is_write should be set, shouldn't it ?

Christophe


store_update_sp = store_updates_sp(regs);

if (user_mode(regs))
--
2.11.0

_PAGE_PRESENT and _PAGE_ACCESSED

2016-07-26 Thread LEROY Christophe

In ppc8xx tlbmiss handler, we consider a page valid if both  
_PAGE_PRESENT and _PAGE_ACCESSED are set.

Is there any chance to have _PAGE_ACCESSED set and not _PAGE_PRESENT ?
Otherwise we could simplify the handler by considering the page valid  
only when _PAGE_ACCESSED is set


Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Strange reports of perf events on powerpc 83xx

2015-08-27 Thread leroy christophe


Hi,

Has anybody already used 'perf' tool on powerpc MPC83xx ?

I have been succesfully using perf on MPC8xx, but on MPC83xx I get 
something strange.


perf record/report reports addresses on user stack, as if it was mixing 
up D accesses and I accesses.


Any idea of what the problem can be ?

# Samples: 8K of event 'cpu-clock'
# Event count (approx.): 219600
#
# Overhead  Command   Shared Object   Symbol
#     .. 


#
 2.62%  perf_reseau4  libpthread-2.18.so  [.] __libc_send
 2.56%  perf_reseau4  [kernel.kallsyms]   [k] __ip_make_skb
 1.62%  perf_reseau4  [kernel.kallsyms]   [k] __ip_append_data.isra.39
 1.55%  perf_reseau4  [kernel.kallsyms]   [k] ip_finish_output
 1.33%  perf_reseau4  [unknown]   [k] 0x7d94
 1.33%  perf_reseau4  [unknown]   [k] 0x7d95
 1.28%  perf_reseau4  [unknown]   [k] 0x7d97
 1.26%  perf_reseau4  [unknown]   [k] 0x7da3
 1.24%  perf_reseau4  [unknown]   [k] 0x7d98
 1.22%  perf_reseau4  [unknown]   [k] 0x7d92
 1.22%  perf_reseau4  [unknown]   [k] 0x7d9b
 1.22%  perf_reseau4  [unknown]   [k] 0x7daa
 1.21%  perf_reseau4  [unknown]   [k] 0x7d96
 1.18%  perf_reseau4  [unknown]   [k] 0x7da7
 1.17%  perf_reseau4  [unknown]   [k] 0x7d8d
 1.17%  perf_reseau4  [unknown]   [k] 0x7d99
 1.13%  perf_reseau4  [unknown]   [k] 0x7d90
 1.13%  perf_reseau4  [unknown]   [k] 0x7da2
 1.12%  perf_reseau4  [kernel.kallsyms]   [k] __local_bh_enable_ip
 1.12%  perf_reseau4  [unknown]   [k] 0x7d9c
 1.12%  perf_reseau4  [unknown]   [k] 0x7d9e
 1.10%  perf_reseau4  [unknown]   [k] 0x7da0
 1.08%  perf_reseau4  [unknown]   [k] 0x7d9f
 1.08%  perf_reseau4  [unknown]   [k] 0x7da6
 1.05%  perf_reseau4  [unknown]   [k] 0x7da8
 1.02%  perf_reseau4  [unknown]   [k] 0x7d9a
 1.01%  perf_reseau4  [unknown]   [k] 0x7db0
 1.00%  perf_reseau4  [unknown]   [k] 0x7d89
 1.00%  perf_reseau4  [unknown]   [k] 0x7d8b
 1.00%  perf_reseau4  [unknown]   [k] 0x7dac


Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

2015-08-17 Thread leroy christophe




Le 17/08/2015 12:56, leroy christophe a écrit :



Le 07/08/2015 01:25, Segher Boessenkool a écrit :

On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
If this makes performance non-negligibly worse on other 32-bit 
chips, and is
an important improvement on 8xx, then we can use an ifdef since 8xx 
already
requires its own kernel build.  I'd prefer to see a benchmark 
showing that it

actually does make things worse on those chips, though.
And I'd like to see a benchmark that shows it *does not* hurt 
performance

on most chips, and does improve things on 8xx, and by how much. But it
isn't *me* who has to show that, it is not my patch.
Ok, following this discussion I made some additional measurement and 
it looks like:

* There is almost no change on the 885
* There is a non negligeable degradation on the 8323 (19.5 tb ticks 
instead of 15.3)


Thanks for pointing this out, I think my patch is therefore not good.

Oops, I was talking about my other past, the one that was to optimise 
ip_csum_fast.

I still have to measure csum_partial

Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

2015-08-17 Thread leroy christophe




Le 07/08/2015 01:25, Segher Boessenkool a écrit :

On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:

If this makes performance non-negligibly worse on other 32-bit chips, and is
an important improvement on 8xx, then we can use an ifdef since 8xx already
requires its own kernel build.  I'd prefer to see a benchmark showing that it
actually does make things worse on those chips, though.

And I'd like to see a benchmark that shows it *does not* hurt performance
on most chips, and does improve things on 8xx, and by how much.  But it
isn't *me* who has to show that, it is not my patch.
Ok, following this discussion I made some additional measurement and it 
looks like:

* There is almost no change on the 885
* There is a non negligeable degradation on the 8323 (19.5 tb ticks 
instead of 15.3)


Thanks for pointing this out, I think my patch is therefore not good.

Christophe

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop

2015-08-17 Thread leroy christophe




Le 17/08/2015 13:00, leroy christophe a écrit :



Le 17/08/2015 12:56, leroy christophe a écrit :



Le 07/08/2015 01:25, Segher Boessenkool a écrit :

On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
If this makes performance non-negligibly worse on other 32-bit 
chips, and is
an important improvement on 8xx, then we can use an ifdef since 8xx 
already
requires its own kernel build.  I'd prefer to see a benchmark 
showing that it

actually does make things worse on those chips, though.
And I'd like to see a benchmark that shows it *does not* hurt 
performance

on most chips, and does improve things on 8xx, and by how much. But it
isn't *me* who has to show that, it is not my patch.
Ok, following this discussion I made some additional measurement and 
it looks like:

* There is almost no change on the 885
* There is a non negligeable degradation on the 8323 (19.5 tb ticks 
instead of 15.3)


Thanks for pointing this out, I think my patch is therefore not good.

Oops, I was talking about my other past, the one that was to optimise 
ip_csum_fast.

I still have to measure csum_partial

Now, I have the results for csum_partial(). The measurement is done with 
mftbl() before and after calling the function, with IRQ off to get a 
stable measure. Measurement is done with a transfer of vmlinux file done 
3 times via scp toward the target. We get approximatly 5 calls to 
csum_partial()


On MPC885:
1/ Without the patchset, mean time spent in csum_partial() is 167 tb ticks.
2/ With the patchset, mean time is 150 tb ticks

On MPC8323:
1/ Without the patchset, mean time is 287 tb ticks
2/ With the patchset, mean time is 256 tb ticks

The improvement is approximatly 10% in both cases

So, unlike my patch on ip_fast_csum(), this one is worth it.

Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

1 2 >

1 - 100 of 168 matches

Mail list logo